Hugging Face puts the squeeze on Nvidia's software ambitions

Hugging Face this week announced HUGS, its answer to Nvidia's Inference Microservices (NIMs), which the AI repo claims will let customers deploy and run LLMs and models on a much wider variety of hardware.

Like Nvidia's previously announced NIMs, Hugging Face Generative AI Services (HUGS) are essentially just containerized model images that contain everything a user might need to deploy the model. The idea is that rather than having to futz with vLLM or TensorRT LLM to get a large language model running optimally at scale, users can instead spin up a preconfigured container image in Docker or Kubernetes and connect to it via standard OpenAI API calls.

HUGS are built around its open source Text Generation Inference (TGI) and Transformers frameworks and libraries, which means they can be deployed on a variety of hardware platforms including Nvidia and AMD GPUs, and will eventually extend support for more specialized AI accelerators like Amazon's Inferentia or Google's TPUs. Apparently no love for Intel Gaudi just yet.

Despite being based on open source technologies, HUGS like NIMS aren't free. If deployed in AWS or Google Cloud, they'll run you about $1 an hour per container.

For comparison, Nvidia charges $1 per hour per GPU for NIMs deployed in the cloud or $4,500 a year per GPU on-prem. If you're deploying a larger model, say Meta's Llama 3.1 405B, that spans eight GPUs, Hugging Face's offering will be significantly less expensive to deploy. What's more, support for alternative hardware types means customers won't be limited to Nvidia's hardware ecosystem.

Whether or not HUGS will be more performant or better optimized than NIMs, remains to be seen.

For those looking to deploy HUGS at a smaller scale, Hugging Face will also make the images available on DigitalOcean's cloud platform at no additional cost, but you'll still have to pay for the compute.

DigitalOcean recently announced the availability of GPU-accelerated VMs based on Nvidia's H100 accelerators which will run you between $2.5 and $6.74 per hour per GPU depending on whether you opt for a single accelerator or sign a 12-month commitment for eight.

Finally, those shelling out the $20 a month per user for Hugging Face's Enterprise Hub subscribers will have the option to deploy HUGS on their own infrastructure.

In terms of models, Hugging Face is fairly conservative and focuses on some of the most popular open models, including:

We expect Hugging Face will quickly expand support to additional models like Microsoft's Phi-series of LLMs in the near future.

But, if paying for what essentially is a bundle of open source software and model files doesn't strike your fancy, nothing stops anyone from building their own containerized models using vLLM, Llama.cpp, TGI, or TensorRT LLM. You can find our hands-on guide on containerizing AI apps here.

With that said, what you're really paying for with Hugging Faces' HUGS or Nvidia's NIMs, for that matter, is the time and effort spent tuning and optimizing the containers for maximum performance. ®

Search

About Us

Website HardCracked provides softwares, patches, cracks and keygens. If you have software or keygens to share, feel free to submit it to us here. Also you may contact us if you have software that needs to be removed from our website. Thanks for use our service!

IT News

Oct 25

Hugging Face puts the squeeze on Nvidia's software ambitions

Search

Categories

About Us

IT News

San Francisco billboards call out tech firms for not paying for open source

Hackers love GitHub dorks - SecOps love outsmarting them

Your computer's not working? Sure, I can fix that problem - which I caused

OpenAI loses another senior figure, disperses safety research team he led

Polish radio station ditches DJs, journalists for AI-generated college kids

Hugging Face puts the squeeze on Nvidia's software ambitions

Emergency patch: Cisco fixes bug under exploit in brute-force attacks