Wanted: A handy metric for gauging if GPUs are being used optimally

GPU accelerators used in AI processing are costly items, so making sure you get the best usage out of them ought to be a priority, yet the industry lacks an effective way of measuring this, says the Uptime Institute.

According to some sources, the cost of an Nvidia H100 can be anywhere from $27,000 to $40,000, while renting GPUs via a cloud provider instead is, for example, priced at $6.98 per hour for a H100 instance on Microsoft's Azure platform. That's just for a single GPU, and naturally AI training will often require more.

Users want to keep those units working as efficiently as possible however research literature, disclosures by AI cluster operators, and model benchmarks all suggest that GPU resources are often wasted, Uptime says in a new report, "GPU utilization is a confusing metric."

Many AI development teams are also unaware of their actual GPU utilization, often assuming higher levels than those achieved in practice.

Uptime, which created the Tier classification levels for datacenters, says GPU servers engaged in training are only operational about 80 percent of the time, and while running, even well-optimized models are only likely to use 35 to 45 percent of compute performance that the silicon can deliver.

Having a simple usage metric for GPUs would be a boon for the industry, writes the report author, research analyst (and former Reg staffer) Max Smolaks. But, he says, GPUs are not comparable with other server components and require fresh ways of accounting for performance.

Current ways of tracking accelerator utilization include monitoring the average operational time for the entire server node, or tracking individual GPU load via tools supplied by the hardware provider itself, typically Nvidia or AMD.

The first method is of limited use to datacenter operators, although it may give an overall power consumption of a cluster over time. The second is the most commonly used, the report says, but not always the best metric for understanding GPU efficiency as the tools typically measure what proportion of processing elements on the chip are executing at a given time and do not take account of the actual work being done.

A better method, according to Uptime, is model FLOPS (floating point operations per second) utilization, or MFU. This tracks the ratio of the observed performance of the model (measured in tokens per second) to the theoretical maximum performance of the underlying hardware, with a higher MFU equating to higher efficiency, which means shorter (and therefore less costly) training runs.

The downside is that this metric, introduced by Google Research, is difficult to calculate and the resultant figures may appear puzzlingly low, with even well-optimized models only delivering between 35 and 45 percent MFU.

This is because performance is impacted by factors such as the network latency and storage throughput, which mean a 100 percent score is unachievable in practice; results above 50 percent represent the current pinnacle.

Uptime concludes there is currently no entirely satisfactory metric to gauge whether GPU resources are being used effectively, but that MFU shows promise, particularly as it has a more-or-less direct relationship with power consumption.

More data gathered from real-world deployments is needed to establish what "good" looks like for an efficient AI cluster, the report states, but many organizations treat this information as proprietary and therefore keep it to themselves. ®

Search
About Us
Website HardCracked provides softwares, patches, cracks and keygens. If you have software or keygens to share, feel free to submit it to us here. Also you may contact us if you have software that needs to be removed from our website. Thanks for use our service!
IT News
Apr 12
I vibe coded a feed reading web app. It was enlightening and uncomfortable

AI-assisted software development is transforming the industry, but you already knew that

Apr 12
Growing void between enterprise and frontier AI puts open weights models in the spotlight

FEATURE Most customers don't need the biggest baddest models, just ones that work, are cheap, and won't pirate their proprietary data

Apr 11
Red Hat RHELocates its Chinese engineering team to India

Hundreds of layoffs, but this smells of geopolitics, not downsizing

Apr 10
Microsoft's Copilot strategy is just more user abuse from Redmond, says Mozilla

Firefox maker warns old web tactics are now shaping AI at the expense of user choice

Apr 10
CPUID site hijacked to serve malware instead of HWMonitor downloads

Six-hour breach turned trusted links into a coin toss between legit tools and credential stealers

Apr 10
Suits won't quit AI spending, even if they can't prove it's working

Forget about investment value! Call it a 'strategic enabler for enterprise‑wide transformation,' says KPMG

Apr 10
Project Glasswing and open source software: The good, the bad, and the ugly

Opinion Just what FOSS developers need - a flood of AI-discovered vulnerabilities