Microsoft Azure to spin up AMD MI200 GPU clusters for 'large scale' AI training

Microsoft Build Microsoft Azure on Thursday revealed it will use AMD's top-tier MI200 Instinct GPUs to perform "large-scale" AI training in the cloud.

"Azure will be the first public cloud to deploy clusters of AMD's flagship MI200 GPUs for large-scale AI training," Microsoft CTO Kevin Scott said during the company's Build conference this week. "We've already started testing these clusters using some of our own AI workloads with great performance."

AMD launched its MI200-series GPUs at its Accelerated Datacenter event last fall. The GPUs are based on AMD's CDNA2 architecture and pack 58 billion transistors and up to 128GB of high-bandwidth memory into a dual-die package.

At launch, AMD's Forrest Norrod, SVP and GM of datacenter and embedded solutions, claimed the chips were nearly 5X faster than Nvidia's then top-tier A100 GPU, at least in "highly precise" FP64 calculations. That lead narrowed substantially in more common FP16 workloads, where AMD claims the chips are about 20 percent faster than Nvidia's A100. Nvidia being the dominant datacenter GPU player.

However, it remains to be seen if and when Azure instances based on the aforementioned AMD's graphics chips will become generally available, or if Microsoft plans to use them for internal workloads.

What we do know is Microsoft is working closely with AMD to optimize the chipmaker's GPUs for PyTorch machine learning workloads.

"We're also deepening our investments in the open-source PyTorch framework, working with the PyTorch core team and AMD both to optimize the performance and developer experience for customers running PyTorch on Azure, and to ensure that developers' PyTorch projects work great on AMD hardware," Scott said.

The Register reached out to Microsoft about its plans for AMD's GPUs. We'll let you know if we hear anything.

The enterprise GPU market heats up

PyTorch development was a central component of Microsoft's strategic partnership with Meta AI, announced earlier this week.

In addition to working with Microsoft to optimize its infrastructure for PyTorch workloads, the social media giant announced it would to deploy "cutting-edge ML training workloads" on a dedicated Azure cluster of 5,400 Nvidia A100 GPUs.

The deployment came just as the datacenter became Nvidia's largest business unit at $3.75 billion, outstripping gaming ($3.62 billion) for the first time.

While Nvidia may still command the bulk of the enterprise GPU market, it's facing its stiffest competition in years, and it's not just AMD knocking at its door.

Intel's long-hyped Ponte Vecchio GPUs are expected to roll out later this year alongside the chipmaker's long-delayed Sapphire Rapids Xeon Scalable processors.

And, earlier this month, Intel debuted its second-generation AI training and inference accelerators, which it claims offer A100-beating performance. ®

Search
About Us
Website HardCracked provides softwares, patches, cracks and keygens. If you have software or keygens to share, feel free to submit it to us here. Also you may contact us if you have software that needs to be removed from our website. Thanks for use our service!
IT News
Jul 1
Ubuntu Unity desktop back from the dead after several years' hiatus

Thanks to Linux wunderkind Rudra Saraswat, not Canonical, this time

Jul 1
Cloudera adopts Apache Iceberg, battles Databricks to be most open in data tables

Move follows Databricks' donation of Delta Lake 2.0 to Linux Foundation

Jul 1
Your data could transform your company. You just need to transform your database first

Webinar Say sayonara to SANs, hello to HCI by catching up on this webinar

Jul 1
Devops tool Jenkins now requires Java 11: This might sting a bit

Finally shift set for version 2.357 of developer automation platform

Jul 1
Open source Office rival Collabora releases web-based CODE 22.05

Already host your own file-sharing tool? Now you can add a web-based office suite on top

Jul 1
Windows 11: The little engine that could, eventually

Stalled marketshare seems to be creeping upwards again in consumer, enterprise - but adoption still a slog