Meta warns that bit flips and other hardware faults cause AI errors

Meta has identified another reason AI might produce rubbish output: hardware faults that corrupt data.

As noted in a paper [PDF] released last week and a June 19 post, hardware faults can corrupt data. No prizes for Meta there - phenomena such as "bit flips" that see data values changed from zero to one are well known and have even been attributed to cosmic rays hitting memory or hard disks.

Meta labels such faults "Silent data corruptions" (SDCs) and its researchers suggest that when they occur in AI systems they create "parameter corruption, where AI model parameters are corrupted and their original values are altered."

"When this occurs during AI inference/servicing it can potentially lead to incorrect or degraded model output for users, ultimately affecting the quality and reliability of AI services," Meta's boffins wrote.

Bit flips are not a new thing - Meta has documented their prevalence in its own infrastructure - but hard to detect at the best of times. In their paper, Meta's researchers suggest the AI stack complicates matters further.

"The escalating complexity and heterogeneity of AI hardware systems make them increasingly susceptible to hardware faults," the paper states.

What to do? Meta suggests measuring hardware faults so that builders of AI systems at least understand the risks.

Its boffins therefore proposed the "parameter vulnerability factor" (PVF) - "a novel metric we've introduced with the aim to standardize the quantification of AI model vulnerability against parameter corruptions."

PVF is apparently "adaptable to different hardware fault models" and can be tweaked for different models and tasks.

"Furthermore, PVF can be extended to the training phase to evaluate the effects of parameter corruptions on the model's convergence capability," Meta's post asserts.

The paper explains that Meta simulated silent corruption incidents using "DLRM" - a tool the social media giant uses to generate personalized content recommendations. Under some circumstances, Meta's authors found four in every thousand inferences would be incorrect.

The paper concludes by suggesting that operators of AI hardware designers consider PVF, to help them balance fault protection with performance and efficiency.

If this all sounds a bit familiar, your déjà vu is spot on. PVF builds on the architectural vulnerability factor (AVF) - an idea described last year by researchers from Intel and the University of Michigan. ®

Search
About Us
Website HardCracked provides softwares, patches, cracks and keygens. If you have software or keygens to share, feel free to submit it to us here. Also you may contact us if you have software that needs to be removed from our website. Thanks for use our service!
IT News
Jul 13
Game dev accuses Intel of selling 'defective' Raptor Lake CPUs

High-end processor instability headaches, failures pushed one studio to switch to AMD

Jul 12
White House urged to double check Microsoft isn't funneling AI to China via G42 deal

Windows maker insisted everything will be locked down and secure - which given its reputation, uh-oh!

Jul 12
PowerToys bring fun tweaks to Windows 10 and 11

Friday FOSS Fest Mac migrants (if any exist) will find Powertoys Run strangely familiar

Jul 12
New Outlook set for GA despite missing some key features

Classic Outlook for Windows shuffles a little closer to the end of the road

Jul 12
Google can totally explain why Chromium browsers quietly tell only its websites about your CPU, GPU usage

OK, now tell us why this isn't an EU DMA violation - asking for a friend in Brussels

Jul 12
SAP's bid to woo open source community meets muted response

German software giant says open source is a 'catalyst for innovation' but is unlikely to release proprietary code

Jul 12
Stop installing that software - you may have just died

On Call They're called role-playing games for a reason ...