Meta warns that bit flips and other hardware faults cause AI errors

Meta has identified another reason AI might produce rubbish output: hardware faults that corrupt data.

As noted in a paper [PDF] released last week and a June 19 post, hardware faults can corrupt data. No prizes for Meta there - phenomena such as "bit flips" that see data values changed from zero to one are well known and have even been attributed to cosmic rays hitting memory or hard disks.

Meta labels such faults "Silent data corruptions" (SDCs) and its researchers suggest that when they occur in AI systems they create "parameter corruption, where AI model parameters are corrupted and their original values are altered."

"When this occurs during AI inference/servicing it can potentially lead to incorrect or degraded model output for users, ultimately affecting the quality and reliability of AI services," Meta's boffins wrote.

Bit flips are not a new thing - Meta has documented their prevalence in its own infrastructure - but hard to detect at the best of times. In their paper, Meta's researchers suggest the AI stack complicates matters further.

"The escalating complexity and heterogeneity of AI hardware systems make them increasingly susceptible to hardware faults," the paper states.

What to do? Meta suggests measuring hardware faults so that builders of AI systems at least understand the risks.

Its boffins therefore proposed the "parameter vulnerability factor" (PVF) - "a novel metric we've introduced with the aim to standardize the quantification of AI model vulnerability against parameter corruptions."

PVF is apparently "adaptable to different hardware fault models" and can be tweaked for different models and tasks.

"Furthermore, PVF can be extended to the training phase to evaluate the effects of parameter corruptions on the model's convergence capability," Meta's post asserts.

The paper explains that Meta simulated silent corruption incidents using "DLRM" - a tool the social media giant uses to generate personalized content recommendations. Under some circumstances, Meta's authors found four in every thousand inferences would be incorrect.

The paper concludes by suggesting that operators of AI hardware designers consider PVF, to help them balance fault protection with performance and efficiency.

If this all sounds a bit familiar, your déjà vu is spot on. PVF builds on the architectural vulnerability factor (AVF) - an idea described last year by researchers from Intel and the University of Michigan. ®

Search
About Us
Website HardCracked provides softwares, patches, cracks and keygens. If you have software or keygens to share, feel free to submit it to us here. Also you may contact us if you have software that needs to be removed from our website. Thanks for use our service!
IT News
Jun 18
Google's unloved plan to fix web permissions gathers support

Mozilla, at least, has warmed to the Chocolate Factory's attempt to improve mic, camera, and location permissions

Jun 18
Eat or be eaten by AI, Amazon CEO warns staff

Expect headcount reductions over the next few years, says Andy Jassy

Jun 17
MiniMax M1 model claims Chinese LLM crown from DeepSeek - plus it's true open-source

China's 'little dragons' pose big challenge to US AI firms

Jun 17
Broadcom delivers VMware Cloud Foundation 9 - the release that realizes its private cloud vision

Promises silos for VMs, storage, and networks are out. Happy cloud-like days are in, without hyperscale complications

Jun 17
Firefox is dead to me - and I'm not the only one who is fed up

Opinion Parent company Mozilla's not my fave either

Jun 17
Bots are overwhelming websites with their hunger for AI data

GLAM-E Labs report warns of risk to online cultural resources