Wish there was a benchmark for ML safety? Allow us to AILuminate you...

MLCommons, an industry-led AI consortium, on Wednesday introduced AILuminate - a benchmark for assessing the safety of large language models in products.

Speaking at an event streamed from the Computer History Museum in San Jose, Peter Mattson, founder and president of MLCommons, likened the situation with AI software to the early days of aviation.

"If you look at aviation, for instance, you can look all the way back to the sketchbooks of Leonardo da Vinci - great ideas that never quite worked," he said. "And then you see the breakthroughs that make them possible, like the Wright brothers at Kitty Hawk.

"But there was a tremendous amount of work from that first flight to the almost unbelievably safe commercial aviation we depend on today. Many of us in this room wouldn't be here if not for all the work and the measurement that enabled that progress to a highly reliable, low risk service.

"To get here for AI, we need standard AI safety benchmarks."

"We" in this case includes technology giants like Meta, Microsoft, Google, and Nvidia - the members of MLCommons. These are stakeholders with a financial interest in the success of AI, as opposed to those who would sooner drive a stake through its heart for kidnapping human creativity and ransoming it as an API.

The benchmarks thus flow from friends - in conjunction with academics and advocacy groups - rather than foes. Those foes include copyright litigants and trade groups that argue music and audiovisual creators stand to lose billions in revenues by 2028 "due to AI's substitutional impact on human-made works," even as generative AI firms gain even greater riches over the same period.

That said, there's little doubt safety standards would be useful - even if it's unclear what liability would follow from violating those standards or actual harmful model interactions. At least since president Biden's 2023 Executive Order on Safe, Secure, and Trustworthy AI, there's been a coordinated effort to better understand the risks of AI systems, and industry players have been keen to shape the rules to their liking.

Nonetheless, makers of AI models readily acknowledge the risks of using generative AI, though not to the point of exiting the market. And AI safety firms like Chatterbox Labs note that even the latest AI models can be induced to emit harmful content with clever prompting.

The MLCommons AILuminate benchmark is focused specifically on risks arising from the use of text-based large language models in English. It does not address multi-modal models. It's also focused on single prompt interactions, and not agents that chain multiple prompts together. And it's not a guarantee of safety.

In short, it's a v1.0 release and further improvements - like support for French, Chinese, and Hindi - are planned for 2025.

In its initial form, AILuminate aims to assess a dozen different hazards.

"They fall into roughly three bins," explained Mattson. "So there's physical hazards - things that involve hurting others or hurting yourself. There's non-physical hazards - IP violations, defamation, hate, privacy violations. And then there are contextual hazards."

Contextual hazards refers to things that may or may not be problematic, depending on the situation. You don't want a general purpose chatbot, for example, to dispense legal or medical advice, Mattson explained, even if that might be desirable for a purpose-built legal or medical system.

"Enterprise AI adoption depends on trust, transparency, and safety," declared Navrina Singh, working group member and founder and CEO of Credo AI, in a statement.

"The AILuminate benchmark, developed through rigorous collaboration between industry leaders and researchers, offers a trusted and fair framework for assessing model risk. This milestone sets a critical foundation for AI safety standards, enabling organizations to confidently and responsibly integrate AI into their operations."

Stuart Battersby, CTO for enterprise AI firm Chatterbox Labs, welcomed the benchmark for advancing the cause of AI safety.

"Great that we are seeing progress in the industry to recognize and test AI safety, especially with cooperation from large companies," Battersby told The Register. "Any movement and collaboration is very welcome.

"Whilst this is a great and welcome step, the reality is that automated testing software needs to be in the hands of the businesses and government departments that are using AI themselves. This is because it's not just about the base model (although that's very important and it should be tested) as each organization's AI deployment is different.

"They have different fine-tuned versions of models, often paired with RAG, using custom implementations of additional guardrails and safety systems, all of which need to be continually tested, in an on-going manner, against their own requirements for safety." ®

Search
About Us
Website HardCracked provides softwares, patches, cracks and keygens. If you have software or keygens to share, feel free to submit it to us here. Also you may contact us if you have software that needs to be removed from our website. Thanks for use our service!
IT News
Dec 10
How to answer the door when the AI agents come knocking

Identity management vendors like Okta see an opening to calm CISOs worried about agents running amok

Dec 9
Linux Foundation aims to become the Switzerland of AI agents

An attempt to provide vendor-neutral oversight as the agent train barrels on

Dec 9
Window Maker Live 13.2 brings 32-bit life to Debian 13

Trixie may have gone 64-bit for installs, but WMLive still ships an i686-bootable build

Dec 9
Google's AI training tactics land it in another EU antitrust fight

Brussels probes whether unpaid web and YouTube content - and rivals' lock-outs - amount to abuse of dominance

Dec 9
AI mania to swell datacenter capex to $1.6T by 2030 - if the bubble doesn't pop first

Analysts say demand keeps rising despite constraints, shaky returns, and mounting investor nerves

Dec 9
SAP users in the dark about vendor's plan for data analytics

February product launch fails to register, with concerns remaining about integration

Dec 9
Affection for Excel spans generations, from Boomers to Zoomers

Younger finance pros are just as loyal to Microsoft's venerable spreadsheet app as their elders