Google DeepMind's latest models kinda sorta take silver at Math Olympiad

Researchers at Google DeepMind claim they've developed a pair of AI models capable of taking home a silver medal in the International Mathematical Olympiad (IMO) - although not within the allotted time limit.

Dubbed AlphaProof and AlphaGeometry 2, these models are designed to help solve one of the bigger hurdles facing popular AI systems today: thanks to limitations in artificial reasoning and training data, they kind of suck at math.

To overcome this, DeepMind developed AlphaProof - which combines a language model with its AlphaZero reinforcement learning algorithm. You may recall this is the same reinforcement model that DeepMind used to master chess, shogi and Go a few years back.

AlphaProof trains itself to prove mathematical statements using a functional programming language called Lean.

"Formal languages offer the critical advantage that proofs involving mathematical reasoning can be formally verified," the DeepMind team wrote in a recent blog.

This means that AlphaProof can not only provide an answer, but prove that it's correct. This differs from existing natural language processing, which will confidently hallucinate a plausible sounding answer - but doesn't actually know whether it's correct.

That's not to say NLP or LLMs can't be useful. DeepMind fine-tuned a Gemini LLM to translate natural language problem statements into ones the AlphaZero algorithm can interpret and use.

"When presented with a problem, AlphaProof generates candidates and then proves or disproves them by searching other possible proof steps in Lean. Every time a solution is verified it's used to reinforce the model.

The DeepMind team did this for millions of problems leading up to the IMO competition - in fact, because it's a reinforcement model, training continued throughout the competition.

AlphaGeometry 2, on the other hand, functions in much the same way as its predecessor, detailed back in January. It combines a neural language model with a "rule-bound deduction engine" and the two work together to find proofs for problems.

As DeepMind explains it, the language model is used to identify patterns and suggest useful constructs. The symbolic engine, meanwhile, uses formal logic to arrive at solutions. The downside to this approach is the second step is comparatively slow.

For its second-gen AlphaGeometry model, DeepMind explained the model's language processing is based on Gemini and trained using an "order of magnitude" more synthetic data than its predecessor. The Symbolic Engine has also been sped up considerably and is said to be "two orders of magnitude" faster.

Putting them to the test

To put these to the test, the DeepMind team tasked AlphaProof and AlphaGeometry 2 with solving the six advanced mathematics problems faced by competitors in this year's IMO.

The competition - which dates back to 1959 and sees pre-college mathematicians tackle some of the hardest problems in algebra, combinatorics, geometry, and number theory - has become something of a proving ground for machine learning devs in recent years.

According to DeepMind, the two models were able to complete four of the six problems - AlphaProof solved two algebra problems and one number theory problem, and AlphaGeometry 2 tackled this year's geometry problem. Unfortunately, the latter was no match for the two combinatorics questions.

Tallied up, DeepMind's models still did fairly well, with a score of 28 out of 42 - the equivalent of a silver medal and one point off from gold.

However, there seems to be plenty of room for improvement. At least for this competition, DeepMind conceded it was still necessary to translate the problems given to competitors manually into a formal mathematical language the models could understand.

The models also failed to solve the majority of the problems within the allotted time period - which spans two 4.5 hour sessions, spread across two days. While the researchers were able to solve one of the problems within a few minutes, the others took upwards of three days to solve.

The DeepMind researchers are not done yet, of course. They report they are already experimenting with a natural language reasoning system, built on Gemini, that wouldn't require problems to be translated into a formal language and could be combined with other AI systems. That should speed things up a bit. ®

Search
About Us
Website HardCracked provides softwares, patches, cracks and keygens. If you have software or keygens to share, feel free to submit it to us here. Also you may contact us if you have software that needs to be removed from our website. Thanks for use our service!
IT News
Oct 18
Destiny Robotics settles SEC case over AI-powered human robot vaporware

Raised $141k+ crowdfunding, but now powered down

Oct 18
Qualcomm 'pausing' X-Elite Dev Kit, offering refunds

Five months in, only 200 units reached customers, Qualy tells El Reg

Oct 17
Microsoft crafts Rust hypervisor to power Azure workloads

OpenVMM touts stronger security, but not ready for prime time just yet

Oct 17
Manifest file destiny: Declare your funding needs via JSON

India-based stockbroker Zerodha pledges $1M a year for open source projects

Oct 17
Western Digital releases firmware fix for SSDs blighted by Windows 11 24H2 BSODs

Microsoft tells us it is investigating the issue... Phew, that's ok then

Oct 17
Securing GenAI for the future of business

Webinar Join a discussion on the security considerations involved when building apps and services with GenAI

Oct 17
Post Office CTO had 'nagging doubts' about Horizon system despite reliability assurances

As 'heat' built from campaigners, tech boss kept telling MPs everything was fine