Google DeepMind used a large language model to solve an unsolved math problem

They had to throw away most of what it produced but there was gold among the garbage.

Will Douglas Heavenarchive page

December 14, 2023

Stephanie Arnett/MITTR

Google DeepMind has used a large language model to crack a famous unsolved problem in pure mathematics. In a paper published in Nature today, the researchers say it is the first time a large language model has been used to discover a solution to a long-standing scientific puzzle—producing verifiable and valuable new information that did not previously exist. “It’s not in the training data—it wasn’t even known,” says coauthor Pushmeet Kohli, vice president of research at Google DeepMind.

Large language models have a reputation for making things up, not for providing new facts. Google DeepMind’s new tool, called FunSearch, could change that. It shows that they can indeed make discoveries—if they are coaxed just so, and if you throw out the majority of what they come up with.

FunSearch (so called because it searches for mathematical functions, not because it’s fun) continues a streak of discoveries in fundamental math and computer science that DeepMind has made using AI. First AlphaTensor found a way to speed up a calculation at the heart of many different kinds of code, beating a 50-year record. Then AlphaDev found ways to make key algorithms used trillions of times a day run faster.

Yet those tools did not use large language models. Built on top of DeepMind’s game-playing AI AlphaZero, both solved math problems by treating them as if they were puzzles in Go or chess. The trouble is that they are stuck in their lanes, says Bernardino Romera-Paredes, a researcher at the company who worked on both AlphaTensor and FunSearch: “AlphaTensor is great at matrix multiplication, but basically nothing else.”

FunSearch takes a different tack. It combines a large language model called Codey, a version of Google’s PaLM 2 that is fine-tuned on computer code, with other systems that reject incorrect or nonsensical answers and plug good ones back in.

“To be very honest with you, we have hypotheses, but we don’t know exactly why this works,” says Alhussein Fawzi, a research scientist at Google DeepMind. “In the beginning of the project, we didn’t know whether this would work at all.”

The researchers started by sketching out the problem they wanted to solve in Python, a popular programming language. But they left out the lines in the program that would specify how to solve it. That is where FunSearch comes in. It gets Codey to fill in the blanks—in effect, to suggest code that will solve the problem.

A second algorithm then checks and scores what Codey comes up with. The best suggestions—even if not yet correct—are saved and given back to Codey, which tries to complete the program again. “Many will be nonsensical, some will be sensible, and a few will be truly inspired,” says Kohli. “You take those truly inspired ones and you say, ‘Okay, take these ones and repeat.’”

After a couple of million suggestions and a few dozen repetitions of the overall process—which took a few days—FunSearch was able to come up with code that produced a correct and previously unknown solution to the cap set problem, which involves finding the largest size of a certain type of set. Imagine plotting dots on graph paper. The cap set problem is like trying to figure out how many dots you can put down without three of them ever forming a straight line.

It’s super niche, but important. Mathematicians do not even agree on how to solve it, let alone what the solution is. (It is also connected to matrix multiplication, the computation that AlphaTensor found a way to speed up.) Terence Tao at the University of California, Los Angeles, who has won many of the top awards in mathematics, including the Fields Medal, called the cap set problem “perhaps my favorite open question” in a 2007 blog post.

Tao is intrigued by what FunSearch can do. “This is a promising paradigm,” he says. “It is an interesting way to leverage the power of large language models.”

A key advantage that FunSearch has over AlphaTensor is that it can, in theory, be used to find solutions to a wide range of problems. That’s because it produces code—a recipe for generating the solution, rather than the solution itself. Different code will solve different problems. FunSearch’s results are also easier to understand. A recipe is often clearer than the weird mathematical solution it produces, says Fawzi.

To test its versatility, the researchers used FunSearch to approach another hard problem in math: the bin packing problem, which involves trying to pack items into as few bins as possible. This is important for a range of applications in computer science, from data center management to e-commerce. FunSearch came up with a way to solve it that’s faster than human-devised ones.

Mathematicians are “still trying to figure out the best way to incorporate large language models into our research workflow in ways that harness their power while mitigating their drawbacks,” Tao says. “This certainly indicates one possible way forward.”

The headline of this article has been updated.