OpenAI’s internal reasoning model has disproved the Erdős unit distance conjecture, an open problem in discrete geometry that stood unsolved since 1946. The model succeeded on approximately 50% of attempts, according to OpenAI researcher Sébastien Bubeck. The proof was posted May 20 on OpenAI.com, and the fallout over verification and transparency has been building since.
On June 2, a group of researchers published the Leiden Declaration on Artificial Intelligence and Mathematics, calling for tight guardrails around AI use in mathematical research. As of June 5, 1,590 mathematicians had signed it. The declaration is now endorsed by the International Mathematical Union.
What the Model Did
Paul Erdős proposed the unit distance conjecture in 1946: given n points on a plane, what is the maximum number of pairs that can be exactly one unit apart? Mathematicians believed his proposed upper bound was correct. OpenAI’s model proved it wrong by constructing a counterexample using tools from algebra and number theory, fields that seem unrelated to the geometry problem.
“It’s a beautiful piece of mathematics that has been discovered,” Melanie Matchett Wood of Harvard told Science News. Fields Medal winner Tim Gowers called it “a milestone in AI mathematics,” according to Ars Technica.
The model used no math-specific tools or external software. Bubeck told Science News that the team “didn’t guide the model in any particular way.” The prompt, composed by AI, described the conjecture and instructed the model to either prove or disprove it.
Perseverance, Not Insight
The consensus among reviewing mathematicians is that this was a brute-force achievement, not a creative leap. Thomas Bloom of the University of Manchester noted in the expert review that it would have been “truly incredible” if the model had proved the conjecture, as that would require genuine creative insight.
Ars Technica’s analysis frames the result as consistent with AI’s existing trajectory: three years ago, LLMs struggled with arithmetic. Last year, they started acing high school math competitions. This is the next step, not a discontinuous jump.
One researcher posted on X that he reproduced the proof using a publicly available model, suggesting the breakthrough relied on scale and patience rather than capabilities unique to OpenAI’s unreleased system.
The Transparency Gap
The Leiden Declaration targets several specific problems. OpenAI does not disclose how many times their model fails to solve open problems, making it impossible for external researchers to assess reliability. The company also will not reveal how long the model spent working on its solution.
As Wood told Science News: “LLMs have read ALL the papers. They have read all the commentary and notes, and everything that’s online. It’s not clear that there’s a way for AI to reasonably attribute the source of the ideas.”
The declaration calls for mandatory disclosure of AI involvement in research, human responsibility for AI-generated results, proper attribution protocols, and institutional policies governing AI use. It frames these as preserving the “characteristic values of mathematical research” including verifiability and scientific integrity.
The Agent Reliability Question
The failure-rate problem extends beyond mathematics. Reasoning models are foundational to autonomous agent decision-making. If a model produces correct proofs 50% of the time but no external party can verify the failure rate independently, the same opacity applies to agent systems making high-stakes decisions in trading, infrastructure management, and code generation.
Bloom raised the practical concern to Science News: people are already using AI to generate hundreds of pages of mathematical reasoning they cannot read or verify. “It could be right. It could be nonsense. Who’s going to be able to check this?”
For teams deploying autonomous agents, the Leiden Declaration’s core demand, that AI systems disclose failure rates and reasoning provenance, maps directly onto the auditability requirements that enterprises are still struggling to implement.