AI Forecasting advances as Metaculus Cup spotlights machine-led predictions
A UK startup's AI placed among the top contenders in a geopolitics forecasting contest, highlighting growing capabilities and ongoing debates about reliability and impact.

Artificial intelligence is learning to predict the future, and a recent Metaculus forecasting cup underscored the trend. Metaculus is a forecasting platform that awards a prize pot of about $5,000 and asks users to estimate the probability of geopolitical events, such as whether Thailand will experience a military coup before September 2025 or whether Israel will strike the Iranian military again before September 2025. Forecasters provide probabilistic answers weeks to months in advance, and the results have at times shown notable accuracy, including predictions about the date of Russia’s invasion of Ukraine and the likelihood of Roe v. Wade being overturned before it happened.
At this year’s Summer Cup, an AI developed by a UK startup named Mantic rose to eighth place among 549 contestants, marking the first time a bot cracked the top 10 in the competition. The top-ranked bot in the cup earlier this year had placed 25th, illustrating rapid improvement in AI forecasting. Mantic built its bot on top of OpenAI’s o1 reasoning model, and the result signals that machine forecasting is approaching human-level performance on a tightly scoped set of questions. Still, experts caution that the contest features a small sample and a format that can advantage automated predictors.
The cup’s format covers a relatively small slate of about 60 questions, and roughly 600 contestants participate, most of them amateurs who predict only a handful of items. That setup, plus the ability for AI systems to continuously update estimates as new information arrives, can favor machines that keep revising their forecasts. A human forecaster likely cannot sustain the same cadence across hundreds of questions without substantial time and resources.
Industry observers say forecasting for geopolitics and technology uses complex, interdependent factors that are hard for any single model to master. Deger Turan, chief executive of Metaculus, notes that large language models can handle messy information and simulate human judgment by learning from vast amounts of past questions and outcomes. They improve by answering many questions, observing outcomes, and adjusting methods on a scale beyond what individual humans can manage.
Ben Turtel, chief executive of LightningRod, which builds forecasting AIs, says a key step in the AI’s improvement is exposure to large training sets. His team trained a recent model on about 100,000 forecasting questions, a scale that helps the AI imitate the way humans learn from repeated trial and error. The training approach shows up in rankings, as AI systems become more adept at converting raw data into probabilistic estimates.
Anthony Vassalo, co-director of RAND’s Forecasting Initiative, emphasizes that forecasting is decision support. Forecasters help institutions anticipate future developments and adjust plans accordingly, updating predictions after new policies are enacted. If decision makers pursue a path they dislike, forecasters can illustrate how policy changes might alter outcomes. In practice, Vassalo notes, forecasting must be integrated with the ability to influence decisions, not just to predict them. Yet he cautions that geopolitical forecasting remains inherently hard, and AI is not a substitute for human judgment but a tool to extend it.
For observers, the most compelling takeaway is not that a bot defeated humans, but that machines can monitor hundreds of questions simultaneously and provide early, broad coverage that human teams cannot sustain. Anthony Vassalo argues that the AI’s potential lies in matching broad crowd-level insights rather than supplanting expert forecasters. If an AI can reach the caliber of the crowd across many topics, it becomes a valuable companion for decision makers who require timely, scalable signals. Still, the Metaculus community prediction—the aggregate of all user forecasts—remains one of the platform’s strongest performers, illustrating why human judgment and machine-assisted forecasting may together offer the most robust view of uncertain futures.
In the end, forecasters stress that forecasting is about guidance and preparation rather than certainty. Ben Wilson, an engineer at Metaculus who compares AIs with humans on forecasting challenges, notes that the competition’s limited scope and participant base mean results should be interpreted with caution. While AI contributions can expand coverage and speed, they do not remove the need for critical human analysis when decisions carry high stakes. As AI tools scale up, the balance between automated insight and strategic human oversight will shape how institutions prepare for an uncertain horizon.