Forecasting AI: Tournament shows even top experts struggle to predict progress
A forecasting tournament put prominent AI thinkers to the test and found their predictions often missed the mark, underscoring uncertainty about timelines for advanced capabilities

A recent forecasting tournament that asked experts to predict milestones in artificial intelligence found that even widely followed researchers and commentators often failed to anticipate how the field would evolve.
Organizers convened skilled forecasters and prominent AI figures to compare judgments about when particular capabilities or benchmarks would be reached. The exercise highlighted how quickly views can shift in response to new research and why predicting AI progress remains difficult even for insiders.
Two of the participants who drew attention during the tournament were François Chollet and Dwarkesh Patel. Chollet, the creator of the Keras deep‑learning library and author of the ARC‑AGI benchmark that tests whether systems exhibit broadly human‑level intelligence, has been known as a cautious voice in AI debates, often criticizing overly optimistic timelines. In the tournament, Chollet said his own timelines had shortened after researchers made progress on obstacles he had long identified, such as models’ weakness at recalling and applying previously learned information.
Dwarkesh Patel, host of a widely followed podcast that tracks what top AI scientists are thinking, moved in the opposite direction. After reporting on and interviewing researchers, Patel said he had become more skeptical that current AI approaches can achieve the kind of continuous, on‑the‑job learning that people display. That shift illustrates how exposure to new evidence can push even informed observers in divergent directions.
Tournament organizers designed questions intended to probe specific technical capabilities and the timing of benchmarks rather than solicit general opinion. Participants’ predictions were then compared with subsequent research developments. The overall result was that many forecasts did not align with what actually occurred, a finding the tournament organizers and participants described as evidence of persistent uncertainty about both the pace and the nature of AI progress.
Experts said the mismatch between forecasts and outcomes reflects several factors. Breakthroughs can arrive unexpectedly from research groups pursuing varied approaches, and incremental improvements in areas such as memory, data efficiency and model architecture can combine to produce sudden capability gains. Conversely, engineered systems often encounter unanticipated constraints when researchers attempt to scale or generalize methods, slowing progress.
Forecasting accuracy matters beyond academic curiosity: public policy, corporate investment, and safety planning all depend on plausible expectations about when advanced systems will emerge. If experts’ forecasts are unreliable, policymakers and institutions that plan on a particular timeline may underprepare or overcommit resources. Tournament participants and observers said systematic calibration — measuring which kinds of forecasters or questions produce better predictions — could help decision‑makers weigh competing views.
The tournament also underscored differences in how experts reason about progress. Some prioritize observable technical bottlenecks and whether recent results address them, while others emphasize the historical pattern of spurts and plateaus or focus on conceptual obstacles such as continual learning. These different heuristics produced opposing updates from Chollet and Patel, even though both are deeply embedded in the research community.
Organizers argued that repeated, structured forecasting exercises can improve collective understanding by revealing which assumptions produce accurate predictions and which do not. Participants welcomed the feedback but also cautioned against overinterpreting a single tournament. They noted that the AI landscape is shaped by a mix of reproducible engineering advances, serendipity, and shifts in research focus, making consistent long‑term forecasting inherently challenging.
As AI systems are deployed in more consequential domains, the tournament’s results are likely to renew interest in methods for improving foresight. Structured prediction tournaments, transparent tracking of forecasts against outcomes, and closer study of why particular predictions fail may help institutions navigate uncertainty. For now, the tournament reinforced a central lesson: even well‑informed experts can be surprised by the pace and direction of AI progress.