“May you burn in hell like you are going to burn here.” This was the final message in the 1983 US wargame Proud Prophet, which ended in a simulated nuclear exchange and a ruined world. The game shocked participants and contributed to a shift in US policy away from the idea that nuclear escalation could be controlled.
Proud Prophet explored what can’t be studied in real life—nuclear war—and yet it must be understood to guide policy. Today, as AI enters military command systems, we face a new methodological challenge: understanding how classified, future-focused decision-support AI might reshape nuclear crises—and heighten escalation risks.
Experimental wargaming—a variant of the wargaming long used by militaries and analysts alike— blends realism with social science rigour. It offers a way to empower policymakers with good data while tackling the fundamental difficulty of generating quality data around AI and nuclear risk.
The escalatory potential of decision-support AI
The integration of AI into nuclear command and control seems all but inevitable. It could provide strategic advantages while also enhancing nuclear safety mechanisms.
In this article, ‘AI’ refers primarily to decision-support systems—tools that provide recommendations, predictions, or strategic assessments. Rather than acting fully autonomously, these systems are positioned to shape the decision environment by influencing how humans frame problems, perceive options, and choose actions.
Humans are prone to judgment-clouding biases, which, in theory, is the kind of human fallibility that could be improved upon by integrating decision-support tools into command structures.
In the nuclear case, concerns over losing control in large nuclear states means the final say regarding the use of force will likely, for the most part, remain in human hands. It’s important to understand how these human choices can be improved, but it’s equally vital to identify where dangers from AI influence might lie. In contrast to safety-oriented AI arguments, recent work suggests their influence could be aggressive and escalatory.
Researchers found that large language models deployed in simulated nuclear wargames demonstrated significant escalatory tendencies. Peter Rautenbach
Researchers found that large language models deployed in simulated nuclear wargames demonstrated significant escalatory tendencies. Others warned that autonomous systems might be predisposed to taking advantage of tactical or strategic shifts, risking inadvertent escalation and missing de-escalation signals.
Even more troubling is the human side, where decision-support-triggered cognitive failings, such as automation and complacency biases, can develop. Cognitive biases in decision makers tend to favour aggression and “hawkishness”, and these biases are likely magnified by other AI-related phenomena, such as a compressed decision-making time period or de-skilling.
Contrary to assumptions that AI will reduce bias and enhance control, these dynamics suggest it may do the opposite. This gap between perception and reality is where danger lies and conflict risk grows.
The real-world use of Israel’s “Lavender” AI system raises further red flags. Lavender generated target lists during recent operations in Gaza. Before Lavender, Israeli intelligence went through a lengthy “incrimination” process, but when using this system human verification reportedly took as little as 20 seconds, where the human operator did nothing but confirm a target was male before approving the strike.
This information comes from the testimonials of six unnamed Israeli intelligence officers but remains largely unverified. Nonetheless, what is known is that the ‘Lavender’ system was used, and casualties have been extraordinarily high. Taken together, this raises troubling questions about the potential erosion of human control, which merits deep exploration.
A method for the unknown: Why wargaming works
To study these risks, we need new methods. Experimental wargaming—realistic simulations with controlled variables—offers a powerful tool for investigating how AI could affect nuclear decision-making.
Wargames are uniquely suited to tackling “weakly structured” problems—complex, poorly defined challenges with no obvious right answer. The methodology deals “with choice in the face of incomplete information, and represents a sophisticated method for unpacking, understanding, and preparing for potentially significant multi-factor (but low-frequency) phenomena.”
AI-human teaming in nuclear command systems is one such problem. Wargames involve consequential decision-making over multiple turns, which can allow researchers to test how decision-making unfolds when information is partial, stakes are high, and cognitive biases may creep in. They let us observe the messy, human part of strategic judgment.
Scholars have found that wargaming creates data that better reflects the real world because how people act in wargames better reflects their real counterparts. Peter Rautenbach
Scholars have found that wargaming creates data that better reflects the real world because how people act in wargames better reflects their real counterparts. Unlike surveys or interviews, wargames immerse players in a competitive environment, where their choices have consequences, and this generates data with a greater degree of external validity. Others have stressed “that gaming, as a story-living experience, engages the human brain, and hence the human being participating in a game, in ways more akin to real-life experience.”
Experimental wargaming differs from traditional military games like Proud Prophet by emphasising reproducibility and analytical clarity. Researchers simplify game mechanics to focus on specific variables—like the presence or absence of AI decision support—so they can trace cause and effect. This allows dozens—or even hundreds—more playthroughs than traditional wargaming. In doing so, we can generate a large pool of quality data from which to build and test theories.
The goal, and challenge, of experimental wargaming is to retain the original ability to tackle complex problems while balancing this with a need to pull defensible insights from the work. This doesn’t mean experimental games are rigid. Just as chess allows deep creativity within strict rules, well-designed games engage the imagination while still producing structured insight.
In balancing analytical utility with realism and player engagement, it offers a strong method for generating policy-relevant evidence on how AI might shape the most consequential decisions made in the most safety critical of environments.
Three paths forward
First, because of the risk related to nuclear-AI integration, policymakers and researchers should focus on a strategic lens of analysis when it comes to military AI risk reduction measures. While valuable, too often the conversation focuses narrowly on implausible full autonomous systems or “killer robots.” But subtler shifts—such as changes to information flow, confidence in predictions, or compressed timelines—may carry even greater strategic consequences. Understanding how AI influences real people making hard decisions should be a top priority.
Second, experimental wargaming should be scaled up and better supported as a key method for generating behavioural evidence. Wargames allow us to simulate plausible futures, observe how human-machine teams operate under pressure, and test escalation pathways before real-world crises occur. Governments, funders, and think tanks should invest in this work now—while policy can still be shaped proactively.
One recent example from The Institute for Security and Technology used a structured tabletop exercise to examine strategic stability risks related to AI integration with nuclear command systems. While not specifically experimental wargaming, it demonstrates the kind of project that can explore this topic, and an ability for structured simulations to generate meaningful insights.
Third, the insights gained through this research should inform consistent and recurring reassessment of national policy and international governance efforts. Much of what we ‘know’ rests on theory and historical analogy. As we close the data gap, actively ensuring it is put to good use will be key to achieving real impact in terms of risk reduction.
AI is not just a technical challenge—it’s a human one. If we want to govern its use responsibly, we need to understand how it changes the way we think, choose, and act when the stakes are highest.
The European Leadership Network itself as an institution holds no formal policy positions. The opinions articulated above represent the views of the authors rather than the European Leadership Network or its members. The ELN aims to encourage debates that will help develop Europe’s capacity to address the pressing foreign, defence, and security policy challenges of our time, to further its charitable purposes.
Image: Alamy, Arlume