Teaching a Computer to Play 4X: How the Annhexation AI Works

Building a believable computer opponent for a 4X strategy game is one of those problems that turns out to be bottomless. I’d use the cliche it looks simple from the outside… but I don’t think thats true, I thought this would be a tough nut from the outset. I’ve built a chess playing engine before and that was far simpler to get a strong opponent - though it helps that that is such a well understood and documented problem. The player wants an opponent that explores, expands, exploits and exterminates with apparent intent — one that musters an army over several turns, marches it across a continent, lands it on your shore and takes your city, all while you watched it coming and couldn’t quite stop it. They do not want an opponent that teleports units, reads your mind, or sits inert in its starting cities until you wander into range.

This post is a tour through the Annhexation AI — explaining how it makes decisions, what it remembers between turns, and how the same core machinery produces eight distinct civilizations and four difficulty levels. Annhexation isn’t open source, so rather than quote the implementation I’ll describe the design and illustrate the interesting bits with pseudocode.

I should note that the AI is still under development but after a lot of bashing with a hammer its feeling in a pretty decent place.

The core idea

The single most important design decision in the Annhexation AI is that strategy, planning and execution are decoupled. These are three layers that are seperated on purpose and an AI turn flows through three layers:

Strategic layer — What am I trying to achieve? Peace or war, expansion or consolidation, science race or turtle with wonders. This layer thinks in goals that last tens of turns.
Operational layer — How do I achieve my strategic goals? Resource allocation, unit quotas, attack plans, city production, research direction. This is the planning.
Tactical / execution layer — What do I actually do this turn? Move this unit here, attack that stack, fortify this garrison, embark these troops. Turn to turn execution.

The payoff of this separation is, hopefully, coherence over time. A greedy turn-by-turn AI looks twitchy: it builds an army, gets distracted, disbands it, builds another. By contrast, an Annhexation AI that adopts a militaryPush goal will hold that goal for twenty-plus turns, funnelling production, research and unit movement toward a single objective until the city falls, the campaign demonstrably fails, or something seismic interrupts it. Strategy should be sticky while execution is flexible.

A complete turn runs as an ordered sequence of discrete phases — from threat assessment and diplomacy through combat, movement, production and fortification:

function runTurn(player, world, aiState):
    detectEvents(aiState, world)          # diff against last turn → fire interrupts
    aiState.goals = evaluateStrategy(player, world, aiState)
    plans = buildOperationalPlans(aiState.goals, player, world)
    executeTactics(plans, player, world)  # the phase sequence (see below)
    aiState.snapshot = snapshot(world)     # remember this turn for next time
    return aiState

Strategy

At the heart of the strategic layer is a prioritized goal stack. Each turn the AI either keeps its current goals or re-evaluates them, and the menu of things it can want is rich:

earlyExpand — plant N cities before consolidating
earlyRush — exploit the opening with an aggressive early attack
infrastructureConsolidation — buildings, population, growth
militaryPush — sustained warfare against a chosen player
defensiveWar / counterattack — react to aggression, retake what was lost
navalInvasion — assault a distant landmass
wonderRace, scienceVictoryPush, scoreOptimisation — the peaceful victory paths
raidWar, asymmetricWar — economic harassment instead of conquest
warPreparation, nuclearFirstStrike, recovery — the situational specials

Goals don’t fire on rigid rules rather they’re scored against each other and the highest-utility ones win. The scoring blends several signals:

Proximity. How far is the nearest enemy city? Distant neighbours (≥14 hexes) push the AI toward peaceful expansion; close ones (≤4 hexes) pull it toward military goals. Geography shapes temperament.
Force balance. Am I winning the simulated battles? Losing exchanges suppress military goals and inflate defensive ones.
Catch-up. Falling behind on city count inflates expansion scores so a boxed-in AI tries harder to grow.
Opportunity. Multipliers derived from how every met rival is currently behaving (more on that below).

Every score is then multiplied by a personality weight. Roughly:

function scoreGoals(player, world, personality):
    scores = {}
    for goal in CANDIDATE_GOALS:
        base = goal.baseValue(player, world)
        world_factors = proximity × forceBalance × catchUp × opportunity
        scores[goal] = base × world_factors × personality.weightFor(goal)
    return sortDescending(scores)

# e.g. early-expand ≈ base × siteRatio × proximityAdj × catchUp × personality.expansion

Two of those terms are about the world; one is about who this civ is. That’s how the same evaluation function produces a cautious turtle and a rampaging horde.

The top goal (priority 0) drives the turn. Secondary goals queue behind it, ready to take over the moment an interrupt fires.

Reading the opponents

A 4X AI that only looks at its own empire plays in a vacuum. Annhexation’s AI explicitly models every player it has met before deciding who to fight.

The AI profiles each known rival across roughly eleven dimensions, each normalised to [0, 1]:

militarisation, development, expansionism, techPace
exposure and coastalExposure (undefended or weakly-garrisoned cities)
borderTension and aggression (forces massed near our borders, active wars)
wonderFocus, scienceFocus, and the all-important isRunawayLeader flag

It also tracks trends — rising, flat or falling over the last five turns — so the AI reacts to a rival who is accelerating, not just one who is currently strong. Those snapshots are kept in persistent state so trend detection survives across turns.

A second pass turns those profiles into a war-target ranking. For each rival it weighs:

Aggression affinity — does attacking this player suit my personality?
Strength — can I actually win?
Accessibility — can I even reach them?
Stability — are they conveniently distracted by another war?

function scoreWarTargets(rivals, me, personality):
    for r in rivals:
        affinity      = personality.aggression × r.borderTension
        winnable      = clamp(myStrength / r.militarisation)
        reachable     = 1 / (1 + travelCost(me, r))
        distracted    = r.aggression_elsewhere
        r.score       = affinity × winnable × reachable × (1 + distracted)
    return sortDescending(rivals)

The winner of that scoring becomes the target of a militaryPush, and the magnitude feeds back as an opportunity multiplier into goal evaluation. An exposed, accessible, distracted neighbour is a temptation the AI is built to notice and exploit.

Personalities and doctrine

Personality in Annhexation isn’t a single “aggression” slider — it’s a vector of about twenty weights (military production, attack appetite, expansion, wonder-building, research, naval production, raid preference, plus early-game tuning like second-city urgency and first-build preference).

On top of that sits the doctrine system — eight civ-specific playbooks that override those weights and the AI’s unit-composition preferences:

Civ	Doctrine	Signature
Mongolia	`HORSE_RUSH`	+50% military production, +50% attack, double raid preference, cavalry-heavy armies
Aztecs	`WARRIOR_RUSH`	+40% military & attack, −20% expansion, melee-heavy early aggression
Russia	`EXPAND_WIDE`	+40% expansion, +30% garrison commitment
Rome	`INFRA_FIRST`	+40% infrastructure, +30% expansion
France	`WAR_FOR_SCIENCE`	+40% research, +30% science-victory focus
Greece	`STRATEGIST`	balanced militarisation across all domains
Egypt	`TURTLE_WONDERS`	+50% wonders & culture, −20% military
England	`COASTAL_ONLY`	+40% naval, +50% coastal-site preference, harbour priority

Because the doctrine only modulates shared machinery, Egypt and Mongolia run the identical goal-evaluation and combat code — they simply weight it toward completely different ends. Mongolia drowns you in cavalry; Egypt hides behind wonders and culture; England fights for the coastline.

Combined with unique per civ units this gives each civ a distinctive personality.

Operational planning: from intent to orders

Once a goal is chosen, the operational layer turns intent into concrete plans.

Unit quotas compute empire-wide demand for each unit class — settlers, workers, garrison, field army, reserve, naval, raiders — each scaled by goals, threat levels, personality and difficulty. During a militaryPush against a walled city, for instance, the garrison quota rises with threat level, melee demand jumps, and siege units become mandatory — you cannot crack walls without them, and the AI knows it.

Unit composition picks the melee/ranged/siege/mounted ratio for an army. Against an unwalled city it loads up on ranged units (free damage); against walls it must bring siege. Doctrine tilts the mix, and resource gating caps it — no horses means no cavalry, no iron means no siege, full stop:

function targetComposition(target, doctrine, resources):
    if target.walled: mix = {melee: 0.4, siege: 0.4, ranged: 0.2}
    else:             mix = {melee: 0.4, ranged: 0.5, mounted: 0.1}
    mix = applyDoctrineBias(mix, doctrine)   # HORSE_RUSH → more mounted, etc.
    if not resources.horses: mix.mounted = 0
    if not resources.iron:   mix.siege   = 0
    return normalise(mix)

Attack plans are first-class, multi-turn objects with an explicit lifecycle:

mustering → gathering → advancing → besieging → assaulting
                ↘ (naval) awaitingTransport → embarking → sailing → landing ↗

Target selection scores enemy cities by proximity (−5 per hex of distance), with bonuses for being unwalled (+15), being a capital (+10), and sitting near iron or horses the AI needs (a big multiplier gated on personality and urgency). It goes for the weakest reachable target first — and it commits.

City production is a distributed priority queue: high-output cities feed global military needs first, low-output cities backfill settlers and workers. The priority cascade runs upgrades → settlers → garrison → military → naval → workers/roads → buildings → wonders, gated by the active goal.

Research follows the goal: an expanding AI beelines the wheel and animal husbandry. A science-victory AI walks a hardcoded path toward rocketry while a warring AI weights military techs. It searches the prerequisite tree but abandons paths longer than three techs — no hundred-turn detours. In theory!

Worker management plans and caches road routes between cities and strategic resources, invalidating them when borders flip. Bottleneck detection explicitly diagnoses why military modernisation is stalled — waiting on a tech, lacking road access to iron, missing currency for trade — and escalates urgency the longer the bottleneck persists.

Tactical execution: a turn, phase by phase

When the planning is done, the AI executes the turn as an ordered sequence of phases. Roughly:

Event detection & city-loss response      (compare against last turn's snapshot)
Emergency garrison fill                    (enemy standing on a city tile)
Unit upgrades & recalls
Retreats                                   (pull damaged units that aren't committed)
Combat                                     (city defence first, then general)
Naval invasion lifecycle                   (drive the beachhead state machines)
Settler escorts & transport convergence
Army movement                              (via the movement planner)
Build orders                               (worker tasks, roads)
Diplomacy                                  (trade, war declarations)
City Defence Commander                     (per-city garrison assignment)
Government & tech completion
Fortification & hidden-unit setup

A few pieces deserve a closer look.

Combat simulation estimates each attack before committing: attack strength (scaled by a difficulty-dependent effectiveness multiplier) versus defence strength (garrison, terrain and fortify bonuses), turned into a win probability and an expected HP loss.
On higher difficulties, combat phasing models ranged-fires-first, melee-counterattacks, melee-finishes — so the AI understands the value of softening a target with archers before the melee goes in. On Easy, that phasing is switched off, dumbing the AI down on purpose.

function shouldAttack(attacker, defender, difficulty):
    atk = attacker.strength × difficulty.combatEffectiveness
    def = defender.strength × terrainBonus × fortifyBonus × garrisonBonus
    winProb = clamp(0.5 + (atk - def) × 0.1, 0, 1)
    return winProb ≥ attacker.riskTolerance

Movement shares a context across all units so two units never plan into the same tile (no accidental stacking). It uses strategic pathing with an A* fallback, plus anti-oscillation rules — it won’t step back onto a tile it occupied in the last couple of turns unless it’s hurt or there’s an enemy adjacent — which kills the classic “AI unit jitters back and forth forever” bug.
Retreat pulls units below an HP threshold (50% on Easy, down to 20% on Deity) or when outnumbered 2:1 nearby — but garrisons never retreat, assault-committed units only break below 15%, and loaded transports never run. Commitment is respected.
The City Defence Commander automates each threatened city’s garrison through its own little state machine — reinforcing → defending → critical → secure — tracking the local force balance and issuing movement orders to defenders. Cities defend themselves intelligently without the strategic layer micromanaging every hex.

Memory: what the AI carries between turns

None of this multi-turn coherence works without persistence. The AI’s state object is serialised between turns and carries, among other things:

the goal stack and all live attack plans with their lifecycle state
unit assignments — which unit is a garrison, a field-army member, a raider, a scout — and what it’s committed to
the border model, classifying cities as capital / frontier / critical / interior and tracking tension per neighbour
posture snapshots (five turns of history), grievances, pending attacks and city-defence commands
cached road routes, resource-access graphs, and settler journey state
the IDs of cities we’ve lost, so a counterattack knows what to retake
a full snapshot of last turn for event detection

That last point drives the AI’s reactivity. Each turn it diffs the current world against last turn’s snapshot to spot captured or lost cities, fresh war declarations, lost wonders, completed techs, detected nukes, and pillaged tiles. Any of these can fire an interrupt that pre-empts the current goal — lose a city and the AI drops what it was doing to respond; lose your capital and counterattack jumps the stack.

function detectEvents(aiState, world):
    prev = aiState.snapshot
    for change in diff(prev, world):
        if change is CITY_LOST:        raise Interrupt(counterattack, change.city)
        if change is WAR_DECLARED:     raise Interrupt(defensiveWar, change.by)
        if change is NUKE_DETECTED:    raise Interrupt(recovery, change.where)
        ...                            # wonders lost, tiles pillaged, techs done

Difficulty: honest tuning plus a few sanctioned cheats

Difficulty in Annhexation is partly competence and partly bonus — and the line between them is deliberate.

	Easy	Normal	Hard	Deity
Production / Research / Gold	0.8×	1.0×	1.15× / 1.1× / 1.1×	1.3× / 1.25× / 1.2×
Combat phasing & focus fire	off	on	on	on
Will retreat	no	yes	yes	yes
Combat effectiveness	0.95×	1.0×	1.08×	1.15×
Decision accuracy	~60%	100%	100%	100%
Strategy re-evaluation	every 20 turns	12	10	8

So an Easy AI isn’t just weaker — it genuinely plays worse: it makes suboptimal choices more often, doesn’t phase its combat, doesn’t retreat damaged units, and reconsiders its strategy only sluggishly. A Deity AI plays the engine to its full ability and gets economic bonuses on top.

The higher difficulties also unlock a small, clearly-scoped set of adaptive cheats: a fog-of-war peek at rival posture, conditional production boosts while pursuing a goal, completion boosts on the home stretch of a wonder or spaceship, and an increased chance of coordinating a joint attack with another AI. These are bonuses with a purpose rather than omniscience.

What it’s optimised for

The Annhexation AI deliberately trades short-term tactical perfection for long-term strategic coherence. Its unit movement is somewhat greedy; it will occasionally make a locally-suboptimal step. But it musters real armies, plans amphibious invasions across several turns, reads which neighbour is weak and accessible, holds a campaign together through a dozen turns of grinding siege, and reacts when you take one of its cities.

The architecture is what makes that possible: a sticky goal stack on top, multi-turn plans in the middle, flexible greedy execution at the bottom, and a persistent memory threading it all together — with personality and difficulty as multipliers reaching into every layer. The result is eight civilizations that feel different, four difficulty levels that genuinely play differently, and an opponent whose intentions you can usually see coming. Stopping them is the game.

Testing and tools

It doesn’t take long before you realise that working on the AI will need you to analyse a lot of games and a lot of data. You need to see why it did something - as the AI grows in complexity you’ll find, or I found, that I would end up with units sat idle, units osciallating between two positions, hopeless attacks, settlers refusing to found cities. And all this can be impacted by all the possibilities that can emerge from the complex set of rules the AI follows and the situations that develop on the map.

And so you need instrumentation, a way to interrogate it, and a way to play more games than you humanly can. At least as a solo developer!

And so a big chunk of work turned out not to be the AI itself but building tools to let me use it and interrogate it.

A headless CLI for batch simulation

Playing the game by hand to test the AI is hopeless — turns are slow, and you need hundreds of them across many games to spot patterns. So there’s a command-line testbed that runs all-AI games with no rendering and no human in the loop:

testbed new   --map continent --difficulty deity --players 6   # create an all-AI game
testbed run   <gameId> --turns 250 --snapshot-every 10         # advance it, headless
testbed inspect <gameId>                                        # one-shot state summary
testbed list                                                    # all games + winners

run advances a game by N turns as fast as the machine will go, printing per-turn progress and bailing early if someone wins. inspect dumps a per-player table — civ, city count, unit count, gold, current research, alive or dead — and list shows every game in the diagnostics directory with its current turn and winner. This is what turns “I think the Mongolian AI rushes too hard” into “I ran forty games and Mongolia wins by turn 90 in thirty of them” — the difference between a hunch and a regression test. Everything is stored in a per-game directory (state.json, ai-states.json, a run.log of notable events like cities founded and wars declared) ready for inspection.

An in-browser testbed and AI inspector

The CLI is great for volume but blind to space — it can’t show you that the army is stuck because a single enemy scout is sitting on the only bridge. For that I run all-AI games inside the actual client. When a game has no human player the normal “End Turn” button is replaced by a testbed panel: buttons to advance 1, 5, 10, 20, 50 or 250 turns, and a “view as” dropdown that swaps the map’s fog-of-war filter so you can watch the game unfold from any AI’s perspective.

Layered on top of that is an AI inspector that lets you select any AI unit or city and it surfaces the internal state that the JSON logs hold, but anchored to what you’re looking at on the map:

the player’s goal stack — each goal’s type, priority, whether it’s active or blocked, the turn it was created, and goal-specific detail (militaryPush vs player_2 → city_42, scienceVictory: 4/4 parts, 5 techs left)
live attack plans — target city, lifecycle state (gathering → besieging → assault), unit fill (5/8 units, siege needed) and rally point
the selected unit’s assignment — role, commitment, target, the turn it was assigned, and the plan it belongs to
the selected city’s classification (interior / border / coastal), the goals that involve it, and its garrison strength
the border model — per-rival tension, culture pressure with turns-to-flip, chokepoint counts
the personality weights that are notably high or low

Turn-by-turn decision logs

Underneath both of those is the thing I lean on most: every AI writes a complete, structured record of its reasoning every single turn. Point an environment variable at a directory and each turn produces a pretty-printed JSON file per AI player — turn-014-mongolia.json and a companion full-state ai-state-014-mongolia.json.

These aren’t log lines; they’re a forensic snapshot of the entire decision. A single turn file captures the goal stack with its scores, the posture and opportunity score it assigned every rival, every city’s production and classification, every unit’s assignment (role, target, commitment, position, HP), the active attack plans — and, crucially, a command trace: an ordered list of every command the AI issued that turn, tagged with the phase that issued it, and success: true or a blocked reason straight from the engine. So when a move silently does nothing, the log tells you the engine rejected it and why.

There are dedicated traces for the gnarly subsystems too: a combat trace of every simulated fight, a naval lifecycle narrative for debugging amphibious invasions (the single most fiddly thing in the whole AI), and a citySiteDecisions list recording every settle attempt and its outcome — accepted, too-close-to-foreign-city, food-tiles-short, on-foreign-landmass-blocked. That last one is the cure for the maddening “why won’t this settler settle?” bug: the answer is right there in the file. Here’s a heavily, heavily, trimmed example JSON from a turn:

{
  "turn": 18, "playerId": "player_4", "civilisation": "greece",
  "doctrine": "STRATEGIST", "difficulty": "hard",

  "goals": [
    { "type": "earlyExpand", "priority": 0, "status": "active", "createdOnTurn": 11,
      "targetCityCount": 4, "settlerCount": 0,
      "bestSites": [
        { "q": 23, "r": 20, "totalScore": 111.4, "penalties": 0 },
        { "q": 25, "r": 19, "totalScore": 109.6, "penalties": 0 }
        /* … 277 more, descending … */
      ] },
    { "type": "infrastructureConsolidation", "priority": 1, "status": "active" },
    { "type": "warPreparation", "priority": 2, "status": "active",
      "targetPlayerId": "player_1", "targetForceSize": 4, "currentForceSize": 3 }
  ],

  "postures": {
    "player_2": { "militarisation": 0.69, "isRunawayLeader": true, "borderTension": 0.27 }
  },

  "cities": [
    { "name": "Athens", "population": 2, "production": "library", "classification": "border" }
  ],

  "commandTrace": [
    { "step": "10", "command": "moveUnit", "unitId": "unit_14", "role": "worker",
      "from": "25,23", "to": "26,23", "success": true },
    { "step": "10", "command": "buildImprovement", "unitId": "unit_14", "success": true },
    { "step": "16", "command": "endTurn", "success": true }
  ]
}

The workflow ties together neatly. Run a few hundred turns headless with the CLI; spot a game that went wrong in the list output; either replay it in the browser with the F3 inspector or crack open the turn-N JSON and read, in order, exactly what the AI was thinking and what the engine let it do. Most of the “the AI is being dumb” moments turn out to be one specific, fixable thing — and these tools are how you find it instead of guessing.

Conclusions

Creating an AI for a 4X is definitely quite an undertaking. Its pretty easy to get units moving around but getting the AI to act in ways that are both interesting and credible takes a lot of effort. Its not that the code is complicated but that their is so much interacting that small changes can result in difficult to predict second and third order effects.

I spent countless hours on things that on the one hand seem simple “stop a unit from oscillating between A and B” but turn out to be really rather complex. While yes you can put in guards “don’t do this” the guards themselves can have unforeseen effects and don’t fix root problems.

You also can’t automate all this away. Yes you can create test cases, yes you can have the AI play countless games against the AI, but an AI isn’t a human and its the human the AI has to respond interestingly to.

I’ve released Annhexation into early access now and the primary reason for that is the AI. I need more people to play it and then resolve the things that inevitably will emerge.

If you’d like to give it a go you can play it online, for free, now.