# Polis Whitepaper

**Multi-Agent Society Simulation as Build-in-Public Research Showcase**

> Version 0.1 · 2026-05-25 · Author: Matthias Meyer (StudioMeyer) · License: CC-BY-4.0
> Live: [meetmyagent.io](https://meetmyagent.io) · Engine source: [github.com/studiomeyer-io/polis-darwin](https://github.com/studiomeyer-io/polis-darwin) (MIT)

---

## 1. What Polis is

Polis is an open multi-agent society simulation. Eight Claude LLM agents take the roles of Mayor, Trader, Farmer, Researcher, Artist, plus a drama trio — Voodoo Priest (outsider with Loa and totem, predicts drama events), Stratege (military power, can putsch the Mayor), Areopagit (judge, can issue laws that lock verbs). They share a world state, hold individual goals, vote on building pledges via a veto-democracy mechanic, and must keep the town alive across a run of 30-50 ticks. A ninth agent — the Chronicler — scores the run on five axes (Survival, Output, Knowledge, Diversity, Happiness) and records the founding saga in the agents' own words.

The name *Polis* (Ancient Greek πόλις) refers to the city as a body of citizens, not the architecture. In Aristotle's Politics, the polis is the natural community that exists for the good life, formed by many voices instead of a single plan. That is what this simulation reproduces: not the layout of walls, but the voices that negotiate between them.

## 2. Why we built it

Three motivations, in order of weight:

**Research showcase for our open-source stack.** Polis is the most demanding integration test for [darwin-agents](https://github.com/studiomeyer-io/darwin-agents) (self-evolving prompts via Pareto-selected multi-critic), [darwin-langgraph](https://github.com/studiomeyer-io/darwin-langgraph) (LangChain-native adapter), and [LangGraph](https://github.com/langchain-ai/langgraphjs) (typed state-machine workflows). If five agents can negotiate a working town across 30 ticks without collapse, the stack is field-proven.

**Pattern-defining example for build-in-public.** The maintenance layer — seven agents that run the project itself — is a replicable pattern for any AI-built product. Source private, research public. Drafts go through a 24-hour staging soak before promotion. Sensitivity-scanned sync to a public mirror. Every code-review round, every cost number, every failed run is documented in public.

**Original contribution to the multi-agent society research direction.** Polis is the first publicly documented sim with Civilization-style era progression on top of governance mechanics (GovSim shared commons, AgentVerse veto-pledges, Voyager skill library, Smallville whisper channel). The plan is to add a deterministic antagonist (Fortuna), four game-over triggers, and durable workflow infrastructure — pushing the field from "narrative demos" toward "playable simulated societies with stakes".

## 3. Architecture

Three layers, deliberately decoupled.

**Sim layer (Polis Engine).** TypeScript, LangGraph state machine, one tick per cycle. Each role gets a focused prompt with only the world summary plus the last 10 events. Agents emit JSON action verbs (`pledge_bauen`, `feld_anlegen`, `whisper`, `inspire`, `erforschen`). Outputs are parsed, validated against the WorldState schema (Zod), and applied. The Chronicler reads the tick result and scores it. Per-tick state is persisted into a dedicated Postgres schema with LISTEN/NOTIFY triggers for live UI streaming.

**Web layer (polis-web).** Next.js 16 standalone, Tailwind v4. Phaser 4 scene renders the town as an isometric tile-map (Era-1 Kenney CC0 sprites planned in V2.2). Server-Sent Events stream tick-level updates from Postgres NOTIFY into the browser. Read API (`/api/runs`, `/api/runs/[id]/state/[tick]`) uses REPEATABLE READ READ ONLY transactions for snapshot consistency. Write API (`/api/runs/start`) is bearer-protected via constant-time `timingSafeEqual` with UTF-16LE encoding to defend against UTF-8 lone-surrogate auth bypass.

**Maintenance layer (seven Wartungs-Agents, private).** A dedicated fleet — CEO, CTO, Architect, Storyteller, Research, Analytics, Visibility — runs as scheduled jobs in our internal repo. The CTO is strictly read-only and reviews diffs with `codebase-memory-mcp` plus `codegraph` plus `archtracker`. The Architect plans new eras and sprites with a research-first discipline (≥2 sources cross-validated per claim). The Storyteller writes three-act narratives from run data, with log-dumps explicitly banned. Drafts go through a 24-hour staging soak before merging — no agent pushes directly to main.

**Durability layer (Temporal).** Polis runs are durable workflows, not scripts. Each tick runs as a Temporal Activity with retry policy. If a Claude subprocess times out (SIGTERM, exit 143), the Activity retries with exponential backoff. If the server reboots mid-run, the Workflow resumes on another worker where it left off. Schedule API replaces system cron for the 8-runs-per-day cadence — with audit trail, pause/resume from the Temporal UI, and backfill on outage. Pattern replicated from our [temporal-memory-workflows](https://github.com/studiomeyer-io/temporal-memory-workflows) stack as template T06: civilization-simulation.

## 4. Key technology choices, with reasoning

| Choice | Why |
|---|---|
| Claude CLI subprocess (`claude -p`) | One subprocess per agent per tick, isolated env, structured JSON output back to the orchestrator. |
| Temporal workflows | Durable execution across server restarts. Activity-retry catches subprocess timeouts. Schedule API replaces system cron with full audit trail. |
| LangGraph 1.3 | Typed state, deterministic stream order, AbortSignal propagation, native testing primitives. Pick over hand-rolled async because the state-machine becomes the documentation. |
| Postgres LISTEN/NOTIFY | Web-UI live updates without WebSocket infrastructure. NOTIFY payload kept under the 8 KB hard limit (we stay under 200 bytes per tick). |
| Phaser 4 → R3F for V4 | Phaser 4 stays through V3 as the 2D tile-map. V4 mounts React Three Fiber + Three.js + Kenney CC0 low-poly mesh behind a `?render=3d` URL flag. ADR-001 (Session 1192) captures the rationale. |
| Zod everywhere | Defense-in-depth: WorldState validated on persist + on read + on SSE payload. Prevents subtle drift bugs. |
| `claude-sonnet-4-6` for sim agents, `claude-opus-4-7` for maintenance agents | After Maiden-Run #5 we switched sim agents from Haiku to Sonnet — Haiku 4.5 produced 83% JSON-skip-rate on the Trader and 73% on the Researcher (too weak for the strict JSON output discipline with nested delta fields). Sonnet 4.6 holds the discipline. Opus runs the maintenance fleet's ADR-writing and code review. |

## 5. Open-source split

| Layer | Visibility | Repo |
|---|---|---|
| Polis Engine | PUBLIC, MIT | [studiomeyer-io/polis-darwin](https://github.com/studiomeyer-io/polis-darwin) |
| polis-web | PUBLIC, MIT | same |
| Documentation (`polis.md`, `meetmyagent.md`) | PUBLIC | same |
| Maintenance Fleet (7 agents) | PRIVATE | internal `madetocreate/nex-hq` |
| Shared lib for fleet | PRIVATE | same |

The split is intentional. The engine is research-relevant — open source brings external contributions, citation paths into academic work, and pattern-replication for other multi-agent sims. The fleet is operationally specific to our infrastructure (StudioMeyer Memory MCP cluster, Telegram bot, Cloudflare token chain) and would not benefit anyone outside our setup. The pattern itself is documented; the source stays private.

Sync from internal to public mirror runs via [`scripts/sync-polis-to-public.sh`](https://github.com/studiomeyer-io/polis-darwin) with a sensitivity scanner that aborts on token, URL, or path leaks.

## 6. Roadmap horizon

The full roadmap lives in [`ROADMAP.md`](/ROADMAP.md). High-level horizon:

- **V2.0** — Foundation: schema, engine refactor with persistence, web skeleton, Phaser scene with placeholder dots. *Live as of 2026-05-25.*
- **V2.1** — Drama foundation: **three new citizens** (Voodoo Priest, Stratege, Areopagit — brings active roster from 5 to 8), subordinate citizens (`anstellen` / `meutern`), 6 eras with transition triggers, 3 drama events (fire / hunger escalation / theft), Fortuna as deterministic antagonist (Mulberry32 PRNG), 4 game-over conditions.
- **V2.2** — Era-1 sprite pack: Kenney CC0 isometric tiles replace placeholder dots, manifest-driven loader.
- **V2.3** — Temporal adapter: opt-in durable workflows via `POLIS_USE_TEMPORAL=true`, subprocess-crash recovery, parallel-citizen child workflows, Schedule API replaces system cron.
- **V3** — Real town visualisation: pixel-art tile-map, building animations, citizen-variant sprites per role.
- **V4** — 3D city: React Three Fiber + Three.js + Kenney CC0 low-poly mesh behind `?render=3d` flag (coexists with V3 Phaser), `three.quarks` particles for drama events, orbit + top-down + first-person citizen-view cameras. ADR-001 in `research/`.

## 7. Research contribution we want to make

Most public multi-agent sims (Smallville, AI Town, Project Sid) show emergent narrative without external pressure. The agents talk, build relationships, drift. Polis adds three things the field has not combined yet:

1. **Era progression with persistent world state** — the town survives across runs, mechanics unlock as the citizens reach Pareto-score thresholds.
2. **Antagonist as deterministic sim entity** — Fortuna is not a meta-agent, it is a Mulberry32 PRNG-seeded trigger system that fires drama events. Reproducible. Replayable. Auditable.
3. **Operational durability via Temporal** — civilization simulations run for thousands of ticks. They need crash recovery, mid-run pause, audit trails. Temporal makes that boring infrastructure work.

If we ship V2.1 + V2.3, we have published the first multi-agent society sim with civilizational stakes and production-grade durability. Open source. Documented. Live.

---

## 8. Status as of 2026-05-25

- 🟢 Coming-soon site V1.7 LIVE on [meetmyagent.io](https://meetmyagent.io), German + English, Live-Feed of the latest citizen actions
- 🟢 V2.0-alpha engine refactored, 56/56 engine tests green, schema migrated
- 🟢 7-agent maintenance fleet live in code, LangGraph orchestrator, 24h staging soak, 101/101 tests
- 🟢 First V2.0 maiden run in flight (run #5, ~Tick 19 of 30, town built a library + house pledge after surviving near-collapse at Tick 4)
- 🟡 V2.1 drama foundation: spec confirmed, **3 new citizens (Voodoo / Stratege / Areopagit) speccified, code pending**
- 🟡 V2.2 sprite pack: Architect ADR-001 written (Kenney CC0), build pending
- 🟡 V2.3 Temporal adapter: decision yes, build pending (next session)
- 🟡 V4 3D city: V4.0-alpha foundation LIVE-CODE (R3F 9.6 + Three.js 0.171 + Drei 10.7, cubes behind `?render=3d`, 57/57 tests, agent-code-review-loop pass). Kenney CC0 binaries + drama particles deferred to V4.0-final / Phase E.
- ⚪ V3: roadmap-only

Read the live engine on [github.com/studiomeyer-io/polis-darwin](https://github.com/studiomeyer-io/polis-darwin). Reach out: `matthias@studiomeyer.io`.
