# Chess 3-Layer Lab — Roadmap

> **Stand:** 2026-05-28 (Session 1222 Public-Deploy)
> Phases A-D foundation, Phase E + G LIVE, Phase F + H backlog.

## Phase A — Schema + LISTEN/NOTIFY (DONE, S1220)

- Postgres schema `chess.*` with 8 operational tables + 4 NOTIFY triggers (extended to 5 in S1221).
- View `chess.games_overview` for dashboard counts.
- Migration scripts idempotent.

## Phase B — Engine + LangGraph Pipeline (DONE, S1220)

- chess.js wrapper for legal moves + FEN + PGN.
- 9-station LangGraph pipeline with append-only stations reducer.
- Deterministic mock-mover for CI (seed-based).
- Subprocess LLM adapter with env-strip for ANTHROPIC_API_KEY + CLAUDE_API_KEY.
- Lichess Opening Explorer + Cloud Eval + Tablebase adapters with rate-limit guard (1s min interval, 60s penalty on 429).

## Phase C — Persistence (DONE, S1220)

- withTx wrapper + ON CONFLICT idempotent inserts.
- Stale-game cleanup on startup (auto-aborts games stuck >2h in running status).
- tool_calls.seq column for ordered nested traces.

## Phase D — Memory Bridge V1 Stub (DONE, S1220)

- 4 stub functions with clear V2/V3 API.
- Pipeline runs even with 0 memory hits (recall stations mark skipped).

## Phase E — Real LLM Calls + Memory V2 + Langfuse + Cost-Tracking (DONE, S1221)

- Subprocess `claude -p` calls in plan + candidates + reflect nodes with strict-JSON prompts.
- Memory V2: direct Postgres reads from `nex_learnings` with project + tag filter (tenant-isolated per agent).
- Langfuse trace + move-span + station-span hierarchy. Cost match DB ↔ Langfuse-API to <0.0001 USD.
- Post-game auto-reflect stores winning + losing lessons in `nex_learnings`.
- 2 live smoke games verified (Ruy Lopez 6 plies $0.38, second game 4 plies $0.22).

## Phase G — Frontend Live Dashboard (DONE, S1221)

- Next.js 16 with basePath `/chess` on port 5281.
- SSE broadcaster ported from polis-web with exponential-backoff reconnect + per-IP + global cap.
- 4-board live dashboard with flash animation + tier accent.
- Move accordion with all 9 stations + tool-call detail + per-station cost + model used.
- Replay slider with per-ply fetch + auto-play 0.5x / 1x / 2x / 4x.
- Agent stats page with win-rate + recent games + memory hit rate.

## Phase G2 — Public Deploy (DONE, S1222 — this commit)

- nginx vhost `meetmyagent.io.conf` with `location ^~ /chess/` + `location = /chess` (both proxy_pass; the redirect-variant creates an endless loop with Next.js trailingSlash:false default).
- systemd-user `chess-web.service` analog `polis-web.service` (reboot-safe via linger).
- 3 discovery files: `llms.txt` + `.well-known/agents.json` (Spec 1.0) + `.well-known/agent-card.json` (Spec 0.3).
- Pool hardening in chess-web/lib/db.ts: on('error') handler + application_name + connectionTimeoutMillis (drift correction vs polis pattern).
- recall-nodes gameId propagation fix (chess.memory_events.game_id now populated on every recall).
- Domain-level discovery: `humans.txt` + `sitemap.xml` + `robots.txt` updated with chess sub-experiment.
- 3-agent code review (Critic + Analyst + Research) R1+R2 GO with 1 HIGH + 2 MED + Pool-hardening all fixed.

## Phase F — Darwin Generation-Cycle (BACKLOG)

Goal: `chess-sonnet-tmd` evolves its reasoning prompts via [`darwin-agents@0.5.0-alpha.2`](https://www.npmjs.com/package/darwin-agents).

Steps:
1. Define `DARWIN_CHESS_OBJECTIVES` as a code constant (win-rate + position-score + cost). R5 backlog from S1220 review.
2. Reflector inspects last 20 game trajectories for board 4.
3. GepaOptimizer generates N=5 prompt variants per station (plan, candidates, reflect).
4. Pareto-select with max-length 5000 chars enforced (prompt-bloat anti-pattern).
5. Write to `chess.darwin_generations` with pareto_scores + surviving flag.
6. Next 20 games use the surviving prompt set.

Exit criterion: generation N+1 of tmd beats generation N over 20 head-to-head test games against the static sonnet-tm baseline.

## Phase H — Temporal Schedule (BACKLOG)

Temporal cluster has been LIVE on dev2 since S1183. The cluster currently runs polis workflows and the t01-t05 templates. Phase H wires chess into it:

1. `chess-game-workflow` Temporal workflow with activity retry + `defineSignal` for game-stop + heartbeat per ply.
2. `chess-tournament-workflow` with `executeChild` for 24-game waves.
3. Schedule API: 4 games per day per board (6-hour rhythm), Sundays the calibration vs Stockfish runs.
4. Memory-maintenance saga: Sunday `nex_decay` + `nex_deduplicate` over chess-tagged learnings.
5. Reference pattern: `/home/simple/temporal-memory-workflows/templates/t01-memory-aware-agent`.

Exit criterion: 24 hours of uninterrupted scheduled play with 0 stale games and at least one activity retry firing without supervisor intervention.

## Phase I — Public Press (BACKLOG, after F + H stabilize)

- Reddit post in r/ClaudeAI + r/MachineLearning with empirical data after 100+ games.
- LinkedIn long-form post with results.
- Public GitHub mirror on studiomeyer-io org.
- Submission to OSF for pre-registration of the win-rate hypothesis.

## V1.1 — Smaller Items (BACKLOG)

- Discovery-files-parity full sweep (still missing: IndexNow key dedup decision since polis owns the meetmyagent.io key, og.png raster export, sitemap-index split per sub-path).
- End-to-end browser smoke during an active game (SSE bridge is code-complete + initial event:hello verified, live 4-board flash animation not yet seen with running game).
- LiveDashboard NEW-1 CSS-animation refactor (cosmetic, replace JS timer).
- Critic-M2 UUID-type station_id, Critic-M3 subprocess-test-coverage, Analyst-M3 parseFen useMemo + ChessBoard React.memo.

## Live URLs

- Hub: [https://meetmyagent.io/chess](https://meetmyagent.io/chess)
- Whitepaper: [/chess/WHITEPAPER.md](https://meetmyagent.io/chess/WHITEPAPER.md)
- This roadmap: [/chess/ROADMAP.md](https://meetmyagent.io/chess/ROADMAP.md)
- Sibling: [/polis](https://meetmyagent.io/polis)

## Operator

StudioMeyer / Matthias Meyer (Palma de Mallorca). Contact: matthias [at] studiomeyer.io.
