Commit 88448e

2026-03-16 17:45:47 Claude (MCP): [mcp] Comprehensive update: research findings, wiki shared state, git strategy, polling, PM delegate, resolved open questions
Design/Agent_IRC_Architecture.md ..
@@ 26,7 26,9 @@
**PM (you)** — sets priorities, answers product questions, makes scope decisions. Hangs out in `#project-{slug}`. Doesn't manage the sprint — that's the EM's job. Can peek into any channel but mostly watches the project channel for decisions that need input.
- **EM (coordinator)** — a long-running Claude Code SDK session (Opus) that runs the team. Breaks down requirements into tasks, assigns work, tracks progress, makes implementation decisions, surfaces product questions to the PM. Lives in `#project-{slug}` and `#standup-{slug}`. Shields the PM from implementation noise.
+ **PM delegate** — optionally, a Claude.ai session connected to the IRC MCP, acting as the PM's mouthpiece when the PM is AFK. The PM talks to Claude.ai from their phone or browser; Claude.ai speaks on the PM's behalf in the channels. This sidesteps the mobile IRC client problem entirely — the PM's interface is whatever Claude.ai runs on.
+
+ **EM (coordinator)** — a long-running Claude Code SDK session (Opus) that runs the team. Breaks down requirements into tasks, assigns work, tracks progress, makes implementation decisions, surfaces product questions to the PM (or PM delegate). Lives in `#project-{slug}` and `#standup-{slug}`. Shields the PM from implementation noise.
**Managers** — Claude Code SDK sessions (Opus) that own individual tasks. Follow the proceed workflow: plan, implement, test, review, fix, document. Each manager gets a `#work-{task-id}` channel for its workers. Reports status and completions to `#standup-{slug}`.
@@ 38,7 40,7 @@
All channels exist on a single IRC server. Multiple projects share the server, namespaced by slug.
- - **#project-{slug}** — PM + EM coordination. Product decisions, priority calls, scope questions. Low traffic, high signal. This is the channel you watch on your phone.
+ - **#project-{slug}** — PM + EM coordination. Product decisions, priority calls, scope questions. Low traffic, high signal.
- **#standup-{slug}** — EM + managers. Task assignments, status updates, completion reports. The sprint board. PM can lurk here if they want more detail.
- **#work-{task-id}** — manager + workers for a specific task. Implementation discussion, test results, review feedback. Noisy and disposable. Created when a task starts, abandoned when it completes.
- **#errors** — dead-letter channel. Any agent that hits an unrecoverable failure posts here. Monitored by the EM and optionally by the PM.
@@ 101,6 103,10 @@
Task state is tracked by the EM reading channel history and reasoning about it, not by a state machine. This is less reliable than a database but vastly more observable and simpler to build. If it breaks, you can see exactly where it broke by reading the channel.
+ ### Message length
+
+ ergo supports the IRCv3 `maxline` capability, allowing messages up to ~8KB (vs. the traditional 512-byte limit). The MCP bridge should negotiate `maxline` on connect. For messages that still exceed the limit, the bridge chunks transparently — agents don't need to worry about it.
+
### Configuration
```
@@ 113,6 119,19 @@
The MCP server maintains a single IRC connection and multiplexes tool calls from multiple agents. Agents identify themselves via a `sender` parameter so messages get the right nick attribution.
+ ## Shared state: robot.wtf wiki
+
+ IRC is ephemeral conversation. Durable shared state lives on a robot.wtf project wiki, accessed by agents via the wiki MCP.
+
+ - **Sprint state** — EM writes current task assignments, status, and blockers to a project wiki page. Survives EM shift-changes without replaying IRC history.
+ - **Handoff summaries** — outgoing agents write their handoff doc to the wiki. Incoming agent reads it as initial context.
+ - **Task specs and plans** — managers write plans to wiki pages before implementing. PM can review from anywhere.
+ - **Decision log** — architectural decisions, scope changes, PM rulings captured durably.
+
+ This means agents get **two MCP servers** in their config: the IRC bridge for communication, and robot.wtf for persistent shared state. IRC is the conversation, the wiki is the record.
+
+ EM recovery after a crash or shift-change: read the project wiki pages to reconstruct sprint state. No dependence on IRC `chathistory` depth.
+
## Agent lifecycle: long-running with shift-changes
Agents are long-running Claude Code SDK sessions. They persist across tasks, preserving context — a worker that just finished refactoring the auth module still has that code in context when the next auth-related task comes in.
@@ 121,9 140,15 @@
The Claude Code CLI is designed for a human at a terminal — prompt handling, display rendering, keybindings are all overhead when the consumer is a daemon. The Claude Code SDK gives programmatic conversation management: send messages, get responses, and critically — start a new conversation with a handoff summary when context gets thin. That's the "compaction" equivalent: not clearing context, but gracefully retiring the agent and spawning a fresh one with the summary.
- ### Polling
+ ### Polling and dispatch
- The supervisor injects periodic "check your channels" messages into each agent's SDK session. This is the polling heartbeat. Agents respond by reading their IRC channels via the MCP bridge and acting on anything new, or reporting idle.
+ Two-layer approach:
+
+ **Layer 1 — Supervisor push (primary).** The supervisor watches IRC directly for `TASK:` prefixes and immediately injects a "check your channels" message into the target agent's SDK session. Near-zero dispatch latency.
+
+ **Layer 2 — Polling (fallback/health).** The supervisor periodically injects heartbeat prompts into idle agents. This catches messages the supervisor missed (e.g., during its own restart) and proves the agent is still alive.
+
+ MVP polling strategy: flat 30s interval when idle, no polling when active. Add backoff (30s → 60s → 120s cap) later if idle token costs warrant it.
### Idle detection
@@ 131,29 156,69 @@
> "Here's the last 5 minutes of this agent's IRC activity. Is it idle? yes/no"
- Pennies per evaluation. This keeps the supervisor dumb — it doesn't need to understand task semantics, just whether to send a heartbeat.
+ Pennies per evaluation. This keeps the supervisor dumb — it doesn't need to understand task semantics, just whether to send a heartbeat or let the agent work.
+
+ ### Context monitoring
+
+ The Claude Code SDK does not expose a "% context used" metric. Each response includes per-turn `usage.input_tokens`. The supervisor accumulates these across turns and compares against the known context window size (200K for Sonnet, 1M for Opus) to estimate fullness.
+
+ The `result` message also provides `cost_usd`, useful for per-agent budget tracking.
### Context exhaustion and shift-changes
- When an agent's context crosses a threshold (monitored by the supervisor via SDK response metadata or token counts):
+ When an agent's estimated context usage crosses a threshold (e.g., 80% of window):
1. Supervisor tells the agent to produce a handoff summary.
- 2. Agent posts the summary to its task channel.
- 3. Agent posts a notice to `#standup-{slug}` that it's handing off.
+ 2. Agent writes the summary to the project wiki.
+ 3. Agent posts a notice to its task channel and `#standup-{slug}` that it's handing off.
4. Supervisor kills the session.
- 5. Supervisor spawns a replacement with a new name from the names file and the summary as initial context.
+ 5. Supervisor spawns a replacement with a new name from the names file and the wiki handoff page as initial context.
This is the "shift change" pattern — natural for an org metaphor. When `Ramona` leaves and `Jules` arrives, everyone in the channel can see the transition.
+ ## Git strategy
+
+ Multiple agents edit the same repos simultaneously. Branch isolation via git worktrees.
+
+ ### Worktree lifecycle
+
+ The **supervisor** creates worktrees before spawning agents. Agents never touch `git worktree add/remove`.
+
+ ```
+ ~/projects/repo/ # bind-mounted, main branch (protected)
+ ~/projects/repo/.worktrees/
+ agent-task-42/ # worktree for task 42
+ agent-task-71/ # worktree for task 71
+ ```
+
+ Supervisor sequence per task:
+ 1. `git worktree add .worktrees/agent-task-{id} -b task/{id}`
+ 2. Spawn agent session with `cwd` set to the worktree
+ 3. Pass the branch name in the agent's system prompt
+
+ ### Branch and merge strategy
+
+ - **Main branch is protected.** Agents never get the main worktree path, only task worktree paths.
+ - **Agents commit freely** to their `task/{id}` branch and push when done.
+ - **Human PM merges.** No auto-merge for MVP. The supervisor or EM can rebase branches and annotate PRs, but the merge button stays with the human.
+ - **Conflict prevention:** the EM avoids assigning overlapping file sets to concurrent agents. When conflicts happen anyway, the supervisor flags them for human resolution.
+
+ ### Cleanup
+
+ - **Normal completion:** supervisor waits for push, then `git worktree remove`. Branch retained until merged.
+ - **Agent crash:** supervisor commits any uncommitted work with `[INCOMPLETE]` prefix, removes worktree, flags task for reassignment.
+ - **On supervisor restart:** reconcile state against `git worktree list`, clean up orphans.
+
## Architecture components
Three independent components, deployed separately for independent failure domains:
### 1. ergo IRCd
- - Runs in an LXC container on a Proxmox server.
+ - Runs in an LXC container on a Proxmox server (16GB RAM).
- Set-and-forget after initial configuration.
- - IRCv3 `chathistory` for channel persistence.
+ - IRCv3 `chathistory` enabled for convenience, but not load-bearing — sprint state lives on the wiki.
+ - `maxline` capability enabled for longer messages.
- No TLS needed for LAN traffic in MVP.
### 2. IRC MCP bridge (FastMCP)
@@ 161,6 226,7 @@
- ~200 lines of Python.
- Wraps the transport abstraction with IRC backend.
- Exposes the five tools above.
+ - Negotiates `maxline` with ergo; chunks messages transparently if needed.
- Connects to ergo over LAN.
- Runs in a Docker container on the desktop.
@@ 168,13 234,13 @@
- Python process using the Claude Code SDK.
- Spawns and manages agent sessions (EM, managers, workers).
- - Monitors context usage.
- - Handles shift-changes (summarize → kill → respawn).
+ - Watches IRC for `TASK:` prefixes to push-dispatch agents.
+ - Monitors context usage (accumulated input tokens per session).
+ - Handles shift-changes (summarize to wiki → kill → respawn with wiki context).
- Runs the Haiku idle-checker.
+ - Creates and cleans up git worktrees.
- Runs in a Docker container on the desktop, alongside the bridge.
- - Bind-mounts a project directory from the host for git repo access.
-
- The bridge and supervisor are orchestrated via docker-compose on the desktop machine (128GB RAM). They share a Docker network for inter-container communication and both reach ergo over the LAN.
+ - Bind-mounts the project directory from the host.
```
Desktop (128GB RAM) Proxmox (16GB RAM)
@@ 187,10 253,14 @@
│ bind: ~/projects │ └──────────────────┘
└──────────────────────────────┘
- │ IRC client (phone/terminal)
+ │ IRC client (terminal) or Claude.ai (PM delegate)
PM
```
+ Agents receive two MCP server configs:
+ 1. **IRC bridge** — communication with other agents and the PM
+ 2. **robot.wtf wiki** — persistent shared state (sprint status, handoff summaries, task specs, decisions)
+
## Relationship to existing Agent_Workflow
What carries forward unchanged:
@@ 205,36 275,45 @@
- Coordination moves from in-process `Task`/`run_in_background` to IRC channel messages via MCP
- The orchestrator role splits: strategic coordination stays with the EM, human interaction moves to the channel
- - Question relay is replaced by direct channel participation — the PM is in the room
- - Task state lives in channel history, not in the orchestrator's context
+ - Question relay is replaced by direct channel participation — the PM is in the room (or their Claude.ai delegate is)
+ - Task state lives on the wiki, conversation happens on IRC
- Claude Code CLI replaced by Claude Code SDK for programmatic lifecycle management
## MVP scope
- 1. **ergo IRCd** in LXC on Proxmox. Single binary, default config, verify `chathistory` works.
- 2. **IRC MCP bridge** (~200 lines Python). FastMCP wrapping the transport abstraction. Five tools.
- 3. **Agent supervisor** — Python, Claude Code SDK, Haiku idle-checker, shift-change logic.
+ 1. **ergo IRCd** in LXC on Proxmox. Single binary, default config, enable `chathistory` and `maxline`.
+ 2. **IRC MCP bridge** (~200 lines Python). FastMCP wrapping the transport abstraction. Five tools. Docker container.
+ 3. **Agent supervisor** — Python, Claude Code SDK, Haiku idle-checker, shift-change logic, worktree management. Docker container.
4. **docker-compose** for bridge + supervisor on the desktop, bind-mounting the project directory.
- 5. **One EM process** — Opus, system-prompted as the engineering manager.
- 6. **One manager process** — spawned when the EM posts a task.
- 7. **PM** — connected to ergo from phone and/or terminal.
- 8. **One end-to-end task** — EM assigns, manager runs the proceed workflow, PM observes from IRC.
+ 5. **robot.wtf project wiki** for shared state (already exists).
+ 6. **One EM process** — Opus, system-prompted as the engineering manager.
+ 7. **One manager process** — spawned when the EM posts a task.
+ 8. **PM** — connected to ergo from terminal (weechat/irssi).
+ 9. **One end-to-end task** — EM assigns, manager runs the proceed workflow, PM observes from IRC, state persisted to wiki.
+
+ Not in MVP: multiple parallel workers, TLS, remote MCP auth (for Claude.ai PM delegate), multi-project namespacing, Matrix/Zulip backends, polling backoff.
+
+ ## Future: PM delegate via Claude.ai
+
+ Once the IRC MCP bridge is exposed as a remote MCP server (with auth), a Claude.ai session can connect to it and act as the PM's delegate. The PM talks to Claude.ai from their phone or browser; Claude.ai participates in IRC channels on their behalf. This replaces the need for a mobile IRC client entirely.
- Not in MVP: multiple parallel workers, TLS, auth, multi-project namespacing, Matrix/Zulip backends.
+ This requires the bridge to be internet-accessible with authentication — not in MVP scope, but the architecture supports it naturally since the bridge already multiplexes by `sender`.
## Open questions
- - **Polling cadence.** How often should the supervisor heartbeat idle agents? Too fast burns tokens, too slow means tasks sit. Probably start at 30s and tune.
- - **IRC client for phone.** The mobile IRC client landscape is thin. Worth testing a few before committing. If it's painful, that's a signal to look at Matrix or Zulip sooner.
- - **Message length limits.** IRC has per-message length limits (~512 bytes traditional, longer with IRCv3). The MCP bridge may need to handle chunking transparently. Check ergo's limits.
- - **Channel persistence depth.** How much `chathistory` should ergo retain? Enough for the EM to reconstruct sprint state after a restart.
- - **Git branch strategy with multiple agents.** Multiple agents editing the same repo need branch isolation. Worktree-per-task within the bind-mounted project directory is the likely answer.
- - **SDK session management details.** How exactly does the Claude Code SDK expose context usage? Need to verify the API surface for monitoring.
+ - **ergo `maxline` default.** Verify the actual default and maximum configurable value. Agent plan summaries could be multi-KB.
+ - **Wiki auth for agents.** Does the robot.wtf MCP need auth configuration for agents running in Docker? Or is it open on LAN?
+ - **SDK `query()` API stability.** The session resume API (`resume` option) needs verification against current SDK version. Field names may have changed.
## Resolved questions
- - **Structured vs. conversational task format.** Conversational wins. The whole point of using IRC is human observability. JSON task objects would make channels unreadable. The only structured convention is a `TASK:` prefix on assignments so the supervisor can pattern-match.
- - **CLI vs. SDK.** SDK. The CLI's terminal processing is overhead for daemon use. The SDK gives programmatic lifecycle control needed for shift-changes.
- - **Single process vs. separate bridge and supervisor.** Separate. Independent failure domains — restart the bridge without killing agents, restart the supervisor without dropping IRC.
- - **Launch-per-task vs. long-running agents.** Long-running. Preserves context across related tasks. The supervisor handles lifecycle (polling, idle detection, shift-changes).
- - **Deployment topology.** ergo in LXC on Proxmox (set-and-forget), bridge + supervisor in docker-compose on desktop (128GB RAM), bind-mounted project directory for repo access.
+ - **Structured vs. conversational task format.** Conversational wins. The whole point of using IRC is human observability. The only structured convention is a `TASK:` prefix on assignments.
+ - **CLI vs. SDK.** SDK. The CLI's terminal processing is overhead for daemon use.
+ - **Single process vs. separate bridge and supervisor.** Separate. Independent failure domains.
+ - **Launch-per-task vs. long-running agents.** Long-running. Preserves context across related tasks.
+ - **Deployment topology.** ergo in LXC on Proxmox, bridge + supervisor in docker-compose on desktop (128GB RAM).
+ - **Polling strategy.** Two-layer: supervisor push for immediate dispatch, flat 30s polling as fallback/health check. Backoff deferred.
+ - **Git branch strategy.** Supervisor-managed worktrees, branch-per-task, human merges, protected main.
+ - **Context monitoring.** Supervisor accumulates `usage.input_tokens` per turn from SDK responses. No built-in SDK metric.
+ - **Durable state.** robot.wtf project wiki via MCP. IRC is ephemeral conversation, wiki is the record. EM recovery reads wiki, not IRC history.
+ - **Mobile IRC client.** Deferred. PM uses terminal for MVP. Future path: Claude.ai as PM delegate via remote MCP, bypassing the mobile client problem entirely.
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9