2026-03-16 17:37:43Claude (MCP):
[mcp] Update Agent IRC Architecture with refined design decisions from brainstorming session
Design/Agent_IRC_Architecture.md ..
@@ 1,7 1,7 @@
---
category: reference
tags: [agents, irc, mcp, architecture]
-
last_updated: 2026-03-15
+
last_updated: 2026-03-16
confidence: medium
---
@@ 9,11 9,9 @@
An architecture for multi-agent coordination over IRC, where a human (the PM) and AI agents share a message bus. The goal is to externalize the coordination layer that currently lives inside Claude Code's context window, so that agents preserve context for their actual work, the human can participate from a phone or terminal, and the whole system is observable by just reading the chat.
-
See also: [[Design/Agent_Workflow]] (current in-process agent hierarchy, which this supersedes for coordination plumbing but not for role definitions or the proceed workflow).
-
## The problem
-
The current agent workflow (documented in [[Design/Agent_Workflow]]) runs everything inside a single Claude Code session tree. The orchestrator dispatches managers via `Task`, managers dispatch workers, questions relay back up the chain. This works, but:
+
The current agent workflow runs everything inside a single Claude Code session tree. The orchestrator dispatches managers via `Task`, managers dispatch workers, questions relay back up the chain. This works, but:
- The orchestrator's context fills up relaying messages it doesn't need to reason about.
- The human is behind a three-hop relay for every question (worker → manager → orchestrator → human → back). You have to be at the terminal.
@@ 26,15 24,13 @@
## Org structure
-
The hierarchy from [[Design/Agent_Workflow]] maps to IRC roles:
-
**PM (you)** — sets priorities, answers product questions, makes scope decisions. Hangs out in `#project-{slug}`. Doesn't manage the sprint — that's the EM's job. Can peek into any channel but mostly watches the project channel for decisions that need input.
-
**EM (coordinator)** — a Claude Code process (Opus) that runs the team. Breaks down requirements into tasks, assigns work, tracks progress, makes implementation decisions, surfaces product questions to the PM. Lives in `#project-{slug}` and `#standup-{slug}`. Shields the PM from implementation noise.
+
**EM (coordinator)** — a long-running Claude Code SDK session (Opus) that runs the team. Breaks down requirements into tasks, assigns work, tracks progress, makes implementation decisions, surfaces product questions to the PM. Lives in `#project-{slug}` and `#standup-{slug}`. Shields the PM from implementation noise.
-
**Managers** — Claude Code processes (Opus) that own individual tasks. Follow the proceed workflow: plan, implement, test, review, fix, document. Each manager gets a `#work-{task-id}` channel for its workers. Reports status and completions to `#standup-{slug}`.
+
**Managers** — Claude Code SDK sessions (Opus) that own individual tasks. Follow the proceed workflow: plan, implement, test, review, fix, document. Each manager gets a `#work-{task-id}` channel for its workers. Reports status and completions to `#standup-{slug}`.
-
**Workers** — Claude Code processes (Sonnet/Haiku) dispatched by managers for specific jobs: implementation, testing, review, documentation. Operate in `#work-{task-id}` channels. Disposable — when context fills up, they summarize and exit.
+
**Workers** — Claude Code SDK sessions (Sonnet/Haiku) dispatched by managers for specific jobs: implementation, testing, review, documentation. Operate in `#work-{task-id}` channels. Disposable — when context fills up, they summarize and exit.
The EM decides what needs the PM's input vs. what it can handle itself. Rule of thumb: anything that changes scope, user-facing behavior, or architecture goes to `#project-{slug}`. Anything that's purely implementation strategy, the EM decides. The EM should also push back on the PM when something is technically inadvisable, just like a real EM would.
@@ 47,20 43,16 @@
- **#work-{task-id}** — manager + workers for a specific task. Implementation discussion, test results, review feedback. Noisy and disposable. Created when a task starts, abandoned when it completes.
- **#errors** — dead-letter channel. Any agent that hits an unrecoverable failure posts here. Monitored by the EM and optionally by the PM.
-
Nick conventions are described in the Agent Naming section below.
-
## Agent naming
Agents get human names, not mechanical identifiers. A conversation between `schuyler`, `Harper`, and `Dinesh` is immediately readable. A conversation between `em-robot`, `mgr-e2-cdn`, and `worker-3` is a SCADA dashboard.
Names also help with the shift-change problem. When `Ramona` hits context exhaustion and hands off to `Jules`, that's a legible event — new person joined, picked up the thread. If `worker-3` gets replaced by another `worker-3`, it's invisible, and that invisibility is exactly the kind of thing that causes confusion.
-
A names file (`names.txt`, one per line) lives in the repo. The launcher daemon pops a name off the list when spawning a process and passes it as the IRC nick. The name also goes into the agent's system prompt so it knows who it is. Names are not reused within a session — once `Ramona` exits, that name is retired until the list resets.
+
A names file (`names.txt`, one per line) lives in the repo. The supervisor pops a name off the list when spawning a process and passes it as the IRC nick. The name also goes into the agent's system prompt so it knows who it is. Names are not reused within a session — once `Ramona` exits, that name is retired until the list resets.
The EM gets a persistent name that doesn't rotate — it's the one constant in the channel. Think of it as the team lead who's always there. Managers and workers get fresh names each time they're spawned.
-
The names file is just a text file. It can be themed — Greek gods, jazz musicians, muppets, fictional detectives, whatever's fun. The only constraint is that names should be short (IRC nicks have length limits), distinct from each other, and not confusable with IRC commands.
-
## Transport abstraction
IRC is the first backend, but the architecture shouldn't be welded to it. A thin transport interface keeps options open:
@@ 83,7 75,7 @@
The IRC implementation wraps an async IRC client library (`bottom` or `irc`). A Zulip or Matrix implementation could be swapped in later — Zulip's topic-per-stream model maps particularly well (stream = project, topic = task).
-
## MCP server
+
## MCP bridge
A FastMCP server wraps the transport and exposes tools to agents. This is the only interface agents use — they never touch IRC directly.
@@ 93,17 85,7 @@
Agents communicate in natural language. The EM assigns a task by saying so in plain English. The manager reports a plan the same way. The PM can read `#standup-{slug}` on their phone and immediately follow the state of the sprint without parsing anything.
-
This works because Claude Code agents are good at natural language — that's the whole product. The EM doesn't need `{"task_id": "p2-5", "type": "assignment"}` to assign work. It says:
-
-
> @Ramona New task: implement the CDN fragment assembly Lambda. Spec is at Tasks/E-2_CDN_Read_Path on the dev wiki. Branch from main, target sub-300ms cold start. Report back here when you've got a plan.
-
-
And the manager responds:
-
-
> Plan's ready. Going with the S3 fragment approach — sidebar and content as separate objects, stitched at request time. Groucho flagged a cache invalidation edge case with nested WikiLinks, I'm handling it by invalidating the sidebar fragment on any page write. Starting implementation now.
-
-
The PM can read that, understand it, and jump in if needed. That's the experience.
-
-
The only concession to machine-parseability is **lightweight conventions** for the launcher daemon — the EM prefixes task assignments with `TASK:` so the launcher can pattern-match without NLP. Everything else is natural language.
+
The only concession to machine-parseability is **lightweight conventions** for the supervisor — the EM prefixes task assignments with `TASK:` so the supervisor can pattern-match without NLP. Everything else is natural language.
### Tools
@@ 115,7 97,7 @@
| `list_channels()` | List active channels. |
| `get_members(channel)` | List who's in a channel. |
-
That's it. No `post_task`, `claim_task`, `poll_for_task`. Task assignment, claiming, and completion are conversational acts, not structured API calls. The EM says "do this," the manager says "on it," the manager says "done." The launcher daemon watches for `TASK:` prefixes to know when to spawn a process.
+
That's it. No `post_task`, `claim_task`, `poll_for_task`. Task assignment, claiming, and completion are conversational acts, not structured API calls. The EM says "do this," the manager says "on it," the manager says "done."
Task state is tracked by the EM reading channel history and reasoning about it, not by a state machine. This is less reliable than a database but vastly more observable and simpler to build. If it breaks, you can see exactly where it broke by reading the channel.
@@ 123,7 105,7 @@
```
TRANSPORT_TYPE=irc
-
IRC_SERVER=localhost
+
IRC_SERVER=<proxmox-host-ip>
IRC_PORT=6667
IRC_NICK=mcp-bridge
MCP_PORT=8090
@@ 131,36 113,83 @@
The MCP server maintains a single IRC connection and multiplexes tool calls from multiple agents. Agents identify themselves via a `sender` parameter so messages get the right nick attribution.
-
## Agent lifecycle
+
## Agent lifecycle: long-running with shift-changes
+
+
Agents are long-running Claude Code SDK sessions. They persist across tasks, preserving context — a worker that just finished refactoring the auth module still has that code in context when the next auth-related task comes in.
+
+
### Why the SDK, not the CLI
+
+
The Claude Code CLI is designed for a human at a terminal — prompt handling, display rendering, keybindings are all overhead when the consumer is a daemon. The Claude Code SDK gives programmatic conversation management: send messages, get responses, and critically — start a new conversation with a handoff summary when context gets thin. That's the "compaction" equivalent: not clearing context, but gracefully retiring the agent and spawning a fresh one with the summary.
+
+
### Polling
-
### Option A: Launch-per-task (MVP)
+
The supervisor injects periodic "check your channels" messages into each agent's SDK session. This is the polling heartbeat. Agents respond by reading their IRC channels via the MCP bridge and acting on anything new, or reporting idle.
-
The simplest model. No long-running agents, no polling loops.
+
### Idle detection
-
1. The EM is a long-running Claude Code process. It polls `#standup-{slug}` for completed work and decides what to assign next.
-
2. When the EM assigns a task, a **launcher daemon** (a shell script or Python process watching `#standup-{slug}`) sees the task post and spawns a new Claude Code process with:
-
- The MCP server connection
-
- A system prompt including the task spec, the proceed workflow, and channel assignments
-
- The task channel name
-
3. The manager process runs the proceed workflow (dispatching its own workers as sub-tasks via the same mechanism, or inline if simple enough).
-
4. When done, the manager posts results to `#standup-{slug}` and exits.
-
5. The launcher daemon cleans up.
+
A Haiku-class classifier determines whether an agent is idle. No conversation state needed — just a single SDK `create_message` call:
-
This avoids the "can Claude Code reliably poll in a loop?" question entirely. Each agent is born with a task and dies when it's done.
+
> "Here's the last 5 minutes of this agent's IRC activity. Is it idle? yes/no"
-
### Option B: Long-running agents (future)
+
Pennies per evaluation. This keeps the supervisor dumb — it doesn't need to understand task semantics, just whether to send a heartbeat.
-
Workers poll the MCP at intervals when idle. This preserves context across tasks — a worker that just finished refactoring the auth module still has that code in context when the next auth-related task comes in. Worth exploring after the MVP, but the polling reliability question needs empirical testing first.
+
### Context exhaustion and shift-changes
-
### Context exhaustion
+
When an agent's context crosses a threshold (monitored by the supervisor via SDK response metadata or token counts):
-
Any agent that detects its context is getting full should:
+
1. Supervisor tells the agent to produce a handoff summary.
+
2. Agent posts the summary to its task channel.
+
3. Agent posts a notice to `#standup-{slug}` that it's handing off.
+
4. Supervisor kills the session.
+
5. Supervisor spawns a replacement with a new name from the names file and the summary as initial context.
-
1. Post a handoff summary to its task channel (what's done, what's remaining, current state of the code).
-
2. Post a notice to `#standup-{slug}` that it's handing off.
-
3. Exit.
+
This is the "shift change" pattern — natural for an org metaphor. When `Ramona` leaves and `Jules` arrives, everyone in the channel can see the transition.
-
The EM or launcher daemon sees the handoff and spawns a replacement with the summary as initial context. This is the "shift change" pattern — natural for an org metaphor.
+
## Architecture components
+
+
Three independent components, deployed separately for independent failure domains:
+
+
### 1. ergo IRCd
+
+
- Runs in an LXC container on a Proxmox server.
+
- Set-and-forget after initial configuration.
+
- IRCv3 `chathistory` for channel persistence.
+
- No TLS needed for LAN traffic in MVP.
+
+
### 2. IRC MCP bridge (FastMCP)
+
+
- ~200 lines of Python.
+
- Wraps the transport abstraction with IRC backend.
+
- Exposes the five tools above.
+
- Connects to ergo over LAN.
+
- Runs in a Docker container on the desktop.
+
+
### 3. Agent supervisor
+
+
- Python process using the Claude Code SDK.
+
- Spawns and manages agent sessions (EM, managers, workers).
- Runs in a Docker container on the desktop, alongside the bridge.
+
- Bind-mounts a project directory from the host for git repo access.
+
+
The bridge and supervisor are orchestrated via docker-compose on the desktop machine (128GB RAM). They share a Docker network for inter-container communication and both reach ergo over the LAN.
- Role definitions (manager, implementer, test runner, Groucho/Chico/Zeppo/Fixer/Documenter)
- The proceed workflow (plan → implement → test → review → fix → document)
-
- Model assignments (Opus for managers, Sonnet for workers, Haiku for documentation)
+
- Model assignments (Opus for EM and managers, Sonnet for workers, Haiku for idle detection and documentation)
- Review and fix loop limits (3 attempts before escalating)
- Worker dispatch guidance (what context to give each worker type)
@@ 178,30 207,34 @@
- The orchestrator role splits: strategic coordination stays with the EM, human interaction moves to the channel
- Question relay is replaced by direct channel participation — the PM is in the room
- Task state lives in channel history, not in the orchestrator's context
+
- Claude Code CLI replaced by Claude Code SDK for programmatic lifecycle management
## MVP scope
-
The minimum viable experiment:
-
-
1. **ergo** IRC server on the Proxmox box. Single binary, default config, no TLS needed for localhost.
-
2. **FastMCP bridge** (~150-200 lines Python). Implements the transport abstraction with IRC backend. Exposes the five tools above.
-
3. **One EM process** — Claude Code with Opus, connected to the MCP, system-prompted as the engineering manager. Given one small task to decompose and assign.
-
4. **One manager process** — launched by a simple shell script when the EM posts a task. Runs the proceed workflow, reports back.
-
5. **You** — connected to ergo from your phone (Palaver, goguma, or similar) and/or terminal (irssi, weechat).
-
6. **One end-to-end task** — the EM posts a `TASK:` assignment to standup, the launcher spawns a manager, the manager does the work (discussing it in natural language in its work channel), posts completion to standup, the EM summarizes to `#project-{slug}`, you see it on your phone and can follow the whole conversation.
+
1. **ergo IRCd** in LXC on Proxmox. Single binary, default config, verify `chathistory` works.
+
2. **IRC MCP bridge** (~200 lines Python). FastMCP wrapping the transport abstraction. Five tools.
4. **docker-compose** for bridge + supervisor on the desktop, bind-mounting the project directory.
+
5. **One EM process** — Opus, system-prompted as the engineering manager.
+
6. **One manager process** — spawned when the EM posts a task.
+
7. **PM** — connected to ergo from phone and/or terminal.
+
8. **One end-to-end task** — EM assigns, manager runs the proceed workflow, PM observes from IRC.
-
Not in MVP: multiple parallel workers, Option B polling, context exhaustion handoffs, TLS, auth, persistence beyond IRC server logs.
+
Not in MVP: multiple parallel workers, TLS, auth, multi-project namespacing, Matrix/Zulip backends.
## Open questions
-
- **Polling reliability.** Can Claude Code maintain a reliable poll loop, or does it drift/hallucinate/get creative? This determines whether Option B is viable or if launch-per-task is the long-term answer too.
+
- **Polling cadence.** How often should the supervisor heartbeat idle agents? Too fast burns tokens, too slow means tasks sit. Probably start at 30s and tune.
- **IRC client for phone.** The mobile IRC client landscape is thin. Worth testing a few before committing. If it's painful, that's a signal to look at Matrix or Zulip sooner.
-
- **ergo vs. other IRCd.** ergo is the obvious choice (Go, single binary, modern IRCv3) but hasn't been evaluated yet. Alternatives: ngircd (C, lightweight), unrealircd (heavy, overkill).
-
- **Launcher daemon design.** How does it watch for task posts? Connect to IRC directly and watch for `TASK:` prefixes? Poll the MCP's `read_messages`? This is a small piece of glue code but it's the thing that actually makes agents appear.
-
- **Git branch strategy with multiple agents.** Multiple agents editing the same repo need branch isolation. The current Agent_Workflow uses `isolation: "worktree"` — does that translate when agents are independent processes?
-
- **Message length limits.** IRC has per-message length limits (~512 bytes traditional, longer with IRCv3). Agent messages — especially plan summaries and status reports — could easily exceed this. The MCP bridge may need to handle chunking transparently, or agents need prompting to keep messages concise. ergo's limits should be checked.
-
- **Channel persistence.** IRC channels are ephemeral by default. If the EM restarts and needs to reconstruct sprint state, it needs channel history. ergo supports chat history via IRCv3 `chathistory` — verify this works and decide how much history to retain.
+
- **Message length limits.** IRC has per-message length limits (~512 bytes traditional, longer with IRCv3). The MCP bridge may need to handle chunking transparently. Check ergo's limits.
+
- **Channel persistence depth.** How much `chathistory` should ergo retain? Enough for the EM to reconstruct sprint state after a restart.
+
- **Git branch strategy with multiple agents.** Multiple agents editing the same repo need branch isolation. Worktree-per-task within the bind-mounted project directory is the likely answer.
+
- **SDK session management details.** How exactly does the Claude Code SDK expose context usage? Need to verify the API surface for monitoring.
## Resolved questions
-
- **Structured vs. conversational task format.** Conversational wins. The whole point of using IRC is human observability. JSON task objects would make channels unreadable. Agents use natural language; the only structured convention is a `TASK:` prefix on assignments so the launcher daemon can pattern-match. See "Design principle" section above.
+
- **Structured vs. conversational task format.** Conversational wins. The whole point of using IRC is human observability. JSON task objects would make channels unreadable. The only structured convention is a `TASK:` prefix on assignments so the supervisor can pattern-match.
+
- **CLI vs. SDK.** SDK. The CLI's terminal processing is overhead for daemon use. The SDK gives programmatic lifecycle control needed for shift-changes.
+
- **Single process vs. separate bridge and supervisor.** Separate. Independent failure domains — restart the bridge without killing agents, restart the supervisor without dropping IRC.
+
- **Launch-per-task vs. long-running agents.** Long-running. Preserves context across related tasks. The supervisor handles lifecycle (polling, idle detection, shift-changes).
+
- **Deployment topology.** ergo in LXC on Proxmox (set-and-forget), bridge + supervisor in docker-compose on desktop (128GB RAM), bind-mounted project directory for repo access.