Commit d3e9e7
2026-03-13 01:42:46 Claude (Dev): [mcp] Port agent conventions spec to wiki| /dev/null .. specs/agent conventions.md | |
| @@ 0,0 1,301 @@ | |
| + | # Wikibot.io Agent Conventions & Patterns |
| + | |
| + | This document is read by Opus phase managers at session start. It establishes the protocols, patterns, and rules that govern how the agent team builds wikibot.io. |
| + | |
| + | ## References |
| + | |
| + | - **PRD:** `Docs/PRD` — source of truth for domain context, architecture, and requirements |
| + | - **Task Graph:** `Specs/Task Graph` — execution spine with work units, dependencies, and acceptance criteria |
| + | - **Phase Gates:** `Specs/Phase Gates` — exit criteria and human validation procedures |
| + | - **Wiki Usage Guide exemplar:** `Meta/Wiki Usage Guide` on the Third Gulf War wiki (read via MCP) — reference for the bootstrap template (P2-6) |
| + | |
| + | --- |
| + | |
| + | ## Agent Hierarchy |
| + | |
| + | ### Three roles |
| + | |
| + | | Role | Model | Dispatched by | Purpose | |
| + | |------|-------|---------------|---------| |
| + | | **Phase Manager** | Opus | Human | Owns one phase. Reads this document, the task graph, and relevant PRD sections. Dispatches workers, reviews results, tracks progress in wiki, reports back. | |
| + | | **Worker** | Sonnet | Phase Manager | Implements one task. Receives a self-contained prompt — no PRD, no task graph. Uses worktree isolation. | |
| + | | **Reviewer** | Opus | Phase Manager | Reviews completed work against acceptance criteria and conventions. Returns approve/reject with specific feedback. | |
| + | |
| + | ### The Rule of Two |
| + | |
| + | No agent-produced artifact is accepted until a second agent has reviewed it and found no blocking concerns. This applies to all code, tests, and IaC. |
| + | |
| + | - Workers produce → Reviewers review |
| + | - Manager wiki status updates are reviewed by the human at phase gates |
| + | - Upstream PRs (to otterwiki, plugins) must be reviewed and merged before downstream work begins |
| + | |
| + | ### Manager lifecycle |
| + | |
| + | A phase manager follows this loop each session: |
| + | |
| + | 1. **Resume state.** Read `Dev/Phase N Status` from the wiki (via MCP). If this is the first session for the phase, create the status page. |
| + | 2. **Identify work.** Read the task graph. Find the next parallelism group — tasks whose dependencies are all complete. |
| + | 3. **Dispatch workers.** For each task in the group, compose a self-contained prompt (see template below) and dispatch a Sonnet worker. Use `isolation: "worktree"` for branch-level isolation. Run independent workers in parallel using `run_in_background: true`. |
| + | 4. **Review results.** As each worker completes, dispatch an Opus reviewer with the worker's output, the task's acceptance criteria, and the conventions from this document. |
| + | 5. **Handle rejections.** If the reviewer rejects, dispatch a fix worker (resumed or new) with the reviewer's specific feedback. Re-review. Maximum 3 fix cycles per task — if still failing, log the blocker in the wiki and move on to other tasks. |
| + | 6. **Merge.** Once a task passes review, merge its worktree branch into the target branch. |
| + | 7. **Update state.** Write a wiki note summarizing what was completed, what decisions were made, and any new tasks discovered. Update `Dev/Phase N Status`. |
| + | 8. **Repeat** until all tasks in the phase are complete. |
| + | 9. **Phase summary.** Write `Dev/Phase N Summary` with full results. Notify the human for gate review. |
| + | |
| + | ### Worker prompt template |
| + | |
| + | Managers compose worker prompts using this structure: |
| + | |
| + | ``` |
| + | ## Task: {task ID and title} |
| + | |
| + | ## Context |
| + | {What exists. What you're building on. Relevant architecture decisions. |
| + | Distill from the PRD — the worker does not read the PRD.} |
| + | |
| + | ## Requirements |
| + | {Specific deliverables. Be precise about interfaces, file locations, |
| + | function signatures, config formats. The worker cannot ask questions.} |
| + | |
| + | ## Tests First (TDD) |
| + | {What tests to write before implementation. Be specific about test |
| + | cases, fixtures, and assertions. Reference the testing layers below.} |
| + | |
| + | ## Acceptance Criteria |
| + | {How the reviewer will judge this. Testable, binary conditions.} |
| + | |
| + | ## Target |
| + | {Repo, branch name, key files to create or modify.} |
| + | |
| + | ## Dependencies |
| + | {What's already been built that the worker can use. Paths to existing |
| + | code, installed packages, available test fixtures.} |
| + | |
| + | ## Constraints |
| + | {Things NOT to do. Boundaries of the task. What's out of scope.} |
| + | ``` |
| + | |
| + | ### Key rules |
| + | |
| + | - **Workers never dispatch sub-workers.** Only managers dispatch. Flat hierarchy. |
| + | - **Workers cannot read the PRD.** Managers distill all needed context into the prompt. |
| + | - **Workers cannot modify files outside their task scope.** If they discover a needed change elsewhere, they report it and the manager creates a new task. |
| + | - **Reviewers do not fix code.** They identify issues and return specific feedback. The manager dispatches a fix worker. |
| + | |
| + | --- |
| + | |
| + | ## TDD Workflow |
| + | |
| + | ### Red-Green-Refactor for agent dispatch |
| + | |
| + | The manager dispatches work in stages: |
| + | |
| + | **1. Red (write failing tests):** |
| + | Worker writes tests based on acceptance criteria. Tests must fail (no implementation yet). Reviewer confirms tests are correct, sufficient, and actually fail. |
| + | |
| + | **2. Green (minimal implementation):** |
| + | Worker (resumed or new, same worktree) writes the minimal code to pass the tests. Reviewer confirms all tests pass and implementation is sound. |
| + | |
| + | **3. Refactor (cleanup if needed):** |
| + | If the reviewer flags quality issues during the Green review, the manager dispatches a cleanup worker. |
| + | |
| + | **Practical concession:** For small, well-defined tasks (utility functions, config parsing, Pulumi resource definitions), the manager can dispatch Red+Green as a single prompt: "Write tests first, then implement. Commit tests and implementation separately." The reviewer still checks both. Use the staged dispatch for anything with ambiguity in requirements or interface design. |
| + | |
| + | ### Testing layers |
| + | |
| + | | Layer | Tools | Runs where | Speed | What it tests | |
| + | |-------|-------|-----------|-------|---------------| |
| + | | **Unit** | pytest, moto, `/tmp` | Worker environment | Seconds | Business logic, data transforms, auth checks | |
| + | | **Integration** | pytest + real AWS | Worker environment with AWS creds | Seconds–minutes | Lambda deployment, EFS mount, API Gateway routing, DynamoDB queries | |
| + | | **E2E** | Phase gate scripts | Human or manager at phase boundary | Minutes | Full user journeys | |
| + | |
| + | ### What gets mocked at unit level |
| + | |
| + | | AWS Service | Mock Tool | Notes | |
| + | |-------------|-----------|-------| |
| + | | DynamoDB | `moto` | Full API coverage — queries, GSIs, conditions | |
| + | | EFS filesystem | `/tmp` or `tempfile` | Just regular file I/O once mounted | |
| + | | HTTP clients (Otterwiki API, WorkOS) | `respx` or `pytest-httpx` | For MCP server and auth middleware | |
| + | | Bedrock | Stub returning fixed-dimension vectors | Only needed in Phase 5+ (premium) | |
| + | | S3 | `moto` | For static hosting tests | |
| + | |
| + | ### What must hit real AWS |
| + | |
| + | - Lambda deployment and invocation |
| + | - EFS mount behavior and git operations on EFS |
| + | - VPC networking and endpoint routing |
| + | - API Gateway routing, custom domains, auth integration |
| + | - Cold start measurements |
| + | - CloudFront distribution behavior |
| + | |
| + | ### Test file conventions |
| + | |
| + | - Test files mirror source files: `app/auth/middleware.py` → `tests/auth/test_middleware.py` |
| + | - Fixtures in `tests/conftest.py` (shared) or `tests/{module}/conftest.py` (module-specific) |
| + | - Integration tests marked with `@pytest.mark.integration` — skipped without AWS creds |
| + | - Use `moto` decorators (`@mock_aws`) for DynamoDB tests |
| + | - Each test function tests one behavior. Name format: `test_{action}_{condition}_{expected_result}` |
| + | |
| + | --- |
| + | |
| + | ## Pulumi Patterns |
| + | |
| + | ### Project structure |
| + | |
| + | ``` |
| + | infra/ |
| + | __main__.py # Top-level composition |
| + | config/ |
| + | dev.yaml # Dev stack config (Pulumi.<stack>.yaml) |
| + | prod.yaml # Prod stack config |
| + | components/ |
| + | vpc.py # VPC, subnets, security groups, endpoints |
| + | efs.py # EFS filesystem, mount targets, access points |
| + | lambda_functions.py # Lambda functions, layers, permissions |
| + | api_gateway.py # API Gateway, routes, integrations |
| + | dynamodb.py # Tables, GSIs, PITR config |
| + | dns.py # Route 53, ACM certificates |
| + | monitoring.py # CloudWatch alarms, dashboards |
| + | frontend.py # S3 bucket, CloudFront distribution |
| + | ``` |
| + | |
| + | ### Conventions |
| + | |
| + | - Each component file exports a class or function that creates related resources |
| + | - Resources are tagged with `project: "wikibot-io"`, `environment: "{stack}"`, `phase: "P{n}"` |
| + | - Use `pulumi.Config()` for stack-specific values (domain names, feature flags) |
| + | - Secrets via `pulumi config set --secret` (dev); Secrets Manager (prod, Phase 4+) |
| + | - Outputs for cross-component references (e.g., VPC ID, EFS filesystem ID) |
| + | |
| + | ### Testing Pulumi code |
| + | |
| + | Unit-test Pulumi code using `pulumi.runtime.set_mocks()`: |
| + | - Verify resource creation and configuration |
| + | - Check IAM policy documents |
| + | - Validate security group rules |
| + | - These are fast, no AWS needed |
| + | |
| + | Integration testing is `pulumi up` against the dev stack, followed by verification scripts. |
| + | |
| + | --- |
| + | |
| + | ## Cross-Repo Coordination |
| + | |
| + | ### Repositories and ownership |
| + | |
| + | | Repo | Changes | Branch Strategy | |
| + | |------|---------|-----------------| |
| + | | `otterwiki` (fork) | Mangum adapter, PROXY_HEADER enhancements | PRs to `main`; admin panel hiding on `wikibot/prod` | |
| + | | `otterwiki-api` | Lambda compatibility if needed | PRs to `main` | |
| + | | `otterwiki-semantic-search` | FAISS backend (alongside existing ChromaDB) | PRs to `main` | |
| + | | `otterwiki-mcp` | Streamable HTTP transport | PRs to `main` | |
| + | | `wikibot-io` (new, private) | Pulumi IaC, management API, auth middleware, frontend, CLI | Feature branches | |
| + | |
| + | ### Branch naming |
| + | |
| + | Feature branches: `feat/P{phase}-{task-id}-{short-description}` |
| + | |
| + | Examples: |
| + | - `feat/P0-1-pulumi-scaffold` |
| + | - `feat/P1-1-mangum-adapter` |
| + | - `feat/P2-4-management-api` |
| + | |
| + | ### Dependency protocol |
| + | |
| + | When an upstream change (e.g., Mangum adapter in `otterwiki`) is a dependency for `wikibot-io` work: |
| + | |
| + | 1. The upstream PR must pass the Rule of Two (worker implements, reviewer approves) |
| + | 2. The upstream PR must be merged |
| + | 3. Only then can the downstream task that depends on it begin |
| + | 4. The manager tracks this in the wiki status page |
| + | |
| + | ### Upstream contribution guidelines |
| + | |
| + | Changes to the open source repos should: |
| + | - Pass the existing test suites |
| + | - Add tests for new functionality |
| + | - Not break existing interfaces |
| + | - Follow existing code style and patterns |
| + | - Be self-contained (no wikibot.io-specific logic) |
| + | |
| + | --- |
| + | |
| + | ## Wiki-Based Task Tracking |
| + | |
| + | ### Dev/ namespace |
| + | |
| + | All development tracking lives in the wiki under the `Dev/` prefix. The existing Third Gulf War wiki serves as both the development tracker and the dogfood test. |
| + | |
| + | ### Page conventions |
| + | |
| + | **`Dev/Phase N Status`** — Living document updated each session. Contains: |
| + | - Current state (which tasks are complete, in progress, blocked) |
| + | - Active decisions or blockers |
| + | - Next parallelism group to execute |
| + | |
| + | **`Dev/Phase N Summary`** — Written once when the phase completes. Contains: |
| + | - What was implemented (with repo/branch references) |
| + | - Decisions made and rationale |
| + | - Deviations from the task graph |
| + | - New tasks discovered (filed as additions to the task graph or future phase work) |
| + | - Lessons learned |
| + | |
| + | **`Dev/Decision Log`** — Append-only log of significant architectural or implementation decisions. Each entry: date, decision, context, alternatives considered, rationale. |
| + | |
| + | ### Session start protocol (for managers) |
| + | |
| + | 1. `read_note("Dev/Phase N Status")` — resume state |
| + | 2. Check the task graph document for the current parallelism group |
| + | 3. `get_recent_changes(limit=10)` — see if the human made any changes since last session |
| + | 4. Proceed with dispatch |
| + | |
| + | ### Documentation loop |
| + | |
| + | After each completed work unit, the manager writes: |
| + | 1. What was implemented (specific files, functions, resources) |
| + | 2. What decisions were made (and why) |
| + | 3. Any deviations from the task description |
| + | 4. New tasks that emerged |
| + | 5. Test results (pass/fail, coverage) |
| + | |
| + | This creates an audit trail that spans sessions and enables any manager (or the human) to pick up where the previous session left off. |
| + | |
| + | --- |
| + | |
| + | ## Code Style and Conventions |
| + | |
| + | ### Python |
| + | |
| + | - Python 3.12+ (Lambda runtime) |
| + | - Type hints on all function signatures |
| + | - `pyproject.toml` for project metadata and dependencies |
| + | - `ruff` for linting and formatting |
| + | - No docstrings on obvious functions; docstrings on public APIs and non-obvious logic |
| + | |
| + | ### IaC |
| + | |
| + | - Pulumi Python SDK |
| + | - Resource names: `{project}-{environment}-{resource}` (e.g., `wikibot-dev-efs`) |
| + | - All resources tagged consistently |
| + | - No hardcoded ARNs or account IDs — use `pulumi.Config()` or data sources |
| + | |
| + | ### Frontend |
| + | |
| + | - Framework TBD (P3-1 decision — React or Svelte, manager decides based on team capabilities) |
| + | - TypeScript |
| + | - Vite for builds |
| + | |
| + | ### Commit messages |
| + | |
| + | Format: `[P{phase}] {action}: {description}` |
| + | |
| + | Examples: |
| + | ``` |
| + | [P0] Add: Pulumi scaffold with VPC and gateway endpoints |
| + | [P1] Add: Mangum adapter for Flask-on-Lambda |
| + | [P2] Fix: ACL check missing on wiki delete endpoint |
| + | [P2] Refactor: Extract auth middleware into shared module |
| + | ``` |
| + | |
| + | For upstream repos, follow each repo's existing commit message conventions. |
| \ | No newline at end of file |