Commit d3e9e7

2026-03-13 01:42:46 Claude (Dev): [mcp] Port agent conventions spec to wiki
/dev/null .. specs/agent conventions.md
@@ 0,0 1,301 @@
+ # Wikibot.io Agent Conventions & Patterns
+
+ This document is read by Opus phase managers at session start. It establishes the protocols, patterns, and rules that govern how the agent team builds wikibot.io.
+
+ ## References
+
+ - **PRD:** `Docs/PRD` — source of truth for domain context, architecture, and requirements
+ - **Task Graph:** `Specs/Task Graph` — execution spine with work units, dependencies, and acceptance criteria
+ - **Phase Gates:** `Specs/Phase Gates` — exit criteria and human validation procedures
+ - **Wiki Usage Guide exemplar:** `Meta/Wiki Usage Guide` on the Third Gulf War wiki (read via MCP) — reference for the bootstrap template (P2-6)
+
+ ---
+
+ ## Agent Hierarchy
+
+ ### Three roles
+
+ | Role | Model | Dispatched by | Purpose |
+ |------|-------|---------------|---------|
+ | **Phase Manager** | Opus | Human | Owns one phase. Reads this document, the task graph, and relevant PRD sections. Dispatches workers, reviews results, tracks progress in wiki, reports back. |
+ | **Worker** | Sonnet | Phase Manager | Implements one task. Receives a self-contained prompt — no PRD, no task graph. Uses worktree isolation. |
+ | **Reviewer** | Opus | Phase Manager | Reviews completed work against acceptance criteria and conventions. Returns approve/reject with specific feedback. |
+
+ ### The Rule of Two
+
+ No agent-produced artifact is accepted until a second agent has reviewed it and found no blocking concerns. This applies to all code, tests, and IaC.
+
+ - Workers produce → Reviewers review
+ - Manager wiki status updates are reviewed by the human at phase gates
+ - Upstream PRs (to otterwiki, plugins) must be reviewed and merged before downstream work begins
+
+ ### Manager lifecycle
+
+ A phase manager follows this loop each session:
+
+ 1. **Resume state.** Read `Dev/Phase N Status` from the wiki (via MCP). If this is the first session for the phase, create the status page.
+ 2. **Identify work.** Read the task graph. Find the next parallelism group — tasks whose dependencies are all complete.
+ 3. **Dispatch workers.** For each task in the group, compose a self-contained prompt (see template below) and dispatch a Sonnet worker. Use `isolation: "worktree"` for branch-level isolation. Run independent workers in parallel using `run_in_background: true`.
+ 4. **Review results.** As each worker completes, dispatch an Opus reviewer with the worker's output, the task's acceptance criteria, and the conventions from this document.
+ 5. **Handle rejections.** If the reviewer rejects, dispatch a fix worker (resumed or new) with the reviewer's specific feedback. Re-review. Maximum 3 fix cycles per task — if still failing, log the blocker in the wiki and move on to other tasks.
+ 6. **Merge.** Once a task passes review, merge its worktree branch into the target branch.
+ 7. **Update state.** Write a wiki note summarizing what was completed, what decisions were made, and any new tasks discovered. Update `Dev/Phase N Status`.
+ 8. **Repeat** until all tasks in the phase are complete.
+ 9. **Phase summary.** Write `Dev/Phase N Summary` with full results. Notify the human for gate review.
+
+ ### Worker prompt template
+
+ Managers compose worker prompts using this structure:
+
+ ```
+ ## Task: {task ID and title}
+
+ ## Context
+ {What exists. What you're building on. Relevant architecture decisions.
+ Distill from the PRD — the worker does not read the PRD.}
+
+ ## Requirements
+ {Specific deliverables. Be precise about interfaces, file locations,
+ function signatures, config formats. The worker cannot ask questions.}
+
+ ## Tests First (TDD)
+ {What tests to write before implementation. Be specific about test
+ cases, fixtures, and assertions. Reference the testing layers below.}
+
+ ## Acceptance Criteria
+ {How the reviewer will judge this. Testable, binary conditions.}
+
+ ## Target
+ {Repo, branch name, key files to create or modify.}
+
+ ## Dependencies
+ {What's already been built that the worker can use. Paths to existing
+ code, installed packages, available test fixtures.}
+
+ ## Constraints
+ {Things NOT to do. Boundaries of the task. What's out of scope.}
+ ```
+
+ ### Key rules
+
+ - **Workers never dispatch sub-workers.** Only managers dispatch. Flat hierarchy.
+ - **Workers cannot read the PRD.** Managers distill all needed context into the prompt.
+ - **Workers cannot modify files outside their task scope.** If they discover a needed change elsewhere, they report it and the manager creates a new task.
+ - **Reviewers do not fix code.** They identify issues and return specific feedback. The manager dispatches a fix worker.
+
+ ---
+
+ ## TDD Workflow
+
+ ### Red-Green-Refactor for agent dispatch
+
+ The manager dispatches work in stages:
+
+ **1. Red (write failing tests):**
+ Worker writes tests based on acceptance criteria. Tests must fail (no implementation yet). Reviewer confirms tests are correct, sufficient, and actually fail.
+
+ **2. Green (minimal implementation):**
+ Worker (resumed or new, same worktree) writes the minimal code to pass the tests. Reviewer confirms all tests pass and implementation is sound.
+
+ **3. Refactor (cleanup if needed):**
+ If the reviewer flags quality issues during the Green review, the manager dispatches a cleanup worker.
+
+ **Practical concession:** For small, well-defined tasks (utility functions, config parsing, Pulumi resource definitions), the manager can dispatch Red+Green as a single prompt: "Write tests first, then implement. Commit tests and implementation separately." The reviewer still checks both. Use the staged dispatch for anything with ambiguity in requirements or interface design.
+
+ ### Testing layers
+
+ | Layer | Tools | Runs where | Speed | What it tests |
+ |-------|-------|-----------|-------|---------------|
+ | **Unit** | pytest, moto, `/tmp` | Worker environment | Seconds | Business logic, data transforms, auth checks |
+ | **Integration** | pytest + real AWS | Worker environment with AWS creds | Seconds–minutes | Lambda deployment, EFS mount, API Gateway routing, DynamoDB queries |
+ | **E2E** | Phase gate scripts | Human or manager at phase boundary | Minutes | Full user journeys |
+
+ ### What gets mocked at unit level
+
+ | AWS Service | Mock Tool | Notes |
+ |-------------|-----------|-------|
+ | DynamoDB | `moto` | Full API coverage — queries, GSIs, conditions |
+ | EFS filesystem | `/tmp` or `tempfile` | Just regular file I/O once mounted |
+ | HTTP clients (Otterwiki API, WorkOS) | `respx` or `pytest-httpx` | For MCP server and auth middleware |
+ | Bedrock | Stub returning fixed-dimension vectors | Only needed in Phase 5+ (premium) |
+ | S3 | `moto` | For static hosting tests |
+
+ ### What must hit real AWS
+
+ - Lambda deployment and invocation
+ - EFS mount behavior and git operations on EFS
+ - VPC networking and endpoint routing
+ - API Gateway routing, custom domains, auth integration
+ - Cold start measurements
+ - CloudFront distribution behavior
+
+ ### Test file conventions
+
+ - Test files mirror source files: `app/auth/middleware.py` → `tests/auth/test_middleware.py`
+ - Fixtures in `tests/conftest.py` (shared) or `tests/{module}/conftest.py` (module-specific)
+ - Integration tests marked with `@pytest.mark.integration` — skipped without AWS creds
+ - Use `moto` decorators (`@mock_aws`) for DynamoDB tests
+ - Each test function tests one behavior. Name format: `test_{action}_{condition}_{expected_result}`
+
+ ---
+
+ ## Pulumi Patterns
+
+ ### Project structure
+
+ ```
+ infra/
+ __main__.py # Top-level composition
+ config/
+ dev.yaml # Dev stack config (Pulumi.<stack>.yaml)
+ prod.yaml # Prod stack config
+ components/
+ vpc.py # VPC, subnets, security groups, endpoints
+ efs.py # EFS filesystem, mount targets, access points
+ lambda_functions.py # Lambda functions, layers, permissions
+ api_gateway.py # API Gateway, routes, integrations
+ dynamodb.py # Tables, GSIs, PITR config
+ dns.py # Route 53, ACM certificates
+ monitoring.py # CloudWatch alarms, dashboards
+ frontend.py # S3 bucket, CloudFront distribution
+ ```
+
+ ### Conventions
+
+ - Each component file exports a class or function that creates related resources
+ - Resources are tagged with `project: "wikibot-io"`, `environment: "{stack}"`, `phase: "P{n}"`
+ - Use `pulumi.Config()` for stack-specific values (domain names, feature flags)
+ - Secrets via `pulumi config set --secret` (dev); Secrets Manager (prod, Phase 4+)
+ - Outputs for cross-component references (e.g., VPC ID, EFS filesystem ID)
+
+ ### Testing Pulumi code
+
+ Unit-test Pulumi code using `pulumi.runtime.set_mocks()`:
+ - Verify resource creation and configuration
+ - Check IAM policy documents
+ - Validate security group rules
+ - These are fast, no AWS needed
+
+ Integration testing is `pulumi up` against the dev stack, followed by verification scripts.
+
+ ---
+
+ ## Cross-Repo Coordination
+
+ ### Repositories and ownership
+
+ | Repo | Changes | Branch Strategy |
+ |------|---------|-----------------|
+ | `otterwiki` (fork) | Mangum adapter, PROXY_HEADER enhancements | PRs to `main`; admin panel hiding on `wikibot/prod` |
+ | `otterwiki-api` | Lambda compatibility if needed | PRs to `main` |
+ | `otterwiki-semantic-search` | FAISS backend (alongside existing ChromaDB) | PRs to `main` |
+ | `otterwiki-mcp` | Streamable HTTP transport | PRs to `main` |
+ | `wikibot-io` (new, private) | Pulumi IaC, management API, auth middleware, frontend, CLI | Feature branches |
+
+ ### Branch naming
+
+ Feature branches: `feat/P{phase}-{task-id}-{short-description}`
+
+ Examples:
+ - `feat/P0-1-pulumi-scaffold`
+ - `feat/P1-1-mangum-adapter`
+ - `feat/P2-4-management-api`
+
+ ### Dependency protocol
+
+ When an upstream change (e.g., Mangum adapter in `otterwiki`) is a dependency for `wikibot-io` work:
+
+ 1. The upstream PR must pass the Rule of Two (worker implements, reviewer approves)
+ 2. The upstream PR must be merged
+ 3. Only then can the downstream task that depends on it begin
+ 4. The manager tracks this in the wiki status page
+
+ ### Upstream contribution guidelines
+
+ Changes to the open source repos should:
+ - Pass the existing test suites
+ - Add tests for new functionality
+ - Not break existing interfaces
+ - Follow existing code style and patterns
+ - Be self-contained (no wikibot.io-specific logic)
+
+ ---
+
+ ## Wiki-Based Task Tracking
+
+ ### Dev/ namespace
+
+ All development tracking lives in the wiki under the `Dev/` prefix. The existing Third Gulf War wiki serves as both the development tracker and the dogfood test.
+
+ ### Page conventions
+
+ **`Dev/Phase N Status`** — Living document updated each session. Contains:
+ - Current state (which tasks are complete, in progress, blocked)
+ - Active decisions or blockers
+ - Next parallelism group to execute
+
+ **`Dev/Phase N Summary`** — Written once when the phase completes. Contains:
+ - What was implemented (with repo/branch references)
+ - Decisions made and rationale
+ - Deviations from the task graph
+ - New tasks discovered (filed as additions to the task graph or future phase work)
+ - Lessons learned
+
+ **`Dev/Decision Log`** — Append-only log of significant architectural or implementation decisions. Each entry: date, decision, context, alternatives considered, rationale.
+
+ ### Session start protocol (for managers)
+
+ 1. `read_note("Dev/Phase N Status")` — resume state
+ 2. Check the task graph document for the current parallelism group
+ 3. `get_recent_changes(limit=10)` — see if the human made any changes since last session
+ 4. Proceed with dispatch
+
+ ### Documentation loop
+
+ After each completed work unit, the manager writes:
+ 1. What was implemented (specific files, functions, resources)
+ 2. What decisions were made (and why)
+ 3. Any deviations from the task description
+ 4. New tasks that emerged
+ 5. Test results (pass/fail, coverage)
+
+ This creates an audit trail that spans sessions and enables any manager (or the human) to pick up where the previous session left off.
+
+ ---
+
+ ## Code Style and Conventions
+
+ ### Python
+
+ - Python 3.12+ (Lambda runtime)
+ - Type hints on all function signatures
+ - `pyproject.toml` for project metadata and dependencies
+ - `ruff` for linting and formatting
+ - No docstrings on obvious functions; docstrings on public APIs and non-obvious logic
+
+ ### IaC
+
+ - Pulumi Python SDK
+ - Resource names: `{project}-{environment}-{resource}` (e.g., `wikibot-dev-efs`)
+ - All resources tagged consistently
+ - No hardcoded ARNs or account IDs — use `pulumi.Config()` or data sources
+
+ ### Frontend
+
+ - Framework TBD (P3-1 decision — React or Svelte, manager decides based on team capabilities)
+ - TypeScript
+ - Vite for builds
+
+ ### Commit messages
+
+ Format: `[P{phase}] {action}: {description}`
+
+ Examples:
+ ```
+ [P0] Add: Pulumi scaffold with VPC and gateway endpoints
+ [P1] Add: Mangum adapter for Flask-on-Lambda
+ [P2] Fix: ACL check missing on wiki delete endpoint
+ [P2] Refactor: Extract auth middleware into shared module
+ ```
+
+ For upstream repos, follow each repo's existing commit message conventions.
\ No newline at end of file
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9