Commit ba56ab

2026-03-13 01:59:09 Claude (Dev): [mcp] Clean up old location, now at Design/Agent Conventions
specs/agent conventions.md .. /dev/null
@@ 1,301 0,0 @@
- # Wikibot.io Agent Conventions & Patterns
-
- This document is read by Opus phase managers at session start. It establishes the protocols, patterns, and rules that govern how the agent team builds wikibot.io.
-
- ## References
-
- - **PRD:** `Docs/PRD` — source of truth for domain context, architecture, and requirements
- - **Task Graph:** `Specs/Task Graph` — execution spine with work units, dependencies, and acceptance criteria
- - **Phase Gates:** `Specs/Phase Gates` — exit criteria and human validation procedures
- - **Wiki Usage Guide exemplar:** `Meta/Wiki Usage Guide` on the Third Gulf War wiki (read via MCP) — reference for the bootstrap template (P2-6)
-
- ---
-
- ## Agent Hierarchy
-
- ### Three roles
-
- | Role | Model | Dispatched by | Purpose |
- |------|-------|---------------|---------|
- | **Phase Manager** | Opus | Human | Owns one phase. Reads this document, the task graph, and relevant PRD sections. Dispatches workers, reviews results, tracks progress in wiki, reports back. |
- | **Worker** | Sonnet | Phase Manager | Implements one task. Receives a self-contained prompt — no PRD, no task graph. Uses worktree isolation. |
- | **Reviewer** | Opus | Phase Manager | Reviews completed work against acceptance criteria and conventions. Returns approve/reject with specific feedback. |
-
- ### The Rule of Two
-
- No agent-produced artifact is accepted until a second agent has reviewed it and found no blocking concerns. This applies to all code, tests, and IaC.
-
- - Workers produce → Reviewers review
- - Manager wiki status updates are reviewed by the human at phase gates
- - Upstream PRs (to otterwiki, plugins) must be reviewed and merged before downstream work begins
-
- ### Manager lifecycle
-
- A phase manager follows this loop each session:
-
- 1. **Resume state.** Read `Dev/Phase N Status` from the wiki (via MCP). If this is the first session for the phase, create the status page.
- 2. **Identify work.** Read the task graph. Find the next parallelism group — tasks whose dependencies are all complete.
- 3. **Dispatch workers.** For each task in the group, compose a self-contained prompt (see template below) and dispatch a Sonnet worker. Use `isolation: "worktree"` for branch-level isolation. Run independent workers in parallel using `run_in_background: true`.
- 4. **Review results.** As each worker completes, dispatch an Opus reviewer with the worker's output, the task's acceptance criteria, and the conventions from this document.
- 5. **Handle rejections.** If the reviewer rejects, dispatch a fix worker (resumed or new) with the reviewer's specific feedback. Re-review. Maximum 3 fix cycles per task — if still failing, log the blocker in the wiki and move on to other tasks.
- 6. **Merge.** Once a task passes review, merge its worktree branch into the target branch.
- 7. **Update state.** Write a wiki note summarizing what was completed, what decisions were made, and any new tasks discovered. Update `Dev/Phase N Status`.
- 8. **Repeat** until all tasks in the phase are complete.
- 9. **Phase summary.** Write `Dev/Phase N Summary` with full results. Notify the human for gate review.
-
- ### Worker prompt template
-
- Managers compose worker prompts using this structure:
-
- ```
- ## Task: {task ID and title}
-
- ## Context
- {What exists. What you're building on. Relevant architecture decisions.
- Distill from the PRD — the worker does not read the PRD.}
-
- ## Requirements
- {Specific deliverables. Be precise about interfaces, file locations,
- function signatures, config formats. The worker cannot ask questions.}
-
- ## Tests First (TDD)
- {What tests to write before implementation. Be specific about test
- cases, fixtures, and assertions. Reference the testing layers below.}
-
- ## Acceptance Criteria
- {How the reviewer will judge this. Testable, binary conditions.}
-
- ## Target
- {Repo, branch name, key files to create or modify.}
-
- ## Dependencies
- {What's already been built that the worker can use. Paths to existing
- code, installed packages, available test fixtures.}
-
- ## Constraints
- {Things NOT to do. Boundaries of the task. What's out of scope.}
- ```
-
- ### Key rules
-
- - **Workers never dispatch sub-workers.** Only managers dispatch. Flat hierarchy.
- - **Workers cannot read the PRD.** Managers distill all needed context into the prompt.
- - **Workers cannot modify files outside their task scope.** If they discover a needed change elsewhere, they report it and the manager creates a new task.
- - **Reviewers do not fix code.** They identify issues and return specific feedback. The manager dispatches a fix worker.
-
- ---
-
- ## TDD Workflow
-
- ### Red-Green-Refactor for agent dispatch
-
- The manager dispatches work in stages:
-
- **1. Red (write failing tests):**
- Worker writes tests based on acceptance criteria. Tests must fail (no implementation yet). Reviewer confirms tests are correct, sufficient, and actually fail.
-
- **2. Green (minimal implementation):**
- Worker (resumed or new, same worktree) writes the minimal code to pass the tests. Reviewer confirms all tests pass and implementation is sound.
-
- **3. Refactor (cleanup if needed):**
- If the reviewer flags quality issues during the Green review, the manager dispatches a cleanup worker.
-
- **Practical concession:** For small, well-defined tasks (utility functions, config parsing, Pulumi resource definitions), the manager can dispatch Red+Green as a single prompt: "Write tests first, then implement. Commit tests and implementation separately." The reviewer still checks both. Use the staged dispatch for anything with ambiguity in requirements or interface design.
-
- ### Testing layers
-
- | Layer | Tools | Runs where | Speed | What it tests |
- |-------|-------|-----------|-------|---------------|
- | **Unit** | pytest, moto, `/tmp` | Worker environment | Seconds | Business logic, data transforms, auth checks |
- | **Integration** | pytest + real AWS | Worker environment with AWS creds | Seconds–minutes | Lambda deployment, EFS mount, API Gateway routing, DynamoDB queries |
- | **E2E** | Phase gate scripts | Human or manager at phase boundary | Minutes | Full user journeys |
-
- ### What gets mocked at unit level
-
- | AWS Service | Mock Tool | Notes |
- |-------------|-----------|-------|
- | DynamoDB | `moto` | Full API coverage — queries, GSIs, conditions |
- | EFS filesystem | `/tmp` or `tempfile` | Just regular file I/O once mounted |
- | HTTP clients (Otterwiki API, WorkOS) | `respx` or `pytest-httpx` | For MCP server and auth middleware |
- | Bedrock | Stub returning fixed-dimension vectors | Only needed in Phase 5+ (premium) |
- | S3 | `moto` | For static hosting tests |
-
- ### What must hit real AWS
-
- - Lambda deployment and invocation
- - EFS mount behavior and git operations on EFS
- - VPC networking and endpoint routing
- - API Gateway routing, custom domains, auth integration
- - Cold start measurements
- - CloudFront distribution behavior
-
- ### Test file conventions
-
- - Test files mirror source files: `app/auth/middleware.py` → `tests/auth/test_middleware.py`
- - Fixtures in `tests/conftest.py` (shared) or `tests/{module}/conftest.py` (module-specific)
- - Integration tests marked with `@pytest.mark.integration` — skipped without AWS creds
- - Use `moto` decorators (`@mock_aws`) for DynamoDB tests
- - Each test function tests one behavior. Name format: `test_{action}_{condition}_{expected_result}`
-
- ---
-
- ## Pulumi Patterns
-
- ### Project structure
-
- ```
- infra/
- __main__.py # Top-level composition
- config/
- dev.yaml # Dev stack config (Pulumi.<stack>.yaml)
- prod.yaml # Prod stack config
- components/
- vpc.py # VPC, subnets, security groups, endpoints
- efs.py # EFS filesystem, mount targets, access points
- lambda_functions.py # Lambda functions, layers, permissions
- api_gateway.py # API Gateway, routes, integrations
- dynamodb.py # Tables, GSIs, PITR config
- dns.py # Route 53, ACM certificates
- monitoring.py # CloudWatch alarms, dashboards
- frontend.py # S3 bucket, CloudFront distribution
- ```
-
- ### Conventions
-
- - Each component file exports a class or function that creates related resources
- - Resources are tagged with `project: "wikibot-io"`, `environment: "{stack}"`, `phase: "P{n}"`
- - Use `pulumi.Config()` for stack-specific values (domain names, feature flags)
- - Secrets via `pulumi config set --secret` (dev); Secrets Manager (prod, Phase 4+)
- - Outputs for cross-component references (e.g., VPC ID, EFS filesystem ID)
-
- ### Testing Pulumi code
-
- Unit-test Pulumi code using `pulumi.runtime.set_mocks()`:
- - Verify resource creation and configuration
- - Check IAM policy documents
- - Validate security group rules
- - These are fast, no AWS needed
-
- Integration testing is `pulumi up` against the dev stack, followed by verification scripts.
-
- ---
-
- ## Cross-Repo Coordination
-
- ### Repositories and ownership
-
- | Repo | Changes | Branch Strategy |
- |------|---------|-----------------|
- | `otterwiki` (fork) | Mangum adapter, PROXY_HEADER enhancements | PRs to `main`; admin panel hiding on `wikibot/prod` |
- | `otterwiki-api` | Lambda compatibility if needed | PRs to `main` |
- | `otterwiki-semantic-search` | FAISS backend (alongside existing ChromaDB) | PRs to `main` |
- | `otterwiki-mcp` | Streamable HTTP transport | PRs to `main` |
- | `wikibot-io` (new, private) | Pulumi IaC, management API, auth middleware, frontend, CLI | Feature branches |
-
- ### Branch naming
-
- Feature branches: `feat/P{phase}-{task-id}-{short-description}`
-
- Examples:
- - `feat/P0-1-pulumi-scaffold`
- - `feat/P1-1-mangum-adapter`
- - `feat/P2-4-management-api`
-
- ### Dependency protocol
-
- When an upstream change (e.g., Mangum adapter in `otterwiki`) is a dependency for `wikibot-io` work:
-
- 1. The upstream PR must pass the Rule of Two (worker implements, reviewer approves)
- 2. The upstream PR must be merged
- 3. Only then can the downstream task that depends on it begin
- 4. The manager tracks this in the wiki status page
-
- ### Upstream contribution guidelines
-
- Changes to the open source repos should:
- - Pass the existing test suites
- - Add tests for new functionality
- - Not break existing interfaces
- - Follow existing code style and patterns
- - Be self-contained (no wikibot.io-specific logic)
-
- ---
-
- ## Wiki-Based Task Tracking
-
- ### Dev/ namespace
-
- All development tracking lives in the wiki under the `Dev/` prefix. The existing Third Gulf War wiki serves as both the development tracker and the dogfood test.
-
- ### Page conventions
-
- **`Dev/Phase N Status`** — Living document updated each session. Contains:
- - Current state (which tasks are complete, in progress, blocked)
- - Active decisions or blockers
- - Next parallelism group to execute
-
- **`Dev/Phase N Summary`** — Written once when the phase completes. Contains:
- - What was implemented (with repo/branch references)
- - Decisions made and rationale
- - Deviations from the task graph
- - New tasks discovered (filed as additions to the task graph or future phase work)
- - Lessons learned
-
- **`Dev/Decision Log`** — Append-only log of significant architectural or implementation decisions. Each entry: date, decision, context, alternatives considered, rationale.
-
- ### Session start protocol (for managers)
-
- 1. `read_note("Dev/Phase N Status")` — resume state
- 2. Check the task graph document for the current parallelism group
- 3. `get_recent_changes(limit=10)` — see if the human made any changes since last session
- 4. Proceed with dispatch
-
- ### Documentation loop
-
- After each completed work unit, the manager writes:
- 1. What was implemented (specific files, functions, resources)
- 2. What decisions were made (and why)
- 3. Any deviations from the task description
- 4. New tasks that emerged
- 5. Test results (pass/fail, coverage)
-
- This creates an audit trail that spans sessions and enables any manager (or the human) to pick up where the previous session left off.
-
- ---
-
- ## Code Style and Conventions
-
- ### Python
-
- - Python 3.12+ (Lambda runtime)
- - Type hints on all function signatures
- - `pyproject.toml` for project metadata and dependencies
- - `ruff` for linting and formatting
- - No docstrings on obvious functions; docstrings on public APIs and non-obvious logic
-
- ### IaC
-
- - Pulumi Python SDK
- - Resource names: `{project}-{environment}-{resource}` (e.g., `wikibot-dev-efs`)
- - All resources tagged consistently
- - No hardcoded ARNs or account IDs — use `pulumi.Config()` or data sources
-
- ### Frontend
-
- - Framework TBD (P3-1 decision — React or Svelte, manager decides based on team capabilities)
- - TypeScript
- - Vite for builds
-
- ### Commit messages
-
- Format: `[P{phase}] {action}: {description}`
-
- Examples:
- ```
- [P0] Add: Pulumi scaffold with VPC and gateway endpoints
- [P1] Add: Mangum adapter for Flask-on-Lambda
- [P2] Fix: ACL check missing on wiki delete endpoint
- [P2] Refactor: Extract auth middleware into shared module
- ```
-
- For upstream repos, follow each repo's existing commit message conventions.
\ No newline at end of file
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9