commit d3e9e7 – Dev Wiki

Attachments History Blame View Source

Changelog Documentation

Toggle dark mode Settings

specs
agent conventions
d3e9e7

Commit `d3e9e7`

2026-03-13 01:42:46 Claude (Dev): [mcp] Port agent conventions spec to wiki

`/dev/null` .. `specs/agent conventions.md`
@@ 0,0 1,301 @@
+	# Wikibot.io Agent Conventions & Patterns
+
+	This document is read by Opus phase managers at session start. It establishes the protocols, patterns, and rules that govern how the agent team builds wikibot.io.
+
+	## References
+
+	- PRD: `Docs/PRD` — source of truth for domain context, architecture, and requirements
+	- Task Graph: `Specs/Task Graph` — execution spine with work units, dependencies, and acceptance criteria
+	- Phase Gates: `Specs/Phase Gates` — exit criteria and human validation procedures
+	- Wiki Usage Guide exemplar: `Meta/Wiki Usage Guide` on the Third Gulf War wiki (read via MCP) — reference for the bootstrap template (P2-6)
+
+	---
+
+	## Agent Hierarchy
+
+	### Three roles
+
+	\| Role \| Model \| Dispatched by \| Purpose \|
+	\|------\|-------\|---------------\|---------\|
+	\| Phase Manager \| Opus \| Human \| Owns one phase. Reads this document, the task graph, and relevant PRD sections. Dispatches workers, reviews results, tracks progress in wiki, reports back. \|
+	\| Worker \| Sonnet \| Phase Manager \| Implements one task. Receives a self-contained prompt — no PRD, no task graph. Uses worktree isolation. \|
+	\| Reviewer \| Opus \| Phase Manager \| Reviews completed work against acceptance criteria and conventions. Returns approve/reject with specific feedback. \|
+
+	### The Rule of Two
+
+	No agent-produced artifact is accepted until a second agent has reviewed it and found no blocking concerns. This applies to all code, tests, and IaC.
+
+	- Workers produce → Reviewers review
+	- Manager wiki status updates are reviewed by the human at phase gates
+	- Upstream PRs (to otterwiki, plugins) must be reviewed and merged before downstream work begins
+
+	### Manager lifecycle
+
+	A phase manager follows this loop each session:
+
+	1. Resume state. Read `Dev/Phase N Status` from the wiki (via MCP). If this is the first session for the phase, create the status page.
+	2. Identify work. Read the task graph. Find the next parallelism group — tasks whose dependencies are all complete.
+	3. Dispatch workers. For each task in the group, compose a self-contained prompt (see template below) and dispatch a Sonnet worker. Use `isolation: "worktree"` for branch-level isolation. Run independent workers in parallel using `run_in_background: true`.
+	4. Review results. As each worker completes, dispatch an Opus reviewer with the worker's output, the task's acceptance criteria, and the conventions from this document.
+	5. Handle rejections. If the reviewer rejects, dispatch a fix worker (resumed or new) with the reviewer's specific feedback. Re-review. Maximum 3 fix cycles per task — if still failing, log the blocker in the wiki and move on to other tasks.
+	6. Merge. Once a task passes review, merge its worktree branch into the target branch.
+	7. Update state. Write a wiki note summarizing what was completed, what decisions were made, and any new tasks discovered. Update `Dev/Phase N Status`.
+	8. Repeat until all tasks in the phase are complete.
+	9. Phase summary. Write `Dev/Phase N Summary` with full results. Notify the human for gate review.
+
+	### Worker prompt template
+
+	Managers compose worker prompts using this structure:
+
+	```
+	## Task: {task ID and title}
+
+	## Context
+	{What exists. What you're building on. Relevant architecture decisions.
+	Distill from the PRD — the worker does not read the PRD.}
+
+	## Requirements
+	{Specific deliverables. Be precise about interfaces, file locations,
+	function signatures, config formats. The worker cannot ask questions.}
+
+	## Tests First (TDD)
+	{What tests to write before implementation. Be specific about test
+	cases, fixtures, and assertions. Reference the testing layers below.}
+
+	## Acceptance Criteria
+	{How the reviewer will judge this. Testable, binary conditions.}
+
+	## Target
+	{Repo, branch name, key files to create or modify.}
+
+	## Dependencies
+	{What's already been built that the worker can use. Paths to existing
+	code, installed packages, available test fixtures.}
+
+	## Constraints
+	{Things NOT to do. Boundaries of the task. What's out of scope.}
+	```
+
+	### Key rules
+
+	- Workers never dispatch sub-workers. Only managers dispatch. Flat hierarchy.
+	- Workers cannot read the PRD. Managers distill all needed context into the prompt.
+	- Workers cannot modify files outside their task scope. If they discover a needed change elsewhere, they report it and the manager creates a new task.
+	- Reviewers do not fix code. They identify issues and return specific feedback. The manager dispatches a fix worker.
+
+	---
+
+	## TDD Workflow
+
+	### Red-Green-Refactor for agent dispatch
+
+	The manager dispatches work in stages:
+
+	1. Red (write failing tests):
+	Worker writes tests based on acceptance criteria. Tests must fail (no implementation yet). Reviewer confirms tests are correct, sufficient, and actually fail.
+
+	2. Green (minimal implementation):
+	Worker (resumed or new, same worktree) writes the minimal code to pass the tests. Reviewer confirms all tests pass and implementation is sound.
+
+	3. Refactor (cleanup if needed):
+	If the reviewer flags quality issues during the Green review, the manager dispatches a cleanup worker.
+
+	Practical concession: For small, well-defined tasks (utility functions, config parsing, Pulumi resource definitions), the manager can dispatch Red+Green as a single prompt: "Write tests first, then implement. Commit tests and implementation separately." The reviewer still checks both. Use the staged dispatch for anything with ambiguity in requirements or interface design.
+
+	### Testing layers
+
+	\| Layer \| Tools \| Runs where \| Speed \| What it tests \|
+	\|-------\|-------\|-----------\|-------\|---------------\|
+	\| Unit \| pytest, moto, `/tmp` \| Worker environment \| Seconds \| Business logic, data transforms, auth checks \|
+	\| Integration \| pytest + real AWS \| Worker environment with AWS creds \| Seconds–minutes \| Lambda deployment, EFS mount, API Gateway routing, DynamoDB queries \|
+	\| E2E \| Phase gate scripts \| Human or manager at phase boundary \| Minutes \| Full user journeys \|
+
+	### What gets mocked at unit level
+
+	\| AWS Service \| Mock Tool \| Notes \|
+	\|-------------\|-----------\|-------\|
+	\| DynamoDB \| `moto` \| Full API coverage — queries, GSIs, conditions \|
+	\| EFS filesystem \| `/tmp` or `tempfile` \| Just regular file I/O once mounted \|
+	\| HTTP clients (Otterwiki API, WorkOS) \| `respx` or `pytest-httpx` \| For MCP server and auth middleware \|
+	\| Bedrock \| Stub returning fixed-dimension vectors \| Only needed in Phase 5+ (premium) \|
+	\| S3 \| `moto` \| For static hosting tests \|
+
+	### What must hit real AWS
+
+	- Lambda deployment and invocation
+	- EFS mount behavior and git operations on EFS
+	- VPC networking and endpoint routing
+	- API Gateway routing, custom domains, auth integration
+	- Cold start measurements
+	- CloudFront distribution behavior
+
+	### Test file conventions
+
+	- Test files mirror source files: `app/auth/middleware.py` → `tests/auth/test_middleware.py`
+	- Fixtures in `tests/conftest.py` (shared) or `tests/{module}/conftest.py` (module-specific)
+	- Integration tests marked with `@pytest.mark.integration` — skipped without AWS creds
+	- Use `moto` decorators (`@mock_aws`) for DynamoDB tests
+	- Each test function tests one behavior. Name format: `test_{action}_{condition}_{expected_result}`
+
+	---
+
+	## Pulumi Patterns
+
+	### Project structure
+
+	```
+	infra/
+	__main__.py # Top-level composition
+	config/
+	dev.yaml # Dev stack config (Pulumi.<stack>.yaml)
+	prod.yaml # Prod stack config
+	components/
+	vpc.py # VPC, subnets, security groups, endpoints
+	efs.py # EFS filesystem, mount targets, access points
+	lambda_functions.py # Lambda functions, layers, permissions
+	api_gateway.py # API Gateway, routes, integrations
+	dynamodb.py # Tables, GSIs, PITR config
+	dns.py # Route 53, ACM certificates
+	monitoring.py # CloudWatch alarms, dashboards
+	frontend.py # S3 bucket, CloudFront distribution
+	```
+
+	### Conventions
+
+	- Each component file exports a class or function that creates related resources
+	- Resources are tagged with `project: "wikibot-io"`, `environment: "{stack}"`, `phase: "P{n}"`
+	- Use `pulumi.Config()` for stack-specific values (domain names, feature flags)
+	- Secrets via `pulumi config set --secret` (dev); Secrets Manager (prod, Phase 4+)
+	- Outputs for cross-component references (e.g., VPC ID, EFS filesystem ID)
+
+	### Testing Pulumi code
+
+	Unit-test Pulumi code using `pulumi.runtime.set_mocks()`:
+	- Verify resource creation and configuration
+	- Check IAM policy documents
+	- Validate security group rules
+	- These are fast, no AWS needed
+
+	Integration testing is `pulumi up` against the dev stack, followed by verification scripts.
+
+	---
+
+	## Cross-Repo Coordination
+
+	### Repositories and ownership
+
+	\| Repo \| Changes \| Branch Strategy \|
+	\|------\|---------\|-----------------\|
+	\| `otterwiki` (fork) \| Mangum adapter, PROXY_HEADER enhancements \| PRs to `main`; admin panel hiding on `wikibot/prod` \|
+	\| `otterwiki-api` \| Lambda compatibility if needed \| PRs to `main` \|
+	\| `otterwiki-semantic-search` \| FAISS backend (alongside existing ChromaDB) \| PRs to `main` \|
+	\| `otterwiki-mcp` \| Streamable HTTP transport \| PRs to `main` \|
+	\| `wikibot-io` (new, private) \| Pulumi IaC, management API, auth middleware, frontend, CLI \| Feature branches \|
+
+	### Branch naming
+
+	Feature branches: `feat/P{phase}-{task-id}-{short-description}`
+
+	Examples:
+	- `feat/P0-1-pulumi-scaffold`
+	- `feat/P1-1-mangum-adapter`
+	- `feat/P2-4-management-api`
+
+	### Dependency protocol
+
+	When an upstream change (e.g., Mangum adapter in `otterwiki`) is a dependency for `wikibot-io` work:
+
+	1. The upstream PR must pass the Rule of Two (worker implements, reviewer approves)
+	2. The upstream PR must be merged
+	3. Only then can the downstream task that depends on it begin
+	4. The manager tracks this in the wiki status page
+
+	### Upstream contribution guidelines
+
+	Changes to the open source repos should:
+	- Pass the existing test suites
+	- Add tests for new functionality
+	- Not break existing interfaces
+	- Follow existing code style and patterns
+	- Be self-contained (no wikibot.io-specific logic)
+
+	---
+
+	## Wiki-Based Task Tracking
+
+	### Dev/ namespace
+
+	All development tracking lives in the wiki under the `Dev/` prefix. The existing Third Gulf War wiki serves as both the development tracker and the dogfood test.
+
+	### Page conventions
+
+	`Dev/Phase N Status` — Living document updated each session. Contains:
+	- Current state (which tasks are complete, in progress, blocked)
+	- Active decisions or blockers
+	- Next parallelism group to execute
+
+	`Dev/Phase N Summary` — Written once when the phase completes. Contains:
+	- What was implemented (with repo/branch references)
+	- Decisions made and rationale
+	- Deviations from the task graph
+	- New tasks discovered (filed as additions to the task graph or future phase work)
+	- Lessons learned
+
+	`Dev/Decision Log` — Append-only log of significant architectural or implementation decisions. Each entry: date, decision, context, alternatives considered, rationale.
+
+	### Session start protocol (for managers)
+
+	1. `read_note("Dev/Phase N Status")` — resume state
+	2. Check the task graph document for the current parallelism group
+	3. `get_recent_changes(limit=10)` — see if the human made any changes since last session
+	4. Proceed with dispatch
+
+	### Documentation loop
+
+	After each completed work unit, the manager writes:
+	1. What was implemented (specific files, functions, resources)
+	2. What decisions were made (and why)
+	3. Any deviations from the task description
+	4. New tasks that emerged
+	5. Test results (pass/fail, coverage)
+
+	This creates an audit trail that spans sessions and enables any manager (or the human) to pick up where the previous session left off.
+
+	---
+
+	## Code Style and Conventions
+
+	### Python
+
+	- Python 3.12+ (Lambda runtime)
+	- Type hints on all function signatures
+	- `pyproject.toml` for project metadata and dependencies
+	- `ruff` for linting and formatting
+	- No docstrings on obvious functions; docstrings on public APIs and non-obvious logic
+
+	### IaC
+
+	- Pulumi Python SDK
+	- Resource names: `{project}-{environment}-{resource}` (e.g., `wikibot-dev-efs`)
+	- All resources tagged consistently
+	- No hardcoded ARNs or account IDs — use `pulumi.Config()` or data sources
+
+	### Frontend
+
+	- Framework TBD (P3-1 decision — React or Svelte, manager decides based on team capabilities)
+	- TypeScript
+	- Vite for builds
+
+	### Commit messages
+
+	Format: `[P{phase}] {action}: {description}`
+
+	Examples:
+	```
+	[P0] Add: Pulumi scaffold with VPC and gateway endpoints
+	[P1] Add: Mangum adapter for Flask-on-Lambda
+	[P2] Fix: ACL check missing on wiki delete endpoint
+	[P2] Refactor: Extract auth middleware into shared module
+	```
+
+	For upstream repos, follow each repo's existing commit message conventions.
\	No newline at end of file

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9