Commit 70bda8
2026-03-13 01:47:16 Claude (Dev): [mcp] Port phase gates to wiki| /dev/null .. design/phase gates.md | |
| @@ 0,0 1,265 @@ | |
| + | # Wikibot.io Phase Gates — Phases 0–4 |
| + | |
| + | This document defines exit criteria and validation procedures for each phase boundary. The human reviews these before giving the go-ahead to proceed. |
| + | |
| + | --- |
| + | |
| + | ## How phase gates work |
| + | |
| + | 1. Phase manager completes all tasks in the phase |
| + | 2. Manager writes `Dev/Phase N Summary` to the wiki with results |
| + | 3. Human reviews the summary and this document's exit criteria |
| + | 4. Human performs the validation steps below |
| + | 5. Go/no-go decision — if go, the next phase's manager can start |
| + | |
| + | --- |
| + | |
| + | ## Phase 0 Gate: Proof of Concept |
| + | |
| + | ### Exit criteria |
| + | |
| + | | Criterion | Target | How to verify | |
| + | |-----------|--------|---------------| |
| + | | EFS warm read latency | < 500ms | Benchmark results in `Dev/Phase 0 — EFS Benchmarks` | |
| + | | EFS warm write latency | < 1s | Benchmark results | |
| + | | Lambda cold start (VPC + EFS) | < 5s | Benchmark results | |
| + | | Concurrent reads | No errors | Benchmark results (3+ simultaneous) | |
| + | | Concurrent writes | Serialized correctly | Benchmark results (5 simultaneous, git locking) | |
| + | | MCP OAuth via WorkOS | Working | Claude.ai connects and calls echo tool | |
| + | | Git library decision | Documented | Decision in wiki or PR description | |
| + | | Apple provider sub | Status known | Documented (verified or flagged as unavailable) | |
| + | | Billing alarm | Active | Check AWS Budgets console | |
| + | |
| + | ### Validation steps |
| + | |
| + | 1. **Review benchmarks.** Read `Dev/Phase 0 — EFS Benchmarks` wiki note. Are the numbers acceptable? Do they leave headroom for the full Otterwiki app (which will be heavier than the PoC)? |
| + | |
| + | 2. **Test MCP yourself.** Open Claude.ai, connect to the PoC MCP endpoint. Call the echo tool. Verify the OAuth flow is smooth enough for end users. |
| + | |
| + | 3. **Check costs.** Review the AWS bill or Cost Explorer. Is the dev stack cost in line with expectations (~$0.50/mo baseline)? |
| + | |
| + | 4. **Review decisions.** Read the git library decision (gitpython vs. dulwich). Does the rationale make sense? |
| + | |
| + | 5. **Check the Pulumi state.** Run `pulumi stack` to verify the dev stack is clean and all resources are tracked. |
| + | |
| + | ### Known risks to evaluate |
| + | |
| + | - **VPC cold starts:** If > 5s, consider Provisioned Concurrency (~$10-15/mo for 1 warm instance). Is the cost acceptable? |
| + | - **EFS latency variance:** NFS latency can spike under load. Are the P95 numbers acceptable, not just averages? |
| + | - **WorkOS quirks:** Any unexpected behavior in the OAuth flow? Token lifetimes? Refresh behavior? |
| + | |
| + | ### Go/no-go decision |
| + | |
| + | - **Go** if all targets met and no surprising cost or latency issues |
| + | - **No-go** if EFS latency is unacceptable → evaluate Fly.io fallback (PRD section: Alternatives Considered) |
| + | - **No-go** if MCP OAuth doesn't work → investigate alternative auth providers or debug WorkOS integration |
| + | |
| + | --- |
| + | |
| + | ## Phase 1 Gate: Single-User Serverless Wiki |
| + | |
| + | ### Exit criteria |
| + | |
| + | | Criterion | Target | How to verify | |
| + | |-----------|--------|---------------| |
| + | | Web UI works | Pages load, edit, save | Browse `dev.wikibot.io` | |
| + | | REST API works | All endpoints respond correctly | Run integration tests | |
| + | | MCP works | All 12 tools functional | Connect Claude.ai or Claude Code, exercise tools | |
| + | | Semantic search works | Returns relevant results | Search for a concept, verify results | |
| + | | Git history | Correct authorship per write path | Check git log on EFS | |
| + | | Routing + TLS | All endpoints on custom domain with valid cert | Browser + curl | |
| + | | Architecture decision | Same vs. separate Lambda for MCP | Documented with rationale | |
| + | |
| + | ### Validation steps |
| + | |
| + | 1. **Browse the wiki.** Go to `dev.wikibot.io`. Create a page with WikiLinks. Verify the web UI is responsive and functional. |
| + | |
| + | 2. **Test the API.** Run the integration test suite or manually curl a few endpoints: |
| + | ``` |
| + | curl -H "Authorization: Bearer $KEY" https://dev.wikibot.io/api/v1/pages |
| + | curl -H "Authorization: Bearer $KEY" https://dev.wikibot.io/api/v1/search?q=test |
| + | ``` |
| + | |
| + | 3. **Test MCP.** Connect Claude.ai to `dev.wikibot.io/mcp`. Exercise read_note, write_note, search_notes, semantic_search. Verify results are correct. |
| + | |
| + | 4. **Check semantic search quality.** Write a few test pages, then search for concepts using different wording. Are the results relevant? |
| + | |
| + | 5. **Check git authorship.** Pages created via web UI should show one author; pages via API/MCP should show the configured API author. Verify in git log. |
| + | |
| + | 6. **Performance sanity check.** Is the web UI snappy enough? Do API calls return in < 1s? Is MCP responsive? |
| + | |
| + | ### Known risks to evaluate |
| + | |
| + | - **Mangum compatibility:** Any Flask features that don't work under Mangum? (Sessions, file uploads, streaming responses) |
| + | - **FAISS index persistence:** Does the index survive Lambda recycling? Is it loaded fast enough on cold start? |
| + | - **Lambda package size:** Is the deployment package under 250MB (zip) or 10GB (container)? If too large, container images may be needed. |
| + | |
| + | ### Go/no-go decision |
| + | |
| + | - **Go** if all features work and performance is acceptable |
| + | - **Partial go** if semantic search has issues → defer to Phase 5 (it's a premium feature anyway) |
| + | - **No-go** if core wiki functionality is broken or too slow |
| + | |
| + | --- |
| + | |
| + | ## Phase 2 Gate: Multi-Tenancy and Auth |
| + | |
| + | ### Exit criteria |
| + | |
| + | | Criterion | Target | How to verify | |
| + | |-----------|--------|---------------| |
| + | | Multi-user auth | Two users can log in independently | Test with two accounts | |
| + | | Wiki isolation | User A cannot access User B's private wiki | ACL enforcement test | |
| + | | Management API | All endpoints work | Integration tests | |
| + | | ACL enforcement | All roles enforced correctly | E2E test (P2-10) | |
| + | | Public wikis | Anonymous read access works | Test without auth | |
| + | | CLI tool | All commands work | Run each command | |
| + | | Bootstrap template | New wikis initialized correctly | Create wiki, inspect pages | |
| + | | Admin panel hiding | Disabled sections hidden and return 404 | Browse admin as owner | |
| + | | PROXY_HEADER auth | All permission levels work | Test each role | |
| + | | Username handling | Validation, uniqueness, reserved names | Attempt invalid usernames | |
| + | |
| + | ### Validation steps |
| + | |
| + | 1. **Create two test accounts.** Sign up as two different users (two Google accounts or Google + GitHub). |
| + | |
| + | 2. **Test isolation.** User A creates a private wiki. Log in as User B. Verify User B cannot see, list, or access User A's wiki via web UI, API, or MCP. |
| + | |
| + | 3. **Test ACLs.** User A grants User B editor access. Verify User B can now read and write. User A revokes. Verify 403. |
| + | |
| + | 4. **Test public wiki.** User A makes a wiki public. Open in incognito (no auth). Verify read-only access. |
| + | |
| + | 5. **Test the CLI.** Run through the full CLI workflow: |
| + | ``` |
| + | wiki create my-wiki "My Wiki" |
| + | wiki list |
| + | wiki token my-wiki |
| + | wiki grant my-wiki friend@example.com editor |
| + | wiki revoke my-wiki friend@example.com |
| + | wiki delete my-wiki |
| + | ``` |
| + | |
| + | 6. **Inspect bootstrap template.** Create a new wiki, then read the Home page and Wiki Usage Guide. Are they clear and complete? |
| + | |
| + | 7. **Test admin panel.** Log in as wiki owner, go to admin panel. Verify only Application Preferences, Sidebar Preferences, and Content and Editing are visible. Verify disabled routes return 404. |
| + | |
| + | 8. **Test tier limits.** As a free user, try to create a second wiki. Try to add a 4th collaborator. Verify clear error messages. |
| + | |
| + | ### Known risks to evaluate |
| + | |
| + | - **DynamoDB latency:** ACL checks add a DynamoDB read to every request. Is the added latency acceptable? Should we cache in Lambda memory? |
| + | - **WorkOS token lifetimes:** How long do MCP OAuth tokens last before refresh? Does Claude.ai handle refresh correctly? |
| + | - **Username squatting:** Not a launch concern, but are the reserved name checks in place? |
| + | |
| + | ### Go/no-go decision |
| + | |
| + | - **Go** if multi-tenancy works correctly and securely |
| + | - **No-go** if tenant isolation has any gaps — this is a security requirement, not a feature |
| + | |
| + | --- |
| + | |
| + | ## Phase 3 Gate: Frontend |
| + | |
| + | ### Exit criteria |
| + | |
| + | | Criterion | Target | How to verify | |
| + | |-----------|--------|---------------| |
| + | | SPA loads | Dashboard accessible after login | Browser test | |
| + | | Auth flow | Login, logout, token refresh | Test each flow | |
| + | | Wiki CRUD | Create, view, delete wiki from UI | Browser test | |
| + | | Collaborator management | Invite, change role, revoke from UI | Browser test | |
| + | | MCP instructions | Correct, copyable command | Verify command works | |
| + | | Public wiki toggle | Works from settings UI | Toggle and verify | |
| + | | Static hosting | SPA served via CloudFront | Check response headers | |
| + | | Mobile responsive | Usable on phone | Browser dev tools or real device | |
| + | |
| + | ### Validation steps |
| + | |
| + | 1. **Full user journey.** In a fresh incognito window: |
| + | - Visit `wikibot.io` → see landing/login |
| + | - Log in with Google → land on dashboard |
| + | - Create a wiki → see token and instructions |
| + | - Copy `claude mcp add` command → paste in terminal → verify MCP connects |
| + | - Go to wiki settings → invite a collaborator, toggle public |
| + | - Log out → verify redirected to login |
| + | |
| + | 2. **Test on mobile.** Open `wikibot.io` on a phone. Is the dashboard usable? Can you create a wiki? |
| + | |
| + | 3. **Check error states.** What happens when the API is down? When you enter an invalid wiki slug? When you try to create a wiki and hit the tier limit? |
| + | |
| + | 4. **Performance.** How fast does the SPA load? Is the bundle size reasonable (< 500KB gzipped)? |
| + | |
| + | ### Known risks to evaluate |
| + | |
| + | - **Framework choice:** Does the chosen framework (React or Svelte) feel right? Any regrets? |
| + | - **Auth UX:** Is the login flow smooth? Any confusing redirects or error messages? |
| + | - **MCP instructions clarity:** Would a new user understand how to connect? Test with fresh eyes. |
| + | |
| + | ### Go/no-go decision |
| + | |
| + | - **Go** if the user journey works end-to-end and the UX is acceptable |
| + | - **Go with issues** if cosmetic issues remain — they can be fixed post-launch |
| + | - **No-go** if auth flow or core wiki management is broken |
| + | |
| + | --- |
| + | |
| + | ## Phase 4 Gate: Launch Readiness |
| + | |
| + | ### Exit criteria |
| + | |
| + | | Criterion | Target | How to verify | |
| + | |-----------|--------|---------------| |
| + | | Git clone | `git clone` works for authorized users | Test from command line | |
| + | | Git auth | Bearer token authentication works | Clone with token | |
| + | | WAF active | Rate limiting and OWASP rules | Check WAF console, test rate limit | |
| + | | Monitoring | Dashboard shows traffic, alarms configured | CloudWatch console | |
| + | | Backups | EFS backup running, DynamoDB PITR active | AWS Backup console, DynamoDB settings | |
| + | | Backup restore | Tested at least once | Restore test documented | |
| + | | Landing page | Loads, explains product, has CTA | Browser test | |
| + | | Docs | Getting started, MCP setup documented | Read through docs | |
| + | |
| + | ### Validation steps |
| + | |
| + | 1. **Test git clone.** Create a wiki, write a few pages via MCP, then: |
| + | ``` |
| + | git clone https://token:<bearer>@<user>.wikibot.io/<wiki>.git |
| + | ``` |
| + | Verify the repo contains the expected pages. |
| + | |
| + | 2. **Test rate limiting.** Hit an endpoint rapidly (> 100 requests/minute). Verify WAF blocks with 429 or 403. Verify normal usage is not affected. |
| + | |
| + | 3. **Review monitoring.** Look at the CloudWatch dashboard. Does it show the traffic from your testing? Are all panels populated? |
| + | |
| + | 4. **Test backup restore.** Restore an EFS backup snapshot. Verify the git repo is intact. This can be a throwaway test — restore to a new filesystem, mount it, inspect, delete. |
| + | |
| + | 5. **Review landing page.** Read it as if you've never heard of wikibot.io. Does it explain the product clearly? Does the CTA lead to signup? |
| + | |
| + | 6. **Read the docs.** Follow the "Getting Started" guide from scratch. Does it work? |
| + | |
| + | 7. **Security review.** Check: |
| + | - No sensitive data in public repos or frontend bundle |
| + | - API keys rotated from development values |
| + | - WAF rules active |
| + | - No open security group rules |
| + | - All endpoints require auth (except public wikis and landing page) |
| + | |
| + | 8. **Cost review.** Check AWS bill. Is the total in line with projections? Any unexpected charges? |
| + | |
| + | ### Final checklist before launch |
| + | |
| + | - [ ] All Phase 0–4 exit criteria met |
| + | - [ ] No critical bugs in the issue tracker |
| + | - [ ] Backup restore tested successfully |
| + | - [ ] Monitoring alarms tested (at least one alarm fired and notified) |
| + | - [ ] Landing page and docs reviewed |
| + | - [ ] DNS configured for production domain (`wikibot.io`) |
| + | - [ ] Production Pulumi stack deployed (separate from dev) |
| + | - [ ] Production secrets rotated from dev values |
| + | - [ ] WAF active on production |
| + | - [ ] WorkOS configured for production domain |
| + | |
| + | ### Go/no-go decision |
| + | |
| + | - **Go** if all checklist items pass |
| + | - **Soft launch** if minor issues remain — launch to a small group, fix in production |
| + | - **No-go** if security issues, data loss risks, or fundamental UX problems remain |
| \ | No newline at end of file |