Blame

9bec77 Claude (MCP) 2026-03-19 22:33:29
[mcp] Add E2E testing design document with post-consolidation plan
1
---
2
category: design
3
tags: [testing, playwright, e2e, infrastructure]
4
last_updated: 2026-03-19
5
confidence: high
6
---
7
8
# E2E Testing
9
10
End-to-end testing for robot.wtf using Playwright and a mock ATProto PDS.
11
12
## Current State
13
14
**Branch**: `feat/e2e-tests` on `robot.wtf` repo (4 commits)
15
16
**4 passing tests** in `tests/e2e/test_login_flow.py`:
17
1. Login page loads
18
2. Client metadata endpoint serves valid ATProto OAuth metadata
19
3. Full OAuth login flow (mock PDS → PAR → consent → callback → cookie)
20
4. Logout clears cookie
21
22
**Infrastructure built**:
23
- `tests/e2e/mock_pds.py` — In-process mock ATProto PDS (OAuth endpoints, account creation, DID docs)
24
- `tests/e2e/conftest.py` — Fixtures for PDS (mock or Docker), test account, key generation, auth server in background thread, Playwright browser context
25
- `tests/e2e/docker-compose.yml` — Real PDS for CI environments with Docker
26
- `.github/workflows/e2e.yml` — CI workflow
27
- Production code changes gated by `ALLOW_HTTP_PDS` env var (requires `FLASK_ENV=testing`, raises RuntimeError otherwise)
28
29
**Partially written test files** (on branch, need fixtures):
30
- `tests/e2e/test_wiki_lifecycle.py` — 4 tests written by Agent B
31
- `tests/e2e/test_account.py` — 4 tests written by Agent C (MCP consent test marked skip)
32
33
## Blocked On
34
35
Server consolidation (see [[Design/Server_Consolidation]]). The auth_server and api_server are separate Flask apps on different ports. This causes:
36
- Cookie cross-port sharing failures
37
- SQLite threading issues with two in-process Flask servers
38
- An implementation agent burned its entire context trying to work around this
39
40
Once auth + api are merged into a single Flask app, the E2E fixtures simplify dramatically.
41
42
## Plan After Consolidation
43
44
### Step 1: Simplify conftest.py
45
46
The `auth_server` fixture currently starts only the auth Flask app. Post-consolidation, it starts the single platform app, which serves both `/auth/*` and `/app/*` routes. Rename to `platform_server` or just `server`.
47
48
No `management_server` fixture needed — it's the same app.
49
50
No cross-port cookie injection needed — same origin.
51
52
### Step 2: Add new fixtures
53
54
**`authenticated_page`** (function-scoped):
55
- Logs in via mock PDS OAuth flow
56
- Returns a Playwright page with valid `platform_token` cookie
57
- Cookie works for all routes (same origin)
58
59
**`wiki_fixture`** (function-scoped):
60
- Creates a wiki directly in DB + filesystem (bypasses route to avoid tier limits)
61
- Calls `WikiModel.create()` + `_init_wiki_repo()` + `_init_wiki_db()`
62
- Cleans up wiki dir + DB row after test
63
64
**`destructive_page`** (function-scoped):
65
- Separate browser context for tests that destroy state (account deletion, wiki deletion)
66
- Prevents cookie/state pollution to other tests
67
68
### Step 3: Implement 11 additional tests
69
70
#### Auth flows (`test_auth_flows.py`, 3 tests)
71
- **Auto-redirect when authenticated**: Visit `/auth/login` with valid cookie → redirects to `/app/`
72
- **Return-to URL preservation**: `return_to` parameter survives the full OAuth redirect chain
73
- **Login with DID**: Login using DID instead of handle (may be redundant with existing test_oauth_login — check)
74
75
#### Wiki lifecycle (`test_wiki_lifecycle.py`, 4 tests — already written)
76
- **Wiki creation form**: Fill slug/name → submit → redirect to settings → MCP token visible
77
- **Wiki settings update**: Change display_name → flash message → persists on reload
78
- **Wiki deletion with confirmation**: Expand danger zone → confirm slug → delete → flash
79
- **MCP token regeneration**: Click regen → JS confirm dialog → new token in flash
80
81
#### Account management (`test_account.py`, 3 tests — already written)
82
- **Account page renders**: Displays DID, handle, created_at from JWT claims
83
- **Account deletion**: Confirm handle → cookie cleared → cascading wiki delete
84
- **Account deletion wrong confirmation**: Wrong handle → stays on page → error flash
85
86
#### MCP consent (`test_account.py`, 1 test — already written, marked skip)
87
- **MCP consent page renders**: Consent page shows client info, wiki name, approve/deny buttons
88
89
### Step 4: Verify existing test files
90
91
Agent B's `test_wiki_lifecycle.py` and Agent C's `test_account.py` were written against fixture signatures that may differ from the simplified post-consolidation conftest. Review and update selectors/fixture names before running.
92
93
### Step 5: Run full suite
94
95
All 15 tests (4 existing + 11 new) should pass. Run unit tests too to verify no regressions.
96
97
## Architecture Notes
98
99
### Mock PDS
100
The mock PDS (`tests/e2e/mock_pds.py`) implements:
101
- `POST /xrpc/com.atproto.server.createAccount` — creates test accounts with `did:plc:` DIDs
102
- `POST /xrpc/com.atproto.server.createSession` — handles re-use of existing accounts
103
- `GET /.well-known/oauth-authorization-server` — AS metadata
104
- `GET /.well-known/oauth-protected-resource` — protected resource metadata
105
- `POST /oauth/par` — Pushed Authorization Request
106
- `GET/POST /oauth/authorize` — Login + consent form (simple HTML, not React SPA)
107
- `POST /oauth/token` — Token exchange (skips PKCE verification)
108
- `GET /did:plc:*` — DID document serving (acts as PLC directory)
109
110
All on `127.0.0.1` to avoid IPv6 resolution issues.
111
112
### Test mode env vars
113
- `ALLOW_HTTP_PDS=true` — relaxes SSRF protections for loopback HTTP (guarded by `FLASK_ENV=testing`)
114
- `PLC_DIRECTORY_URL` — points at mock PDS for DID resolution
115
- `PLATFORM_DOMAIN=127.0.0.1:{port}` — makes CLIENT_ID/REDIRECT_URI use HTTP
116
- `WIKI_TEMPLATE_DIR` — pointed at nonexistent path for predictable fallback behavior
117
118
### Docker vs mock
119
Conftest has a 3-tier fallback: external PDS already running → Docker Compose → in-process mock. CI with Docker gets the real PDS; devcontainers without Docker get the mock. The mock is sufficient for all current tests.