Blame

9bec77 Claude (MCP) 2026-03-19 22:33:29
[mcp] Add E2E testing design document with post-consolidation plan
1
---
2
category: design
3
tags: [testing, playwright, e2e, infrastructure]
7e618f Claude (MCP) 2026-03-20 22:17:10
[mcp] Update E2E_Testing: reflect completed implementation (23 tests on main)
4
last_updated: 2026-03-20
9bec77 Claude (MCP) 2026-03-19 22:33:29
[mcp] Add E2E testing design document with post-consolidation plan
5
confidence: high
6
---
7
8
# E2E Testing
9
10
End-to-end testing for robot.wtf using Playwright and a mock ATProto PDS.
11
7e618f Claude (MCP) 2026-03-20 22:17:10
[mcp] Update E2E_Testing: reflect completed implementation (23 tests on main)
12
**Status: COMPLETE** — 23 tests on `main`, deployed 2026-03-20.
13
9bec77 Claude (MCP) 2026-03-19 22:33:29
[mcp] Add E2E testing design document with post-consolidation plan
14
## Current State
15
7e618f Claude (MCP) 2026-03-20 22:17:10
[mcp] Update E2E_Testing: reflect completed implementation (23 tests on main)
16
**23 passing E2E tests** on `main` across 4 files, plus 294 unit tests.
17
18
### Test files
9bec77 Claude (MCP) 2026-03-19 22:33:29
[mcp] Add E2E testing design document with post-consolidation plan
19
7e618f Claude (MCP) 2026-03-20 22:17:10
[mcp] Update E2E_Testing: reflect completed implementation (23 tests on main)
20
**`test_login_flow.py`** (5 tests):
9bec77 Claude (MCP) 2026-03-19 22:33:29
[mcp] Add E2E testing design document with post-consolidation plan
21
1. Login page loads
22
2. Client metadata endpoint serves valid ATProto OAuth metadata
23
3. Full OAuth login flow (mock PDS → PAR → consent → callback → cookie)
24
4. Logout clears cookie
7e618f Claude (MCP) 2026-03-20 22:17:10
[mcp] Update E2E_Testing: reflect completed implementation (23 tests on main)
25
5. OAuth callback error shows flash (not 500)
26
27
**`test_auth_flows.py`** (4 tests):
28
1. Auto-redirect when authenticated (visit `/auth/login` with valid cookie → `/app/`)
29
2. Return-to URL preservation across OAuth redirect chain
30
3. Login with handle (tests error handling for unresolvable mock handles)
31
4. Unauthenticated access redirects to login with `return_to`
32
33
**`test_wiki_lifecycle.py`** (8 tests):
34
1. Wiki creation form (slug/name → submit → redirect → MCP token visible)
35
2. Wiki settings update (change display_name → flash → persists on reload)
36
3. Wiki deletion with confirmation (expand danger zone → confirm slug → delete)
37
4. MCP token regeneration (click regen → JS confirm → new token in flash)
38
5. Dashboard redirects to existing wiki
39
6. Wiki deletion wrong slug rejected (confirm mismatch → flash error → wiki survives)
40
7. Wiki creation duplicate slug rejected
41
8. Wiki creation invalid slug rejected (bypasses browser validation, tests server-side)
42
9. Wiki settings steady state (no token flash, regenerate button present)
43
44
**`test_account.py`** (6 tests):
45
1. Account page renders (displays DID, handle)
46
2. Account deletion wrong confirmation (wrong handle → error flash)
47
3. MCP consent page renders (client info, wiki name, approve/deny buttons)
48
4. Account deletion (correct handle → cookie cleared → redirected)
49
5. MCP consent deny redirects with error
50
6. Wiki settings steady state page elements
51
52
### Infrastructure
53
54
- `tests/e2e/mock_pds.py` — In-process mock ATProto PDS with PKCE verification, thread-safe state
55
- `tests/e2e/conftest.py` — Fixtures for single platform server, authenticated pages, wiki creation
56
57
### Fixtures
58
59
- **`platform_server`** (session): Starts consolidated Flask app in daemon thread on a free port
60
- **`authenticated_page`** (function): Fresh browser context with valid `platform_token` cookie via direct JWT minting
61
- **`wiki_fixture`** (function): Creates wiki directly in DB + filesystem, cleans up after test
62
- **`destructive_page`** (function): Separate browser context for tests that destroy state
63
- **`pds`** (session): Mock PDS in daemon thread
64
- **`test_account`** (session): Test account on mock PDS
65
66
### Production code changes for test mode
67
68
Gated by `ALLOW_HTTP_PDS=true` + `FLASK_ENV=testing` (RuntimeError at import if either is wrong):
69
70
- `app/auth/atproto_security.py``_ALLOW_HTTP_PDS` flag, loopback SSRF relaxation
71
- `app/auth/atproto_identity.py``PLC_DIRECTORY_URL` read at request time, skip bidirectional handle verification
72
- `app/auth/atproto_oauth.py` — Relax HTTPS/port assertions on auth server metadata
73
- `app/platform_server.py``_SCHEME` variable, conditional `SESSION_COOKIE_SECURE`, conditional cookie `secure` flag, rate limiter disabled in test mode, limiter GC strong-reference fix
74
- `app/db.py``check_same_thread=False` scoped to `FLASK_ENV=testing`
75
76
### Bug fixes discovered during E2E work
77
78
- `resolve_did()` SSRF: Upgraded from plain `requests.get` to `hardened_http` (pre-existing vulnerability, elevated by injectable `PLC_DIRECTORY_URL`)
79
- Flask-Limiter GC: `Limiter` object garbage collected after `create_app()` returned due to weak references. Fixed with strong ref in `app.config["_LIMITER"]`
9bec77 Claude (MCP) 2026-03-19 22:33:29
[mcp] Add E2E testing design document with post-consolidation plan
80
81
## Architecture Notes
82
83
### Mock PDS
7e618f Claude (MCP) 2026-03-20 22:17:10
[mcp] Update E2E_Testing: reflect completed implementation (23 tests on main)
84
The mock PDS (`tests/e2e/mock_pds.py`) implements the full ATProto OAuth flow:
85
- Account creation/session management
86
- OAuth AS metadata, protected resource metadata
87
- PAR, authorize (HTML form), token exchange with PKCE S256 verification
88
- DID document serving (acts as PLC directory)
89
- Thread-safe global state with `threading.Lock`
9bec77 Claude (MCP) 2026-03-19 22:33:29
[mcp] Add E2E testing design document with post-consolidation plan
90
91
All on `127.0.0.1` to avoid IPv6 resolution issues.
92
93
### Test mode env vars
94
- `ALLOW_HTTP_PDS=true` — relaxes SSRF protections for loopback HTTP (guarded by `FLASK_ENV=testing`)
7e618f Claude (MCP) 2026-03-20 22:17:10
[mcp] Update E2E_Testing: reflect completed implementation (23 tests on main)
95
- `PLC_DIRECTORY_URL` — points at mock PDS for DID resolution (read at request time in `resolve_did()`)
9bec77 Claude (MCP) 2026-03-19 22:33:29
[mcp] Add E2E testing design document with post-consolidation plan
96
- `PLATFORM_DOMAIN=127.0.0.1:{port}` — makes CLIENT_ID/REDIRECT_URI use HTTP
97
- `WIKI_TEMPLATE_DIR` — pointed at nonexistent path for predictable fallback behavior
afe8dc Claude (MCP) 2026-03-20 22:58:22
[mcp] Add E2E future directions to design doc
98
99
## Future Directions (priority order)
100
101
### 1. Resolver permission tests (HIGH)
102
The `TenantResolver` is the only thing preventing cross-tenant access. No E2E test hits a wiki subdomain. The `is_bearer_token` bypass, `_apply_wiki_access_restrictions`, and the internal API key path are untested end-to-end. Requires routing to a second Host in the test environment (Playwright supports `set_extra_http_headers`).
103
104
### 2. Multi-user fixtures (HIGH)
105
Single test account means ownership isolation is untested. Add `test_account_b` (mock PDS already supports multiple accounts). Test: user B cannot access user A's wiki settings, user B gets appropriate access level on user A's wiki content.
106
107
### 3. Fix CI pipeline (HIGH, low effort)
108
Current `ci.yml` doesn't install Playwright browsers. Needs: `playwright install chromium`, separate unit/E2E jobs, browser caching (`~/.cache/ms-playwright`), `--screenshot=only-on-failure` artifacts, `--timeout=60`.
109
110
### 4. Infrastructure hardening (MEDIUM)
111
- Port allocation race: bind-then-close gap before `make_server`. Pass bound socket directly.
112
- Silent teardown: `wiki_fixture` swallows cleanup exceptions. Log them.
113
- Session-scoped `page` fixture leaks state between tests.
114
115
### 5. MCP consent + tool invocation E2E (MEDIUM)
116
The MCP server (`otterwiki-mcp/` repo, separate from `mcp_entry.py` sidecar) has 12 real tools wrapping the REST API. E2E testing the full flow — consent → token → tool invocation — is feasible now. The consent HMAC signing is security-critical.
117
118
### 6. Rate limit enforcement (LOW)
119
One test: 6 rapid writes, assert 6th returns 429. Catches wiring bugs where the limiter is instantiated but never called.
120
121
### 7. Otterwiki integration (DEFERRED)
122
Full path: login → create wiki → visit subdomain → see content. Requires otterwiki installed in CI and subprocess management. Defer until CI infrastructure is more mature.