---
category: design
tags: [testing, playwright, e2e, infrastructure]
last_updated: 2026-03-20
confidence: high
---

# E2E Testing

End-to-end testing for robot.wtf using Playwright and a mock ATProto PDS.

**Status: COMPLETE** — 23 tests on `main`, deployed 2026-03-20.

## Current State

**23 passing E2E tests** on `main` across 4 files, plus 294 unit tests.

### Test files

**`test_login_flow.py`** (5 tests):
1. Login page loads
2. Client metadata endpoint serves valid ATProto OAuth metadata
3. Full OAuth login flow (mock PDS → PAR → consent → callback → cookie)
4. Logout clears cookie
5. OAuth callback error shows flash (not 500)

**`test_auth_flows.py`** (4 tests):
1. Auto-redirect when authenticated (visit `/auth/login` with valid cookie → `/app/`)
2. Return-to URL preservation across OAuth redirect chain
3. Login with handle (tests error handling for unresolvable mock handles)
4. Unauthenticated access redirects to login with `return_to`

**`test_wiki_lifecycle.py`** (8 tests):
1. Wiki creation form (slug/name → submit → redirect → MCP token visible)
2. Wiki settings update (change display_name → flash → persists on reload)
3. Wiki deletion with confirmation (expand danger zone → confirm slug → delete)
4. MCP token regeneration (click regen → JS confirm → new token in flash)
5. Dashboard redirects to existing wiki
6. Wiki deletion wrong slug rejected (confirm mismatch → flash error → wiki survives)
7. Wiki creation duplicate slug rejected
8. Wiki creation invalid slug rejected (bypasses browser validation, tests server-side)
9. Wiki settings steady state (no token flash, regenerate button present)

**`test_account.py`** (6 tests):
1. Account page renders (displays DID, handle)
2. Account deletion wrong confirmation (wrong handle → error flash)
3. MCP consent page renders (client info, wiki name, approve/deny buttons)
4. Account deletion (correct handle → cookie cleared → redirected)
5. MCP consent deny redirects with error
6. Wiki settings steady state page elements

### Infrastructure

- `tests/e2e/mock_pds.py` — In-process mock ATProto PDS with PKCE verification, thread-safe state
- `tests/e2e/conftest.py` — Fixtures for single platform server, authenticated pages, wiki creation

### Fixtures

- **`platform_server`** (session): Starts consolidated Flask app in daemon thread on a free port
- **`authenticated_page`** (function): Fresh browser context with valid `platform_token` cookie via direct JWT minting
- **`wiki_fixture`** (function): Creates wiki directly in DB + filesystem, cleans up after test
- **`destructive_page`** (function): Separate browser context for tests that destroy state
- **`pds`** (session): Mock PDS in daemon thread
- **`test_account`** (session): Test account on mock PDS

### Production code changes for test mode

Gated by `ALLOW_HTTP_PDS=true` + `FLASK_ENV=testing` (RuntimeError at import if either is wrong):

- `app/auth/atproto_security.py``_ALLOW_HTTP_PDS` flag, loopback SSRF relaxation
- `app/auth/atproto_identity.py``PLC_DIRECTORY_URL` read at request time, skip bidirectional handle verification
- `app/auth/atproto_oauth.py` — Relax HTTPS/port assertions on auth server metadata
- `app/platform_server.py``_SCHEME` variable, conditional `SESSION_COOKIE_SECURE`, conditional cookie `secure` flag, rate limiter disabled in test mode, limiter GC strong-reference fix
- `app/db.py``check_same_thread=False` scoped to `FLASK_ENV=testing`

### Bug fixes discovered during E2E work

- `resolve_did()` SSRF: Upgraded from plain `requests.get` to `hardened_http` (pre-existing vulnerability, elevated by injectable `PLC_DIRECTORY_URL`)
- Flask-Limiter GC: `Limiter` object garbage collected after `create_app()` returned due to weak references. Fixed with strong ref in `app.config["_LIMITER"]`

## Architecture Notes

### Mock PDS
The mock PDS (`tests/e2e/mock_pds.py`) implements the full ATProto OAuth flow:
- Account creation/session management
- OAuth AS metadata, protected resource metadata
- PAR, authorize (HTML form), token exchange with PKCE S256 verification
- DID document serving (acts as PLC directory)
- Thread-safe global state with `threading.Lock`

All on `127.0.0.1` to avoid IPv6 resolution issues.

### Test mode env vars
- `ALLOW_HTTP_PDS=true` — relaxes SSRF protections for loopback HTTP (guarded by `FLASK_ENV=testing`)
- `PLC_DIRECTORY_URL` — points at mock PDS for DID resolution (read at request time in `resolve_did()`)
- `PLATFORM_DOMAIN=127.0.0.1:{port}` — makes CLIENT_ID/REDIRECT_URI use HTTP
- `WIKI_TEMPLATE_DIR` — pointed at nonexistent path for predictable fallback behavior

## Future Directions (priority order)

### 1. Resolver permission tests (HIGH)
The `TenantResolver` is the only thing preventing cross-tenant access. No E2E test hits a wiki subdomain. The `is_bearer_token` bypass, `_apply_wiki_access_restrictions`, and the internal API key path are untested end-to-end. Requires routing to a second Host in the test environment (Playwright supports `set_extra_http_headers`).

### 2. Multi-user fixtures (HIGH)
Single test account means ownership isolation is untested. Add `test_account_b` (mock PDS already supports multiple accounts). Test: user B cannot access user A's wiki settings, user B gets appropriate access level on user A's wiki content.

### 3. Fix CI pipeline (HIGH, low effort)
Current `ci.yml` doesn't install Playwright browsers. Needs: `playwright install chromium`, separate unit/E2E jobs, browser caching (`~/.cache/ms-playwright`), `--screenshot=only-on-failure` artifacts, `--timeout=60`.

### 4. Infrastructure hardening (MEDIUM)
- Port allocation race: bind-then-close gap before `make_server`. Pass bound socket directly.
- Silent teardown: `wiki_fixture` swallows cleanup exceptions. Log them.
- Session-scoped `page` fixture leaks state between tests.

### 5. MCP consent + tool invocation E2E (MEDIUM)
The MCP server (`otterwiki-mcp/` repo, separate from `mcp_entry.py` sidecar) has 12 real tools wrapping the REST API. E2E testing the full flow — consent → token → tool invocation — is feasible now. The consent HMAC signing is security-critical.

### 6. Rate limit enforcement (LOW)
One test: 6 rapid writes, assert 6th returns 429. Catches wiring bugs where the limiter is instantiated but never called.

### 7. Otterwiki integration (DEFERRED)
Full path: login → create wiki → visit subdomain → see content. Requires otterwiki installed in CI and subprocess management. Defer until CI infrastructure is more mature.
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9