Commit 752837
2026-03-19 21:31:11 Claude (MCP): [mcp] Add server consolidation design document| /dev/null .. Design/Server_Consolidation.md | |
| @@ 0,0 1,174 @@ | |
| + | --- |
| + | category: design |
| + | tags: [architecture, infrastructure, refactoring] |
| + | last_updated: 2026-03-19 |
| + | confidence: high |
| + | --- |
| + | |
| + | # Server Consolidation: Merge auth_server + api_server |
| + | |
| + | Merge the auth service (port 8003) and management/API service (port 8002) into a single Flask app on port 8002. Otterwiki (port 8000) and MCP sidecar (port 8001) remain separate. |
| + | |
| + | ## Motivation |
| + | |
| + | auth_server and api_server are tightly coupled: |
| + | - Same database (robot.db) |
| + | - Same models (UserModel, WikiModel) |
| + | - Same signing keys (RSA for JWT, EC for ATProto client) |
| + | - Same cookie (platform_token on .robot.wtf) |
| + | - Same user identity model (DID-based) |
| + | |
| + | The separation creates concrete problems: |
| + | - **E2E testing**: Cookies set on port 8003 aren't sent to port 8002. Standing up two Flask servers in test fixtures requires cross-thread SQLite coordination. An implementation agent burned its entire context trying to solve this. |
| + | - **Operational overhead**: Two systemd services, two Gunicorn configs, two health checks for what is logically one "platform" service. |
| + | - **Code duplication**: Both apps call `_load_keys()`, `get_connection()`, `init_schema()` independently. Both have their own rate limiting setup. |
| + | |
| + | ## Current Architecture |
| + | |
| + | ``` |
| + | Caddy (TLS, port 80/443) |
| + | ├─ robot.wtf/auth/* → robot-auth (Gunicorn, port 8003, auth_server.py) |
| + | ├─ robot.wtf/app/* → robot-api (Gunicorn, port 8002, api_server.py) |
| + | ├─ robot.wtf/api/* → robot-api (port 8002) |
| + | ├─ {slug}.robot.wtf/mcp → robot-mcp (uvicorn, port 8001) |
| + | ├─ {slug}.robot.wtf/api/v1/* → robot-otterwiki (Gunicorn, port 8000, wsgi.py) |
| + | └─ {slug}.robot.wtf/* → robot-otterwiki (port 8000) |
| + | ``` |
| + | |
| + | Four processes, three entry points (`auth_server:application`, `api_server:application`, `wsgi:application`), plus the MCP sidecar. |
| + | |
| + | ## Target Architecture |
| + | |
| + | ``` |
| + | Caddy (TLS, port 80/443) |
| + | ├─ robot.wtf/auth/* → robot-platform (Gunicorn, port 8002, platform_server.py) |
| + | ├─ robot.wtf/app/* → robot-platform (port 8002) |
| + | ├─ robot.wtf/api/* → robot-platform (port 8002) |
| + | ├─ {slug}.robot.wtf/mcp → robot-mcp (uvicorn, port 8001) |
| + | ├─ {slug}.robot.wtf/api/v1/* → robot-otterwiki (Gunicorn, port 8000, wsgi.py) |
| + | └─ {slug}.robot.wtf/* → robot-otterwiki (port 8000) |
| + | ``` |
| + | |
| + | Three processes, two entry points. Caddy routing unchanged (both `/auth/*` and `/app/*` already route to the same IP, just different ports — changing to same port is a one-line edit). |
| + | |
| + | ## What Changes |
| + | |
| + | ### New: `app/platform_server.py` |
| + | |
| + | Single Flask app factory that combines auth and management routes: |
| + | |
| + | ```python |
| + | def create_app(*, db_path=None, client_jwk_path=None, signing_key_path=None): |
| + | app = Flask(__name__, template_folder="templates") |
| + | # templates/auth/ — login.html, consent.html, error.html, base.html |
| + | # templates/management/ — layout.html, wiki_create.html, etc. |
| + | |
| + | # Shared setup: secret key, keys, DB, models, rate limiter |
| + | ... |
| + | |
| + | # Auth routes (/auth/*) |
| + | _register_auth_routes(app, platform_jwt, client_secret_jwk, ...) |
| + | |
| + | # Management UI routes (/app/*) |
| + | _register_management_ui_routes(app, platform_jwt, wiki_model, user_model, ...) |
| + | |
| + | # Management API routes (/api/*) |
| + | _register_management_api_routes(app, wiki_model, user_model, ...) |
| + | |
| + | # Well-known routes |
| + | _register_wellknown_routes(app, ...) |
| + | |
| + | return app |
| + | ``` |
| + | |
| + | The route registration functions extract the existing route definitions from `auth_server.py` and `api_server.py` into callable functions that take a Flask app and shared dependencies as arguments. |
| + | |
| + | ### Removed |
| + | - `app/auth_server.py` — routes moved to platform_server.py |
| + | - `ansible/roles/deploy/templates/robot-auth.service.j2` — systemd service removed |
| + | - `ansible/roles/deploy/templates/gunicorn-auth.conf.py.j2` — Gunicorn config removed |
| + | |
| + | ### Modified |
| + | - `app/api_server.py` → renamed/merged into `platform_server.py` |
| + | - `ansible/roles/deploy/templates/Caddyfile.j2` — remove auth port, route `/auth/*` to port 8002 |
| + | - `ansible/roles/deploy/tasks/main.yml` — remove auth service deployment |
| + | - `app/auth/templates/` — move to `app/templates/auth/` |
| + | - `app/management/templates/` — move to `app/templates/management/` |
| + | |
| + | ### Unchanged |
| + | - `app/wsgi.py` — otterwiki entry point, completely independent |
| + | - `app/resolver.py` — TenantResolver wraps otterwiki, not the platform service |
| + | - `app/management/routes.py` — ManagementMiddleware still wraps the platform Flask app |
| + | - All auth logic — unchanged, just relocated |
| + | - All management logic — unchanged |
| + | - Database schema — unchanged |
| + | - MCP sidecar — unchanged |
| + | |
| + | ## ManagementMiddleware Handling |
| + | |
| + | Currently, `api_server.py` wraps the Flask app with ManagementMiddleware (a WSGI middleware that intercepts `/api/*` for rate limiting and auth). Auth routes don't go through this middleware. |
| + | |
| + | After consolidation, ManagementMiddleware still wraps the combined Flask app. It already passes through paths it doesn't handle — `/auth/*` routes will pass through to Flask unchanged. No middleware changes needed. |
| + | |
| + | Verify by reading ManagementMiddleware's `__call__` — it only intercepts paths matching its configured prefixes (`/api/`). All other paths pass to the wrapped app. |
| + | |
| + | ## Template Directory Structure |
| + | |
| + | Before: |
| + | ``` |
| + | app/auth/templates/ — base.html, login.html, consent.html, error.html |
| + | app/management/templates/ — layout.html, wiki_create.html, wiki_settings.html, account.html |
| + | ``` |
| + | |
| + | After: |
| + | ``` |
| + | app/templates/ |
| + | auth/ — base.html, login.html, consent.html, error.html |
| + | management/ — layout.html, wiki_create.html, wiki_settings.html, account.html |
| + | ``` |
| + | |
| + | Template references in route code change from `render_template("login.html")` to `render_template("auth/login.html")`. Mechanical find-and-replace. |
| + | |
| + | ## Database Connection Strategy |
| + | |
| + | Both apps currently use `get_connection()` which opens a new SQLite connection per call. The consolidated app continues this pattern — one connection per request via Flask's `g` object and `teardown_appcontext`. |
| + | |
| + | The auth_server pattern (`_get_db()` storing in `g._database`) is cleaner than api_server's approach (connection at startup). Adopt the per-request pattern throughout. |
| + | |
| + | ## Rate Limiting |
| + | |
| + | - Auth routes: Flask-Limiter with per-route decorators (`@limiter.limit("1/minute")`) |
| + | - Management API routes: WSGIRateLimiter singleton in ManagementMiddleware |
| + | |
| + | Both can coexist — Flask-Limiter operates at the Flask level, WSGIRateLimiter at the WSGI level. No conflict. |
| + | |
| + | ## Session and Cookie |
| + | |
| + | One Flask app = one `secret_key` = one session. The platform_token cookie is set with `domain=COOKIE_DOMAIN`, which is the same regardless of which routes set it. No changes needed. |
| + | |
| + | ## E2E Testing Impact |
| + | |
| + | The consolidation directly unblocks E2E testing: |
| + | - One server fixture instead of two |
| + | - Cookies work naturally (same origin) |
| + | - No SQLite cross-thread issues |
| + | - `authenticated_page` fixture just logs in and the cookie works for all routes |
| + | - The 11 planned E2E tests become straightforward |
| + | |
| + | ## Implementation Sequence |
| + | |
| + | 1. Create `app/platform_server.py` with combined app factory |
| + | 2. Move templates to `app/templates/{auth,management}/` |
| + | 3. Update `render_template()` calls with subdirectory prefixes |
| + | 4. Verify all existing unit tests pass against the new structure |
| + | 5. Update Ansible: remove auth service, update Caddy routes |
| + | 6. Deploy and verify |
| + | 7. Remove old `auth_server.py` and `api_server.py` |
| + | 8. Resume E2E test implementation with simplified fixtures |
| + | |
| + | ## Risks |
| + | |
| + | - **Merge complexity**: The two app factories have different initialization patterns. Reconciling them requires care but isn't architecturally novel. |
| + | - **Template path changes**: Every `render_template()` call needs updating. Mechanical but easy to miss one. |
| + | - **Existing unit tests**: Tests that import `auth_server.create_app()` or `api_server._create_flask_app()` need updating. Many tests — but the change is the same for each. |
| + | - **Deployment window**: The Ansible change removes one service and modifies Caddy. Brief downtime for auth routes during deploy. Mitigate by deploying the combined service first (both ports), then removing the old auth service. |
