--- category: design tags: [architecture, infrastructure, refactoring] last_updated: 2026-03-19 confidence: high --- # Server Consolidation: Merge auth_server + api_server Merge the auth service (port 8003) and management/API service (port 8002) into a single Flask app on port 8002. Otterwiki (port 8000) and MCP sidecar (port 8001) remain separate. ## Motivation auth_server and api_server are tightly coupled: - Same database (robot.db) - Same models (UserModel, WikiModel) - Same signing keys (RSA for JWT, EC for ATProto client) - Same cookie (platform_token on .robot.wtf) - Same user identity model (DID-based) The separation creates concrete problems: - **E2E testing**: Cookies set on port 8003 aren't sent to port 8002. Standing up two Flask servers in test fixtures requires cross-thread SQLite coordination. An implementation agent burned its entire context trying to solve this. - **Operational overhead**: Two systemd services, two Gunicorn configs, two health checks for what is logically one "platform" service. - **Code duplication**: Both apps call `_load_keys()`, `get_connection()`, `init_schema()` independently. Both have their own rate limiting setup. ## Current Architecture ``` Caddy (TLS, port 80/443) ├─ robot.wtf/auth/* → robot-auth (Gunicorn, port 8003, auth_server.py) ├─ robot.wtf/app/* → robot-api (Gunicorn, port 8002, api_server.py) ├─ robot.wtf/api/* → robot-api (port 8002) ├─ {slug}.robot.wtf/mcp → robot-mcp (uvicorn, port 8001) ├─ {slug}.robot.wtf/api/v1/* → robot-otterwiki (Gunicorn, port 8000, wsgi.py) └─ {slug}.robot.wtf/* → robot-otterwiki (port 8000) ``` Four processes, three entry points (`auth_server:application`, `api_server:application`, `wsgi:application`), plus the MCP sidecar. ## Target Architecture ``` Caddy (TLS, port 80/443) ├─ robot.wtf/auth/* → robot-platform (Gunicorn, port 8002, platform_server.py) ├─ robot.wtf/app/* → robot-platform (port 8002) ├─ robot.wtf/api/* → robot-platform (port 8002) ├─ {slug}.robot.wtf/mcp → robot-mcp (uvicorn, port 8001) ├─ {slug}.robot.wtf/api/v1/* → robot-otterwiki (Gunicorn, port 8000, wsgi.py) └─ {slug}.robot.wtf/* → robot-otterwiki (port 8000) ``` Three processes, two entry points. Caddy routing unchanged (both `/auth/*` and `/app/*` already route to the same IP, just different ports — changing to same port is a one-line edit). ## What Changes ### New: `app/platform_server.py` Single Flask app factory that combines auth and management routes: ```python def create_app(*, db_path=None, client_jwk_path=None, signing_key_path=None): app = Flask(__name__, template_folder="templates") # templates/auth/ — login.html, consent.html, error.html, base.html # templates/management/ — layout.html, wiki_create.html, etc. # Shared setup: secret key, keys, DB, models, rate limiter ... # Auth routes (/auth/*) _register_auth_routes(app, platform_jwt, client_secret_jwk, ...) # Management UI routes (/app/*) _register_management_ui_routes(app, platform_jwt, wiki_model, user_model, ...) # Management API routes (/api/*) _register_management_api_routes(app, wiki_model, user_model, ...) # Well-known routes _register_wellknown_routes(app, ...) return app ``` The route registration functions extract the existing route definitions from `auth_server.py` and `api_server.py` into callable functions that take a Flask app and shared dependencies as arguments. ### Removed - `app/auth_server.py` — routes moved to platform_server.py - `ansible/roles/deploy/templates/robot-auth.service.j2` — systemd service removed - `ansible/roles/deploy/templates/gunicorn-auth.conf.py.j2` — Gunicorn config removed ### Modified - `app/api_server.py` → renamed/merged into `platform_server.py` - `ansible/roles/deploy/templates/Caddyfile.j2` — remove auth port, route `/auth/*` to port 8002 - `ansible/roles/deploy/tasks/main.yml` — remove auth service deployment - `app/auth/templates/` — move to `app/templates/auth/` - `app/management/templates/` — move to `app/templates/management/` ### Unchanged - `app/wsgi.py` — otterwiki entry point, completely independent - `app/resolver.py` — TenantResolver wraps otterwiki, not the platform service - `app/management/routes.py` — ManagementMiddleware still wraps the platform Flask app - All auth logic — unchanged, just relocated - All management logic — unchanged - Database schema — unchanged - MCP sidecar — unchanged ## ManagementMiddleware Handling Currently, `api_server.py` wraps the Flask app with ManagementMiddleware (a WSGI middleware that intercepts `/api/*` for rate limiting and auth). Auth routes don't go through this middleware. After consolidation, ManagementMiddleware still wraps the combined Flask app. It already passes through paths it doesn't handle — `/auth/*` routes will pass through to Flask unchanged. No middleware changes needed. Verify by reading ManagementMiddleware's `__call__` — it only intercepts paths matching its configured prefixes (`/api/`). All other paths pass to the wrapped app. ## Template Directory Structure Before: ``` app/auth/templates/ — base.html, login.html, consent.html, error.html app/management/templates/ — layout.html, wiki_create.html, wiki_settings.html, account.html ``` After: ``` app/templates/ auth/ — base.html, login.html, consent.html, error.html management/ — layout.html, wiki_create.html, wiki_settings.html, account.html ``` Template references in route code change from `render_template("login.html")` to `render_template("auth/login.html")`. Mechanical find-and-replace. ## Database Connection Strategy Both apps currently use `get_connection()` which opens a new SQLite connection per call. The consolidated app continues this pattern — one connection per request via Flask's `g` object and `teardown_appcontext`. The auth_server pattern (`_get_db()` storing in `g._database`) is cleaner than api_server's approach (connection at startup). Adopt the per-request pattern throughout. ## Rate Limiting - Auth routes: Flask-Limiter with per-route decorators (`@limiter.limit("1/minute")`) - Management API routes: WSGIRateLimiter singleton in ManagementMiddleware Both can coexist — Flask-Limiter operates at the Flask level, WSGIRateLimiter at the WSGI level. No conflict. ## Session and Cookie One Flask app = one `secret_key` = one session. The platform_token cookie is set with `domain=COOKIE_DOMAIN`, which is the same regardless of which routes set it. No changes needed. ## E2E Testing Impact The consolidation directly unblocks E2E testing: - One server fixture instead of two - Cookies work naturally (same origin) - No SQLite cross-thread issues - `authenticated_page` fixture just logs in and the cookie works for all routes - The 11 planned E2E tests become straightforward ## Implementation Sequence 1. Create `app/platform_server.py` with combined app factory 2. Move templates to `app/templates/{auth,management}/` 3. Update `render_template()` calls with subdirectory prefixes 4. Verify all existing unit tests pass against the new structure 5. Update Ansible: remove auth service, update Caddy routes 6. Deploy and verify 7. Remove old `auth_server.py` and `api_server.py` 8. Resume E2E test implementation with simplified fixtures ## Risks and Review Findings Plan reviewed 2026-03-19. Core approach confirmed sound. Key findings: ### Template migration (important) Not just `render_template()` calls — the `{% extends %}` directives inside templates must also update: - Auth templates: `{% extends 'base.html' %}` → `{% extends 'auth/base.html' %}` - Management templates: `{% extends "layout.html" %}` → `{% extends "management/layout.html" %}` 6 auth + 4 management template files affected. ### App factory interface (resolved) Single `create_app()` factory returns the fully-wrapped WSGI app. Module-level `application = create_app()` for Gunicorn (`app.platform_server:application`). - Auth server applies ProxyFix inside Flask; API server applies it outermost. Consolidated app: apply once, outermost (api_server pattern). - Tests call `create_app(db_path=..., ...)` with overrides as they do today. - Actual import callsites across tests: ~17 (not ~46 as initially estimated). 3 test files, straightforward find-and-replace. ### Error handlers (resolved) The 429 handler is a verbatim copy (content-negotiated JSON/HTML). The 400/500 handlers only exist in auth_server and render HTML via `error.html`. No conflict: API routes go through ManagementMiddleware (raw WSGI, handles its own errors), so Flask error handlers only fire for `/auth/*` and `/app/*` routes. Keep one set of handlers, move `error.html` to `app/templates/error.html`. ### Deployment strategy (important) Correct zero-downtime approach: 1. Deploy `robot-platform` on port 8002 (new service) 2. Keep `robot-auth` on 8003 running temporarily 3. Verify platform handles `/app/*` and `/api/*` 4. Update Caddy to route `/auth/*` to 8002 5. Stop and disable `robot-auth` ### Caddyfile management (resolved) The Caddyfile lives on proxy-1 and is managed separately from this repo. During cutover, the proxy-1 agent needs one change: route `robot.wtf/auth/*` to port 8002 instead of 8003. Timing: after `robot-platform` is verified on 8002, before `robot-auth` is stopped. ### Hardcoded service references (important) - `admin_stats()` hardcodes `["robot-otterwiki", "robot-mcp", "robot-api", "robot-auth"]` for systemctl/journalctl — must update - `smoke-test.sh` hardcodes port 8003 checks — must update - `healthcheck` role defaults reference port 8003 — must update ### Environment variables (important) `CLIENT_JWK_PATH` is only in `robot-auth.service` env. Must propagate to consolidated service. ### ProxyFix (minor) Apply once at outermost WSGI layer. Don't accidentally apply twice. ### Python packages (resolved) `app/auth/` and `app/management/` stay as-is. They represent distinct domains (auth infrastructure vs wiki lifecycle CRUD). The consolidation merges entry points, not packages. `platform_server.py` imports from both. ### Entanglement (resolved) Only 2 systemd service files and 3 test files reference `auth_server`/`api_server`. No other app code imports from them. Ansible handlers for service restarts need updating. Clean separation, low-risk rename. ### Confirmed correct - ManagementMiddleware passes through `/auth/*` (line 162-165) - Flask-Limiter + WSGIRateLimiter coexist without conflict - Static file serving has no conflicts
