Properties

category: design
tags: [architecture, infrastructure, refactoring]
last_updated: 2026-03-19
confidence: high

Server Consolidation: Merge auth_server + api_server

Merge the auth service (port 8003) and management/API service (port 8002) into a single Flask app on port 8002. Otterwiki (port 8000) and MCP sidecar (port 8001) remain separate.

Motivation

auth_server and api_server are tightly coupled:

Same database (robot.db)
Same models (UserModel, WikiModel)
Same signing keys (RSA for JWT, EC for ATProto client)
Same cookie (platform_token on .robot.wtf)
Same user identity model (DID-based)

The separation creates concrete problems:

E2E testing: Cookies set on port 8003 aren't sent to port 8002. Standing up two Flask servers in test fixtures requires cross-thread SQLite coordination. An implementation agent burned its entire context trying to solve this.
Operational overhead: Two systemd services, two Gunicorn configs, two health checks for what is logically one "platform" service.
Code duplication: Both apps call _load_keys(), get_connection(), init_schema() independently. Both have their own rate limiting setup.

Current Architecture

Caddy (TLS, port 80/443)
├─ robot.wtf/auth/*          → robot-auth (Gunicorn, port 8003, auth_server.py)
├─ robot.wtf/app/*           → robot-api  (Gunicorn, port 8002, api_server.py)
├─ robot.wtf/api/*           → robot-api  (port 8002)
├─ {slug}.robot.wtf/mcp      → robot-mcp  (uvicorn, port 8001)
├─ {slug}.robot.wtf/api/v1/* → robot-otterwiki (Gunicorn, port 8000, wsgi.py)
└─ {slug}.robot.wtf/*        → robot-otterwiki (port 8000)

Four processes, three entry points (auth_server:application, api_server:application, wsgi:application), plus the MCP sidecar.

Target Architecture

Caddy (TLS, port 80/443)
├─ robot.wtf/auth/*          → robot-platform (Gunicorn, port 8002, platform_server.py)
├─ robot.wtf/app/*           → robot-platform (port 8002)
├─ robot.wtf/api/*           → robot-platform (port 8002)
├─ {slug}.robot.wtf/mcp      → robot-mcp  (uvicorn, port 8001)
├─ {slug}.robot.wtf/api/v1/* → robot-otterwiki (Gunicorn, port 8000, wsgi.py)
└─ {slug}.robot.wtf/*        → robot-otterwiki (port 8000)

Three processes, two entry points. Caddy routing unchanged (both /auth/* and /app/* already route to the same IP, just different ports — changing to same port is a one-line edit).

What Changes

New: `app/platform_server.py`

Single Flask app factory that combines auth and management routes:

def create_app(*, db_path=None, client_jwk_path=None, signing_key_path=None):
    app = Flask(__name__, template_folder="templates")
    # templates/auth/ — login.html, consent.html, error.html, base.html
    # templates/management/ — layout.html, wiki_create.html, etc.

    # Shared setup: secret key, keys, DB, models, rate limiter
    ...

    # Auth routes (/auth/*)
    _register_auth_routes(app, platform_jwt, client_secret_jwk, ...)

    # Management UI routes (/app/*)
    _register_management_ui_routes(app, platform_jwt, wiki_model, user_model, ...)

    # Management API routes (/api/*)
    _register_management_api_routes(app, wiki_model, user_model, ...)

    # Well-known routes
    _register_wellknown_routes(app, ...)

    return app

The route registration functions extract the existing route definitions from auth_server.py and api_server.py into callable functions that take a Flask app and shared dependencies as arguments.

Removed

app/auth_server.py — routes moved to platform_server.py
ansible/roles/deploy/templates/robot-auth.service.j2 — systemd service removed
ansible/roles/deploy/templates/gunicorn-auth.conf.py.j2 — Gunicorn config removed

Modified

app/api_server.py → renamed/merged into platform_server.py
ansible/roles/deploy/templates/Caddyfile.j2 — remove auth port, route /auth/* to port 8002
ansible/roles/deploy/tasks/main.yml — remove auth service deployment
app/auth/templates/ — move to app/templates/auth/
app/management/templates/ — move to app/templates/management/

Unchanged

app/wsgi.py — otterwiki entry point, completely independent
app/resolver.py — TenantResolver wraps otterwiki, not the platform service
app/management/routes.py — ManagementMiddleware still wraps the platform Flask app
All auth logic — unchanged, just relocated
All management logic — unchanged
Database schema — unchanged
MCP sidecar — unchanged

ManagementMiddleware Handling

Currently, api_server.py wraps the Flask app with ManagementMiddleware (a WSGI middleware that intercepts /api/* for rate limiting and auth). Auth routes don't go through this middleware.

After consolidation, ManagementMiddleware still wraps the combined Flask app. It already passes through paths it doesn't handle — /auth/* routes will pass through to Flask unchanged. No middleware changes needed.

Verify by reading ManagementMiddleware's __call__ — it only intercepts paths matching its configured prefixes (/api/). All other paths pass to the wrapped app.

Template Directory Structure

Before:

app/auth/templates/     — base.html, login.html, consent.html, error.html
app/management/templates/ — layout.html, wiki_create.html, wiki_settings.html, account.html

After:

app/templates/
  auth/        — base.html, login.html, consent.html, error.html
  management/  — layout.html, wiki_create.html, wiki_settings.html, account.html

Template references in route code change from render_template("login.html") to render_template("auth/login.html"). Mechanical find-and-replace.

Database Connection Strategy

Both apps currently use get_connection() which opens a new SQLite connection per call. The consolidated app continues this pattern — one connection per request via Flask's g object and teardown_appcontext.

The auth_server pattern (_get_db() storing in g._database) is cleaner than api_server's approach (connection at startup). Adopt the per-request pattern throughout.

Rate Limiting

Auth routes: Flask-Limiter with per-route decorators (@limiter.limit("1/minute"))
Management API routes: WSGIRateLimiter singleton in ManagementMiddleware

Both can coexist — Flask-Limiter operates at the Flask level, WSGIRateLimiter at the WSGI level. No conflict.

One Flask app = one secret_key = one session. The platform_token cookie is set with domain=COOKIE_DOMAIN, which is the same regardless of which routes set it. No changes needed.

E2E Testing Impact

The consolidation directly unblocks E2E testing:

One server fixture instead of two
Cookies work naturally (same origin)
No SQLite cross-thread issues
authenticated_page fixture just logs in and the cookie works for all routes
The 11 planned E2E tests become straightforward

Implementation Sequence

Create app/platform_server.py with combined app factory
Move templates to app/templates/{auth,management}/
Update render_template() calls with subdirectory prefixes
Verify all existing unit tests pass against the new structure
Update Ansible: remove auth service, update Caddy routes
Deploy and verify
Remove old auth_server.py and api_server.py
Resume E2E test implementation with simplified fixtures

Risks

Merge complexity: The two app factories have different initialization patterns. Reconciling them requires care but isn't architecturally novel.
Template path changes: Every render_template() call needs updating. Mechanical but easy to miss one.
Existing unit tests: Tests that import auth_server.create_app() or api_server._create_flask_app() need updating. Many tests — but the change is the same for each.
Deployment window: The Ansible change removes one service and modifies Caddy. Brief downtime for auth routes during deploy. Mitigate by deploying the combined service first (both ports), then removing the old auth service.