Properties
category: design tags: [architecture, infrastructure, refactoring] last_updated: 2026-03-19 confidence: high
Server Consolidation: Merge auth_server + api_server
Merge the auth service (port 8003) and management/API service (port 8002) into a single Flask app on port 8002. Otterwiki (port 8000) and MCP sidecar (port 8001) remain separate.
Motivation
auth_server and api_server are tightly coupled:
- Same database (robot.db)
- Same models (UserModel, WikiModel)
- Same signing keys (RSA for JWT, EC for ATProto client)
- Same cookie (platform_token on .robot.wtf)
- Same user identity model (DID-based)
The separation creates concrete problems:
- E2E testing: Cookies set on port 8003 aren't sent to port 8002. Standing up two Flask servers in test fixtures requires cross-thread SQLite coordination. An implementation agent burned its entire context trying to solve this.
- Operational overhead: Two systemd services, two Gunicorn configs, two health checks for what is logically one "platform" service.
- Code duplication: Both apps call
_load_keys(),get_connection(),init_schema()independently. Both have their own rate limiting setup.
Current Architecture
Caddy (TLS, port 80/443)
├─ robot.wtf/auth/* → robot-auth (Gunicorn, port 8003, auth_server.py)
├─ robot.wtf/app/* → robot-api (Gunicorn, port 8002, api_server.py)
├─ robot.wtf/api/* → robot-api (port 8002)
├─ {slug}.robot.wtf/mcp → robot-mcp (uvicorn, port 8001)
├─ {slug}.robot.wtf/api/v1/* → robot-otterwiki (Gunicorn, port 8000, wsgi.py)
└─ {slug}.robot.wtf/* → robot-otterwiki (port 8000)
Four processes, three entry points (auth_server:application, api_server:application, wsgi:application), plus the MCP sidecar.
Target Architecture
Caddy (TLS, port 80/443)
├─ robot.wtf/auth/* → robot-platform (Gunicorn, port 8002, platform_server.py)
├─ robot.wtf/app/* → robot-platform (port 8002)
├─ robot.wtf/api/* → robot-platform (port 8002)
├─ {slug}.robot.wtf/mcp → robot-mcp (uvicorn, port 8001)
├─ {slug}.robot.wtf/api/v1/* → robot-otterwiki (Gunicorn, port 8000, wsgi.py)
└─ {slug}.robot.wtf/* → robot-otterwiki (port 8000)
Three processes, two entry points. Caddy routing unchanged (both /auth/* and /app/* already route to the same IP, just different ports — changing to same port is a one-line edit).
What Changes
New: app/platform_server.py
Single Flask app factory that combines auth and management routes:
def create_app(*, db_path=None, client_jwk_path=None, signing_key_path=None): app = Flask(__name__, template_folder="templates") # templates/auth/ — login.html, consent.html, error.html, base.html # templates/management/ — layout.html, wiki_create.html, etc. # Shared setup: secret key, keys, DB, models, rate limiter ... # Auth routes (/auth/*) _register_auth_routes(app, platform_jwt, client_secret_jwk, ...) # Management UI routes (/app/*) _register_management_ui_routes(app, platform_jwt, wiki_model, user_model, ...) # Management API routes (/api/*) _register_management_api_routes(app, wiki_model, user_model, ...) # Well-known routes _register_wellknown_routes(app, ...) return app
The route registration functions extract the existing route definitions from auth_server.py and api_server.py into callable functions that take a Flask app and shared dependencies as arguments.
Removed
app/auth_server.py— routes moved to platform_server.pyansible/roles/deploy/templates/robot-auth.service.j2— systemd service removedansible/roles/deploy/templates/gunicorn-auth.conf.py.j2— Gunicorn config removed
Modified
app/api_server.py→ renamed/merged intoplatform_server.pyansible/roles/deploy/templates/Caddyfile.j2— remove auth port, route/auth/*to port 8002ansible/roles/deploy/tasks/main.yml— remove auth service deploymentapp/auth/templates/— move toapp/templates/auth/app/management/templates/— move toapp/templates/management/
Unchanged
app/wsgi.py— otterwiki entry point, completely independentapp/resolver.py— TenantResolver wraps otterwiki, not the platform serviceapp/management/routes.py— ManagementMiddleware still wraps the platform Flask app- All auth logic — unchanged, just relocated
- All management logic — unchanged
- Database schema — unchanged
- MCP sidecar — unchanged
ManagementMiddleware Handling
Currently, api_server.py wraps the Flask app with ManagementMiddleware (a WSGI middleware that intercepts /api/* for rate limiting and auth). Auth routes don't go through this middleware.
After consolidation, ManagementMiddleware still wraps the combined Flask app. It already passes through paths it doesn't handle — /auth/* routes will pass through to Flask unchanged. No middleware changes needed.
Verify by reading ManagementMiddleware's __call__ — it only intercepts paths matching its configured prefixes (/api/). All other paths pass to the wrapped app.
Template Directory Structure
Before:
app/auth/templates/ — base.html, login.html, consent.html, error.html app/management/templates/ — layout.html, wiki_create.html, wiki_settings.html, account.html
After:
app/templates/ auth/ — base.html, login.html, consent.html, error.html management/ — layout.html, wiki_create.html, wiki_settings.html, account.html
Template references in route code change from render_template("login.html") to render_template("auth/login.html"). Mechanical find-and-replace.
Database Connection Strategy
Both apps currently use get_connection() which opens a new SQLite connection per call. The consolidated app continues this pattern — one connection per request via Flask's g object and teardown_appcontext.
The auth_server pattern (_get_db() storing in g._database) is cleaner than api_server's approach (connection at startup). Adopt the per-request pattern throughout.
Rate Limiting
- Auth routes: Flask-Limiter with per-route decorators (
@limiter.limit("1/minute")) - Management API routes: WSGIRateLimiter singleton in ManagementMiddleware
Both can coexist — Flask-Limiter operates at the Flask level, WSGIRateLimiter at the WSGI level. No conflict.
Session and Cookie
One Flask app = one secret_key = one session. The platform_token cookie is set with domain=COOKIE_DOMAIN, which is the same regardless of which routes set it. No changes needed.
E2E Testing Impact
The consolidation directly unblocks E2E testing:
- One server fixture instead of two
- Cookies work naturally (same origin)
- No SQLite cross-thread issues
authenticated_pagefixture just logs in and the cookie works for all routes- The 11 planned E2E tests become straightforward
Implementation Sequence
- Create
app/platform_server.pywith combined app factory - Move templates to
app/templates/{auth,management}/ - Update
render_template()calls with subdirectory prefixes - Verify all existing unit tests pass against the new structure
- Update Ansible: remove auth service, update Caddy routes
- Deploy and verify
- Remove old
auth_server.pyandapi_server.py - Resume E2E test implementation with simplified fixtures
Risks and Review Findings
Plan reviewed 2026-03-19. Core approach confirmed sound. Key findings:
Template migration (important)
Not just render_template() calls — the {% extends %} directives inside templates must also update:
- Auth templates:
{% extends 'base.html' %}→{% extends 'auth/base.html' %} - Management templates:
{% extends "layout.html" %}→{% extends "management/layout.html" %}
6 auth + 4 management template files affected.
App factory interface (resolved)
Single create_app() factory returns the fully-wrapped WSGI app. Module-level application = create_app() for Gunicorn (app.platform_server:application).
- Auth server applies ProxyFix inside Flask; API server applies it outermost. Consolidated app: apply once, outermost (api_server pattern).
- Tests call
create_app(db_path=..., ...)with overrides as they do today. - Actual import callsites across tests: ~17 (not ~46 as initially estimated). 3 test files, straightforward find-and-replace.
Error handlers (resolved)
The 429 handler is a verbatim copy (content-negotiated JSON/HTML). The 400/500 handlers only exist in auth_server and render HTML via error.html. No conflict: API routes go through ManagementMiddleware (raw WSGI, handles its own errors), so Flask error handlers only fire for /auth/* and /app/* routes. Keep one set of handlers, move error.html to app/templates/error.html.
Deployment strategy (important)
Correct zero-downtime approach:
- Deploy
robot-platformon port 8002 (new service) - Keep
robot-authon 8003 running temporarily - Verify platform handles
/app/*and/api/* - Update Caddy to route
/auth/*to 8002 - Stop and disable
robot-auth
Caddyfile management (resolved)
The Caddyfile lives on proxy-1 and is managed separately from this repo. During cutover, the proxy-1 agent needs one change: route robot.wtf/auth/* to port 8002 instead of 8003. Timing: after robot-platform is verified on 8002, before robot-auth is stopped.
Hardcoded service references (important)
admin_stats()hardcodes["robot-otterwiki", "robot-mcp", "robot-api", "robot-auth"]for systemctl/journalctl — must updatesmoke-test.shhardcodes port 8003 checks — must updatehealthcheckrole defaults reference port 8003 — must update
Environment variables (important)
CLIENT_JWK_PATH is only in robot-auth.service env. Must propagate to consolidated service.
ProxyFix (minor)
Apply once at outermost WSGI layer. Don't accidentally apply twice.
Python packages (resolved)
app/auth/ and app/management/ stay as-is. They represent distinct domains (auth infrastructure vs wiki lifecycle CRUD). The consolidation merges entry points, not packages. platform_server.py imports from both.
Entanglement (resolved)
Only 2 systemd service files and 3 test files reference auth_server/api_server. No other app code imports from them. Ansible handlers for service restarts need updating. Clean separation, low-risk rename.
Confirmed correct
- ManagementMiddleware passes through
/auth/*(line 162-165) - Flask-Limiter + WSGIRateLimiter coexist without conflict
- Static file serving has no conflicts
