---
category: design
tags: [architecture, infrastructure, refactoring]
last_updated: 2026-03-19
confidence: high
---

# Server Consolidation: Merge auth_server + api_server

Merge the auth service (port 8003) and management/API service (port 8002) into a single Flask app on port 8002. Otterwiki (port 8000) and MCP sidecar (port 8001) remain separate.

## Motivation

auth_server and api_server are tightly coupled:
- Same database (robot.db)
- Same models (UserModel, WikiModel)
- Same signing keys (RSA for JWT, EC for ATProto client)
- Same cookie (platform_token on .robot.wtf)
- Same user identity model (DID-based)

The separation creates concrete problems:
- **E2E testing**: Cookies set on port 8003 aren't sent to port 8002. Standing up two Flask servers in test fixtures requires cross-thread SQLite coordination. An implementation agent burned its entire context trying to solve this.
- **Operational overhead**: Two systemd services, two Gunicorn configs, two health checks for what is logically one "platform" service.
- **Code duplication**: Both apps call `_load_keys()`, `get_connection()`, `init_schema()` independently. Both have their own rate limiting setup.

## Current Architecture

```
Caddy (TLS, port 80/443)
├─ robot.wtf/auth/*          → robot-auth (Gunicorn, port 8003, auth_server.py)
├─ robot.wtf/app/*           → robot-api  (Gunicorn, port 8002, api_server.py)
├─ robot.wtf/api/*           → robot-api  (port 8002)
├─ {slug}.robot.wtf/mcp      → robot-mcp  (uvicorn, port 8001)
├─ {slug}.robot.wtf/api/v1/* → robot-otterwiki (Gunicorn, port 8000, wsgi.py)
└─ {slug}.robot.wtf/*        → robot-otterwiki (port 8000)
```

Four processes, three entry points (`auth_server:application`, `api_server:application`, `wsgi:application`), plus the MCP sidecar.

## Target Architecture

```
Caddy (TLS, port 80/443)
├─ robot.wtf/auth/*          → robot-platform (Gunicorn, port 8002, platform_server.py)
├─ robot.wtf/app/*           → robot-platform (port 8002)
├─ robot.wtf/api/*           → robot-platform (port 8002)
├─ {slug}.robot.wtf/mcp      → robot-mcp  (uvicorn, port 8001)
├─ {slug}.robot.wtf/api/v1/* → robot-otterwiki (Gunicorn, port 8000, wsgi.py)
└─ {slug}.robot.wtf/*        → robot-otterwiki (port 8000)
```

Three processes, two entry points. Caddy routing unchanged (both `/auth/*` and `/app/*` already route to the same IP, just different ports — changing to same port is a one-line edit).

## What Changes

### New: `app/platform_server.py`

Single Flask app factory that combines auth and management routes:

```python
def create_app(*, db_path=None, client_jwk_path=None, signing_key_path=None):
    app = Flask(__name__, template_folder="templates")
    # templates/auth/ — login.html, consent.html, error.html, base.html
    # templates/management/ — layout.html, wiki_create.html, etc.
    
    # Shared setup: secret key, keys, DB, models, rate limiter
    ...
    
    # Auth routes (/auth/*)
    _register_auth_routes(app, platform_jwt, client_secret_jwk, ...)
    
    # Management UI routes (/app/*)
    _register_management_ui_routes(app, platform_jwt, wiki_model, user_model, ...)
    
    # Management API routes (/api/*)
    _register_management_api_routes(app, wiki_model, user_model, ...)
    
    # Well-known routes
    _register_wellknown_routes(app, ...)
    
    return app
```

The route registration functions extract the existing route definitions from `auth_server.py` and `api_server.py` into callable functions that take a Flask app and shared dependencies as arguments.

### Removed
- `app/auth_server.py` — routes moved to platform_server.py
- `ansible/roles/deploy/templates/robot-auth.service.j2` — systemd service removed
- `ansible/roles/deploy/templates/gunicorn-auth.conf.py.j2` — Gunicorn config removed

### Modified
- `app/api_server.py` → renamed/merged into `platform_server.py`
- `ansible/roles/deploy/templates/Caddyfile.j2` — remove auth port, route `/auth/*` to port 8002
- `ansible/roles/deploy/tasks/main.yml` — remove auth service deployment
- `app/auth/templates/` — move to `app/templates/auth/`
- `app/management/templates/` — move to `app/templates/management/`

### Unchanged
- `app/wsgi.py` — otterwiki entry point, completely independent
- `app/resolver.py` — TenantResolver wraps otterwiki, not the platform service
- `app/management/routes.py` — ManagementMiddleware still wraps the platform Flask app
- All auth logic — unchanged, just relocated
- All management logic — unchanged
- Database schema — unchanged
- MCP sidecar — unchanged

## ManagementMiddleware Handling

Currently, `api_server.py` wraps the Flask app with ManagementMiddleware (a WSGI middleware that intercepts `/api/*` for rate limiting and auth). Auth routes don't go through this middleware.

After consolidation, ManagementMiddleware still wraps the combined Flask app. It already passes through paths it doesn't handle — `/auth/*` routes will pass through to Flask unchanged. No middleware changes needed.

Verify by reading ManagementMiddleware's `__call__` — it only intercepts paths matching its configured prefixes (`/api/`). All other paths pass to the wrapped app.

## Template Directory Structure

Before:
```
app/auth/templates/     — base.html, login.html, consent.html, error.html
app/management/templates/ — layout.html, wiki_create.html, wiki_settings.html, account.html
```

After:
```
app/templates/
  auth/        — base.html, login.html, consent.html, error.html
  management/  — layout.html, wiki_create.html, wiki_settings.html, account.html
```

Template references in route code change from `render_template("login.html")` to `render_template("auth/login.html")`. Mechanical find-and-replace.

## Database Connection Strategy

Both apps currently use `get_connection()` which opens a new SQLite connection per call. The consolidated app continues this pattern — one connection per request via Flask's `g` object and `teardown_appcontext`.

The auth_server pattern (`_get_db()` storing in `g._database`) is cleaner than api_server's approach (connection at startup). Adopt the per-request pattern throughout.

## Rate Limiting

- Auth routes: Flask-Limiter with per-route decorators (`@limiter.limit("1/minute")`)
- Management API routes: WSGIRateLimiter singleton in ManagementMiddleware

Both can coexist — Flask-Limiter operates at the Flask level, WSGIRateLimiter at the WSGI level. No conflict.

## Session and Cookie

One Flask app = one `secret_key` = one session. The platform_token cookie is set with `domain=COOKIE_DOMAIN`, which is the same regardless of which routes set it. No changes needed.

## E2E Testing Impact

The consolidation directly unblocks E2E testing:
- One server fixture instead of two
- Cookies work naturally (same origin)
- No SQLite cross-thread issues
- `authenticated_page` fixture just logs in and the cookie works for all routes
- The 11 planned E2E tests become straightforward

## Implementation Sequence

1. Create `app/platform_server.py` with combined app factory
2. Move templates to `app/templates/{auth,management}/`
3. Update `render_template()` calls with subdirectory prefixes
4. Verify all existing unit tests pass against the new structure
5. Update Ansible: remove auth service, update Caddy routes
6. Deploy and verify
7. Remove old `auth_server.py` and `api_server.py`
8. Resume E2E test implementation with simplified fixtures

## Risks and Review Findings

Plan reviewed 2026-03-19. Core approach confirmed sound. Key findings:

### Template migration (important)

Not just `render_template()` calls — the `{% extends %}` directives inside templates must also update:
- Auth templates: `{% extends 'base.html' %}``{% extends 'auth/base.html' %}`
- Management templates: `{% extends "layout.html" %}``{% extends "management/layout.html" %}`

6 auth + 4 management template files affected.

### App factory interface (resolved)

Single `create_app()` factory returns the fully-wrapped WSGI app. Module-level `application = create_app()` for Gunicorn (`app.platform_server:application`).

- Auth server applies ProxyFix inside Flask; API server applies it outermost. Consolidated app: apply once, outermost (api_server pattern).
- Tests call `create_app(db_path=..., ...)` with overrides as they do today.
- Actual import callsites across tests: ~17 (not ~46 as initially estimated). 3 test files, straightforward find-and-replace.

### Error handlers (resolved)

The 429 handler is a verbatim copy (content-negotiated JSON/HTML). The 400/500 handlers only exist in auth_server and render HTML via `error.html`. No conflict: API routes go through ManagementMiddleware (raw WSGI, handles its own errors), so Flask error handlers only fire for `/auth/*` and `/app/*` routes. Keep one set of handlers, move `error.html` to `app/templates/error.html`.

### Deployment strategy (important)

Correct zero-downtime approach:
1. Deploy `robot-platform` on port 8002 (new service)
2. Keep `robot-auth` on 8003 running temporarily
3. Verify platform handles `/app/*` and `/api/*`
4. Update Caddy to route `/auth/*` to 8002
5. Stop and disable `robot-auth`

### Caddyfile management (resolved)

The Caddyfile lives on proxy-1 and is managed separately from this repo. During cutover, the proxy-1 agent needs one change: route `robot.wtf/auth/*` to port 8002 instead of 8003. Timing: after `robot-platform` is verified on 8002, before `robot-auth` is stopped.

### Hardcoded service references (important)

- `admin_stats()` hardcodes `["robot-otterwiki", "robot-mcp", "robot-api", "robot-auth"]` for systemctl/journalctl — must update
- `smoke-test.sh` hardcodes port 8003 checks — must update
- `healthcheck` role defaults reference port 8003 — must update

### Environment variables (important)

`CLIENT_JWK_PATH` is only in `robot-auth.service` env. Must propagate to consolidated service.

### ProxyFix (minor)

Apply once at outermost WSGI layer. Don't accidentally apply twice.

### Python packages (resolved)

`app/auth/` and `app/management/` stay as-is. They represent distinct domains (auth infrastructure vs wiki lifecycle CRUD). The consolidation merges entry points, not packages. `platform_server.py` imports from both.

### Entanglement (resolved)

Only 2 systemd service files and 3 test files reference `auth_server`/`api_server`. No other app code imports from them. Ansible handlers for service restarts need updating. Clean separation, low-risk rename.

### Confirmed correct

- ManagementMiddleware passes through `/auth/*` (line 162-165)
- Flask-Limiter + WSGIRateLimiter coexist without conflict
- Static file serving has no conflicts
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9