Blame

752837 Claude (MCP) 2026-03-19 21:31:11
[mcp] Add server consolidation design document
1
---
2
category: design
3
tags: [architecture, infrastructure, refactoring]
4
last_updated: 2026-03-19
5
confidence: high
6
---
7
8
# Server Consolidation: Merge auth_server + api_server
9
10
Merge the auth service (port 8003) and management/API service (port 8002) into a single Flask app on port 8002. Otterwiki (port 8000) and MCP sidecar (port 8001) remain separate.
11
12
## Motivation
13
14
auth_server and api_server are tightly coupled:
15
- Same database (robot.db)
16
- Same models (UserModel, WikiModel)
17
- Same signing keys (RSA for JWT, EC for ATProto client)
18
- Same cookie (platform_token on .robot.wtf)
19
- Same user identity model (DID-based)
20
21
The separation creates concrete problems:
22
- **E2E testing**: Cookies set on port 8003 aren't sent to port 8002. Standing up two Flask servers in test fixtures requires cross-thread SQLite coordination. An implementation agent burned its entire context trying to solve this.
23
- **Operational overhead**: Two systemd services, two Gunicorn configs, two health checks for what is logically one "platform" service.
24
- **Code duplication**: Both apps call `_load_keys()`, `get_connection()`, `init_schema()` independently. Both have their own rate limiting setup.
25
26
## Current Architecture
27
28
```
29
Caddy (TLS, port 80/443)
30
├─ robot.wtf/auth/* → robot-auth (Gunicorn, port 8003, auth_server.py)
31
├─ robot.wtf/app/* → robot-api (Gunicorn, port 8002, api_server.py)
32
├─ robot.wtf/api/* → robot-api (port 8002)
33
├─ {slug}.robot.wtf/mcp → robot-mcp (uvicorn, port 8001)
34
├─ {slug}.robot.wtf/api/v1/* → robot-otterwiki (Gunicorn, port 8000, wsgi.py)
35
└─ {slug}.robot.wtf/* → robot-otterwiki (port 8000)
36
```
37
38
Four processes, three entry points (`auth_server:application`, `api_server:application`, `wsgi:application`), plus the MCP sidecar.
39
40
## Target Architecture
41
42
```
43
Caddy (TLS, port 80/443)
44
├─ robot.wtf/auth/* → robot-platform (Gunicorn, port 8002, platform_server.py)
45
├─ robot.wtf/app/* → robot-platform (port 8002)
46
├─ robot.wtf/api/* → robot-platform (port 8002)
47
├─ {slug}.robot.wtf/mcp → robot-mcp (uvicorn, port 8001)
48
├─ {slug}.robot.wtf/api/v1/* → robot-otterwiki (Gunicorn, port 8000, wsgi.py)
49
└─ {slug}.robot.wtf/* → robot-otterwiki (port 8000)
50
```
51
52
Three processes, two entry points. Caddy routing unchanged (both `/auth/*` and `/app/*` already route to the same IP, just different ports — changing to same port is a one-line edit).
53
54
## What Changes
55
56
### New: `app/platform_server.py`
57
58
Single Flask app factory that combines auth and management routes:
59
60
```python
61
def create_app(*, db_path=None, client_jwk_path=None, signing_key_path=None):
62
app = Flask(__name__, template_folder="templates")
63
# templates/auth/ — login.html, consent.html, error.html, base.html
64
# templates/management/ — layout.html, wiki_create.html, etc.
65
66
# Shared setup: secret key, keys, DB, models, rate limiter
67
...
68
69
# Auth routes (/auth/*)
70
_register_auth_routes(app, platform_jwt, client_secret_jwk, ...)
71
72
# Management UI routes (/app/*)
73
_register_management_ui_routes(app, platform_jwt, wiki_model, user_model, ...)
74
75
# Management API routes (/api/*)
76
_register_management_api_routes(app, wiki_model, user_model, ...)
77
78
# Well-known routes
79
_register_wellknown_routes(app, ...)
80
81
return app
82
```
83
84
The route registration functions extract the existing route definitions from `auth_server.py` and `api_server.py` into callable functions that take a Flask app and shared dependencies as arguments.
85
86
### Removed
87
- `app/auth_server.py` — routes moved to platform_server.py
88
- `ansible/roles/deploy/templates/robot-auth.service.j2` — systemd service removed
89
- `ansible/roles/deploy/templates/gunicorn-auth.conf.py.j2` — Gunicorn config removed
90
91
### Modified
92
- `app/api_server.py` → renamed/merged into `platform_server.py`
93
- `ansible/roles/deploy/templates/Caddyfile.j2` — remove auth port, route `/auth/*` to port 8002
94
- `ansible/roles/deploy/tasks/main.yml` — remove auth service deployment
95
- `app/auth/templates/` — move to `app/templates/auth/`
96
- `app/management/templates/` — move to `app/templates/management/`
97
98
### Unchanged
99
- `app/wsgi.py` — otterwiki entry point, completely independent
100
- `app/resolver.py` — TenantResolver wraps otterwiki, not the platform service
101
- `app/management/routes.py` — ManagementMiddleware still wraps the platform Flask app
102
- All auth logic — unchanged, just relocated
103
- All management logic — unchanged
104
- Database schema — unchanged
105
- MCP sidecar — unchanged
106
107
## ManagementMiddleware Handling
108
109
Currently, `api_server.py` wraps the Flask app with ManagementMiddleware (a WSGI middleware that intercepts `/api/*` for rate limiting and auth). Auth routes don't go through this middleware.
110
111
After consolidation, ManagementMiddleware still wraps the combined Flask app. It already passes through paths it doesn't handle — `/auth/*` routes will pass through to Flask unchanged. No middleware changes needed.
112
113
Verify by reading ManagementMiddleware's `__call__` — it only intercepts paths matching its configured prefixes (`/api/`). All other paths pass to the wrapped app.
114
115
## Template Directory Structure
116
117
Before:
118
```
119
app/auth/templates/ — base.html, login.html, consent.html, error.html
120
app/management/templates/ — layout.html, wiki_create.html, wiki_settings.html, account.html
121
```
122
123
After:
124
```
125
app/templates/
126
auth/ — base.html, login.html, consent.html, error.html
127
management/ — layout.html, wiki_create.html, wiki_settings.html, account.html
128
```
129
130
Template references in route code change from `render_template("login.html")` to `render_template("auth/login.html")`. Mechanical find-and-replace.
131
132
## Database Connection Strategy
133
134
Both apps currently use `get_connection()` which opens a new SQLite connection per call. The consolidated app continues this pattern — one connection per request via Flask's `g` object and `teardown_appcontext`.
135
136
The auth_server pattern (`_get_db()` storing in `g._database`) is cleaner than api_server's approach (connection at startup). Adopt the per-request pattern throughout.
137
138
## Rate Limiting
139
140
- Auth routes: Flask-Limiter with per-route decorators (`@limiter.limit("1/minute")`)
141
- Management API routes: WSGIRateLimiter singleton in ManagementMiddleware
142
143
Both can coexist — Flask-Limiter operates at the Flask level, WSGIRateLimiter at the WSGI level. No conflict.
144
145
## Session and Cookie
146
147
One Flask app = one `secret_key` = one session. The platform_token cookie is set with `domain=COOKIE_DOMAIN`, which is the same regardless of which routes set it. No changes needed.
148
149
## E2E Testing Impact
150
151
The consolidation directly unblocks E2E testing:
152
- One server fixture instead of two
153
- Cookies work naturally (same origin)
154
- No SQLite cross-thread issues
155
- `authenticated_page` fixture just logs in and the cookie works for all routes
156
- The 11 planned E2E tests become straightforward
157
158
## Implementation Sequence
159
160
1. Create `app/platform_server.py` with combined app factory
161
2. Move templates to `app/templates/{auth,management}/`
162
3. Update `render_template()` calls with subdirectory prefixes
163
4. Verify all existing unit tests pass against the new structure
164
5. Update Ansible: remove auth service, update Caddy routes
165
6. Deploy and verify
166
7. Remove old `auth_server.py` and `api_server.py`
167
8. Resume E2E test implementation with simplified fixtures
168
f681e1 Claude (MCP) 2026-03-19 21:34:15
[api] Edit: Design/Server_Consolidation
169
## Risks and Review Findings
752837 Claude (MCP) 2026-03-19 21:31:11
[mcp] Add server consolidation design document
170
f681e1 Claude (MCP) 2026-03-19 21:34:15
[api] Edit: Design/Server_Consolidation
171
Plan reviewed 2026-03-19. Core approach confirmed sound. Key findings:
172
173
### Template migration (important)
174
175
Not just `render_template()` calls — the `{% extends %}` directives inside templates must also update:
176
- Auth templates: `{% extends 'base.html' %}``{% extends 'auth/base.html' %}`
177
- Management templates: `{% extends "layout.html" %}``{% extends "management/layout.html" %}`
178
179
6 auth + 4 management template files affected.
180
f4f51f Claude (MCP) 2026-03-19 23:07:20
[mcp] Update Server_Consolidation: resolve app factory interface
181
### App factory interface (resolved)
f681e1 Claude (MCP) 2026-03-19 21:34:15
[api] Edit: Design/Server_Consolidation
182
f4f51f Claude (MCP) 2026-03-19 23:07:20
[mcp] Update Server_Consolidation: resolve app factory interface
183
Single `create_app()` factory returns the fully-wrapped WSGI app. Module-level `application = create_app()` for Gunicorn (`app.platform_server:application`).
f681e1 Claude (MCP) 2026-03-19 21:34:15
[api] Edit: Design/Server_Consolidation
184
f4f51f Claude (MCP) 2026-03-19 23:07:20
[mcp] Update Server_Consolidation: resolve app factory interface
185
- Auth server applies ProxyFix inside Flask; API server applies it outermost. Consolidated app: apply once, outermost (api_server pattern).
186
- Tests call `create_app(db_path=..., ...)` with overrides as they do today.
187
- Actual import callsites across tests: ~17 (not ~46 as initially estimated). 3 test files, straightforward find-and-replace.
f681e1 Claude (MCP) 2026-03-19 21:34:15
[api] Edit: Design/Server_Consolidation
188
057244 Claude (MCP) 2026-03-19 23:07:27
[mcp] Update Server_Consolidation: resolve error handler merging
189
### Error handlers (resolved)
f681e1 Claude (MCP) 2026-03-19 21:34:15
[api] Edit: Design/Server_Consolidation
190
057244 Claude (MCP) 2026-03-19 23:07:27
[mcp] Update Server_Consolidation: resolve error handler merging
191
The 429 handler is a verbatim copy (content-negotiated JSON/HTML). The 400/500 handlers only exist in auth_server and render HTML via `error.html`. No conflict: API routes go through ManagementMiddleware (raw WSGI, handles its own errors), so Flask error handlers only fire for `/auth/*` and `/app/*` routes. Keep one set of handlers, move `error.html` to `app/templates/error.html`.
f681e1 Claude (MCP) 2026-03-19 21:34:15
[api] Edit: Design/Server_Consolidation
192
193
### Deployment strategy (important)
194
195
Correct zero-downtime approach:
196
1. Deploy `robot-platform` on port 8002 (new service)
197
2. Keep `robot-auth` on 8003 running temporarily
198
3. Verify platform handles `/app/*` and `/api/*`
199
4. Update Caddy to route `/auth/*` to 8002
200
5. Stop and disable `robot-auth`
201
14d350 Claude (MCP) 2026-03-19 23:07:11
[mcp] Update Server_Consolidation: resolve Caddyfile location
202
### Caddyfile management (resolved)
f681e1 Claude (MCP) 2026-03-19 21:34:15
[api] Edit: Design/Server_Consolidation
203
14d350 Claude (MCP) 2026-03-19 23:07:11
[mcp] Update Server_Consolidation: resolve Caddyfile location
204
The Caddyfile lives on proxy-1 and is managed separately from this repo. During cutover, the proxy-1 agent needs one change: route `robot.wtf/auth/*` to port 8002 instead of 8003. Timing: after `robot-platform` is verified on 8002, before `robot-auth` is stopped.
f681e1 Claude (MCP) 2026-03-19 21:34:15
[api] Edit: Design/Server_Consolidation
205
206
### Hardcoded service references (important)
207
208
- `admin_stats()` hardcodes `["robot-otterwiki", "robot-mcp", "robot-api", "robot-auth"]` for systemctl/journalctl — must update
209
- `smoke-test.sh` hardcodes port 8003 checks — must update
210
- `healthcheck` role defaults reference port 8003 — must update
211
212
### Environment variables (important)
213
214
`CLIENT_JWK_PATH` is only in `robot-auth.service` env. Must propagate to consolidated service.
215
216
### ProxyFix (minor)
217
218
Apply once at outermost WSGI layer. Don't accidentally apply twice.
219
66a730 Claude (MCP) 2026-03-19 23:07:35
[mcp] Update Server_Consolidation: resolve package structure and entanglement
220
### Python packages (resolved)
221
222
`app/auth/` and `app/management/` stay as-is. They represent distinct domains (auth infrastructure vs wiki lifecycle CRUD). The consolidation merges entry points, not packages. `platform_server.py` imports from both.
223
224
### Entanglement (resolved)
225
226
Only 2 systemd service files and 3 test files reference `auth_server`/`api_server`. No other app code imports from them. Ansible handlers for service restarts need updating. Clean separation, low-risk rename.
227
f681e1 Claude (MCP) 2026-03-19 21:34:15
[api] Edit: Design/Server_Consolidation
228
### Confirmed correct
229
230
- ManagementMiddleware passes through `/auth/*` (line 162-165)
231
- Flask-Limiter + WSGIRateLimiter coexist without conflict
232
- Static file serving has no conflicts