Commit f139f8
2026-03-15 00:28:30 Claude (MCP): [mcp] Add VPS architecture design doc: ATProto auth, Caddy, SQLite, OVHcloud| /dev/null .. Design/VPS_Architecture.md | |
| @@ 0,0 1,602 @@ | |
| + | --- |
| + | category: reference |
| + | tags: [design, prd, architecture, atproto, vps] |
| + | last_updated: 2026-03-14 |
| + | confidence: medium |
| + | --- |
| + | |
| + | # VPS Architecture (ATProto + OVHcloud) |
| + | |
| + | **Status:** Draft — proposed alternative to the AWS serverless architecture |
| + | **Replaces (if adopted):** [[Design/Platform_Overview]], [[Design/Auth]], [[Design/Operations]] (infrastructure sections) |
| + | **Preserves:** ACL model, permission headers, MCP tools, Otterwiki multi-tenancy middleware, URL structure, semantic search logic, wiki bootstrap template, REST API surface, freemium tiers |
| + | |
| + | --- |
| + | |
| + | ## Why this exists |
| + | |
| + | The AWS serverless architecture described in [[Design/Platform_Overview]] works, but it optimizes for a problem we may not have yet: elastic scale and zero cost at rest. The tradeoff is complexity — VPC endpoints, Mangum adapters, DynamoDB Streams to avoid SQS endpoint costs, Lambda cold starts, EFS mount latency. All of that machinery exists to make Lambda work, not to make the wiki work. |
| + | |
| + | A VPS on an OVHcloud community server for ATProto apps eliminates the hosting bill entirely and replaces the AWS complexity with a conventional deployment: persistent processes, local disk, SQLite, Caddy. The application logic — multi-tenant Otterwiki, MCP tools, semantic search, ACL enforcement — ports over with minimal changes. The middleware we already built for Lambda is WSGI middleware with a Mangum wrapper; removing the wrapper gives us back the WSGI middleware. |
| + | |
| + | The ATProto identity system replaces WorkOS as the auth provider. Users sign in with their Bluesky handle (or any ATProto PDS account). Identity is a DID — portable, user-owned, and philosophically aligned with "your wiki is a git repo you can clone." The target audience (developers and researchers using AI agents) overlaps heavily with the ATProto early-adopter community, and the OVHcloud community server is specifically for ATProto apps. |
| + | |
| + | --- |
| + | |
| + | ## Infrastructure |
| + | |
| + | ### Server |
| + | |
| + | OVHcloud community VPS for ATProto applications. Shared infrastructure, zero cost. The VPS runs Linux with Docker or systemd-managed services. If we ever need to leave the community server, the deployment is portable to any VPS provider (Hetzner, DigitalOcean, Fly.io, or back to AWS on an EC2 instance) — nothing is OVHcloud-specific. |
| + | |
| + | ### Process model |
| + | |
| + | Four persistent processes, managed by systemd or Docker Compose: |
| + | |
| + | ``` |
| + | ┌─────────────────────────────────────────────────────────────────┐ |
| + | │ Caddy (reverse proxy, TLS) │ |
| + | │ *.{domain} + {domain} │ |
| + | │ │ |
| + | │ Routes: │ |
| + | │ {slug}.{domain}/mcp → MCP sidecar (port 8001) │ |
| + | │ {slug}.{domain}/api/v1/* → REST API (port 8002) │ |
| + | │ {slug}.{domain}/repo.git/* → Git smart HTTP (port 8002) │ |
| + | │ {slug}.{domain}/* → Otterwiki WSGI (port 8000) │ |
| + | │ {domain}/auth/* → Auth service (port 8003) │ |
| + | │ {domain}/api/* → Management API (port 8002) │ |
| + | │ {domain}/app/* → Static files (SPA) │ |
| + | │ {domain} → Static files (landing page) │ |
| + | └────────┬──────────┬──────────┬──────────┬───────────────────────┘ |
| + | │ │ │ │ |
| + | ┌────▼───┐ ┌────▼───┐ ┌───▼────┐ ┌──▼─────┐ |
| + | │Otterwiki│ │ MCP │ │Platform│ │ Auth │ |
| + | │ WSGI │ │sidecar │ │ API │ │service │ |
| + | │Gunicorn│ │FastMCP │ │ Flask │ │ Flask │ |
| + | │:8000 │ │:8001 │ │:8002 │ │:8003 │ |
| + | └────┬───┘ └───┬────┘ └───┬────┘ └───┬────┘ |
| + | │ │ │ │ |
| + | ┌────▼─────────▼──────────▼───────────▼───┐ |
| + | │ Shared resources │ |
| + | │ /srv/wikis/{slug}/repo.git (git) │ |
| + | │ /srv/wikis/{slug}/index.faiss (vectors) │ |
| + | │ /srv/data/wikibot.db (SQLite) │ |
| + | │ /srv/data/embeddings/ (model) │ |
| + | └─────────────────────────────────────────┘ |
| + | ``` |
| + | |
| + | ### Caddy |
| + | |
| + | Caddy handles TLS termination, automatic Let's Encrypt certificates (including wildcard via DNS challenge), and reverse proxy routing. It replaces API Gateway + CloudFront + ACM. |
| + | |
| + | Wildcard TLS requires a DNS challenge. Caddy supports this natively with plugins for common DNS providers (Cloudflare, Route 53, OVHcloud). The DNS zone for `{domain}` needs API credentials configured in Caddy. |
| + | |
| + | Caddy's routing is order-sensitive and matcher-based. The Caddyfile structure: |
| + | |
| + | ``` |
| + | {domain} { |
| + | handle /auth/* { |
| + | reverse_proxy localhost:8003 |
| + | } |
| + | handle /api/* { |
| + | reverse_proxy localhost:8002 |
| + | } |
| + | handle /app/* { |
| + | root * /srv/static/app |
| + | try_files {path} /app/index.html |
| + | file_server |
| + | } |
| + | handle { |
| + | root * /srv/static/landing |
| + | file_server |
| + | } |
| + | } |
| + | |
| + | *.{domain} { |
| + | @mcp path /mcp /mcp/* |
| + | handle @mcp { |
| + | reverse_proxy localhost:8001 |
| + | } |
| + | |
| + | @api path /api/v1/* |
| + | handle @api { |
| + | reverse_proxy localhost:8002 |
| + | } |
| + | |
| + | @git path /repo.git/* |
| + | handle @git { |
| + | reverse_proxy localhost:8002 |
| + | } |
| + | |
| + | handle { |
| + | reverse_proxy localhost:8000 |
| + | } |
| + | } |
| + | ``` |
| + | |
| + | The `{slug}` is extracted from the `Host` header by the downstream services, not by Caddy. Caddy just routes to the right backend; the backend resolves the tenant. |
| + | |
| + | ### Why not Nginx |
| + | |
| + | Caddy's automatic TLS (including wildcard via DNS challenge) eliminates certbot, cron renewal, and manual certificate management. For a single-operator deployment where the admin might not be around to fix a cert renewal failure, this matters. Nginx is more configurable but requires more maintenance. If we needed fine-grained caching rules or complex rewrite logic, Nginx would be worth the tradeoff. We don't. |
| + | |
| + | --- |
| + | |
| + | ## Authentication |
| + | |
| + | ### Identity model |
| + | |
| + | User identity is an ATProto DID (Decentralized Identifier). A DID is a persistent, portable identifier that survives handle changes and PDS migrations. When a user logs in, we resolve their handle to a DID and store the DID as the primary key. |
| + | |
| + | ``` |
| + | User { |
| + | did: string, // e.g. "did:plc:abc123..." — primary identifier |
| + | handle: string, // e.g. "sderle.bsky.social" — display name, may change |
| + | display_name: string, // from ATProto profile |
| + | avatar_url?: string, // from ATProto profile |
| + | username: string, // platform username, chosen at signup (URL slug) |
| + | created_at: ISO8601, |
| + | wiki_count: number, |
| + | } |
| + | ``` |
| + | |
| + | The `did` is the stable identity. The `handle` is refreshed from the PDS on each login (handles can change). The `username` is the platform-local slug used in URLs — it's chosen at signup and immutable for MVP, just like the current design. |
| + | |
| + | ### ATProto OAuth (browser login) |
| + | |
| + | Wikibot is an ATProto OAuth **confidential client**. The flow: |
| + | |
| + | ``` |
| + | 1. User enters their handle (e.g. "sderle.bsky.social") on the login page |
| + | 2. Wikibot resolves the handle to a DID, then resolves the DID to a PDS URL |
| + | 3. Wikibot fetches the PDS's Authorization Server metadata |
| + | (GET {pds}/.well-known/oauth-authorization-server) |
| + | 4. Wikibot sends a Pushed Authorization Request (PAR) to the PDS's AS, |
| + | including PKCE code_challenge and DPoP proof |
| + | 5. User is redirected to their PDS's authorization interface |
| + | 6. User approves the authorization request |
| + | 7. PDS redirects back to {domain}/auth/callback with an authorization code |
| + | 8. Wikibot exchanges the code for tokens (access_token + refresh_token) |
| + | with DPoP binding and client authentication (signed JWT) |
| + | 9. Wikibot uses the access token to fetch the user's profile (DID, handle, |
| + | display name) from their PDS |
| + | 10. Wikibot mints a platform JWT, sets it as an HttpOnly cookie on .{domain} |
| + | 11. Redirect to {domain}/app/ |
| + | ``` |
| + | |
| + | The platform JWT is signed with our own RS256 key (stored on disk, not in Secrets Manager). After step 10, the PDS is not in the runtime path — the platform JWT is self-contained and validated locally. ATProto tokens are stored in the session database for potential future use (e.g., posting to Bluesky on behalf of the user), but they're not needed for wiki operations. |
| + | |
| + | ### Reference implementation |
| + | |
| + | Bluesky maintains a Python Flask OAuth demo in `bluesky-social/cookbook/python-oauth-web-app` (CC-0 licensed). It implements the full ATProto OAuth flow as a confidential client using `authlib` for PKCE and DPoP, with `joserfc` for JWT/JWK handling. This is the starting point for our auth service. It handles the hard parts: handle-to-DID resolution, PDS Authorization Server discovery, PAR, DPoP nonce management, and token refresh. |
| + | |
| + | Key libraries from the reference implementation: |
| + | |
| + | - `authlib` — PKCE, code challenge, general OAuth utilities |
| + | - `joserfc` — JWK generation, JWT signing/verification, DPoP proof creation |
| + | - `requests` — HTTP client for PDS communication (the demo includes a hardened HTTP client with SSRF mitigations) |
| + | |
| + | ### MCP OAuth (Claude.ai) |
| + | |
| + | This is the most architecturally significant auth flow. Claude.ai's MCP client implements standard OAuth 2.1 with Dynamic Client Registration (DCR). It discovers the Authorization Server by fetching `/.well-known/oauth-protected-resource` from the MCP endpoint. The AS must support DCR, PKCE, and standard token endpoints. |
| + | |
| + | ATProto's OAuth profile is not directly compatible with this — ATProto uses per-user Authorization Servers (each user's PDS), whereas Claude.ai expects a single AS URL from the resource metadata endpoint. |
| + | |
| + | **Solution: wikibot runs its own OAuth 2.1 Authorization Server for MCP.** |
| + | |
| + | ``` |
| + | 1. Claude.ai connects to https://{slug}.{domain}/mcp |
| + | 2. Gets 401, fetches /.well-known/oauth-protected-resource |
| + | 3. Discovers wikibot's AS at https://{domain}/auth/oauth |
| + | 4. Performs Dynamic Client Registration at {domain}/auth/oauth/register |
| + | 5. Redirects user to {domain}/auth/oauth/authorize |
| + | 6. User sees wikibot's consent page: |
| + | - If already logged in (platform JWT cookie): "Authorize Claude to access {wiki}?" |
| + | - If not logged in: "Sign in with Bluesky" → ATProto OAuth flow → then consent |
| + | 7. User approves, wikibot issues authorization code |
| + | 8. Claude.ai exchanges code for access token at {domain}/auth/oauth/token |
| + | 9. Claude.ai uses access token to make MCP requests |
| + | 10. MCP sidecar validates token against wikibot's JWKS |
| + | ``` |
| + | |
| + | Wikibot's MCP OAuth AS is a thin layer. It delegates authentication to ATProto (step 6) and handles authorization itself (does this user have access to this wiki?). The token it issues is a JWT containing the user's DID and the authorized wiki slug, signed with our RS256 key. |
| + | |
| + | Required OAuth 2.1 AS endpoints: |
| + | |
| + | | Endpoint | Purpose | |
| + | |----------|---------| |
| + | | `/.well-known/oauth-authorization-server` | AS metadata (issuer, endpoints, supported grants) | |
| + | | `/auth/oauth/register` | Dynamic Client Registration (RFC 7591) | |
| + | | `/auth/oauth/authorize` | Authorization endpoint (consent page) | |
| + | | `/auth/oauth/token` | Token endpoint (code exchange, refresh) | |
| + | | `/.well-known/jwks.json` | Public key for token validation | |
| + | |
| + | These can be implemented with `authlib`'s server components or hand-rolled (the spec surface is small — DCR, authorization code grant with PKCE, token issuance, JWKS). |
| + | |
| + | ### MCP protected resource metadata |
| + | |
| + | Each wiki's MCP endpoint serves its own resource metadata: |
| + | |
| + | ```json |
| + | // GET https://{slug}.{domain}/.well-known/oauth-protected-resource |
| + | { |
| + | "resource": "https://{slug}.{domain}/mcp", |
| + | "authorization_servers": ["https://{domain}/auth/oauth"], |
| + | "scopes_supported": ["wiki:read", "wiki:write"] |
| + | } |
| + | ``` |
| + | |
| + | All wikis point to the same AS. The AS knows which wiki is being authorized because the `redirect_uri` and `resource` parameter identify the wiki. |
| + | |
| + | ### Bearer tokens (Claude Code / API) |
| + | |
| + | Unchanged from the current design. Each wiki gets a bearer token at creation time, stored as a bcrypt hash in the database. The user sees the token once. Claude Code usage: |
| + | |
| + | ```bash |
| + | claude mcp add {slug} \ |
| + | --transport http \ |
| + | --url https://{slug}.{domain}/mcp \ |
| + | --header "Authorization: Bearer YOUR_TOKEN" |
| + | ``` |
| + | |
| + | ### Cross-subdomain auth |
| + | |
| + | Same approach as [[Design/Frontend]]: platform JWT stored as an `HttpOnly`, `Secure`, `SameSite=Lax` cookie on `.{domain}`. Every request to any subdomain includes the cookie. The Otterwiki middleware and MCP sidecar both validate JWTs using the same public key. |
| + | |
| + | ### Auth convergence |
| + | |
| + | All three paths converge on the same identity and the same ACL check: |
| + | |
| + | ``` |
| + | Browser → ATProto OAuth → platform JWT (cookie) → resolve DID → ACL check |
| + | Claude.ai → MCP OAuth 2.1 → MCP access token (JWT) → resolve DID → ACL check |
| + | Claude Code → Bearer token → hash lookup in DB → resolve user → ACL check |
| + | |
| + | All paths → middleware → sets Otterwiki proxy headers (or authorizes MCP/API request) |
| + | ``` |
| + | |
| + | ### Migration off ATProto |
| + | |
| + | We store the DID as the primary user identifier, not the handle or PDS URL. If ATProto auth needs to be replaced, the migration path is: |
| + | |
| + | - Add alternative OAuth providers (Google, GitHub) alongside ATProto |
| + | - Link new provider identities to existing DIDs via an `identity_links` table |
| + | - Existing users continue to work; new users can sign up with either method |
| + | |
| + | This is simpler than the WorkOS migration path in the original design because we already own the JWT-issuing layer — we're not migrating off a third-party token issuer. |
| + | |
| + | --- |
| + | |
| + | ## Data Model |
| + | |
| + | ### SQLite replaces DynamoDB |
| + | |
| + | The dataset is small even at 1000 users. SQLite on local disk is simpler, faster, and free. The application layer uses SQLAlchemy (or raw `sqlite3` — the schema is simple enough). If the deployment ever needs Postgres, the migration is straightforward. |
| + | |
| + | The SQLite database lives at `/srv/data/wikibot.db`. Write concurrency is handled by SQLite's WAL mode, which supports concurrent reads with serialized writes. For a wiki platform where writes are infrequent relative to reads, this is more than adequate. |
| + | |
| + | ### Tables |
| + | |
| + | ```sql |
| + | CREATE TABLE users ( |
| + | did TEXT PRIMARY KEY, -- ATProto DID |
| + | handle TEXT NOT NULL, -- ATProto handle (may change) |
| + | display_name TEXT, |
| + | avatar_url TEXT, |
| + | username TEXT UNIQUE NOT NULL, -- platform slug, immutable |
| + | created_at TEXT NOT NULL, -- ISO8601 |
| + | wiki_count INTEGER DEFAULT 0 |
| + | ); |
| + | |
| + | CREATE TABLE wikis ( |
| + | slug TEXT PRIMARY KEY, -- globally unique, URL slug |
| + | owner_did TEXT NOT NULL REFERENCES users(did), |
| + | display_name TEXT NOT NULL, |
| + | repo_path TEXT NOT NULL, -- /srv/wikis/{slug}/repo.git |
| + | mcp_token_hash TEXT NOT NULL, -- bcrypt hash |
| + | is_public INTEGER DEFAULT 0, |
| + | is_paid INTEGER DEFAULT 0, |
| + | payment_status TEXT DEFAULT 'free', -- 'free' | 'active' | 'lapsed' |
| + | created_at TEXT NOT NULL, |
| + | last_accessed TEXT NOT NULL, |
| + | page_count INTEGER DEFAULT 0 |
| + | ); |
| + | |
| + | CREATE TABLE acls ( |
| + | wiki_slug TEXT NOT NULL REFERENCES wikis(slug), |
| + | grantee_did TEXT NOT NULL REFERENCES users(did), |
| + | role TEXT NOT NULL, -- 'owner' | 'editor' | 'viewer' |
| + | granted_by TEXT NOT NULL, |
| + | granted_at TEXT NOT NULL, |
| + | PRIMARY KEY (wiki_slug, grantee_did) |
| + | ); |
| + | |
| + | CREATE TABLE oauth_sessions ( |
| + | id TEXT PRIMARY KEY, -- session ID |
| + | user_did TEXT NOT NULL REFERENCES users(did), |
| + | dpop_private_jwk TEXT NOT NULL, -- DPoP key (encrypted at rest) |
| + | access_token TEXT, |
| + | refresh_token TEXT, |
| + | token_expires_at TEXT, |
| + | created_at TEXT NOT NULL |
| + | ); |
| + | |
| + | CREATE TABLE mcp_oauth_clients ( |
| + | client_id TEXT PRIMARY KEY, -- DCR-issued client ID |
| + | client_name TEXT, |
| + | redirect_uris TEXT NOT NULL, -- JSON array |
| + | client_secret_hash TEXT, -- for confidential clients |
| + | created_at TEXT NOT NULL |
| + | ); |
| + | |
| + | CREATE TABLE reindex_queue ( |
| + | wiki_slug TEXT NOT NULL, |
| + | page_path TEXT NOT NULL, |
| + | action TEXT NOT NULL, -- 'upsert' | 'delete' |
| + | queued_at TEXT NOT NULL, |
| + | PRIMARY KEY (wiki_slug, page_path) |
| + | ); |
| + | ``` |
| + | |
| + | ### Storage layout |
| + | |
| + | ``` |
| + | /srv/ |
| + | wikis/ |
| + | {slug}/ |
| + | repo.git/ # bare git repo |
| + | index.faiss # FAISS vector index |
| + | embeddings.json # page_path → vector mapping |
| + | data/ |
| + | wikibot.db # SQLite database |
| + | signing_key.pem # RS256 private key for JWT signing |
| + | signing_key.pub # RS256 public key |
| + | client_jwk.json # ATProto OAuth confidential client JWK (private) |
| + | client_jwk_pub.json # ATProto OAuth client JWK (public, served at client_id URL) |
| + | static/ |
| + | landing/ # landing page HTML/CSS/JS |
| + | app/ # management SPA |
| + | embeddings/ |
| + | model/ # all-MiniLM-L6-v2 model files |
| + | backups/ # local backup staging |
| + | ``` |
| + | |
| + | --- |
| + | |
| + | ## Compute |
| + | |
| + | ### Otterwiki (WSGI) |
| + | |
| + | Otterwiki runs as a persistent Gunicorn process. The multi-tenant middleware we built for Lambda ports back to WSGI by removing the Mangum wrapper. The middleware: |
| + | |
| + | 1. Extracts the wiki slug from the `Host` header |
| + | 2. Looks up the wiki in SQLite |
| + | 3. Resolves the user from the platform JWT (cookie) or bearer token |
| + | 4. Checks ACL permissions |
| + | 5. Sets Otterwiki proxy headers (`x-otterwiki-email`, `x-otterwiki-name`, `x-otterwiki-permissions`) |
| + | 6. Swaps Otterwiki's config to point at the correct repo path |
| + | 7. Delegates to Otterwiki's Flask app |
| + | |
| + | The config-swapping is the multi-tenancy mechanism we already built. In Lambda, it happened per-invocation; in WSGI, it happens per-request. The difference is negligible — the config is a handful of in-memory variables, not file I/O. |
| + | |
| + | Gunicorn runs with multiple workers (e.g., 4 workers for a small VPS). Each worker handles one request at a time. Git write operations are serialized per-repo by git's own lock file, same as on EFS. |
| + | |
| + | ### MCP sidecar (FastMCP) |
| + | |
| + | FastMCP runs as a separate process serving Streamable HTTP on port 8001. It reads git repos directly from `/srv/wikis/{slug}/repo.git` — same code as the current MCP server, same tools, same return formats. |
| + | |
| + | The sidecar validates MCP OAuth tokens (JWTs signed by our AS) and bearer tokens (bcrypt hash lookup in SQLite). Token validation is the same logic as the Otterwiki middleware, factored into a shared library. |
| + | |
| + | Why a separate process: Otterwiki is a Flask app designed around page rendering. The MCP server is an async protocol handler. Mixing them in one process would require either making Otterwiki async (large refactor) or running FastMCP synchronously (defeats the purpose). Separate processes, same database, same git repos. |
| + | |
| + | ### Platform API (Flask) |
| + | |
| + | A lightweight Flask app handling the management API (wiki CRUD, ACL management, token generation) and the Git smart HTTP protocol. This is the same API surface described in [[Design/Implementation_Phases]], with SQLite queries instead of DynamoDB calls. |
| + | |
| + | The Git smart HTTP endpoints (`/repo.git/info/refs`, `/repo.git/git-upload-pack`, `/repo.git/git-receive-pack`) use dulwich to serve the bare repos on disk. Free tier gets read-only (upload-pack only); premium gets read-write. |
| + | |
| + | ### Auth service (Flask) |
| + | |
| + | Handles both ATProto OAuth (browser login) and the MCP OAuth 2.1 AS. Runs as its own process because the OAuth flows involve redirects and state management that are cleaner in isolation. |
| + | |
| + | This could be merged into the platform API process. Separating it keeps the auth code (which is security-critical and relatively complex) isolated from the CRUD endpoints. If the separation proves to be operationally annoying, merge them — they're both Flask apps talking to the same SQLite database. |
| + | |
| + | --- |
| + | |
| + | ## Semantic Search |
| + | |
| + | The embedding pipeline simplifies dramatically on a VPS. No DynamoDB Streams, no event source mappings, no separate embedding Lambda. MiniLM loads once at process startup and stays in memory. |
| + | |
| + | ### Write path |
| + | |
| + | ``` |
| + | Page write (Otterwiki or MCP) |
| + | → Middleware writes {wiki_slug, page_path, action} to reindex_queue table in SQLite |
| + | → Background worker (in-process thread or separate process) polls the queue: |
| + | 1. Read page content from git repo on disk |
| + | 2. Chunk page |
| + | 3. Embed chunks using MiniLM (already loaded in memory) |
| + | 4. Update FAISS index on disk |
| + | 5. Delete queue entry |
| + | ``` |
| + | |
| + | The background worker can be a simple thread in the Otterwiki process (using Python's `threading` or `concurrent.futures`), a separate `huey` or `rq` worker, or even a cron job that runs every 30 seconds. The latency requirement is loose — research wikis are written by AI agents and searched minutes later. |
| + | |
| + | For simplicity, start with an in-process thread pool. If it causes issues (GIL contention under load, memory pressure from MiniLM in every Gunicorn worker), move to a dedicated worker process that loads MiniLM once and processes the queue. |
| + | |
| + | ### Search path |
| + | |
| + | Synchronous, handled by the MCP sidecar or REST API: |
| + | |
| + | 1. MiniLM is loaded at process startup (the MCP sidecar and API processes both load it) |
| + | 2. Embed the query |
| + | 3. Load FAISS index from disk (cached in memory after first load) |
| + | 4. Search, deduplicate, return results |
| + | |
| + | On a VPS, loading the FAISS index is a local disk read (<1ms for a typical wiki). No EFS mount latency, no Lambda cold start loading the model. |
| + | |
| + | ### Model loading strategy |
| + | |
| + | MiniLM (~80MB) loads in ~500ms. On a VPS with persistent processes, this happens once at startup. In the Lambda architecture, it happened on every cold start. This is one of the clearest wins of the VPS approach. |
| + | |
| + | If memory is tight on the shared VPS, only the MCP sidecar needs MiniLM loaded (it handles semantic search). The Otterwiki process and platform API don't need it — they just write to the reindex queue. |
| + | |
| + | --- |
| + | |
| + | ## Backup and Disaster Recovery |
| + | |
| + | ### What we're protecting |
| + | |
| + | | Data | Location | Severity of loss | |
| + | |------|----------|-----------------| |
| + | | Git repos | `/srv/wikis/*/repo.git` | **Critical** — user data | |
| + | | SQLite database | `/srv/data/wikibot.db` | **High** — reconstructable from repos but painful | |
| + | | FAISS indexes | `/srv/wikis/*/index.faiss` | **Low** — rebuildable from repo content | |
| + | | Signing keys | `/srv/data/*.pem`, `/srv/data/*.json` | **High** — loss invalidates all active sessions | |
| + | |
| + | ### Backup strategy |
| + | |
| + | **Git repos:** `rsync` to offsite storage (a second VPS, an S3 bucket, or a Backblaze B2 bucket). Daily, with a cron job. Repos are bare git — rsync handles them efficiently. Also: users can `git clone` their own repos at any time, which is distributed backup by design. |
| + | |
| + | **SQLite:** `.backup` command (online backup, doesn't block writes in WAL mode) to a local snapshot file, then rsync offsite with the git repos. Daily. |
| + | |
| + | **Signing keys:** Backed up once at creation time, stored separately from the data backups (e.g., in a password manager or encrypted at rest on a different system). These rarely change. |
| + | |
| + | **FAISS indexes:** Not backed up. Rebuildable from repo content. Loss triggers a one-time re-embedding — seconds per wiki. |
| + | |
| + | ### Recovery |
| + | |
| + | If the VPS dies completely, recovery is: |
| + | |
| + | 1. Provision a new VPS (any provider) |
| + | 2. Install dependencies, deploy application code |
| + | 3. Restore signing keys |
| + | 4. Restore SQLite database from backup |
| + | 5. Restore git repos from backup (or users re-push from their clones) |
| + | 6. Re-embed all wikis (automated script, runs in minutes) |
| + | 7. Update DNS to point to new VPS |
| + | |
| + | RTO: hours (mostly limited by repo restore transfer time). RPO: 24 hours (daily backup cycle). This is acceptable for a free/community service. If tighter RPO is needed, increase backup frequency or add streaming replication to a standby. |
| + | |
| + | --- |
| + | |
| + | ## Deployment |
| + | |
| + | ### Application deployment |
| + | |
| + | Code lives in a Git repo. Deployment is `git pull` + restart services. No Pulumi, no CloudFormation, no CI/CD pipeline required (though one can be added). |
| + | |
| + | ```bash |
| + | ssh vps |
| + | cd /srv/app |
| + | git pull |
| + | pip install -r requirements.txt --break-system-packages |
| + | sudo systemctl restart wikibot-otterwiki |
| + | sudo systemctl restart wikibot-mcp |
| + | sudo systemctl restart wikibot-api |
| + | sudo systemctl restart wikibot-auth |
| + | # Caddy doesn't need restart for app deploys |
| + | ``` |
| + | |
| + | Or with Docker Compose: |
| + | |
| + | ```bash |
| + | ssh vps |
| + | cd /srv/app |
| + | git pull |
| + | docker compose build |
| + | docker compose up -d |
| + | ``` |
| + | |
| + | ### Initial setup |
| + | |
| + | 1. Provision VPS, install OS packages (Python 3.11+, git, Caddy) |
| + | 2. Configure DNS: `{domain}` and `*.{domain}` pointing to VPS IP |
| + | 3. Configure Caddy with DNS challenge credentials for wildcard TLS |
| + | 4. Generate RS256 signing keypair |
| + | 5. Generate ATProto OAuth client JWK |
| + | 6. Publish client metadata at `https://{domain}/auth/client-metadata.json` |
| + | 7. Initialize SQLite database (run migration script) |
| + | 8. Download MiniLM model to `/srv/embeddings/model/` |
| + | 9. Start services |
| + | |
| + | ### Monitoring |
| + | |
| + | For a community-hosted service, keep monitoring simple: |
| + | |
| + | - **Health checks:** Each service exposes a `/health` endpoint. Caddy or an external monitor (UptimeRobot, free tier) pings them. |
| + | - **Logs:** systemd journal or Docker logs. No ELK stack, no CloudWatch. `journalctl -u wikibot-otterwiki --since "1 hour ago"` is sufficient at this scale. |
| + | - **Disk space:** A cron job that alerts (email or Bluesky DM) when disk usage exceeds 80%. |
| + | - **Backups:** The backup cron job logs success/failure. Alert on failure. |
| + | |
| + | If the service grows, add Prometheus + Grafana. Not before. |
| + | |
| + | --- |
| + | |
| + | ## What changes vs. what stays the same |
| + | |
| + | ### Stays the same |
| + | |
| + | - ACL model (owner/editor/viewer roles, same permission matrix) |
| + | - Otterwiki proxy header mechanism (`x-otterwiki-email`, `x-otterwiki-name`, `x-otterwiki-permissions`) |
| + | - Multi-tenant middleware logic (resolve slug → look up wiki → check ACL → set headers → delegate) |
| + | - MCP tools (read_note, write_note, search, semantic_search, list_notes, etc.) |
| + | - REST API surface (same endpoints, same request/response shapes) |
| + | - URL structure (`{slug}.{domain}/` for wikis, `{domain}/app/` for management) |
| + | - Wiki bootstrap template |
| + | - FAISS + MiniLM semantic search |
| + | - Freemium tier model and limits |
| + | - Lapse policy (read-only + MCP disabled) |
| + | - Git remote access (read-only free, read-write premium) |
| + | - Frontend SPA (same screens, same Svelte app, served by Caddy instead of CloudFront) |
| + | - Otterwiki admin panel disposition (same sections hidden/shown) |
| + | |
| + | ### Changes |
| + | |
| + | | Component | AWS architecture | VPS architecture | |
| + | |-----------|-----------------|-----------------| |
| + | | Hosting | Lambda + EFS + API Gateway | Gunicorn + local disk + Caddy | |
| + | | Database | DynamoDB (on-demand) | SQLite (WAL mode) | |
| + | | Auth provider | WorkOS AuthKit | ATProto OAuth (self-hosted) | |
| + | | MCP OAuth AS | WorkOS (standalone connect) | Self-hosted OAuth 2.1 AS | |
| + | | Identity | OAuth provider sub (Google/GitHub/etc.) | ATProto DID | |
| + | | TLS | ACM + CloudFront | Caddy + Let's Encrypt | |
| + | | Embedding trigger | DynamoDB Streams → Lambda | SQLite queue → background worker | |
| + | | Static hosting | S3 + CloudFront | Caddy file_server | |
| + | | IaC | Pulumi | systemd units or Docker Compose | |
| + | | Secrets | Secrets Manager (Phase 4) | Files on disk (encrypted at rest via LUKS or similar) | |
| + | | Backups | AWS Backup + DynamoDB PITR | rsync + SQLite .backup | |
| + | | Cost at rest | ~$0.50/mo (Phase 0–3), ~$13–18/mo (Phase 4+) | $0 (community server) | |
| + | | Cost at 1K users | ~$15–20/mo | $0 (community server) | |
| + | |
| + | ### What can be reused from existing implementation |
| + | |
| + | - **Multi-tenant middleware** — remove Mangum wrapper, the WSGI middleware is underneath |
| + | - **MCP server tools** — identical, just change the repo path prefix |
| + | - **REST API handlers** — swap DynamoDB calls for SQLite queries |
| + | - **Otterwiki fork** — identical, same proxy header auth mode |
| + | - **Semantic search plugin** — identical |
| + | - **FAISS indexing code** — identical |
| + | - **Frontend SPA** — identical (change `VITE_API_BASE_URL`, remove WorkOS client ID) |
| + | - **Wiki bootstrap template** — identical |
| + | - **ACL checking logic** — swap DynamoDB reads for SQLite reads |
| + | |
| + | --- |
| + | |
| + | ## Open Questions |
| + | |
| + | 1. **ATProto Python OAuth library maturity.** The Bluesky Flask demo uses `authlib` + `joserfc` and is CC-0 licensed. It's a reference implementation, not a maintained library. We'd be copying and adapting it, not importing a package. Is the DPoP/PAR implementation battle-tested enough, or do we need to audit it carefully? |
| + | |
| + | 2. **MCP OAuth AS scope.** Building a spec-compliant OAuth 2.1 AS (with DCR, PKCE, token refresh, JWKS) is a meaningful amount of work. `authlib` has server-side components that can handle some of this. How much can we lean on `authlib` vs. hand-rolling? The Bluesky Flask demo is client-side only. |
| + | |
| + | 3. **Shared VPS resource constraints.** A community server has finite RAM and CPU. MiniLM (~80MB in memory per process that loads it), Gunicorn workers, FAISS indexes, and SQLite all compete for resources. What are the actual resource limits on the OVHcloud community server? This determines how many Gunicorn workers we can run and whether the embedding worker should be in-process or separate. |
| + | |
| + | 4. **Domain name.** The domain appears throughout the architecture (Caddy config, ATProto client metadata, JWT issuer, MCP resource metadata). What domain are we using? The ATProto client metadata URL IS the `client_id` in the protocol — it needs to be stable. Changing the domain later means re-registering the client and invalidating all active sessions. |
| + | |
| + | 5. **Caddy DNS challenge provider.** Wildcard TLS requires DNS API access. Which DNS provider hosts the zone, and does Caddy have a plugin for it? Cloudflare, Route 53, and OVHcloud are all supported. The DNS provider choice should be made before deployment. |
| + | |
| + | 6. **Account creation UX with ATProto.** When a new user arrives, they enter their Bluesky handle and go through the ATProto OAuth flow. When they come back, we need them to pick a platform username (for their wiki slug). The current design has username selection at signup — this still works, but the flow is: enter handle → authorize on PDS → pick username → create wiki. Is that smooth enough, or should we default the username to their handle (minus the `.bsky.social` suffix) and let them change it? |
| + | |
| + | 7. **Claude.ai MCP OAuth compatibility.** The self-hosted OAuth 2.1 AS approach should work — Claude.ai's MCP client follows standard OAuth 2.1 discovery. But the actual implementation needs testing against Claude.ai's specific client behavior (which headers it sends, how it handles token refresh, whether it supports DPoP). The GitHub issues around Claude.ai MCP OAuth suggest it can be finicky. Plan for a debugging cycle. |
| + | |
| + | 8. **ATProto scopes.** The ATProto OAuth spec has "transitional" scopes (`transition:generic`). We only need authentication (identity), not authorization to act on the user's PDS. Is there a read-only or identity-only scope, or do we request `transition:generic` and just not use the access token for anything beyond profile fetching? |