Properties
category: reference tags: [design, prd, architecture, atproto, vps] last_updated: 2026-03-14 confidence: medium
VPS Architecture (ATProto + robot.wtf)
Status: Active — this is the current plan Supersedes: Design/Platform_Overview, Design/Auth, Design/Operations, Design/Data_Model (infrastructure and billing sections), Design/Implementation_Phases (phase structure and premium tiers) Preserves: ACL model, permission headers, MCP tools, Otterwiki multi-tenancy middleware, URL structure, semantic search logic, wiki bootstrap template, REST API surface
Why this exists
The AWS serverless architecture described in Design/Platform_Overview works, but it optimizes for a problem we don't have: elastic scale and zero cost at rest. The tradeoff is complexity — VPC endpoints, Mangum adapters, DynamoDB Streams to avoid SQS endpoint costs, Lambda cold starts, EFS mount latency. All of that machinery exists to make Lambda work, not to make the wiki work.
robot.wtf is a free, volunteer-run wiki service for the ATProto community. No premium tier, no billing, no Stripe. The hosting is a Debian 12 VM on a Proxmox hypervisor with a static IP and generous RAM and disk. The deployment is conventional: persistent processes, local disk, SQLite, Caddy. The application logic — multi-tenant Otterwiki, MCP tools, semantic search, ACL enforcement — ports over from the Lambda implementation with minimal changes. The middleware we already built for Lambda is WSGI middleware with a Mangum wrapper; removing the wrapper gives us back the WSGI middleware.
The ATProto identity system replaces WorkOS as the auth provider. Users sign in with their Bluesky handle (or any ATProto PDS account). Identity is a DID — portable, user-owned, and philosophically aligned with "your wiki is a git repo you can clone." The target audience (developers and researchers using AI agents) overlaps heavily with the ATProto early-adopter community.
Service model
robot.wtf is a free tool, not a business. There is no premium tier and no billing infrastructure.
Every user gets:
- 1 wiki
- 500 pages
- 3 collaborators
- Full-text search + semantic search
- MCP access (Claude.ai OAuth + Claude Code bearer token)
- Read-only git clone
- Public wiki toggle
These are resource management limits, not a paywall. If someone needs more, they clone their repo and self-host — which is the whole point of git-backed storage.
If paid tiers ever make sense, the architecture supports them — the ACL model and schema have room for tier fields. But the billing infrastructure (Stripe, webhooks, lapse enforcement, upgrade/downgrade flows) doesn't get built until someone is actually asking to pay. That decision and all the commercial design work is preserved in the archived design docs (Design/Implementation_Phases, Design/Operations).
Infrastructure
Server
Debian 12 VM running on a Proxmox hypervisor. Static IP, generous RAM and disk allocation. If the VM ever needs to move, the deployment is portable to any Linux box (Hetzner, DigitalOcean, Fly.io, bare metal, or back to AWS on an EC2 instance) — nothing is host-specific.
Process model
Four persistent processes, managed by systemd or Docker Compose:
┌─────────────────────────────────────────────────────────────────┐
│ Caddy (reverse proxy, TLS) │
│ *.robot.wtf + robot.wtf │
│ │
│ Routes: │
│ {slug}.robot.wtf/mcp → MCP sidecar (port 8001) │
│ {slug}.robot.wtf/api/v1/* → REST API (port 8002) │
│ {slug}.robot.wtf/repo.git/* → Git smart HTTP (port 8002) │
│ {slug}.robot.wtf/* → Otterwiki WSGI (port 8000) │
│ robot.wtf/auth/* → Auth service (port 8003) │
│ robot.wtf/api/* → Management API (port 8002) │
│ robot.wtf/app/* → Static files (SPA) │
│ robot.wtf → Static files (landing page) │
└────────┬──────────┬──────────┬──────────┬───────────────────────┘
│ │ │ │
┌────▼───┐ ┌────▼───┐ ┌───▼────┐ ┌──▼─────┐
│Otterwiki│ │ MCP │ │Platform│ │ Auth │
│ WSGI │ │sidecar │ │ API │ │service │
│Gunicorn│ │FastMCP │ │ Flask │ │ Flask │
│:8000 │ │:8001 │ │:8002 │ │:8003 │
└────┬───┘ └───┬────┘ └───┬────┘ └───┬────┘
│ │ │ │
┌────▼─────────▼──────────▼───────────▼───┐
│ Shared resources │
│ /srv/wikis/{slug}/repo.git (git) │
│ /srv/wikis/{slug}/index.faiss (vectors) │
│ /srv/data/robot.db (SQLite) │
│ /srv/data/embeddings/ (model) │
└─────────────────────────────────────────┘
Caddy
Caddy handles TLS termination, automatic Let's Encrypt certificates (including wildcard via DNS challenge), and reverse proxy routing. It replaces API Gateway + CloudFront + ACM.
Wildcard TLS requires a DNS challenge. Caddy supports this natively with plugins for common DNS providers (Cloudflare, Route 53, OVHcloud). The DNS zone for robot.wtf needs API credentials configured in Caddy.
Caddy's routing is order-sensitive and matcher-based. The Caddyfile structure:
robot.wtf {
handle /auth/* {
reverse_proxy localhost:8003
}
handle /api/* {
reverse_proxy localhost:8002
}
handle /app/* {
root * /srv/static/app
try_files {path} /app/index.html
file_server
}
handle {
root * /srv/static/landing
file_server
}
}
*.robot.wtf {
@mcp path /mcp /mcp/*
handle @mcp {
reverse_proxy localhost:8001
}
@api path /api/v1/*
handle @api {
reverse_proxy localhost:8002
}
@git path /repo.git/*
handle @git {
reverse_proxy localhost:8002
}
handle {
reverse_proxy localhost:8000
}
}
The {slug} is extracted from the Host header by the downstream services, not by Caddy. Caddy just routes to the right backend; the backend resolves the tenant.
Why not Nginx
Caddy's automatic TLS (including wildcard via DNS challenge) eliminates certbot, cron renewal, and manual certificate management. For a single-operator deployment where the admin might not be around to fix a cert renewal failure, this matters. Nginx is more configurable but requires more maintenance. If we needed fine-grained caching rules or complex rewrite logic, Nginx would be worth the tradeoff. We don't.
Authentication
Identity model
User identity is an ATProto DID (Decentralized Identifier). A DID is a persistent, portable identifier that survives handle changes and PDS migrations. When a user logs in, we resolve their handle to a DID and store the DID as the primary key.
User {
did: string, // e.g. "did:plc:abc123..." — primary identifier
handle: string, // e.g. "sderle.bsky.social" — display name, may change
display_name: string, // from ATProto profile
avatar_url?: string, // from ATProto profile
username: string, // platform username (URL slug)
created_at: ISO8601,
wiki_count: number,
}
The did is the stable identity. The handle is refreshed from the PDS on each login (handles can change). The username is the platform-local slug used in URLs — immutable after signup.
Username defaulting
When a new user signs up, the platform username defaults to the local part of their ATProto handle. For a handle like sderle.bsky.social, the default is sderle. For a user with a custom domain handle like schuyler.robot.wtf, the default is schuyler (the domain prefix). The user can override this at signup if they want something different, but the default should be right most of the time.
Validation rules are unchanged from the original design: lowercase alphanumeric + hyphens, 3–30 characters, no leading/trailing hyphens, checked against the reserved name blocklist and existing usernames/wiki slugs.
ATProto OAuth (browser login)
robot.wtf is an ATProto OAuth confidential client. The flow:
1. User enters their handle (e.g. "sderle.bsky.social") on the login page
2. robot.wtf resolves the handle to a DID, then resolves the DID to a PDS URL
3. robot.wtf fetches the PDS's Authorization Server metadata
(GET {pds}/.well-known/oauth-authorization-server)
4. robot.wtf sends a Pushed Authorization Request (PAR) to the PDS's AS,
including PKCE code_challenge and DPoP proof
5. User is redirected to their PDS's authorization interface
6. User approves the authorization request
7. PDS redirects back to robot.wtf/auth/callback with an authorization code
8. robot.wtf exchanges the code for tokens (access_token + refresh_token)
with DPoP binding and client authentication (signed JWT)
9. robot.wtf uses the access token to fetch the user's profile (DID, handle,
display name) from their PDS
10. robot.wtf mints a platform JWT, sets it as an HttpOnly cookie on .robot.wtf
11. Redirect to robot.wtf/app/
The platform JWT is signed with our own RS256 key (stored on disk). After step 10, the PDS is not in the runtime path — the platform JWT is self-contained and validated locally. ATProto tokens are stored in the session database for potential future use (e.g., posting to Bluesky on behalf of the user), but they're not needed for wiki operations.
Reference implementation
Bluesky maintains a Python Flask OAuth demo in bluesky-social/cookbook/python-oauth-web-app (CC-0 licensed). It implements the full ATProto OAuth flow as a confidential client using authlib for PKCE and DPoP, with joserfc for JWT/JWK handling. This is the starting point for our auth service. It handles the hard parts: handle-to-DID resolution, PDS Authorization Server discovery, PAR, DPoP nonce management, and token refresh.
Key libraries from the reference implementation:
authlib— PKCE, code challenge, general OAuth utilitiesjoserfc— JWK generation, JWT signing/verification, DPoP proof creationrequests— HTTP client for PDS communication (the demo includes a hardened HTTP client with SSRF mitigations)
MCP OAuth (Claude.ai)
This is the most architecturally significant auth flow. Claude.ai's MCP client implements standard OAuth 2.1 with Dynamic Client Registration (DCR). It discovers the Authorization Server by fetching /.well-known/oauth-protected-resource from the MCP endpoint. The AS must support DCR, PKCE, and standard token endpoints.
ATProto's OAuth profile is not directly compatible with this — ATProto uses per-user Authorization Servers (each user's PDS), whereas Claude.ai expects a single AS URL from the resource metadata endpoint.
Solution: robot.wtf runs its own OAuth 2.1 Authorization Server for MCP.
1. Claude.ai connects to https://{slug}.robot.wtf/mcp
2. Gets 401, fetches /.well-known/oauth-protected-resource
3. Discovers robot.wtf's AS at https://robot.wtf/auth/oauth
4. Performs Dynamic Client Registration at robot.wtf/auth/oauth/register
5. Redirects user to robot.wtf/auth/oauth/authorize
6. User sees robot.wtf's consent page:
- If already logged in (platform JWT cookie): "Authorize Claude to access {wiki}?"
- If not logged in: "Sign in with Bluesky" → ATProto OAuth flow → then consent
7. User approves, robot.wtf issues authorization code
8. Claude.ai exchanges code for access token at robot.wtf/auth/oauth/token
9. Claude.ai uses access token to make MCP requests
10. MCP sidecar validates token against robot.wtf's JWKS
robot.wtf's MCP OAuth AS is a thin layer. It delegates authentication to ATProto (step 6) and handles authorization itself (does this user have access to this wiki?). The token it issues is a JWT containing the user's DID and the authorized wiki slug, signed with our RS256 key.
Required OAuth 2.1 AS endpoints:
| Endpoint | Purpose |
|---|---|
/.well-known/oauth-authorization-server |
AS metadata (issuer, endpoints, supported grants) |
/auth/oauth/register |
Dynamic Client Registration (RFC 7591) |
/auth/oauth/authorize |
Authorization endpoint (consent page) |
/auth/oauth/token |
Token endpoint (code exchange, refresh) |
/.well-known/jwks.json |
Public key for token validation |
These can be implemented with authlib's server components or hand-rolled (the spec surface is small — DCR, authorization code grant with PKCE, token issuance, JWKS).
MCP protected resource metadata
Each wiki's MCP endpoint serves its own resource metadata:
// GET https://{slug}.robot.wtf/.well-known/oauth-protected-resource { "resource": "https://{slug}.robot.wtf/mcp", "authorization_servers": ["https://robot.wtf/auth/oauth"], "scopes_supported": ["wiki:read", "wiki:write"] }
All wikis point to the same AS. The AS knows which wiki is being authorized because the redirect_uri and resource parameter identify the wiki.
Bearer tokens (Claude Code / API)
Unchanged from the current design. Each wiki gets a bearer token at creation time, stored as a bcrypt hash in the database. The user sees the token once. Claude Code usage:
claude mcp add {slug} \ --transport http \ --url https://{slug}.robot.wtf/mcp \ --header "Authorization: Bearer YOUR_TOKEN"
Cross-subdomain auth
Same approach as Design/Frontend: platform JWT stored as an HttpOnly, Secure, SameSite=Lax cookie on .robot.wtf. Every request to any subdomain includes the cookie. The Otterwiki middleware and MCP sidecar both validate JWTs using the same public key.
Auth convergence
All three paths converge on the same identity and the same ACL check:
Browser → ATProto OAuth → platform JWT (cookie) → resolve DID → ACL check Claude.ai → MCP OAuth 2.1 → MCP access token (JWT) → resolve DID → ACL check Claude Code → Bearer token → hash lookup in DB → resolve user → ACL check All paths → middleware → sets Otterwiki proxy headers (or authorizes MCP/API request)
Migration off ATProto
We store the DID as the primary user identifier, not the handle or PDS URL. If ATProto auth needs to be replaced, the migration path is:
- Add alternative OAuth providers (Google, GitHub) alongside ATProto
- Link new provider identities to existing DIDs via an
identity_linkstable - Existing users continue to work; new users can sign up with either method
This is simpler than the WorkOS migration path in the original design because we already own the JWT-issuing layer — we're not migrating off a third-party token issuer.
Data Model
SQLite replaces DynamoDB
The dataset is small even at 1000 users. SQLite on local disk is simpler, faster, and free. The application layer uses SQLAlchemy (or raw sqlite3 — the schema is simple enough). If the deployment ever needs Postgres, the migration is straightforward.
The SQLite database lives at /srv/data/robot.db. Write concurrency is handled by SQLite's WAL mode, which supports concurrent reads with serialized writes. For a wiki platform where writes are infrequent relative to reads, this is more than adequate.
Tables
CREATE TABLE users ( did TEXT PRIMARY KEY, -- ATProto DID handle TEXT NOT NULL, -- ATProto handle (may change) display_name TEXT, avatar_url TEXT, username TEXT UNIQUE NOT NULL, -- platform slug, immutable created_at TEXT NOT NULL, -- ISO8601 wiki_count INTEGER DEFAULT 0 ); CREATE TABLE wikis ( slug TEXT PRIMARY KEY, -- globally unique, URL slug owner_did TEXT NOT NULL REFERENCES users(did), display_name TEXT NOT NULL, repo_path TEXT NOT NULL, -- /srv/wikis/{slug}/repo.git mcp_token_hash TEXT NOT NULL, -- bcrypt hash is_public INTEGER DEFAULT 0, created_at TEXT NOT NULL, last_accessed TEXT NOT NULL, page_count INTEGER DEFAULT 0 ); CREATE TABLE acls ( wiki_slug TEXT NOT NULL REFERENCES wikis(slug), grantee_did TEXT NOT NULL REFERENCES users(did), role TEXT NOT NULL, -- 'owner' | 'editor' | 'viewer' granted_by TEXT NOT NULL, granted_at TEXT NOT NULL, PRIMARY KEY (wiki_slug, grantee_did) ); CREATE TABLE oauth_sessions ( id TEXT PRIMARY KEY, -- session ID user_did TEXT NOT NULL REFERENCES users(did), dpop_private_jwk TEXT NOT NULL, -- DPoP key (encrypted at rest) access_token TEXT, refresh_token TEXT, token_expires_at TEXT, created_at TEXT NOT NULL ); CREATE TABLE mcp_oauth_clients ( client_id TEXT PRIMARY KEY, -- DCR-issued client ID client_name TEXT, redirect_uris TEXT NOT NULL, -- JSON array client_secret_hash TEXT, -- for confidential clients created_at TEXT NOT NULL ); CREATE TABLE reindex_queue ( wiki_slug TEXT NOT NULL, page_path TEXT NOT NULL, action TEXT NOT NULL, -- 'upsert' | 'delete' queued_at TEXT NOT NULL, PRIMARY KEY (wiki_slug, page_path) );
Storage layout
/srv/
wikis/
{slug}/
repo.git/ # bare git repo
index.faiss # FAISS vector index
embeddings.json # page_path → vector mapping
data/
robot.db # SQLite database
signing_key.pem # RS256 private key for JWT signing
signing_key.pub # RS256 public key
client_jwk.json # ATProto OAuth confidential client JWK (private)
client_jwk_pub.json # ATProto OAuth client JWK (public, served at client_id URL)
static/
landing/ # landing page HTML/CSS/JS
app/ # management SPA
embeddings/
model/ # all-MiniLM-L6-v2 model files
backups/ # local backup staging
Compute
Otterwiki (WSGI)
Otterwiki runs as a persistent Gunicorn process. The multi-tenant middleware we built for Lambda ports back to WSGI by removing the Mangum wrapper. The middleware:
- Extracts the wiki slug from the
Hostheader - Looks up the wiki in SQLite
- Resolves the user from the platform JWT (cookie) or bearer token
- Checks ACL permissions
- Sets Otterwiki proxy headers (
x-otterwiki-email,x-otterwiki-name,x-otterwiki-permissions) - Swaps Otterwiki's config to point at the correct repo path
- Delegates to Otterwiki's Flask app
The config-swapping is the multi-tenancy mechanism we already built. In Lambda, it happened per-invocation; in WSGI, it happens per-request. The difference is negligible — the config is a handful of in-memory variables, not file I/O.
Gunicorn runs with multiple workers. The Proxmox VM has generous RAM, so worker count is limited by CPU cores, not memory. Git write operations are serialized per-repo by git's own lock file.
MCP sidecar (FastMCP)
FastMCP runs as a separate process serving Streamable HTTP on port 8001. It reads git repos directly from /srv/wikis/{slug}/repo.git — same code as the current MCP server, same tools, same return formats.
The sidecar validates MCP OAuth tokens (JWTs signed by our AS) and bearer tokens (bcrypt hash lookup in SQLite). Token validation is the same logic as the Otterwiki middleware, factored into a shared library.
Why a separate process: Otterwiki is a Flask app designed around page rendering. The MCP server is an async protocol handler. Mixing them in one process would require either making Otterwiki async (large refactor) or running FastMCP synchronously (defeats the purpose). Separate processes, same database, same git repos.
Platform API (Flask)
A lightweight Flask app handling the management API (wiki CRUD, ACL management, token generation) and the Git smart HTTP protocol. This is the same API surface described in the archived Design/Implementation_Phases, with SQLite queries instead of DynamoDB calls.
The Git smart HTTP endpoints (/repo.git/info/refs, /repo.git/git-upload-pack) use dulwich to serve the bare repos on disk. Read-only (upload-pack only) — users can clone and pull their wikis at any time.
Auth service (Flask)
Handles both ATProto OAuth (browser login) and the MCP OAuth 2.1 AS. Runs as its own process because the OAuth flows involve redirects and state management that are cleaner in isolation.
This could be merged into the platform API process. Separating it keeps the auth code (which is security-critical and relatively complex) isolated from the CRUD endpoints. If the separation proves to be operationally annoying, merge them — they're both Flask apps talking to the same SQLite database.
Semantic Search
The embedding pipeline simplifies dramatically on a VPS. No DynamoDB Streams, no event source mappings, no separate embedding Lambda. MiniLM loads once at process startup and stays in memory.
Write path
Page write (Otterwiki or MCP)
→ Middleware writes {wiki_slug, page_path, action} to reindex_queue table in SQLite
→ Background worker (in-process thread or separate process) polls the queue:
1. Read page content from git repo on disk
2. Chunk page
3. Embed chunks using MiniLM (already loaded in memory)
4. Update FAISS index on disk
5. Delete queue entry
The background worker can be a simple thread in the Otterwiki process (using Python's threading or concurrent.futures), a separate huey or rq worker, or even a cron job that runs every 30 seconds. The latency requirement is loose — research wikis are written by AI agents and searched minutes later.
For simplicity, start with an in-process thread pool. If it causes issues (GIL contention under load, memory pressure from MiniLM in every Gunicorn worker), move to a dedicated worker process that loads MiniLM once and processes the queue.
Search path
Synchronous, handled by the MCP sidecar or REST API:
- MiniLM is loaded at process startup (the MCP sidecar and API processes both load it)
- Embed the query
- Load FAISS index from disk (cached in memory after first load)
- Search, deduplicate, return results
On a VPS, loading the FAISS index is a local disk read (<1ms for a typical wiki). No EFS mount latency, no Lambda cold start loading the model.
Model loading strategy
MiniLM (~80MB) loads in ~500ms. On a VPS with persistent processes, this happens once at startup. In the Lambda architecture, it happened on every cold start. This is one of the clearest wins of the VPS approach.
The Proxmox VM has plenty of RAM, so loading MiniLM in both the MCP sidecar and a dedicated embedding worker is fine. The Otterwiki process and platform API don't need it — they just write to the reindex queue.
Backup and Disaster Recovery
What we're protecting
| Data | Location | Severity of loss |
|---|---|---|
| Git repos | /srv/wikis/*/repo.git |
Critical — user data |
| SQLite database | /srv/data/robot.db |
High — reconstructable from repos but painful |
| FAISS indexes | /srv/wikis/*/index.faiss |
Low — rebuildable from repo content |
| Signing keys | /srv/data/*.pem, /srv/data/*.json |
High — loss invalidates all active sessions |
Backup strategy
Git repos: rsync to offsite storage (a second VPS, an S3 bucket, or a Backblaze B2 bucket). Daily, with a cron job. Repos are bare git — rsync handles them efficiently. Also: users can git clone their own repos at any time, which is distributed backup by design.
SQLite: .backup command (online backup, doesn't block writes in WAL mode) to a local snapshot file, then rsync offsite with the git repos. Daily.
Signing keys: Backed up once at creation time, stored separately from the data backups (e.g., in a password manager or encrypted at rest on a different system). These rarely change.
FAISS indexes: Not backed up. Rebuildable from repo content. Loss triggers a one-time re-embedding — seconds per wiki.
Proxmox snapshots: The Proxmox hypervisor can take VM-level snapshots. These are a useful complement to application-level backups — a snapshot captures the entire VM state for rapid rollback after a bad deploy. Not a substitute for offsite backups (snapshots live on the same hardware).
Recovery
If the VM dies completely, recovery is:
- Provision a new VM (on Proxmox or any other host)
- Install Debian 12, install dependencies, deploy application code
- Restore signing keys
- Restore SQLite database from backup
- Restore git repos from backup (or users re-push from their clones)
- Re-embed all wikis (automated script, runs in minutes)
- Update DNS to point to new IP (if it changed)
RTO: hours (mostly limited by repo restore transfer time). RPO: 24 hours (daily backup cycle). This is acceptable for a free community service. If tighter RPO is needed, increase backup frequency or add streaming replication to a standby.
Deployment
Application deployment
Code lives in a Git repo. Deployment is git pull + restart services. No Pulumi, no CloudFormation, no CI/CD pipeline required (though one can be added).
ssh vm cd /srv/app git pull pip install -r requirements.txt --break-system-packages sudo systemctl restart robot-otterwiki sudo systemctl restart robot-mcp sudo systemctl restart robot-api sudo systemctl restart robot-auth # Caddy doesn't need restart for app deploys
Or with Docker Compose:
ssh vm cd /srv/app git pull docker compose build docker compose up -d
Initial setup
- Provision Debian 12 VM on Proxmox, assign static IP
- Install OS packages: Python 3.11+, git, build-essential
- Install Caddy (with DNS challenge plugin for the DNS provider)
- Configure DNS:
robot.wtfand*.robot.wtf→ VM's static IP - Configure Caddy with DNS challenge credentials for wildcard TLS
- Generate RS256 signing keypair (
/srv/data/signing_key.pem) - Generate ATProto OAuth client JWK (
/srv/data/client_jwk.json) - Publish client metadata at
https://robot.wtf/auth/client-metadata.json - Initialize SQLite database (run migration script)
- Download MiniLM model to
/srv/embeddings/model/ - Start services, verify health checks
Monitoring
For a volunteer-run service, keep monitoring simple:
- Health checks: Each service exposes a
/healthendpoint. An external monitor (UptimeRobot, free tier) pings them. - Logs: systemd journal or Docker logs.
journalctl -u robot-otterwiki --since "1 hour ago"is sufficient at this scale. - Disk space: A cron job that alerts (email or Bluesky DM) when disk usage exceeds 80%.
- Backups: The backup cron job logs success/failure. Alert on failure.
If the service grows, add Prometheus + Grafana. Not before.
URL Structure
Every wiki gets a subdomain: {slug}.robot.wtf. The slug is the wiki's globally unique identifier.
sderle.robot.wtf/ → wiki web UI (Otterwiki) sderle.robot.wtf/api/v1/ → wiki REST API sderle.robot.wtf/mcp → wiki MCP endpoint sderle.robot.wtf/repo.git/* → git smart HTTP (read-only clone)
For the single-wiki-per-user model, the wiki slug is the username. You sign up as sderle, your wiki lives at sderle.robot.wtf.
The management app and auth live on the root domain:
robot.wtf/ → landing page robot.wtf/app/ → management SPA (dashboard) robot.wtf/app/settings → wiki settings robot.wtf/app/collaborators → collaborator management robot.wtf/app/connect → MCP connection instructions robot.wtf/app/account → account settings robot.wtf/auth/* → OAuth flows (ATProto + MCP AS) robot.wtf/api/* → management API
Namespace rules
Slugs and usernames are the same thing (each user gets one wiki, the slug IS the username). Reserved names blocked for signup: api, auth, app, www, admin, mcp, docs, status, blog, help, support, static, assets, null, undefined, wiki, robot.
What changes vs. what stays the same
Stays the same
- ACL model (owner/editor/viewer roles, same permission matrix)
- Otterwiki proxy header mechanism (
x-otterwiki-email,x-otterwiki-name,x-otterwiki-permissions) - Multi-tenant middleware logic (resolve slug → look up wiki → check ACL → set headers → delegate)
- MCP tools (read_note, write_note, search, semantic_search, list_notes, etc.)
- REST API surface (same endpoints, same request/response shapes)
- Wiki bootstrap template
- FAISS + MiniLM semantic search
- Otterwiki admin panel disposition (same sections hidden/shown)
Changes
| Component | AWS architecture | VPS architecture |
|---|---|---|
| Domain | wikibot.io | robot.wtf |
| Business model | Freemium SaaS | Free volunteer project |
| Hosting | Lambda + EFS + API Gateway | Gunicorn + local disk + Caddy |
| Host environment | AWS (managed) | Debian 12 VM on Proxmox |
| Database | DynamoDB (on-demand) | SQLite (WAL mode) |
| Auth provider | WorkOS AuthKit | ATProto OAuth (self-hosted) |
| MCP OAuth AS | WorkOS (standalone connect) | Self-hosted OAuth 2.1 AS |
| Identity | OAuth provider sub (Google/GitHub/etc.) | ATProto DID |
| TLS | ACM + CloudFront | Caddy + Let's Encrypt |
| Embedding trigger | DynamoDB Streams → Lambda | SQLite queue → background worker |
| Static hosting | S3 + CloudFront | Caddy file_server |
| IaC | Pulumi | systemd units or Docker Compose |
| Secrets | Secrets Manager | Files on disk |
| Backups | AWS Backup + DynamoDB PITR | rsync + SQLite .backup + Proxmox snapshots |
| Billing | Stripe (planned) | None |
| Cost | ~$13–18/mo at launch | $0 |
What can be reused from existing implementation
- Multi-tenant middleware — remove Mangum wrapper, the WSGI middleware is underneath
- MCP server tools — identical, just change the repo path prefix
- REST API handlers — swap DynamoDB calls for SQLite queries
- Otterwiki fork — identical, same proxy header auth mode
- Semantic search plugin — identical
- FAISS indexing code — identical
- Frontend SPA — identical (change
VITE_API_BASE_URL, remove WorkOS client ID) - Wiki bootstrap template — identical
- ACL checking logic — swap DynamoDB reads for SQLite reads
Open Questions
ATProto Python OAuth library maturity. The Bluesky Flask demo uses
authlib+joserfcand is CC-0 licensed. It's a reference implementation, not a maintained library. We'd be copying and adapting it, not importing a package. Is the DPoP/PAR implementation battle-tested enough, or do we need to audit it carefully?MCP OAuth AS scope. Building a spec-compliant OAuth 2.1 AS (with DCR, PKCE, token refresh, JWKS) is a meaningful amount of work.
authlibhas server-side components that can handle some of this. How much can we lean onauthlibvs. hand-rolling? The Bluesky Flask demo is client-side only.Caddy DNS challenge provider. Wildcard TLS requires DNS API access. Which DNS provider hosts the robot.wtf zone? Cloudflare, Route 53, and OVHcloud are all supported by Caddy. The DNS provider choice should be made before deployment.
Claude.ai MCP OAuth compatibility. The self-hosted OAuth 2.1 AS approach should work — Claude.ai's MCP client follows standard OAuth 2.1 discovery. But the actual implementation needs testing against Claude.ai's specific client behavior (which headers it sends, how it handles token refresh, whether it supports DPoP). The GitHub issues around Claude.ai MCP OAuth suggest it can be finicky. Plan for a debugging cycle.
ATProto scopes. The ATProto OAuth spec has "transitional" scopes (
transition:generic). We only need authentication (identity), not authorization to act on the user's PDS. Is there a read-only or identity-only scope, or do we requesttransition:genericand just not use the access token for anything beyond profile fetching?Docker Compose vs. systemd. Both work. Docker Compose gives you reproducible builds, isolation, and easier migration between hosts. Systemd is lighter, native to Debian, and avoids Docker's overhead. For a Proxmox VM where we control the environment completely, systemd is probably sufficient. Docker adds value if we expect to move the deployment frequently.