VPS Phases

---
category: reference
tags: [design, tasks, phases, vps]
last_updated: 2026-03-15
confidence: medium
---

# VPS Implementation Phases

Implementation sequence for robot.wtf on the Debian 12 / Proxmox VM. Phase designations prefixed with **V** to distinguish from the archived AWS phases (P0–P4).

See [[Design/VPS_Architecture]] for the architecture these phases build toward. See [[Dev/V3_V5_Risk_Research]] for the auth risk assessment informing the spike phase.

---

## What we're starting with

Phases 1 and 2 of the AWS build produced working, tested components:

- **TenantResolver WSGI middleware** — resolves slug from Host header, authenticates (JWT or bearer token), checks ACL, sets Otterwiki proxy headers, swaps per-wiki config. Currently wraps DynamoDB calls. (P2-5b-7)
- **ManagementMiddleware** — wiki CRUD, ACL management, bearer token generation. WSGI middleware over DynamoDB. (P2-4)
- **Auth middleware** — platform JWT (RS256) validation and user resolution. (P2-2)
- **ACL enforcement** — role-to-permission mapping, public wiki handling. (P2-3)
- **Wiki bootstrap template** — parameterized starter pages. (P2-6)
- **Admin panel hiding** — PLATFORM_MODE flag hides conflicting Otterwiki admin sections. (P2-8)
- **MCP server** — 12 tools, Streamable HTTP, bearer token auth. Working on Lambda.
- **REST API plugin** — full CRUD, search, semantic search, history, links.
- **Semantic search** — FAISS + MiniLM, chunking, embedding, deduplication.
- **Otterwiki fork** — PROXY_HEADER auth mode, all plugins installed.
- **E2E test suite** — 17 tests covering cross-path (API↔MCP) operations.

The port is primarily: DynamoDB → SQLite, EFS paths → local disk paths, Mangum wrapper → Gunicorn, WorkOS → ATProto OAuth.

---

## V0: VM Infrastructure — COMPLETE

**Status:** Complete as of 2026-03-15. Ansible-provisioned VM is live, Caddy serving valid TLS on bare domain and wildcard subdomains.

**Goal:** Ansible-provisioned Debian 12 VM with Caddy serving valid TLS on bare domain and wildcard.

**Tasks:**

- V0-1: Ansible playbook for base VM provisioning (Python 3.11+, git, build-essential, common tools)
- V0-2: Install Caddy with DNS challenge plugin (Cloudflare, Route 53, or OVHcloud — depends on where robot.wtf zone is hosted)
- V0-3: Configure DNS: `robot.wtf` A record + `*.robot.wtf` A record → VM static IP
- V0-4: Caddyfile with wildcard TLS via DNS challenge. Bare domain serves a static placeholder page. Wildcard subdomains serve a placeholder.
- V0-5: Generate RS256 signing keypair for platform JWTs (`/srv/data/signing_key.pem`, `/srv/data/signing_key.pub`)
- V0-6: Generate ATProto OAuth confidential client JWK (`/srv/data/client_jwk.json`)
- V0-7: Create `/srv` directory structure per [[Design/VPS_Architecture]] storage layout
- V0-8: Initialize SQLite database with schema from [[Design/VPS_Architecture]]

**Exit criteria:**
- `https://robot.wtf` serves HTTPS with valid cert
- `https://anything.robot.wtf` serves HTTPS with valid wildcard cert
- Ansible playbook is idempotent (can re-run without breaking things)
- SQLite database exists with empty tables

---

## VS: Auth Spikes

**Goal:** Validate both auth flows empirically before building the real thing. Kill unknowns early.

These are throwaway prototypes, not production code. The point is to prove the protocol flows work against real counterparts (bsky.social PDS, Claude.ai MCP client) so that V3 and V5 are pure implementation, not discovery.

### VS-1: ATProto OAuth Spike — COMPLETE

**Status:** Complete as of 2026-03-15. Full ATProto OAuth flow validated end-to-end on robot.wtf.

Deploy the Bluesky cookbook Flask demo (`bluesky-social/cookbook/python-oauth-web-app`) directly on the VM behind Caddy. Minimal adaptation — just enough to run it at `robot.wtf/auth/` with the robot.wtf client JWK.

**Tasks:**

- VS-1a: Clone the cookbook demo, install deps, configure Caddy to proxy `robot.wtf/auth/*` to the demo on a local port.
- VS-1b: Publish client metadata at `https://robot.wtf/auth/client-metadata.json` with scope `"atproto"` (identity-only) and redirect URI `https://robot.wtf/auth/oauth/callback`.
- VS-1c: Log in with a real Bluesky account. Walk through the full flow: handle entry → PDS redirect → approve → callback → token → DID.
- VS-1d: Test with a non-bsky.social PDS if one is accessible (e.g., a self-hosted PDS on the ATProto community server). This validates that the flow isn't accidentally Bluesky-specific.
- VS-1e: Document what worked, what didn't, any PDS quirks encountered.

**Exit criteria:**
- A real Bluesky login completes end-to-end on robot.wtf
- The DID and handle are retrieved from the token response
- DPoP nonce handling works without manual intervention
- Findings documented in a Dev summary page

**Expected time:** Half a day. The demo is ready to run.

### VS-2: MCP OAuth AS Stub — COMPLETE

**Status:** Complete as of 2026-03-15. Persistent SQLite OAuth provider deployed; Claude.ai OAuth flow validated end-to-end.

Build a minimal, hard-coded OAuth 2.1 AS that implements the five endpoints Claude.ai needs. No real auth, no database, no ATProto — just the protocol surface with canned responses. Deploy behind Caddy and point Claude.ai at it.

**Tasks:**

- VS-2a: Implement the stub AS as a single Flask file (~150 lines):
    - `GET /.well-known/oauth-authorization-server` → static JSON metadata
    - `POST /auth/oauth/register` → accept any DCR request, return a `client_id` + `client_secret`
    - `GET /auth/oauth/authorize` → auto-approve (no consent UI), redirect with auth code
    - `POST /auth/oauth/token` → exchange code for a JWT access token (hard-coded claims)
    - `GET /.well-known/jwks.json` → serve the RS256 public key
- VS-2b: Implement the protected resource metadata on a wiki subdomain:
    - `GET https://{slug}.robot.wtf/.well-known/oauth-protected-resource` → point to robot.wtf AS
- VS-2c: Deploy a minimal MCP endpoint (FastMCP with one dummy tool, e.g., `echo`) behind the protected resource metadata. Wire it to return 401 with `WWW-Authenticate` header on unauthenticated requests, and accept the stub JWT on authenticated requests.
- VS-2d: Add the MCP URL in Claude.ai settings. Walk through the flow: Claude.ai discovers AS → registers → redirects to authorize → gets token → calls the echo tool.
- VS-2e: If it fails, inspect what Claude.ai actually sent (log all requests) and iterate. Document the exact request shapes, headers, and timing.
- VS-2f: Test the same flow with Claude Code (`claude mcp add --transport http`) to verify OAuth discovery works there too.

**Exit criteria:**
- Claude.ai completes the MCP OAuth flow against the stub and successfully calls a tool
- OR: documented exactly where/why Claude.ai's client diverges from the spec, with a plan to accommodate
- Claude Code OAuth discovery tested
- Findings documented in a Dev summary page

**Expected time:** 1–2 days, mostly debugging Claude.ai's behavior.

### Why spike before building

The research ([[Dev/V3_V5_Risk_Research]]) assessed V3 as low risk and V5 as medium risk, but both assessments are based on documentation and code reading, not empirical testing. The spikes turn "should work" into "does work" for a cost of ~2 days, before committing to the full implementation in V3 and V5. If either spike reveals a fundamental incompatibility, we find out before building the real system, not after.

---

## V1: Otterwiki on Caddy — COMPLETE

**Status:** Complete as of 2026-03-15. Single Otterwiki instance running behind Gunicorn/Caddy with multi-tenant slug routing operational.

**Goal:** Single Otterwiki instance running behind Gunicorn behind Caddy, with multi-tenant slug-based routing on wildcard subdomains. No auth yet — test users hardcoded in SQLite.

**Tasks:**

- V1-1: Port TenantResolver from DynamoDB to SQLite. Replace `UserModel.get()`, `WikiModel.get()`, `AclModel.query()` with SQLite queries. The WSGI middleware structure, proxy header injection, and config-swapping logic stay unchanged.
- V1-2: Port data access layer. Create a thin `db.py` module that wraps SQLite (or SQLAlchemy if preferred). Same interface as the DynamoDB models but backed by SQLite.
- V1-3: Gunicorn configuration. Otterwiki Flask app wrapped in TenantResolver, served by Gunicorn on port 8000. Multiple workers (sized to VM CPU cores).
- V1-4: Caddy routing for wiki subdomains. `*.robot.wtf` routes to Gunicorn on port 8000. Caddy handles TLS, Gunicorn handles WSGI.
- V1-5: Manual smoke test. Insert a test user and wiki directly into SQLite. `git init --bare /srv/wikis/testuser/repo.git`. Commit a Home page. Verify `https://testuser.robot.wtf` serves the wiki.

**Exit criteria:**
- Browsing, editing, and saving wiki pages works at `{slug}.robot.wtf`
- Gunicorn stays up, serves multiple requests, workers don't crash
- Config-swap per request works (if multiple test wikis exist, each resolves correctly)

---

## V2: Migrate dev.wikibot.io → dev.robot.wtf — PARTIAL

**Status:** Partial as of 2026-03-15. Dev wiki (this wiki) migrated to dev.robot.wtf. The Third Gulf War wiki (3gw.robot.wtf / 3gw.robot.wtf) remains on the home server via DNS CNAME exception and has not been migrated.

**Goal:** The existing development wiki (Third Gulf War research + this dev wiki) running on the VPS under robot.wtf, proving the full Otterwiki stack works in the new environment with real data.

**Tasks:**

- V2-1: Export git repos from the existing AWS deployment. `git clone --bare` from the EFS-backed repos (or from the dev.wikibot.io git remote if accessible). Copy to `/srv/wikis/` on the VM.
- V2-2: Create corresponding user and wiki records in SQLite. The existing deployment uses UUIDs and DynamoDB; the VPS uses DIDs and SQLite. For the dev migration, create placeholder user records (the owner's DID can be updated once ATProto auth is wired in V3).
- V2-3: Rebuild FAISS indexes from the migrated repos. Run the existing embedding code against the local repos with MiniLM.
- V2-4: Wire up the MCP server (FastMCP sidecar on port 8001) with bearer token auth against SQLite. Generate new bearer tokens for the migrated wikis.
- V2-5: Configure Caddy to route MCP traffic (`{slug}.robot.wtf/mcp`) to port 8001.
- V2-6: Verify end-to-end: browse the wiki in a browser, connect Claude Code via MCP bearer token, run a few tool calls.
- V2-7: Update MCP connection config in Claude.ai / Claude Code to point at `dev.robot.wtf` instead of `dev.wikibot.io`.

**Exit criteria:**
- The Third Gulf War wiki is browsable at its new robot.wtf subdomain
- This dev wiki is browsable at its new robot.wtf subdomain
- MCP tools work against the migrated wikis via Claude Code (bearer token)
- Semantic search returns results from the migrated FAISS indexes
- The old AWS deployment can be left running in parallel until confidence is high, then decommissioned

**Note:** Auth at this point is still placeholder/hardcoded. The migrated wikis use bearer tokens for MCP and don't require browser login (the wiki owner can browse without auth during this phase, or auth can be faked via a test JWT). Real ATProto auth comes in V3.

---

## V3: ATProto OAuth (Browser Login) — COMPLETE

**Status:** Complete as of 2026-03-15. Production auth service live with full signup flow; real Bluesky users can sign in and create accounts.

**Goal:** Real users can sign in with their Bluesky handle and get a platform JWT.

VS-1 proved the flow works. This phase replaces the spike with production code: proper error handling, session management, signup flow, and integration with the TenantResolver.

**Tasks:**

- V3-1: Auth service scaffold. Flask app on port 8003. Caddy routes `robot.wtf/auth/*` to it. (May already exist from VS-1 — evolve the spike or rewrite clean.)
- V3-2: Publish ATProto OAuth client metadata at `https://robot.wtf/auth/client-metadata.json`. This URL becomes the `client_id` in the ATProto OAuth protocol — it must be stable. (May already exist from VS-1.)
- V3-3: Implement the ATProto OAuth flow. Adapt the Bluesky cookbook demo: handle input → DID resolution → PDS discovery → PAR → redirect → callback → token exchange → profile fetch. Store ATProto tokens in the `oauth_sessions` table. Incorporate lessons from VS-1.
- V3-4: Platform JWT issuance. On successful ATProto auth, mint a platform JWT (RS256, same signing key from V0-5), set as HttpOnly cookie on `.robot.wtf`.
- V3-5: Signup flow. First-time users: after ATProto auth, prompt for platform username. Default to ATProto handle prefix (e.g., `sderle` from `sderle.bsky.social`, or domain prefix from a custom handle). Validate against reserved names and existing slugs. Create user record in SQLite.
- V3-6: Wire TenantResolver to validate platform JWTs from the cookie. Replace the hardcoded test auth from V1.
- V3-7: Handle refresh on login. Update `handle` and `display_name` from the PDS profile on each login (handles can change).
- V3-8: Login/logout UI. Minimal pages served by the auth service: login page (handle input field), logout endpoint (clear cookie). The management SPA comes later in V6; this is just the bare auth flow.

**Exit criteria:**
- A real Bluesky user can visit `robot.wtf/auth/login`, enter their handle, authorize on their PDS, and land back at robot.wtf with a valid platform JWT
- The platform JWT cookie works across subdomains (the user's wiki at `{slug}.robot.wtf` recognizes them)
- First-time signup creates a user record with DID, handle, and chosen username
- Returning users are recognized by DID; handle updates are reflected

---

## V4: Management API + Wiki Lifecycle — COMPLETE

**Status:** Complete as of 2026-03-15. Wiki CRUD, ACL management, tier limits, and git HTTP read all operational.

**Goal:** Authenticated users can create wikis, manage collaborators, and get MCP bearer tokens via API.

**Tasks:**

- V4-1: Port ManagementMiddleware from DynamoDB to SQLite. Same endpoints, same WSGI middleware pattern. Replace DynamoDB calls with SQLite queries.
- V4-2: Wire management API at `robot.wtf/api/*`. Caddy routes to the platform API service on port 8002.
- V4-3: Wiki creation flow. `POST /api/wikis` → create SQLite records + `git init --bare` + bootstrap template + generate bearer token. Returns token (shown once).
- V4-4: Wiki deletion, ACL management, token regeneration — port remaining endpoints.
- V4-5: Tier limit enforcement. 1 wiki per user, 500 pages, 3 collaborators. Enforced in middleware on write operations.
- V4-6: Git smart HTTP (read-only). Caddy routes `{slug}.robot.wtf/repo.git/*` to the platform API. dulwich serves `git-upload-pack` from the bare repo. Users can `git clone https://{slug}.robot.wtf/repo.git`.
- V4-7: Integration test. Create a user (via V3 auth flow), create a wiki (via API), browse it, connect MCP, clone via git.

**Exit criteria:**
- Authenticated user can create a wiki and get a bearer token
- Wiki appears at `{slug}.robot.wtf` with bootstrap template pages
- ACL grants/revocations work
- `git clone https://{slug}.robot.wtf/repo.git` works
- Tier limits enforced (cannot create second wiki)

---

## V5: MCP OAuth AS (Claude.ai) — COMPLETE

**Status:** Complete as of 2026-03-15. Claude.ai successfully connected to a robot.wtf wiki via OAuth; full MCP OAuth flow operational in production.

**Goal:** Claude.ai can connect to a robot.wtf wiki via its standard MCP OAuth flow.

VS-2 proved (or debugged) the protocol surface. This phase replaces the stub with production code: authlib's `AuthorizationServer`, real DCR persistence, ATProto-backed consent UI, proper token lifecycle.

**Tasks:**

- V5-1: Implement OAuth 2.1 AS using authlib's `AuthorizationServer` + `AuthorizationCodeGrant` (PKCE) + `ClientRegistrationEndpoint` (RFC 7591). Wire model callbacks against SQLite (`mcp_oauth_clients` table for DCR, authorization codes in a transient table or in-memory store). Incorporate findings from VS-2 about Claude.ai's specific requirements.
- V5-2: MCP protected resource metadata. Each wiki's MCP endpoint serves `/.well-known/oauth-protected-resource` pointing to robot.wtf's AS. Caddy routes this from the wiki subdomain to the MCP sidecar (or a small handler).
- V5-3: Authorization UI. The `/auth/oauth/authorize` endpoint renders a consent page. If the user is already logged in (platform JWT cookie): show "Authorize Claude to access {wiki}?". If not logged in: show "Sign in with Bluesky" which triggers the ATProto flow (V3) and then returns to consent.
- V5-4: Token issuance. The AS issues JWTs containing the user's DID and authorized wiki slug, signed with the platform RS256 key. MCP sidecar validates these the same way it validates platform JWTs.
- V5-5: Test against Claude.ai. Full end-to-end: add wiki MCP URL in Claude.ai, complete OAuth flow with real ATProto login, call wiki tools.
- V5-6: Test against Claude Code. Verify `claude mcp add --transport http` with OAuth discovery works (in addition to the existing bearer token path).

**Exit criteria:**
- Claude.ai can discover, authenticate, and use MCP tools on a robot.wtf wiki
- Claude Code can connect via OAuth (in addition to the existing bearer token path)
- Token refresh works (Claude.ai doesn't lose connection after token expiry)

---

## V6: Frontend + Landing Page — COMPLETE

**Status:** Complete as of 2026-03-15. Landing page live at robot.wtf/; management UI live at robot.wtf/app/*.

**Goal:** Web UI for account management and a landing page that explains what robot.wtf is.

**Tasks:**

- V6-1: Landing page. Static HTML/CSS at `robot.wtf/`. Explains what robot.wtf is, shows how to get started, links to sign in. Tone and structure adapted from [[Design/Landing_Page]] but reframed as a free volunteer project, not a product. Served by Caddy's file_server.
- V6-2: Management SPA. Svelte app at `robot.wtf/app/*`. Served by Caddy (try_files fallback to index.html for client-side routing). Screens from the archived [[Design/Frontend]], minus billing:
    - Dashboard (wiki list — for now just one wiki, but the UI should handle the general case)
    - Wiki settings (display name, public/private toggle, link to Otterwiki admin panel)
    - Collaborators (list, invite by handle/email, change role, revoke)
    - MCP connection instructions (bearer token display/regen, Claude Code command, Claude.ai setup walkthrough)
    - Account settings (username, connected ATProto handle, delete account)
- V6-3: `/api/me` endpoint. Returns current user info from platform JWT. The SPA uses this to determine login state and populate the UI.
- V6-4: "Logged in" detection on landing page. Small inline script checks for a companion cookie or probes `/api/me` to swap "Sign in" for "Dashboard" link.

**Exit criteria:**
- New user can visit `robot.wtf`, understand the service, sign in, create a wiki, and connect MCP — entirely through the web UI
- Returning user can manage collaborators, regenerate tokens, toggle public access
- Landing page is clear, honest, and matches the documentation style (write like a README, not a pitch deck)

---

## V7: Semantic Search + Operational Hardening — PARTIAL

**Status:** Partial as of 2026-03-20. Semantic search complete (FAISS + lifecycle hooks, REST API, MCP tool). Operational hardening not started.

**Goal:** Semantic search works. Backups run. The service is ready for real users.

**Tasks:**

- V7-1: ~~Embedding background worker.~~ COMPLETE — Implemented via synchronous lifecycle hooks (`page_saved`/`page_deleted`/`page_renamed` trigger FAISS upsert) instead of background worker. HookListener registers hooks; BackendRegistry provides per-wiki FAISS indexes. ONNX MiniLM-L6-v2 embeddings (ChromaDB bundled).
- V7-2: ~~Wire reindex queue.~~ COMPLETE — Superseded by hook-based synchronous updates. Page saves/deletes/renames trigger immediate index updates via lifecycle hooks. No queue needed.
- V7-3: ~~Wire semantic search into MCP sidecar and REST API.~~ COMPLETE — FAISS backend with IndexFlatIP, per-wiki indexes under `/srv/data/faiss/{slug}/`. REST API: `GET /api/v1/semantic-search`, `POST /api/v1/reindex`, `GET /api/v1/reindex/status`. MCP `semantic_search` tool calls the REST API.
- V7-4: Backup cron. Daily: SQLite `.backup` + `rsync` of `/srv/wikis` and `/srv/data` to offsite storage. Alert on failure.
- V7-5: Health checks. Each service (Otterwiki, MCP, platform API, auth) exposes `/health`. External monitor (UptimeRobot free tier) pings them.
- V7-6: Log rotation. systemd journal or logrotate configuration so logs don't fill the disk.
- V7-7: Disk space monitoring. Cron job alerts when usage exceeds 80%.
- V7-8: Account deletion. User can delete their account from the SPA. Deletes wiki (git repo, FAISS index), SQLite records, ACL grants. Requires typing username to confirm.
- V7-9: Announce to the ATProto community. Post on Bluesky, add to ATProto app directories if any exist.

**Exit criteria:**
- Semantic search returns relevant results for queries against indexed wikis
- Backups run daily and succeed
- Health checks pass and alert on failure
- A real user who is not the operator can sign up, create a wiki, connect MCP, and use it without assistance

---

## Dependency graph

```
V0 (VM + Caddy + TLS)
 ├─ VS-1 (ATProto OAuth spike)
 ├─ VS-2 (MCP OAuth AS stub)
 │
 └─ V1 (Otterwiki on Caddy)
     ├─ V2 (Migrate dev wikis)
     └─ V3 (ATProto OAuth — informed by VS-1)
         └─ V4 (Management API)
             ├─ V5 (MCP OAuth AS — informed by VS-2)
             └─ V6 (Frontend + Landing)
                 └─ V7 (Semantic search + Ops)
```

VS-1, VS-2, and V1 can all run in parallel after V0. The spikes don't block V1 — they're independent experiments on the same VM. V2 and V3 can run in parallel after V1. V4 depends on V3 (needs real auth). V5 and V6 can run in parallel after V4. V7 is the final polish.

**Minimum viable launch:** After V4, the service is usable — users can sign in with Bluesky, create a wiki, connect via Claude Code (bearer token), and browse the web UI. Claude.ai MCP OAuth (V5), the polished frontend (V6), and semantic search (V7) are improvements on a working service.

---

## What's not in scope

These items were part of the commercial wikibot.io design and are deliberately excluded from robot.wtf:

- Stripe integration, billing, premium tiers, upgrade/downgrade flows, lapse enforcement
- Multiple wikis per user (could be added later as a free feature if resource limits allow)
- Read/write git push (only read-only clone)
- External git sync (bidirectional GitHub/GitLab)
- Custom domains
- CDN read path / fragment caching (not needed — no Lambda cold starts)
- Lambda library mode / lazy loading (not needed — persistent processes)
- WAF (not needed at this scale — Caddy + rate limiting if abuse appears)
- Pulumi / IaC (replaced by Ansible for VM setup, systemd for services)
- CI/CD pipeline (deploy is `git pull` + restart; add GitHub Actions later if it matters)