The AWS serverless architecture described in [[Design/Platform_Overview]] works, but it optimizes for a problem we may not have yet: elastic scale and zero cost at rest. The tradeoff is complexity — VPC endpoints, Mangum adapters, DynamoDB Streams to avoid SQS endpoint costs, Lambda cold starts, EFS mount latency. All of that machinery exists to make Lambda work, not to make the wiki work.
+
The AWS serverless architecture described in [[Design/Platform_Overview]] works, but it optimizes for a problem we don't have: elastic scale and zero cost at rest. The tradeoff is complexity — VPC endpoints, Mangum adapters, DynamoDB Streams to avoid SQS endpoint costs, Lambda cold starts, EFS mount latency. All of that machinery exists to make Lambda work, not to make the wiki work.
-
A VPS on an OVHcloud community server for ATProto apps eliminates the hosting bill entirely and replaces the AWS complexity with a conventional deployment: persistent processes, local disk, SQLite, Caddy. The application logic — multi-tenant Otterwiki, MCP tools, semantic search, ACL enforcement — ports over with minimal changes. The middleware we already built for Lambda is WSGI middleware with a Mangum wrapper; removing the wrapper gives us back the WSGI middleware.
+
robot.wtf is a free, volunteer-run wiki service for the ATProto community. No premium tier, no billing, no Stripe. The hosting is a Debian 12 VM on a Proxmox hypervisor with a static IP and generous RAM and disk. The deployment is conventional: persistent processes, local disk, SQLite, Caddy. The application logic — multi-tenant Otterwiki, MCP tools, semantic search, ACL enforcement — ports over from the Lambda implementation with minimal changes. The middleware we already built for Lambda is WSGI middleware with a Mangum wrapper; removing the wrapper gives us back the WSGI middleware.
-
The ATProto identity system replaces WorkOS as the auth provider. Users sign in with their Bluesky handle (or any ATProto PDS account). Identity is a DID — portable, user-owned, and philosophically aligned with "your wiki is a git repo you can clone." The target audience (developers and researchers using AI agents) overlaps heavily with the ATProto early-adopter community, and the OVHcloud community server is specifically for ATProto apps.
+
The ATProto identity system replaces WorkOS as the auth provider. Users sign in with their Bluesky handle (or any ATProto PDS account). Identity is a DID — portable, user-owned, and philosophically aligned with "your wiki is a git repo you can clone." The target audience (developers and researchers using AI agents) overlaps heavily with the ATProto early-adopter community.
+
+
---
+
+
## Service model
+
+
robot.wtf is a free tool, not a business. There is no premium tier and no billing infrastructure.
+
+
Every user gets:
+
+
- 1 wiki
+
- 500 pages
+
- 3 collaborators
+
- Full-text search + semantic search
+
- MCP access (Claude.ai OAuth + Claude Code bearer token)
+
- Read-only git clone
+
- Public wiki toggle
+
+
These are resource management limits, not a paywall. If someone needs more, they clone their repo and self-host — which is the whole point of git-backed storage.
+
+
If paid tiers ever make sense, the architecture supports them — the ACL model and schema have room for tier fields. But the billing infrastructure (Stripe, webhooks, lapse enforcement, upgrade/downgrade flows) doesn't get built until someone is actually asking to pay. That decision and all the commercial design work is preserved in the archived design docs ([[Design/Implementation_Phases]], [[Design/Operations]]).
---
@@ 27,7 47,7 @@
### Server
-
OVHcloud community VPS for ATProto applications. Shared infrastructure, zero cost. The VPS runs Linux with Docker or systemd-managed services. If we ever need to leave the community server, the deployment is portable to any VPS provider (Hetzner, DigitalOcean, Fly.io, or back to AWS on an EC2 instance) — nothing is OVHcloud-specific.
+
Debian 12 VM running on a Proxmox hypervisor. Static IP, generous RAM and disk allocation. If the VM ever needs to move, the deployment is portable to any Linux box (Hetzner, DigitalOcean, Fly.io, bare metal, or back to AWS on an EC2 instance) — nothing is host-specific.
Caddy handles TLS termination, automatic Let's Encrypt certificates (including wildcard via DNS challenge), and reverse proxy routing. It replaces API Gateway + CloudFront + ACM.
-
Wildcard TLS requires a DNS challenge. Caddy supports this natively with plugins for common DNS providers (Cloudflare, Route 53, OVHcloud). The DNS zone for `{domain}` needs API credentials configured in Caddy.
+
Wildcard TLS requires a DNS challenge. Caddy supports this natively with plugins for common DNS providers (Cloudflare, Route 53, OVHcloud). The DNS zone for robot.wtf needs API credentials configured in Caddy.
Caddy's routing is order-sensitive and matcher-based. The Caddyfile structure:
```
-
{domain} {
+
robot.wtf {
handle /auth/* {
reverse_proxy localhost:8003
}
@@ 92,7 112,7 @@
}
}
-
*.{domain} {
+
*.robot.wtf {
@mcp path /mcp /mcp/*
handle @mcp {
reverse_proxy localhost:8001
@@ 134,37 154,43 @@
handle: string, // e.g. "sderle.bsky.social" — display name, may change
display_name: string, // from ATProto profile
avatar_url?: string, // from ATProto profile
-
username: string, // platform username, chosen at signup (URL slug)
+
username: string, // platform username (URL slug)
created_at: ISO8601,
wiki_count: number,
}
```
-
The `did` is the stable identity. The `handle` is refreshed from the PDS on each login (handles can change). The `username` is the platform-local slug used in URLs — it's chosen at signup and immutable for MVP, just like the current design.
+
The `did` is the stable identity. The `handle` is refreshed from the PDS on each login (handles can change). The `username` is the platform-local slug used in URLs — immutable after signup.
+
+
### Username defaulting
+
+
When a new user signs up, the platform username defaults to the local part of their ATProto handle. For a handle like `sderle.bsky.social`, the default is `sderle`. For a user with a custom domain handle like `schuyler.robot.wtf`, the default is `schuyler` (the domain prefix). The user can override this at signup if they want something different, but the default should be right most of the time.
+
+
Validation rules are unchanged from the original design: lowercase alphanumeric + hyphens, 3–30 characters, no leading/trailing hyphens, checked against the reserved name blocklist and existing usernames/wiki slugs.
### ATProto OAuth (browser login)
-
Wikibot is an ATProto OAuth **confidential client**. The flow:
+
robot.wtf is an ATProto OAuth **confidential client**. The flow:
```
1. User enters their handle (e.g. "sderle.bsky.social") on the login page
-
2. Wikibot resolves the handle to a DID, then resolves the DID to a PDS URL
-
3. Wikibot fetches the PDS's Authorization Server metadata
+
2. robot.wtf resolves the handle to a DID, then resolves the DID to a PDS URL
+
3. robot.wtf fetches the PDS's Authorization Server metadata
4. Wikibot sends a Pushed Authorization Request (PAR) to the PDS's AS,
+
4. robot.wtf sends a Pushed Authorization Request (PAR) to the PDS's AS,
including PKCE code_challenge and DPoP proof
5. User is redirected to their PDS's authorization interface
6. User approves the authorization request
-
7. PDS redirects back to {domain}/auth/callback with an authorization code
-
8. Wikibot exchanges the code for tokens (access_token + refresh_token)
+
7. PDS redirects back to robot.wtf/auth/callback with an authorization code
+
8. robot.wtf exchanges the code for tokens (access_token + refresh_token)
with DPoP binding and client authentication (signed JWT)
-
9. Wikibot uses the access token to fetch the user's profile (DID, handle,
+
9. robot.wtf uses the access token to fetch the user's profile (DID, handle,
display name) from their PDS
-
10. Wikibot mints a platform JWT, sets it as an HttpOnly cookie on .{domain}
-
11. Redirect to {domain}/app/
+
10. robot.wtf mints a platform JWT, sets it as an HttpOnly cookie on .robot.wtf
+
11. Redirect to robot.wtf/app/
```
-
The platform JWT is signed with our own RS256 key (stored on disk, not in Secrets Manager). After step 10, the PDS is not in the runtime path — the platform JWT is self-contained and validated locally. ATProto tokens are stored in the session database for potential future use (e.g., posting to Bluesky on behalf of the user), but they're not needed for wiki operations.
+
The platform JWT is signed with our own RS256 key (stored on disk). After step 10, the PDS is not in the runtime path — the platform JWT is self-contained and validated locally. ATProto tokens are stored in the session database for potential future use (e.g., posting to Bluesky on behalf of the user), but they're not needed for wiki operations.
### Reference implementation
@@ 182,24 208,24 @@
ATProto's OAuth profile is not directly compatible with this — ATProto uses per-user Authorization Servers (each user's PDS), whereas Claude.ai expects a single AS URL from the resource metadata endpoint.
-
**Solution: wikibot runs its own OAuth 2.1 Authorization Server for MCP.**
+
**Solution: robot.wtf runs its own OAuth 2.1 Authorization Server for MCP.**
```
-
1. Claude.ai connects to https://{slug}.{domain}/mcp
+
1. Claude.ai connects to https://{slug}.robot.wtf/mcp
3. Discovers wikibot's AS at https://{domain}/auth/oauth
-
4. Performs Dynamic Client Registration at {domain}/auth/oauth/register
-
5. Redirects user to {domain}/auth/oauth/authorize
-
6. User sees wikibot's consent page:
+
3. Discovers robot.wtf's AS at https://robot.wtf/auth/oauth
+
4. Performs Dynamic Client Registration at robot.wtf/auth/oauth/register
+
5. Redirects user to robot.wtf/auth/oauth/authorize
+
6. User sees robot.wtf's consent page:
- If already logged in (platform JWT cookie): "Authorize Claude to access {wiki}?"
- If not logged in: "Sign in with Bluesky" → ATProto OAuth flow → then consent
-
7. User approves, wikibot issues authorization code
-
8. Claude.ai exchanges code for access token at {domain}/auth/oauth/token
+
7. User approves, robot.wtf issues authorization code
+
8. Claude.ai exchanges code for access token at robot.wtf/auth/oauth/token
9. Claude.ai uses access token to make MCP requests
-
10. MCP sidecar validates token against wikibot's JWKS
+
10. MCP sidecar validates token against robot.wtf's JWKS
```
-
Wikibot's MCP OAuth AS is a thin layer. It delegates authentication to ATProto (step 6) and handles authorization itself (does this user have access to this wiki?). The token it issues is a JWT containing the user's DID and the authorized wiki slug, signed with our RS256 key.
+
robot.wtf's MCP OAuth AS is a thin layer. It delegates authentication to ATProto (step 6) and handles authorization itself (does this user have access to this wiki?). The token it issues is a JWT containing the user's DID and the authorized wiki slug, signed with our RS256 key.
Required OAuth 2.1 AS endpoints:
@@ 218,10 244,10 @@
Each wiki's MCP endpoint serves its own resource metadata:
```json
-
// GET https://{slug}.{domain}/.well-known/oauth-protected-resource
+
// GET https://{slug}.robot.wtf/.well-known/oauth-protected-resource
Same approach as [[Design/Frontend]]: platform JWT stored as an `HttpOnly`, `Secure`, `SameSite=Lax` cookie on `.{domain}`. Every request to any subdomain includes the cookie. The Otterwiki middleware and MCP sidecar both validate JWTs using the same public key.
+
Same approach as [[Design/Frontend]]: platform JWT stored as an `HttpOnly`, `Secure`, `SameSite=Lax` cookie on `.robot.wtf`. Every request to any subdomain includes the cookie. The Otterwiki middleware and MCP sidecar both validate JWTs using the same public key.
### Auth convergence
@@ 273,7 299,7 @@
The dataset is small even at 1000 users. SQLite on local disk is simpler, faster, and free. The application layer uses SQLAlchemy (or raw `sqlite3` — the schema is simple enough). If the deployment ever needs Postgres, the migration is straightforward.
-
The SQLite database lives at `/srv/data/wikibot.db`. Write concurrency is handled by SQLite's WAL mode, which supports concurrent reads with serialized writes. For a wiki platform where writes are infrequent relative to reads, this is more than adequate.
+
The SQLite database lives at `/srv/data/robot.db`. Write concurrency is handled by SQLite's WAL mode, which supports concurrent reads with serialized writes. For a wiki platform where writes are infrequent relative to reads, this is more than adequate.
### Tables
@@ 295,8 321,6 @@
repo_path TEXT NOT NULL, -- /srv/wikis/{slug}/repo.git
mcp_token_hash TEXT NOT NULL, -- bcrypt hash
is_public INTEGER DEFAULT 0,
-
is_paid INTEGER DEFAULT 0,
-
payment_status TEXT DEFAULT 'free', -- 'free' | 'active' | 'lapsed'
created_at TEXT NOT NULL,
last_accessed TEXT NOT NULL,
page_count INTEGER DEFAULT 0
@@ 348,7 372,7 @@
index.faiss # FAISS vector index
embeddings.json # page_path → vector mapping
data/
-
wikibot.db # SQLite database
+
robot.db # SQLite database
signing_key.pem # RS256 private key for JWT signing
The config-swapping is the multi-tenancy mechanism we already built. In Lambda, it happened per-invocation; in WSGI, it happens per-request. The difference is negligible — the config is a handful of in-memory variables, not file I/O.
-
Gunicorn runs with multiple workers (e.g., 4 workers for a small VPS). Each worker handles one request at a time. Git write operations are serialized per-repo by git's own lock file, same as on EFS.
+
Gunicorn runs with multiple workers. The Proxmox VM has generous RAM, so worker count is limited by CPU cores, not memory. Git write operations are serialized per-repo by git's own lock file.
### MCP sidecar (FastMCP)
@@ 391,9 415,9 @@
### Platform API (Flask)
-
A lightweight Flask app handling the management API (wiki CRUD, ACL management, token generation) and the Git smart HTTP protocol. This is the same API surface described in [[Design/Implementation_Phases]], with SQLite queries instead of DynamoDB calls.
+
A lightweight Flask app handling the management API (wiki CRUD, ACL management, token generation) and the Git smart HTTP protocol. This is the same API surface described in the archived [[Design/Implementation_Phases]], with SQLite queries instead of DynamoDB calls.
-
The Git smart HTTP endpoints (`/repo.git/info/refs`, `/repo.git/git-upload-pack`, `/repo.git/git-receive-pack`) use dulwich to serve the bare repos on disk. Free tier gets read-only (upload-pack only); premium gets read-write.
+
The Git smart HTTP endpoints (`/repo.git/info/refs`, `/repo.git/git-upload-pack`) use dulwich to serve the bare repos on disk. Read-only (upload-pack only) — users can clone and pull their wikis at any time.
### Auth service (Flask)
@@ 439,7 463,7 @@
MiniLM (~80MB) loads in ~500ms. On a VPS with persistent processes, this happens once at startup. In the Lambda architecture, it happened on every cold start. This is one of the clearest wins of the VPS approach.
-
If memory is tight on the shared VPS, only the MCP sidecar needs MiniLM loaded (it handles semantic search). The Otterwiki process and platform API don't need it — they just write to the reindex queue.
+
The Proxmox VM has plenty of RAM, so loading MiniLM in both the MCP sidecar and a dedicated embedding worker is fine. The Otterwiki process and platform API don't need it — they just write to the reindex queue.
---
@@ 450,7 474,7 @@
| Data | Location | Severity of loss |
|------|----------|-----------------|
| Git repos | `/srv/wikis/*/repo.git` | **Critical** — user data |
-
| SQLite database | `/srv/data/wikibot.db` | **High** — reconstructable from repos but painful |
+
| SQLite database | `/srv/data/robot.db` | **High** — reconstructable from repos but painful |
| Signing keys | `/srv/data/*.pem`, `/srv/data/*.json` | **High** — loss invalidates all active sessions |
@@ 464,19 488,21 @@
**FAISS indexes:** Not backed up. Rebuildable from repo content. Loss triggers a one-time re-embedding — seconds per wiki.
+
**Proxmox snapshots:** The Proxmox hypervisor can take VM-level snapshots. These are a useful complement to application-level backups — a snapshot captures the entire VM state for rapid rollback after a bad deploy. Not a substitute for offsite backups (snapshots live on the same hardware).
+
### Recovery
-
If the VPS dies completely, recovery is:
+
If the VM dies completely, recovery is:
-
1. Provision a new VPS (any provider)
-
2. Install dependencies, deploy application code
+
1. Provision a new VM (on Proxmox or any other host)
5. Restore git repos from backup (or users re-push from their clones)
6. Re-embed all wikis (automated script, runs in minutes)
-
7. Update DNS to point to new VPS
+
7. Update DNS to point to new IP (if it changed)
-
RTO: hours (mostly limited by repo restore transfer time). RPO: 24 hours (daily backup cycle). This is acceptable for a free/community service. If tighter RPO is needed, increase backup frequency or add streaming replication to a standby.
+
RTO: hours (mostly limited by repo restore transfer time). RPO: 24 hours (daily backup cycle). This is acceptable for a free community service. If tighter RPO is needed, increase backup frequency or add streaming replication to a standby.
---
@@ 487,21 513,21 @@
Code lives in a Git repo. Deployment is `git pull` + restart services. No Pulumi, no CloudFormation, no CI/CD pipeline required (though one can be added).
10. Download MiniLM model to `/srv/embeddings/model/`
+
11. Start services, verify health checks
### Monitoring
-
For a community-hosted service, keep monitoring simple:
+
For a volunteer-run service, keep monitoring simple:
-
- **Health checks:** Each service exposes a `/health` endpoint. Caddy or an external monitor (UptimeRobot, free tier) pings them.
-
- **Logs:** systemd journal or Docker logs. No ELK stack, no CloudWatch. `journalctl -u wikibot-otterwiki --since "1 hour ago"` is sufficient at this scale.
+
- **Health checks:** Each service exposes a `/health` endpoint. An external monitor (UptimeRobot, free tier) pings them.
+
- **Logs:** systemd journal or Docker logs. `journalctl -u robot-otterwiki --since "1 hour ago"` is sufficient at this scale.
- **Disk space:** A cron job that alerts (email or Bluesky DM) when disk usage exceeds 80%.
- **Backups:** The backup cron job logs success/failure. Alert on failure.
@@ 533,6 561,38 @@
---
+
## URL Structure
+
+
Every wiki gets a subdomain: `{slug}.robot.wtf`. The slug is the wiki's globally unique identifier.
Slugs and usernames are the same thing (each user gets one wiki, the slug IS the username). Reserved names blocked for signup: `api`, `auth`, `app`, `www`, `admin`, `mcp`, `docs`, `status`, `blog`, `help`, `support`, `static`, `assets`, `null`, `undefined`, `wiki`, `robot`.
+
+
---
+
## What changes vs. what stays the same
### Stays the same
@@ 542,20 602,18 @@
- Multi-tenant middleware logic (resolve slug → look up wiki → check ACL → set headers → delegate)
### What can be reused from existing implementation
@@ 589,14 647,10 @@
2. **MCP OAuth AS scope.** Building a spec-compliant OAuth 2.1 AS (with DCR, PKCE, token refresh, JWKS) is a meaningful amount of work. `authlib` has server-side components that can handle some of this. How much can we lean on `authlib` vs. hand-rolling? The Bluesky Flask demo is client-side only.
-
3. **Shared VPS resource constraints.** A community server has finite RAM and CPU. MiniLM (~80MB in memory per process that loads it), Gunicorn workers, FAISS indexes, and SQLite all compete for resources. What are the actual resource limits on the OVHcloud community server? This determines how many Gunicorn workers we can run and whether the embedding worker should be in-process or separate.
-
-
4. **Domain name.** The domain appears throughout the architecture (Caddy config, ATProto client metadata, JWT issuer, MCP resource metadata). What domain are we using? The ATProto client metadata URL IS the `client_id` in the protocol — it needs to be stable. Changing the domain later means re-registering the client and invalidating all active sessions.
-
-
5. **Caddy DNS challenge provider.** Wildcard TLS requires DNS API access. Which DNS provider hosts the zone, and does Caddy have a plugin for it? Cloudflare, Route 53, and OVHcloud are all supported. The DNS provider choice should be made before deployment.
+
3. **Caddy DNS challenge provider.** Wildcard TLS requires DNS API access. Which DNS provider hosts the robot.wtf zone? Cloudflare, Route 53, and OVHcloud are all supported by Caddy. The DNS provider choice should be made before deployment.
-
6. **Account creation UX with ATProto.** When a new user arrives, they enter their Bluesky handle and go through the ATProto OAuth flow. When they come back, we need them to pick a platform username (for their wiki slug). The current design has username selection at signup — this still works, but the flow is: enter handle → authorize on PDS → pick username → create wiki. Is that smooth enough, or should we default the username to their handle (minus the `.bsky.social` suffix) and let them change it?
+
4. **Claude.ai MCP OAuth compatibility.** The self-hosted OAuth 2.1 AS approach should work — Claude.ai's MCP client follows standard OAuth 2.1 discovery. But the actual implementation needs testing against Claude.ai's specific client behavior (which headers it sends, how it handles token refresh, whether it supports DPoP). The GitHub issues around Claude.ai MCP OAuth suggest it can be finicky. Plan for a debugging cycle.
-
7. **Claude.ai MCP OAuth compatibility.** The self-hosted OAuth 2.1 AS approach should work — Claude.ai's MCP client follows standard OAuth 2.1 discovery. But the actual implementation needs testing against Claude.ai's specific client behavior (which headers it sends, how it handles token refresh, whether it supports DPoP). The GitHub issues around Claude.ai MCP OAuth suggest it can be finicky. Plan for a debugging cycle.
+
5. **ATProto scopes.** The ATProto OAuth spec has "transitional" scopes (`transition:generic`). We only need authentication (identity), not authorization to act on the user's PDS. Is there a read-only or identity-only scope, or do we request `transition:generic` and just not use the access token for anything beyond profile fetching?
-
8. **ATProto scopes.** The ATProto OAuth spec has "transitional" scopes (`transition:generic`). We only need authentication (identity), not authorization to act on the user's PDS. Is there a read-only or identity-only scope, or do we request `transition:generic` and just not use the access token for anything beyond profile fetching?
+
6. **Docker Compose vs. systemd.** Both work. Docker Compose gives you reproducible builds, isolation, and easier migration between hosts. Systemd is lighter, native to Debian, and avoids Docker's overhead. For a Proxmox VM where we control the environment completely, systemd is probably sufficient. Docker adds value if we expect to move the deployment frequently.