Commit d105f8

2026-03-15 22:37:48 robot.wtf: Move misplaced pages from default wiki
/dev/null .. Tasks/Disk_Usage_Cap.md
@@ 0,0 1,38 @@
+ ---
+ category: reference
+ tags: [tasks, future, limits]
+ last_updated: 2026-03-15
+ confidence: medium
+ ---
+
+ # Disk Usage Cap Per Wiki
+
+ ## Problem
+ There's no disk space limit per wiki. A user could fill the VPS disk by uploading large attachments or creating thousands of pages. The page count limit (500) doesn't account for attachment size.
+
+ ## Current limits
+ - MAX_PAGES_PER_WIKI = 500 (enforced in ManagementMiddleware on wiki creation)
+ - No disk space enforcement
+
+ ## Proposed cap
+ 50MB per wiki. The dev wiki (65 pages, all markdown) is 1.4MB including git history. 50MB gives plenty of room for growth and moderate attachments.
+
+ ## Enforcement options
+
+ ### Option A: Per-write check
+ Check `du -s` on the wiki repo before each page save or attachment upload. Reject writes that would exceed the cap. Accurate but expensive — `du` on a large git repo can take hundreds of milliseconds.
+
+ ### Option B: Tracked column
+ Add a `disk_usage` column to the wikis table. Update it after each write (or periodically). Check the column value on writes — fast but can drift from reality.
+
+ ### Option C: Periodic cron
+ A cron job checks `du` for all wikis. If over cap, sets a flag that the TenantResolver checks on write requests. Cheap per-request but enforcement is delayed.
+
+ ### Option D: Git hooks
+ A git pre-receive hook that checks repo size before accepting a push. Only works for git-based writes, not web UI or MCP.
+
+ ## Decision needed
+ Which option, or a combination. Option B (tracked column) with Option C (periodic correction) is probably the right balance — fast per-request check with eventual consistency.
+
+ ## Not urgent
+ No external users yet. Can be deferred until pre-launch (V7-9).
/dev/null .. Tasks/MCP_Wiki_Routing.md
@@ 0,0 1,40 @@
+ ---
+ category: reference
+ tags: [tasks, mcp, bug, multi-tenant]
+ last_updated: 2026-03-15
+ confidence: high
+ ---
+
+ # MCP Wiki Routing Bug
+
+ ## Problem
+
+ The MCP sidecar calls the otterwiki REST API at `http://localhost:8000` without a Host header. The TenantResolver on port 8000 treats requests without a wiki subdomain as the default wiki (`_default`). All MCP reads and writes go to the wrong wiki.
+
+ This means:
+ - Pages written via MCP go to `/srv/wikis/_default/`, not the intended wiki
+ - Pages read via MCP return content from `_default`, not the wiki the user connected to
+ - Semantic search queries search the wrong index
+ - The three `Tasks/` pages created today via MCP are in the `_default` wiki, not the dev wiki
+
+ ## Root cause
+
+ The MCP server (otterwiki-mcp) creates a `WikiClient` with `OTTERWIKI_API_URL=http://localhost:8000`. When it makes HTTP requests to the REST API, it doesn't set a Host header. The TenantResolver sees `localhost` and falls through to the default wiki.
+
+ On 3gw (single-tenant), this works because there's only one wiki. On robot.wtf (multi-tenant), the MCP sidecar needs to forward the wiki context.
+
+ ## Fix
+
+ The MCP sidecar receives requests on wiki subdomains (e.g., `dev.robot.wtf/mcp`). It needs to:
+
+ 1. Extract the wiki slug from the incoming request's Host header
+ 2. Pass `Host: {slug}.robot.wtf` on its API calls to `http://localhost:8000`
+ 3. This way TenantResolver resolves the correct wiki for each API call
+
+ The `WikiClient` in `otterwiki_mcp/api_client.py` needs to accept a `host_header` parameter and include it in all requests. The MCP tool handlers need to know the current wiki slug (from the incoming request context) and pass it to the client.
+
+ ## Complication
+
+ FastMCP's tool handlers are async functions registered with `@mcp.tool()`. They don't have direct access to the HTTP request context (Host header). The wiki slug needs to be threaded through somehow — either via a context variable, or by creating a per-request WikiClient.
+
+ The V5 consent flow already extracts the wiki slug in the `/authorize/callback` handler. A similar pattern can be used for tool calls.
/dev/null .. Tasks/Post_Launch.md
@@ 0,0 1,44 @@
+ ---
+ category: reference
+ tags: [tasks, milestones, launch]
+ last_updated: 2026-03-15
+ confidence: medium
+ ---
+
+ # Post-Launch Milestone
+
+ Improvements after robot.wtf is live with real users. None of these block launch.
+
+ ## Features
+
+ ### Account deletion (V7-8)
+ User can delete their account from the management UI. Deletes wiki (git repo, FAISS index), SQLite records, ACL grants. Requires typing username to confirm.
+
+ ### Git clone auth
+ Read-only git clone works (V4-6) but has no auth — anyone can clone any wiki. Should require bearer token or platform JWT for private wikis.
+
+ ### Multiple wikis per user
+ Currently limited to 1 wiki per user. The data model supports multiple wikis. Needs UI for wiki selection and limit increase.
+
+ ### Bluesky DM alerts
+ Translate health check / disk monitoring alerts into Bluesky DMs via ATProto API. Small webhook relay.
+
+ ### Wiki import
+ Import existing git repos (from GitHub, local, etc.) as a new wiki. Upload or provide a git URL.
+
+ ### Attachment size limits
+ Per-file and per-wiki attachment size enforcement. Currently no limits on uploaded files.
+
+ ## Infrastructure
+
+ ### Proxmox CPU type
+ Change from kvm64 to host to enable numpy 2.4+ and remove the pin. Requires VM reboot. See [[Dev/Proxmox_CPU_Type]].
+
+ ### Semantic search worker process
+ Move embedding from in-process daemon thread to a separate worker process. Queue-based (reindex_queue table). Eliminates FAISS corruption risk on restart. See [[Tasks/Semantic_Search_Architecture]].
+
+ ### CI/CD
+ Currently deploy is `git push` + `ansible-playbook`. Add GitHub Actions for tests on PR, auto-deploy on merge to main.
+
+ ### Monitoring dashboard
+ Grafana or similar for service metrics. Currently health checks are binary (up/down) with no latency or throughput visibility.
/dev/null .. Tasks/Pre_Launch.md
@@ 0,0 1,53 @@
+ ---
+ category: reference
+ tags: [tasks, milestones, launch]
+ last_updated: 2026-03-15
+ confidence: high
+ ---
+
+ # Pre-Launch Milestone
+
+ Work required before opening robot.wtf to the ATProto community. Everything here is either a bug, a missing feature that blocks usability, or a safety issue.
+
+ ## Blocking
+
+ ### MCP wiki routing (bug)
+ MCP sidecar calls the REST API without forwarding the wiki context. All reads/writes go to the default wiki. Fix in progress — see [[Tasks/MCP_Wiki_Routing]].
+
+ ### Multi-tenant semantic search
+ Sync thread only indexes one wiki. Each wiki needs its own FAISS index directory and sync state. See [[Tasks/Semantic_Search_Architecture]] and [[Tasks/Semantic_Search_Multi_Tenant]].
+
+ ### Disk usage cap
+ No per-wiki disk space limit. A user could fill the VPS. See [[Tasks/Disk_Usage_Cap]]. 50MB per wiki proposed.
+
+ ### Management UI usability
+ The dashboard works but needs UX iteration:
+ - Wiki creation flow should default slug to username (or let user pick wiki domain at signup)
+ - MCP connection instructions need to be clearer
+ - Settings page layout needs work
+ - Consider re-enabling Otterwiki's content/editing admin settings (currently hidden by PLATFORM_MODE)
+
+ ### Landing page copy
+ Draft is live at robot.wtf/. Needs tone and content review. Reframe FAQs for ATProto audience. Add actual screenshots once the UI is polished.
+
+ ## Safety
+
+ ### Backup verification
+ Backup cron is running (V7-4) but needs a test restore to verify it works.
+
+ ### SMTP alerts test
+ Health check and disk monitoring alerts are configured but haven't been tested end-to-end. Send a test alert to verify Gmail relay works.
+
+ ### Rate limiting
+ No rate limiting on any endpoint. Caddy can add this, but nothing is configured. Not critical for soft launch with a small community, but needed before wider announcement.
+
+ ## Not blocking but important
+
+ ### FAISS index corruption risk
+ The sync thread is a daemon thread killed without cleanup on SIGTERM. Mid-write kills could corrupt the index. Recoverable (full reindex) but slow. See [[Tasks/Semantic_Search_Architecture]].
+
+ ### OAuth token refresh
+ Claude.ai token refresh hasn't been tested. If tokens expire after 1 hour and refresh fails, users lose their MCP connection.
+
+ ### Stale pages in wrong wiki
+ Three Tasks/ pages were written to the _default wiki by the MCP routing bug. Need to be moved to the dev wiki after the routing fix.
/dev/null .. Tasks/Semantic_Search_Architecture.md
@@ 0,0 1,32 @@
+ ---
+ category: reference
+ tags: [tasks, semantic-search, architecture]
+ last_updated: 2026-03-15
+ confidence: high
+ ---
+
+ # Semantic Search Architecture Issues
+
+ ## Current state
+ FAISS + ONNX MiniLM embedding, running in-process in the gunicorn worker. Works for single-tenant. 65 pages indexed for dev wiki.
+
+ ## Issues to address
+
+ ### 1. Multi-tenant indexing (blocking)
+ The sync thread watches one wiki (whichever storage was set at startup). TenantResolver swaps storage per-request, but the sync thread holds the original reference. Each wiki needs its own FAISS index directory and its own sync state. The reindex_all function also wipes and rebuilds the entire shared index.
+
+ **Needed:** Per-wiki FAISS directories (`/srv/data/faiss/{slug}/`), per-wiki sync state, sync thread that iterates over all wikis or per-wiki threads.
+
+ ### 2. In-process embedding risks
+ The ONNX model (~80MB) loads in the gunicorn worker. The sync thread is a daemon thread — killed without cleanup on SIGTERM. If killed mid-write to the FAISS index, the index could corrupt (recovered by full reindex on next start, but that's slow).
+
+ **Options:**
+ - Separate embedding worker process (like ChromaDB was, but lighter)
+ - Queue-based: page saves write to a queue (SQLite reindex_queue table already in schema), worker process reads and embeds
+ - Graceful shutdown handler in sync thread
+
+ ### 3. Sync frequency
+ Currently every 60 seconds by polling git HEAD SHA. For a multi-tenant setup with many wikis, polling every wiki every 60 seconds doesn't scale. A queue (reindex_queue table triggered by page_saved hook) would be more efficient.
+
+ ## Not blocking launch
+ Semantic search works for the dev wiki. Multi-tenant indexing is needed before opening to users with multiple wikis. The in-process risks and sync frequency are optimization concerns for later.
/dev/null .. Tasks/Semantic_Search_Multi_Tenant.md
@@ 0,0 1,63 @@
+ ---
+ category: reference
+ tags: [tasks, semantic-search, chromadb, multi-tenant]
+ last_updated: 2026-03-15
+ confidence: high
+ ---
+
+ # Semantic Search Multi-Tenant Fix
+
+ ## Problem
+
+ The otterwiki-semantic-search plugin is single-tenant. On the robot.wtf VPS:
+
+ 1. **One shared ChromaDB collection** (`otterwiki_pages`) for all wikis. Page paths have no wiki slug prefix — two wikis with a page named "Home" would collide.
+
+ 2. **One sync thread** started at boot, tied to the default wiki's storage object. The `TenantResolver._swap_storage()` patches `_state["storage"]` per-request, but the sync thread holds a reference to the original storage and never sees other wikis.
+
+ 3. **`reindex_all` wipes everything** — it calls `backend.reset()` which drops and recreates the entire shared collection. Reindexing one wiki destroys the other's index.
+
+ 4. **No auto-index for new wikis** — there's no trigger when a wiki is first accessed. The `page_saved` hook catches future saves, but won't back-fill existing pages.
+
+ 5. **Embedding model download** — ChromaDB's default ONNX MiniLM embedding function needs to download the model on first use. The `robot` service user needs a writable cache directory (`HOME=/srv`). Even with this, the embedding silently fails and produces zero indexed documents.
+
+ ## Immediate status
+
+ - ChromaDB server is running on port 8004
+ - numpy is importable (pinned <2.4.0)
+ - The plugin initializes and connects to ChromaDB
+ - But zero documents are indexed for the dev wiki
+ - Attempted manual `reindex_all` via Python — says "complete" but collection count stays 0
+
+ ## Options
+
+ ### Option A: Fix ChromaDB multi-tenant (more work)
+ - Per-wiki collections: `otterwiki_pages_{slug}`
+ - Per-wiki sync state: `chroma_sync_state_{slug}.json`
+ - `reindex_all` scoped to one collection
+ - Sync thread needs per-wiki awareness or one thread per wiki
+ - Changes to: otterwiki-semantic-search plugin
+
+ ### Option B: Switch back to FAISS (different tradeoffs)
+ - FAISS indexes are per-directory — natural per-wiki isolation
+ - Local MiniLM embedding (no model download issue — bundled)
+ - The wikibot.io Lambda deployment already used FAISS + MiniLM
+ - The otterwiki-semantic-search plugin already has a FAISS backend (`VECTOR_BACKEND=faiss`)
+ - But FAISS needs explicit index management (build, save, load)
+ - And the Lambda deployment used Bedrock for embedding — local MiniLM needs the model on disk
+
+ ### Option C: Hybrid — ChromaDB with explicit embedding function
+ - Use ChromaDB but provide our own embedding function (MiniLM loaded locally) instead of relying on ChromaDB's default ONNX embedding
+ - Solves the model download issue
+ - Still needs multi-tenant collection fix
+
+ ## Decision needed
+
+ Which approach to take. The answer depends on:
+ - Is the ChromaDB embedding function the only reason reindex produces 0 results? (Debug this first)
+ - Is per-wiki FAISS simpler than per-wiki ChromaDB collections?
+ - Do we want to maintain two backends or pick one?
+
+ ## Related
+ - [[Dev/Proxmox_CPU_Type]] — numpy X86_V2 issue (workaround in place)
+ - [[Design/Async_Embedding_Pipeline]] — original FAISS + MiniLM design (AWS, archived)
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9