Commit d105f8
2026-03-15 22:37:48 robot.wtf: Move misplaced pages from default wiki| /dev/null .. Tasks/Disk_Usage_Cap.md | |
| @@ 0,0 1,38 @@ | |
| + | --- |
| + | category: reference |
| + | tags: [tasks, future, limits] |
| + | last_updated: 2026-03-15 |
| + | confidence: medium |
| + | --- |
| + | |
| + | # Disk Usage Cap Per Wiki |
| + | |
| + | ## Problem |
| + | There's no disk space limit per wiki. A user could fill the VPS disk by uploading large attachments or creating thousands of pages. The page count limit (500) doesn't account for attachment size. |
| + | |
| + | ## Current limits |
| + | - MAX_PAGES_PER_WIKI = 500 (enforced in ManagementMiddleware on wiki creation) |
| + | - No disk space enforcement |
| + | |
| + | ## Proposed cap |
| + | 50MB per wiki. The dev wiki (65 pages, all markdown) is 1.4MB including git history. 50MB gives plenty of room for growth and moderate attachments. |
| + | |
| + | ## Enforcement options |
| + | |
| + | ### Option A: Per-write check |
| + | Check `du -s` on the wiki repo before each page save or attachment upload. Reject writes that would exceed the cap. Accurate but expensive — `du` on a large git repo can take hundreds of milliseconds. |
| + | |
| + | ### Option B: Tracked column |
| + | Add a `disk_usage` column to the wikis table. Update it after each write (or periodically). Check the column value on writes — fast but can drift from reality. |
| + | |
| + | ### Option C: Periodic cron |
| + | A cron job checks `du` for all wikis. If over cap, sets a flag that the TenantResolver checks on write requests. Cheap per-request but enforcement is delayed. |
| + | |
| + | ### Option D: Git hooks |
| + | A git pre-receive hook that checks repo size before accepting a push. Only works for git-based writes, not web UI or MCP. |
| + | |
| + | ## Decision needed |
| + | Which option, or a combination. Option B (tracked column) with Option C (periodic correction) is probably the right balance — fast per-request check with eventual consistency. |
| + | |
| + | ## Not urgent |
| + | No external users yet. Can be deferred until pre-launch (V7-9). |
| /dev/null .. Tasks/MCP_Wiki_Routing.md | |
| @@ 0,0 1,40 @@ | |
| + | --- |
| + | category: reference |
| + | tags: [tasks, mcp, bug, multi-tenant] |
| + | last_updated: 2026-03-15 |
| + | confidence: high |
| + | --- |
| + | |
| + | # MCP Wiki Routing Bug |
| + | |
| + | ## Problem |
| + | |
| + | The MCP sidecar calls the otterwiki REST API at `http://localhost:8000` without a Host header. The TenantResolver on port 8000 treats requests without a wiki subdomain as the default wiki (`_default`). All MCP reads and writes go to the wrong wiki. |
| + | |
| + | This means: |
| + | - Pages written via MCP go to `/srv/wikis/_default/`, not the intended wiki |
| + | - Pages read via MCP return content from `_default`, not the wiki the user connected to |
| + | - Semantic search queries search the wrong index |
| + | - The three `Tasks/` pages created today via MCP are in the `_default` wiki, not the dev wiki |
| + | |
| + | ## Root cause |
| + | |
| + | The MCP server (otterwiki-mcp) creates a `WikiClient` with `OTTERWIKI_API_URL=http://localhost:8000`. When it makes HTTP requests to the REST API, it doesn't set a Host header. The TenantResolver sees `localhost` and falls through to the default wiki. |
| + | |
| + | On 3gw (single-tenant), this works because there's only one wiki. On robot.wtf (multi-tenant), the MCP sidecar needs to forward the wiki context. |
| + | |
| + | ## Fix |
| + | |
| + | The MCP sidecar receives requests on wiki subdomains (e.g., `dev.robot.wtf/mcp`). It needs to: |
| + | |
| + | 1. Extract the wiki slug from the incoming request's Host header |
| + | 2. Pass `Host: {slug}.robot.wtf` on its API calls to `http://localhost:8000` |
| + | 3. This way TenantResolver resolves the correct wiki for each API call |
| + | |
| + | The `WikiClient` in `otterwiki_mcp/api_client.py` needs to accept a `host_header` parameter and include it in all requests. The MCP tool handlers need to know the current wiki slug (from the incoming request context) and pass it to the client. |
| + | |
| + | ## Complication |
| + | |
| + | FastMCP's tool handlers are async functions registered with `@mcp.tool()`. They don't have direct access to the HTTP request context (Host header). The wiki slug needs to be threaded through somehow — either via a context variable, or by creating a per-request WikiClient. |
| + | |
| + | The V5 consent flow already extracts the wiki slug in the `/authorize/callback` handler. A similar pattern can be used for tool calls. |
| /dev/null .. Tasks/Post_Launch.md | |
| @@ 0,0 1,44 @@ | |
| + | --- |
| + | category: reference |
| + | tags: [tasks, milestones, launch] |
| + | last_updated: 2026-03-15 |
| + | confidence: medium |
| + | --- |
| + | |
| + | # Post-Launch Milestone |
| + | |
| + | Improvements after robot.wtf is live with real users. None of these block launch. |
| + | |
| + | ## Features |
| + | |
| + | ### Account deletion (V7-8) |
| + | User can delete their account from the management UI. Deletes wiki (git repo, FAISS index), SQLite records, ACL grants. Requires typing username to confirm. |
| + | |
| + | ### Git clone auth |
| + | Read-only git clone works (V4-6) but has no auth — anyone can clone any wiki. Should require bearer token or platform JWT for private wikis. |
| + | |
| + | ### Multiple wikis per user |
| + | Currently limited to 1 wiki per user. The data model supports multiple wikis. Needs UI for wiki selection and limit increase. |
| + | |
| + | ### Bluesky DM alerts |
| + | Translate health check / disk monitoring alerts into Bluesky DMs via ATProto API. Small webhook relay. |
| + | |
| + | ### Wiki import |
| + | Import existing git repos (from GitHub, local, etc.) as a new wiki. Upload or provide a git URL. |
| + | |
| + | ### Attachment size limits |
| + | Per-file and per-wiki attachment size enforcement. Currently no limits on uploaded files. |
| + | |
| + | ## Infrastructure |
| + | |
| + | ### Proxmox CPU type |
| + | Change from kvm64 to host to enable numpy 2.4+ and remove the pin. Requires VM reboot. See [[Dev/Proxmox_CPU_Type]]. |
| + | |
| + | ### Semantic search worker process |
| + | Move embedding from in-process daemon thread to a separate worker process. Queue-based (reindex_queue table). Eliminates FAISS corruption risk on restart. See [[Tasks/Semantic_Search_Architecture]]. |
| + | |
| + | ### CI/CD |
| + | Currently deploy is `git push` + `ansible-playbook`. Add GitHub Actions for tests on PR, auto-deploy on merge to main. |
| + | |
| + | ### Monitoring dashboard |
| + | Grafana or similar for service metrics. Currently health checks are binary (up/down) with no latency or throughput visibility. |
| /dev/null .. Tasks/Pre_Launch.md | |
| @@ 0,0 1,53 @@ | |
| + | --- |
| + | category: reference |
| + | tags: [tasks, milestones, launch] |
| + | last_updated: 2026-03-15 |
| + | confidence: high |
| + | --- |
| + | |
| + | # Pre-Launch Milestone |
| + | |
| + | Work required before opening robot.wtf to the ATProto community. Everything here is either a bug, a missing feature that blocks usability, or a safety issue. |
| + | |
| + | ## Blocking |
| + | |
| + | ### MCP wiki routing (bug) |
| + | MCP sidecar calls the REST API without forwarding the wiki context. All reads/writes go to the default wiki. Fix in progress — see [[Tasks/MCP_Wiki_Routing]]. |
| + | |
| + | ### Multi-tenant semantic search |
| + | Sync thread only indexes one wiki. Each wiki needs its own FAISS index directory and sync state. See [[Tasks/Semantic_Search_Architecture]] and [[Tasks/Semantic_Search_Multi_Tenant]]. |
| + | |
| + | ### Disk usage cap |
| + | No per-wiki disk space limit. A user could fill the VPS. See [[Tasks/Disk_Usage_Cap]]. 50MB per wiki proposed. |
| + | |
| + | ### Management UI usability |
| + | The dashboard works but needs UX iteration: |
| + | - Wiki creation flow should default slug to username (or let user pick wiki domain at signup) |
| + | - MCP connection instructions need to be clearer |
| + | - Settings page layout needs work |
| + | - Consider re-enabling Otterwiki's content/editing admin settings (currently hidden by PLATFORM_MODE) |
| + | |
| + | ### Landing page copy |
| + | Draft is live at robot.wtf/. Needs tone and content review. Reframe FAQs for ATProto audience. Add actual screenshots once the UI is polished. |
| + | |
| + | ## Safety |
| + | |
| + | ### Backup verification |
| + | Backup cron is running (V7-4) but needs a test restore to verify it works. |
| + | |
| + | ### SMTP alerts test |
| + | Health check and disk monitoring alerts are configured but haven't been tested end-to-end. Send a test alert to verify Gmail relay works. |
| + | |
| + | ### Rate limiting |
| + | No rate limiting on any endpoint. Caddy can add this, but nothing is configured. Not critical for soft launch with a small community, but needed before wider announcement. |
| + | |
| + | ## Not blocking but important |
| + | |
| + | ### FAISS index corruption risk |
| + | The sync thread is a daemon thread killed without cleanup on SIGTERM. Mid-write kills could corrupt the index. Recoverable (full reindex) but slow. See [[Tasks/Semantic_Search_Architecture]]. |
| + | |
| + | ### OAuth token refresh |
| + | Claude.ai token refresh hasn't been tested. If tokens expire after 1 hour and refresh fails, users lose their MCP connection. |
| + | |
| + | ### Stale pages in wrong wiki |
| + | Three Tasks/ pages were written to the _default wiki by the MCP routing bug. Need to be moved to the dev wiki after the routing fix. |
| /dev/null .. Tasks/Semantic_Search_Architecture.md | |
| @@ 0,0 1,32 @@ | |
| + | --- |
| + | category: reference |
| + | tags: [tasks, semantic-search, architecture] |
| + | last_updated: 2026-03-15 |
| + | confidence: high |
| + | --- |
| + | |
| + | # Semantic Search Architecture Issues |
| + | |
| + | ## Current state |
| + | FAISS + ONNX MiniLM embedding, running in-process in the gunicorn worker. Works for single-tenant. 65 pages indexed for dev wiki. |
| + | |
| + | ## Issues to address |
| + | |
| + | ### 1. Multi-tenant indexing (blocking) |
| + | The sync thread watches one wiki (whichever storage was set at startup). TenantResolver swaps storage per-request, but the sync thread holds the original reference. Each wiki needs its own FAISS index directory and its own sync state. The reindex_all function also wipes and rebuilds the entire shared index. |
| + | |
| + | **Needed:** Per-wiki FAISS directories (`/srv/data/faiss/{slug}/`), per-wiki sync state, sync thread that iterates over all wikis or per-wiki threads. |
| + | |
| + | ### 2. In-process embedding risks |
| + | The ONNX model (~80MB) loads in the gunicorn worker. The sync thread is a daemon thread — killed without cleanup on SIGTERM. If killed mid-write to the FAISS index, the index could corrupt (recovered by full reindex on next start, but that's slow). |
| + | |
| + | **Options:** |
| + | - Separate embedding worker process (like ChromaDB was, but lighter) |
| + | - Queue-based: page saves write to a queue (SQLite reindex_queue table already in schema), worker process reads and embeds |
| + | - Graceful shutdown handler in sync thread |
| + | |
| + | ### 3. Sync frequency |
| + | Currently every 60 seconds by polling git HEAD SHA. For a multi-tenant setup with many wikis, polling every wiki every 60 seconds doesn't scale. A queue (reindex_queue table triggered by page_saved hook) would be more efficient. |
| + | |
| + | ## Not blocking launch |
| + | Semantic search works for the dev wiki. Multi-tenant indexing is needed before opening to users with multiple wikis. The in-process risks and sync frequency are optimization concerns for later. |
| /dev/null .. Tasks/Semantic_Search_Multi_Tenant.md | |
| @@ 0,0 1,63 @@ | |
| + | --- |
| + | category: reference |
| + | tags: [tasks, semantic-search, chromadb, multi-tenant] |
| + | last_updated: 2026-03-15 |
| + | confidence: high |
| + | --- |
| + | |
| + | # Semantic Search Multi-Tenant Fix |
| + | |
| + | ## Problem |
| + | |
| + | The otterwiki-semantic-search plugin is single-tenant. On the robot.wtf VPS: |
| + | |
| + | 1. **One shared ChromaDB collection** (`otterwiki_pages`) for all wikis. Page paths have no wiki slug prefix — two wikis with a page named "Home" would collide. |
| + | |
| + | 2. **One sync thread** started at boot, tied to the default wiki's storage object. The `TenantResolver._swap_storage()` patches `_state["storage"]` per-request, but the sync thread holds a reference to the original storage and never sees other wikis. |
| + | |
| + | 3. **`reindex_all` wipes everything** — it calls `backend.reset()` which drops and recreates the entire shared collection. Reindexing one wiki destroys the other's index. |
| + | |
| + | 4. **No auto-index for new wikis** — there's no trigger when a wiki is first accessed. The `page_saved` hook catches future saves, but won't back-fill existing pages. |
| + | |
| + | 5. **Embedding model download** — ChromaDB's default ONNX MiniLM embedding function needs to download the model on first use. The `robot` service user needs a writable cache directory (`HOME=/srv`). Even with this, the embedding silently fails and produces zero indexed documents. |
| + | |
| + | ## Immediate status |
| + | |
| + | - ChromaDB server is running on port 8004 |
| + | - numpy is importable (pinned <2.4.0) |
| + | - The plugin initializes and connects to ChromaDB |
| + | - But zero documents are indexed for the dev wiki |
| + | - Attempted manual `reindex_all` via Python — says "complete" but collection count stays 0 |
| + | |
| + | ## Options |
| + | |
| + | ### Option A: Fix ChromaDB multi-tenant (more work) |
| + | - Per-wiki collections: `otterwiki_pages_{slug}` |
| + | - Per-wiki sync state: `chroma_sync_state_{slug}.json` |
| + | - `reindex_all` scoped to one collection |
| + | - Sync thread needs per-wiki awareness or one thread per wiki |
| + | - Changes to: otterwiki-semantic-search plugin |
| + | |
| + | ### Option B: Switch back to FAISS (different tradeoffs) |
| + | - FAISS indexes are per-directory — natural per-wiki isolation |
| + | - Local MiniLM embedding (no model download issue — bundled) |
| + | - The wikibot.io Lambda deployment already used FAISS + MiniLM |
| + | - The otterwiki-semantic-search plugin already has a FAISS backend (`VECTOR_BACKEND=faiss`) |
| + | - But FAISS needs explicit index management (build, save, load) |
| + | - And the Lambda deployment used Bedrock for embedding — local MiniLM needs the model on disk |
| + | |
| + | ### Option C: Hybrid — ChromaDB with explicit embedding function |
| + | - Use ChromaDB but provide our own embedding function (MiniLM loaded locally) instead of relying on ChromaDB's default ONNX embedding |
| + | - Solves the model download issue |
| + | - Still needs multi-tenant collection fix |
| + | |
| + | ## Decision needed |
| + | |
| + | Which approach to take. The answer depends on: |
| + | - Is the ChromaDB embedding function the only reason reindex produces 0 results? (Debug this first) |
| + | - Is per-wiki FAISS simpler than per-wiki ChromaDB collections? |
| + | - Do we want to maintain two backends or pick one? |
| + | |
| + | ## Related |
| + | - [[Dev/Proxmox_CPU_Type]] — numpy X86_V2 issue (workaround in place) |
| + | - [[Design/Async_Embedding_Pipeline]] — original FAISS + MiniLM design (AWS, archived) |