commit d105f8 – Dev Wiki

Changelog Documentation

Toggle dark mode Settings

Page Index

Archive
- AWS Design
- AWS Tasks
Bugs
- Semantic Search Background Sync
Design
Dev
DiagTestPage
E2ETestPage
Home
Meta
- Wiki Usage Guide
Minsky
Plans
Security
- OWASP 2025 Audit
Tasks
To-Do
UserBPage
irc-plugin-design-review

Commit `d105f8`

2026-03-15 22:37:48 robot.wtf: Move misplaced pages from default wiki

`/dev/null` .. `Tasks/Disk_Usage_Cap.md`
@@ 0,0 1,38 @@
+	---
+	category: reference
+	tags: [tasks, future, limits]
+	last_updated: 2026-03-15
+	confidence: medium
+	---
+
+	# Disk Usage Cap Per Wiki
+
+	## Problem
+	There's no disk space limit per wiki. A user could fill the VPS disk by uploading large attachments or creating thousands of pages. The page count limit (500) doesn't account for attachment size.
+
+	## Current limits
+	- MAX_PAGES_PER_WIKI = 500 (enforced in ManagementMiddleware on wiki creation)
+	- No disk space enforcement
+
+	## Proposed cap
+	50MB per wiki. The dev wiki (65 pages, all markdown) is 1.4MB including git history. 50MB gives plenty of room for growth and moderate attachments.
+
+	## Enforcement options
+
+	### Option A: Per-write check
+	Check `du -s` on the wiki repo before each page save or attachment upload. Reject writes that would exceed the cap. Accurate but expensive — `du` on a large git repo can take hundreds of milliseconds.
+
+	### Option B: Tracked column
+	Add a `disk_usage` column to the wikis table. Update it after each write (or periodically). Check the column value on writes — fast but can drift from reality.
+
+	### Option C: Periodic cron
+	A cron job checks `du` for all wikis. If over cap, sets a flag that the TenantResolver checks on write requests. Cheap per-request but enforcement is delayed.
+
+	### Option D: Git hooks
+	A git pre-receive hook that checks repo size before accepting a push. Only works for git-based writes, not web UI or MCP.
+
+	## Decision needed
+	Which option, or a combination. Option B (tracked column) with Option C (periodic correction) is probably the right balance — fast per-request check with eventual consistency.
+
+	## Not urgent
+	No external users yet. Can be deferred until pre-launch (V7-9).
`/dev/null` .. `Tasks/MCP_Wiki_Routing.md`
@@ 0,0 1,40 @@
+	---
+	category: reference
+	tags: [tasks, mcp, bug, multi-tenant]
+	last_updated: 2026-03-15
+	confidence: high
+	---
+
+	# MCP Wiki Routing Bug
+
+	## Problem
+
+	The MCP sidecar calls the otterwiki REST API at `http://localhost:8000` without a Host header. The TenantResolver on port 8000 treats requests without a wiki subdomain as the default wiki (`_default`). All MCP reads and writes go to the wrong wiki.
+
+	This means:
+	- Pages written via MCP go to `/srv/wikis/_default/`, not the intended wiki
+	- Pages read via MCP return content from `_default`, not the wiki the user connected to
+	- Semantic search queries search the wrong index
+	- The three `Tasks/` pages created today via MCP are in the `_default` wiki, not the dev wiki
+
+	## Root cause
+
+	The MCP server (otterwiki-mcp) creates a `WikiClient` with `OTTERWIKI_API_URL=http://localhost:8000`. When it makes HTTP requests to the REST API, it doesn't set a Host header. The TenantResolver sees `localhost` and falls through to the default wiki.
+
+	On 3gw (single-tenant), this works because there's only one wiki. On robot.wtf (multi-tenant), the MCP sidecar needs to forward the wiki context.
+
+	## Fix
+
+	The MCP sidecar receives requests on wiki subdomains (e.g., `dev.robot.wtf/mcp`). It needs to:
+
+	1. Extract the wiki slug from the incoming request's Host header
+	2. Pass `Host: {slug}.robot.wtf` on its API calls to `http://localhost:8000`
+	3. This way TenantResolver resolves the correct wiki for each API call
+
+	The `WikiClient` in `otterwiki_mcp/api_client.py` needs to accept a `host_header` parameter and include it in all requests. The MCP tool handlers need to know the current wiki slug (from the incoming request context) and pass it to the client.
+
+	## Complication
+
+	FastMCP's tool handlers are async functions registered with `@mcp.tool()`. They don't have direct access to the HTTP request context (Host header). The wiki slug needs to be threaded through somehow — either via a context variable, or by creating a per-request WikiClient.
+
+	The V5 consent flow already extracts the wiki slug in the `/authorize/callback` handler. A similar pattern can be used for tool calls.
`/dev/null` .. `Tasks/Post_Launch.md`
@@ 0,0 1,44 @@
+	---
+	category: reference
+	tags: [tasks, milestones, launch]
+	last_updated: 2026-03-15
+	confidence: medium
+	---
+
+	# Post-Launch Milestone
+
+	Improvements after robot.wtf is live with real users. None of these block launch.
+
+	## Features
+
+	### Account deletion (V7-8)
+	User can delete their account from the management UI. Deletes wiki (git repo, FAISS index), SQLite records, ACL grants. Requires typing username to confirm.
+
+	### Git clone auth
+	Read-only git clone works (V4-6) but has no auth — anyone can clone any wiki. Should require bearer token or platform JWT for private wikis.
+
+	### Multiple wikis per user
+	Currently limited to 1 wiki per user. The data model supports multiple wikis. Needs UI for wiki selection and limit increase.
+
+	### Bluesky DM alerts
+	Translate health check / disk monitoring alerts into Bluesky DMs via ATProto API. Small webhook relay.
+
+	### Wiki import
+	Import existing git repos (from GitHub, local, etc.) as a new wiki. Upload or provide a git URL.
+
+	### Attachment size limits
+	Per-file and per-wiki attachment size enforcement. Currently no limits on uploaded files.
+
+	## Infrastructure
+
+	### Proxmox CPU type
+	Change from kvm64 to host to enable numpy 2.4+ and remove the pin. Requires VM reboot. See [[Dev/Proxmox_CPU_Type]].
+
+	### Semantic search worker process
+	Move embedding from in-process daemon thread to a separate worker process. Queue-based (reindex_queue table). Eliminates FAISS corruption risk on restart. See [[Tasks/Semantic_Search_Architecture]].
+
+	### CI/CD
+	Currently deploy is `git push` + `ansible-playbook`. Add GitHub Actions for tests on PR, auto-deploy on merge to main.
+
+	### Monitoring dashboard
+	Grafana or similar for service metrics. Currently health checks are binary (up/down) with no latency or throughput visibility.
`/dev/null` .. `Tasks/Pre_Launch.md`
@@ 0,0 1,53 @@
+	---
+	category: reference
+	tags: [tasks, milestones, launch]
+	last_updated: 2026-03-15
+	confidence: high
+	---
+
+	# Pre-Launch Milestone
+
+	Work required before opening robot.wtf to the ATProto community. Everything here is either a bug, a missing feature that blocks usability, or a safety issue.
+
+	## Blocking
+
+	### MCP wiki routing (bug)
+	MCP sidecar calls the REST API without forwarding the wiki context. All reads/writes go to the default wiki. Fix in progress — see [[Tasks/MCP_Wiki_Routing]].
+
+	### Multi-tenant semantic search
+	Sync thread only indexes one wiki. Each wiki needs its own FAISS index directory and sync state. See [[Tasks/Semantic_Search_Architecture]] and [[Tasks/Semantic_Search_Multi_Tenant]].
+
+	### Disk usage cap
+	No per-wiki disk space limit. A user could fill the VPS. See [[Tasks/Disk_Usage_Cap]]. 50MB per wiki proposed.
+
+	### Management UI usability
+	The dashboard works but needs UX iteration:
+	- Wiki creation flow should default slug to username (or let user pick wiki domain at signup)
+	- MCP connection instructions need to be clearer
+	- Settings page layout needs work
+	- Consider re-enabling Otterwiki's content/editing admin settings (currently hidden by PLATFORM_MODE)
+
+	### Landing page copy
+	Draft is live at robot.wtf/. Needs tone and content review. Reframe FAQs for ATProto audience. Add actual screenshots once the UI is polished.
+
+	## Safety
+
+	### Backup verification
+	Backup cron is running (V7-4) but needs a test restore to verify it works.
+
+	### SMTP alerts test
+	Health check and disk monitoring alerts are configured but haven't been tested end-to-end. Send a test alert to verify Gmail relay works.
+
+	### Rate limiting
+	No rate limiting on any endpoint. Caddy can add this, but nothing is configured. Not critical for soft launch with a small community, but needed before wider announcement.
+
+	## Not blocking but important
+
+	### FAISS index corruption risk
+	The sync thread is a daemon thread killed without cleanup on SIGTERM. Mid-write kills could corrupt the index. Recoverable (full reindex) but slow. See [[Tasks/Semantic_Search_Architecture]].
+
+	### OAuth token refresh
+	Claude.ai token refresh hasn't been tested. If tokens expire after 1 hour and refresh fails, users lose their MCP connection.
+
+	### Stale pages in wrong wiki
+	Three Tasks/ pages were written to the _default wiki by the MCP routing bug. Need to be moved to the dev wiki after the routing fix.
`/dev/null` .. `Tasks/Semantic_Search_Architecture.md`
@@ 0,0 1,32 @@
+	---
+	category: reference
+	tags: [tasks, semantic-search, architecture]
+	last_updated: 2026-03-15
+	confidence: high
+	---
+
+	# Semantic Search Architecture Issues
+
+	## Current state
+	FAISS + ONNX MiniLM embedding, running in-process in the gunicorn worker. Works for single-tenant. 65 pages indexed for dev wiki.
+
+	## Issues to address
+
+	### 1. Multi-tenant indexing (blocking)
+	The sync thread watches one wiki (whichever storage was set at startup). TenantResolver swaps storage per-request, but the sync thread holds the original reference. Each wiki needs its own FAISS index directory and its own sync state. The reindex_all function also wipes and rebuilds the entire shared index.
+
+	Needed: Per-wiki FAISS directories (`/srv/data/faiss/{slug}/`), per-wiki sync state, sync thread that iterates over all wikis or per-wiki threads.
+
+	### 2. In-process embedding risks
+	The ONNX model (~80MB) loads in the gunicorn worker. The sync thread is a daemon thread — killed without cleanup on SIGTERM. If killed mid-write to the FAISS index, the index could corrupt (recovered by full reindex on next start, but that's slow).
+
+	Options:
+	- Separate embedding worker process (like ChromaDB was, but lighter)
+	- Queue-based: page saves write to a queue (SQLite reindex_queue table already in schema), worker process reads and embeds
+	- Graceful shutdown handler in sync thread
+
+	### 3. Sync frequency
+	Currently every 60 seconds by polling git HEAD SHA. For a multi-tenant setup with many wikis, polling every wiki every 60 seconds doesn't scale. A queue (reindex_queue table triggered by page_saved hook) would be more efficient.
+
+	## Not blocking launch
+	Semantic search works for the dev wiki. Multi-tenant indexing is needed before opening to users with multiple wikis. The in-process risks and sync frequency are optimization concerns for later.
`/dev/null` .. `Tasks/Semantic_Search_Multi_Tenant.md`
@@ 0,0 1,63 @@
+	---
+	category: reference
+	tags: [tasks, semantic-search, chromadb, multi-tenant]
+	last_updated: 2026-03-15
+	confidence: high
+	---
+
+	# Semantic Search Multi-Tenant Fix
+
+	## Problem
+
+	The otterwiki-semantic-search plugin is single-tenant. On the robot.wtf VPS:
+
+	1. One shared ChromaDB collection (`otterwiki_pages`) for all wikis. Page paths have no wiki slug prefix — two wikis with a page named "Home" would collide.
+
+	2. One sync thread started at boot, tied to the default wiki's storage object. The `TenantResolver._swap_storage()` patches `_state["storage"]` per-request, but the sync thread holds a reference to the original storage and never sees other wikis.
+
+	3. `reindex_all` wipes everything — it calls `backend.reset()` which drops and recreates the entire shared collection. Reindexing one wiki destroys the other's index.
+
+	4. No auto-index for new wikis — there's no trigger when a wiki is first accessed. The `page_saved` hook catches future saves, but won't back-fill existing pages.
+
+	5. Embedding model download — ChromaDB's default ONNX MiniLM embedding function needs to download the model on first use. The `robot` service user needs a writable cache directory (`HOME=/srv`). Even with this, the embedding silently fails and produces zero indexed documents.
+
+	## Immediate status
+
+	- ChromaDB server is running on port 8004
+	- numpy is importable (pinned <2.4.0)
+	- The plugin initializes and connects to ChromaDB
+	- But zero documents are indexed for the dev wiki
+	- Attempted manual `reindex_all` via Python — says "complete" but collection count stays 0
+
+	## Options
+
+	### Option A: Fix ChromaDB multi-tenant (more work)
+	- Per-wiki collections: `otterwiki_pages_{slug}`
+	- Per-wiki sync state: `chroma_sync_state_{slug}.json`
+	- `reindex_all` scoped to one collection
+	- Sync thread needs per-wiki awareness or one thread per wiki
+	- Changes to: otterwiki-semantic-search plugin
+
+	### Option B: Switch back to FAISS (different tradeoffs)
+	- FAISS indexes are per-directory — natural per-wiki isolation
+	- Local MiniLM embedding (no model download issue — bundled)
+	- The wikibot.io Lambda deployment already used FAISS + MiniLM
+	- The otterwiki-semantic-search plugin already has a FAISS backend (`VECTOR_BACKEND=faiss`)
+	- But FAISS needs explicit index management (build, save, load)
+	- And the Lambda deployment used Bedrock for embedding — local MiniLM needs the model on disk
+
+	### Option C: Hybrid — ChromaDB with explicit embedding function
+	- Use ChromaDB but provide our own embedding function (MiniLM loaded locally) instead of relying on ChromaDB's default ONNX embedding
+	- Solves the model download issue
+	- Still needs multi-tenant collection fix
+
+	## Decision needed
+
+	Which approach to take. The answer depends on:
+	- Is the ChromaDB embedding function the only reason reindex produces 0 results? (Debug this first)
+	- Is per-wiki FAISS simpler than per-wiki ChromaDB collections?
+	- Do we want to maintain two backends or pick one?
+
+	## Related
+	- [[Dev/Proxmox_CPU_Type]] — numpy X86_V2 issue (workaround in place)
+	- [[Design/Async_Embedding_Pipeline]] — original FAISS + MiniLM design (AWS, archived)

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9