--- category: reference tags: [tasks, semantic-search, chromadb, multi-tenant] last_updated: 2026-03-15 confidence: high --- # Semantic Search Multi-Tenant Fix ## Problem The otterwiki-semantic-search plugin is single-tenant. On the robot.wtf VPS: 1. **One shared ChromaDB collection** (`otterwiki_pages`) for all wikis. Page paths have no wiki slug prefix — two wikis with a page named "Home" would collide. 2. **One sync thread** started at boot, tied to the default wiki's storage object. The `TenantResolver._swap_storage()` patches `_state["storage"]` per-request, but the sync thread holds a reference to the original storage and never sees other wikis. 3. **`reindex_all` wipes everything** — it calls `backend.reset()` which drops and recreates the entire shared collection. Reindexing one wiki destroys the other's index. 4. **No auto-index for new wikis** — there's no trigger when a wiki is first accessed. The `page_saved` hook catches future saves, but won't back-fill existing pages. 5. **Embedding model download** — ChromaDB's default ONNX MiniLM embedding function needs to download the model on first use. The `robot` service user needs a writable cache directory (`HOME=/srv`). Even with this, the embedding silently fails and produces zero indexed documents. ## Immediate status - ChromaDB server is running on port 8004 - numpy is importable (pinned <2.4.0) - The plugin initializes and connects to ChromaDB - But zero documents are indexed for the dev wiki - Attempted manual `reindex_all` via Python — says "complete" but collection count stays 0 ## Options ### Option A: Fix ChromaDB multi-tenant (more work) - Per-wiki collections: `otterwiki_pages_{slug}` - Per-wiki sync state: `chroma_sync_state_{slug}.json` - `reindex_all` scoped to one collection - Sync thread needs per-wiki awareness or one thread per wiki - Changes to: otterwiki-semantic-search plugin ### Option B: Switch back to FAISS (different tradeoffs) - FAISS indexes are per-directory — natural per-wiki isolation - Local MiniLM embedding (no model download issue — bundled) - The wikibot.io Lambda deployment already used FAISS + MiniLM - The otterwiki-semantic-search plugin already has a FAISS backend (`VECTOR_BACKEND=faiss`) - But FAISS needs explicit index management (build, save, load) - And the Lambda deployment used Bedrock for embedding — local MiniLM needs the model on disk ### Option C: Hybrid — ChromaDB with explicit embedding function - Use ChromaDB but provide our own embedding function (MiniLM loaded locally) instead of relying on ChromaDB's default ONNX embedding - Solves the model download issue - Still needs multi-tenant collection fix ## Decision needed Which approach to take. The answer depends on: - Is the ChromaDB embedding function the only reason reindex produces 0 results? (Debug this first) - Is per-wiki FAISS simpler than per-wiki ChromaDB collections? - Do we want to maintain two backends or pick one? ## Related - [[Dev/Proxmox_CPU_Type]] — numpy X86_V2 issue (workaround in place) - [[Design/Async_Embedding_Pipeline]] — original FAISS + MiniLM design (AWS, archived)