Properties
category: reference tags: [tasks, semantic-search, chromadb, multi-tenant] last_updated: 2026-03-15 confidence: high
Semantic Search Multi-Tenant Fix
Problem
The otterwiki-semantic-search plugin is single-tenant. On the robot.wtf VPS:
One shared ChromaDB collection (
otterwiki_pages) for all wikis. Page paths have no wiki slug prefix — two wikis with a page named "Home" would collide.One sync thread started at boot, tied to the default wiki's storage object. The
TenantResolver._swap_storage()patches_state["storage"]per-request, but the sync thread holds a reference to the original storage and never sees other wikis.reindex_allwipes everything — it callsbackend.reset()which drops and recreates the entire shared collection. Reindexing one wiki destroys the other's index.No auto-index for new wikis — there's no trigger when a wiki is first accessed. The
page_savedhook catches future saves, but won't back-fill existing pages.Embedding model download — ChromaDB's default ONNX MiniLM embedding function needs to download the model on first use. The
robotservice user needs a writable cache directory (HOME=/srv). Even with this, the embedding silently fails and produces zero indexed documents.
Immediate status
- ChromaDB server is running on port 8004
- numpy is importable (pinned <2.4.0)
- The plugin initializes and connects to ChromaDB
- But zero documents are indexed for the dev wiki
- Attempted manual
reindex_allvia Python — says "complete" but collection count stays 0
Options
Option A: Fix ChromaDB multi-tenant (more work)
- Per-wiki collections:
otterwiki_pages_{slug} - Per-wiki sync state:
chroma_sync_state_{slug}.json reindex_allscoped to one collection- Sync thread needs per-wiki awareness or one thread per wiki
- Changes to: otterwiki-semantic-search plugin
Option B: Switch back to FAISS (different tradeoffs)
- FAISS indexes are per-directory — natural per-wiki isolation
- Local MiniLM embedding (no model download issue — bundled)
- The wikibot.io Lambda deployment already used FAISS + MiniLM
- The otterwiki-semantic-search plugin already has a FAISS backend (
VECTOR_BACKEND=faiss) - But FAISS needs explicit index management (build, save, load)
- And the Lambda deployment used Bedrock for embedding — local MiniLM needs the model on disk
Option C: Hybrid — ChromaDB with explicit embedding function
- Use ChromaDB but provide our own embedding function (MiniLM loaded locally) instead of relying on ChromaDB's default ONNX embedding
- Solves the model download issue
- Still needs multi-tenant collection fix
Decision needed
Which approach to take. The answer depends on:
- Is the ChromaDB embedding function the only reason reindex produces 0 results? (Debug this first)
- Is per-wiki FAISS simpler than per-wiki ChromaDB collections?
- Do we want to maintain two backends or pick one?
Related
- Dev/Proxmox_CPU_Type — numpy X86_V2 issue (workaround in place)
- Design/Async_Embedding_Pipeline — original FAISS + MiniLM design (AWS, archived)