Properties
category: reference
tags: [tasks, semantic-search, chromadb, multi-tenant]
last_updated: 2026-03-15
confidence: high

Semantic Search Multi-Tenant Fix

Problem

The otterwiki-semantic-search plugin is single-tenant. On the robot.wtf VPS:

  1. One shared ChromaDB collection (otterwiki_pages) for all wikis. Page paths have no wiki slug prefix — two wikis with a page named "Home" would collide.

  2. One sync thread started at boot, tied to the default wiki's storage object. The TenantResolver._swap_storage() patches _state["storage"] per-request, but the sync thread holds a reference to the original storage and never sees other wikis.

  3. reindex_all wipes everything — it calls backend.reset() which drops and recreates the entire shared collection. Reindexing one wiki destroys the other's index.

  4. No auto-index for new wikis — there's no trigger when a wiki is first accessed. The page_saved hook catches future saves, but won't back-fill existing pages.

  5. Embedding model download — ChromaDB's default ONNX MiniLM embedding function needs to download the model on first use. The robot service user needs a writable cache directory (HOME=/srv). Even with this, the embedding silently fails and produces zero indexed documents.

Immediate status

  • ChromaDB server is running on port 8004
  • numpy is importable (pinned <2.4.0)
  • The plugin initializes and connects to ChromaDB
  • But zero documents are indexed for the dev wiki
  • Attempted manual reindex_all via Python — says "complete" but collection count stays 0

Options

Option A: Fix ChromaDB multi-tenant (more work)

  • Per-wiki collections: otterwiki_pages_{slug}
  • Per-wiki sync state: chroma_sync_state_{slug}.json
  • reindex_all scoped to one collection
  • Sync thread needs per-wiki awareness or one thread per wiki
  • Changes to: otterwiki-semantic-search plugin

Option B: Switch back to FAISS (different tradeoffs)

  • FAISS indexes are per-directory — natural per-wiki isolation
  • Local MiniLM embedding (no model download issue — bundled)
  • The wikibot.io Lambda deployment already used FAISS + MiniLM
  • The otterwiki-semantic-search plugin already has a FAISS backend (VECTOR_BACKEND=faiss)
  • But FAISS needs explicit index management (build, save, load)
  • And the Lambda deployment used Bedrock for embedding — local MiniLM needs the model on disk

Option C: Hybrid — ChromaDB with explicit embedding function

  • Use ChromaDB but provide our own embedding function (MiniLM loaded locally) instead of relying on ChromaDB's default ONNX embedding
  • Solves the model download issue
  • Still needs multi-tenant collection fix

Decision needed

Which approach to take. The answer depends on:

  • Is the ChromaDB embedding function the only reason reindex produces 0 results? (Debug this first)
  • Is per-wiki FAISS simpler than per-wiki ChromaDB collections?
  • Do we want to maintain two backends or pick one?