The otterwiki-semantic-search plugin is single-tenant. On the robot.wtf VPS:
+
All multi-tenant issues resolved. FAISS backend with `BackendRegistry` provides per-wiki isolation.
-
1. **One shared ChromaDB collection** (`otterwiki_pages`) for all wikis. Page paths have no wiki slug prefix — two wikis with a page named "Home" would collide.
+
### Implementation
+
- **BackendRegistry** manages per-wiki FAISS backends with lazy initialization
+
- **Per-wiki FAISS indexes** at `/srv/data/faiss/{slug}/` — natural isolation, no slug-prefix hacking
+
- **Lifecycle hooks** (`page_saved`/`page_deleted`/`page_renamed`) via `HookListener` replace the single-tenant sync thread
+
- **`reindex_all` is per-wiki scoped** — reindexing one wiki does not affect others
+
- **Auto-index on first access** — existing pages are back-filled when a wiki is first accessed
+
- **ONNX MiniLM-L6-v2 embeddings** (ChromaDB bundled) — no model download issues
+
- ChromaDB deprecated and disabled
-
2. **One sync thread** started at boot, tied to the default wiki's storage object. The `TenantResolver._swap_storage()` patches `_state["storage"]` per-request, but the sync thread holds a reference to the original storage and never sees other wikis.
+
### Original problems (all resolved)
-
3. **`reindex_all` wipes everything** — it calls `backend.reset()` which drops and recreates the entire shared collection. Reindexing one wiki destroys the other's index.
-
-
4. **No auto-index for new wikis** — there's no trigger when a wiki is first accessed. The `page_saved` hook catches future saves, but won't back-fill existing pages.
-
-
5. **Embedding model download** — ChromaDB's default ONNX MiniLM embedding function needs to download the model on first use. The `robot` service user needs a writable cache directory (`HOME=/srv`). Even with this, the embedding silently fails and produces zero indexed documents.
-
-
## Immediate status
-
-
- ChromaDB server is running on port 8004
-
- numpy is importable (pinned <2.4.0)
-
- The plugin initializes and connects to ChromaDB
-
- But zero documents are indexed for the dev wiki
-
- Attempted manual `reindex_all` via Python — says "complete" but collection count stays 0
- Sync thread needs per-wiki awareness or one thread per wiki
-
- Changes to: otterwiki-semantic-search plugin
-
-
### Option B: Switch back to FAISS (different tradeoffs)
-
- FAISS indexes are per-directory — natural per-wiki isolation
-
- Local MiniLM embedding (no model download issue — bundled)
-
- The wikibot.io Lambda deployment already used FAISS + MiniLM
-
- The otterwiki-semantic-search plugin already has a FAISS backend (`VECTOR_BACKEND=faiss`)
-
- But FAISS needs explicit index management (build, save, load)
-
- And the Lambda deployment used Bedrock for embedding — local MiniLM needs the model on disk
-
-
### Option C: Hybrid — ChromaDB with explicit embedding function
-
- Use ChromaDB but provide our own embedding function (MiniLM loaded locally) instead of relying on ChromaDB's default ONNX embedding
-
- Solves the model download issue
-
- Still needs multi-tenant collection fix
-
-
## ~~Decision needed~~ RESOLVED (2026-03-18)
-
-
**Decision: FAISS multi-tenant via `BackendRegistry`.** ChromaDB deprecated and disabled. Per-wiki FAISS indexes at `/srv/data/faiss/{slug}/`. Sync thread replaced by lifecycle hooks. All issues listed above are resolved.