commit 3d6378

Commit `3d6378`

2026-03-20 19:52:56 Claude (MCP): [mcp] Update Semantic Search Multi-Tenant to reflect completed implementation

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

`Tasks/Semantic_Search_Multi_Tenant.md` ..
@@ 1,60 1,34 @@
	---
	category: reference
-	tags: [tasks, semantic-search, chromadb, multi-tenant]
-	last_updated: 2026-03-15
+	tags: [tasks, semantic-search, faiss, multi-tenant]
+	last_updated: 2026-03-20
	confidence: high
	---

-	# Semantic Search Multi-Tenant Fix
+	# Semantic Search Multi-Tenant — IMPLEMENTED

-	## Problem
+	## Solution (2026-03-20)

-	The otterwiki-semantic-search plugin is single-tenant. On the robot.wtf VPS:
+	All multi-tenant issues resolved. FAISS backend with `BackendRegistry` provides per-wiki isolation.

-	1. One shared ChromaDB collection (`otterwiki_pages`) for all wikis. Page paths have no wiki slug prefix — two wikis with a page named "Home" would collide.
+	### Implementation
+	- BackendRegistry manages per-wiki FAISS backends with lazy initialization
+	- Per-wiki FAISS indexes at `/srv/data/faiss/{slug}/` — natural isolation, no slug-prefix hacking
+	- Lifecycle hooks (`page_saved`/`page_deleted`/`page_renamed`) via `HookListener` replace the single-tenant sync thread
+	- `reindex_all` is per-wiki scoped — reindexing one wiki does not affect others
+	- Auto-index on first access — existing pages are back-filled when a wiki is first accessed
+	- ONNX MiniLM-L6-v2 embeddings (ChromaDB bundled) — no model download issues
+	- ChromaDB deprecated and disabled

-	2. One sync thread started at boot, tied to the default wiki's storage object. The `TenantResolver._swap_storage()` patches `_state["storage"]` per-request, but the sync thread holds a reference to the original storage and never sees other wikis.
+	### Original problems (all resolved)

-	3. `reindex_all` wipes everything — it calls `backend.reset()` which drops and recreates the entire shared collection. Reindexing one wiki destroys the other's index.
-
-	4. No auto-index for new wikis — there's no trigger when a wiki is first accessed. The `page_saved` hook catches future saves, but won't back-fill existing pages.
-
-	5. Embedding model download — ChromaDB's default ONNX MiniLM embedding function needs to download the model on first use. The `robot` service user needs a writable cache directory (`HOME=/srv`). Even with this, the embedding silently fails and produces zero indexed documents.
-
-	## Immediate status
-
-	- ChromaDB server is running on port 8004
-	- numpy is importable (pinned <2.4.0)
-	- The plugin initializes and connects to ChromaDB
-	- But zero documents are indexed for the dev wiki
-	- Attempted manual `reindex_all` via Python — says "complete" but collection count stays 0
-
-	## Options
-
-	### Option A: Fix ChromaDB multi-tenant (more work)
-	- Per-wiki collections: `otterwiki_pages_{slug}`
-	- Per-wiki sync state: `chroma_sync_state_{slug}.json`
-	- `reindex_all` scoped to one collection
-	- Sync thread needs per-wiki awareness or one thread per wiki
-	- Changes to: otterwiki-semantic-search plugin
-
-	### Option B: Switch back to FAISS (different tradeoffs)
-	- FAISS indexes are per-directory — natural per-wiki isolation
-	- Local MiniLM embedding (no model download issue — bundled)
-	- The wikibot.io Lambda deployment already used FAISS + MiniLM
-	- The otterwiki-semantic-search plugin already has a FAISS backend (`VECTOR_BACKEND=faiss`)
-	- But FAISS needs explicit index management (build, save, load)
-	- And the Lambda deployment used Bedrock for embedding — local MiniLM needs the model on disk
-
-	### Option C: Hybrid — ChromaDB with explicit embedding function
-	- Use ChromaDB but provide our own embedding function (MiniLM loaded locally) instead of relying on ChromaDB's default ONNX embedding
-	- Solves the model download issue
-	- Still needs multi-tenant collection fix
-
-	## ~~Decision needed~~ RESOLVED (2026-03-18)
-
-	Decision: FAISS multi-tenant via `BackendRegistry`. ChromaDB deprecated and disabled. Per-wiki FAISS indexes at `/srv/data/faiss/{slug}/`. Sync thread replaced by lifecycle hooks. All issues listed above are resolved.
+	1. One shared ChromaDB collection — Replaced by per-wiki FAISS directories.
+	2. One sync thread tied to default wiki — Replaced by per-request lifecycle hooks.
+	3. `reindex_all` wipes everything — Now per-wiki scoped.
+	4. No auto-index for new wikis — Auto-reindex on first wiki access.
+	5. Embedding model download — ONNX model bundled, no download needed.

	## Related
+	- [[Tasks/Semantic_Search_Architecture]] — overall architecture
	- [[Dev/Proxmox_CPU_Type]] — numpy X86_V2 issue (workaround in place)
	- [[Design/Async_Embedding_Pipeline]] — original FAISS + MiniLM design (AWS, archived)

Commit 3d6378

Commit `3d6378`