commit 4a820e

Commit `4a820e`

2026-03-20 19:52:49 Claude (MCP): [mcp] Update Semantic Search Architecture to reflect completed implementation

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

`Tasks/Semantic_Search_Architecture.md` ..
@@ 1,41 1,49 @@
	---
	category: reference
	tags: [tasks, semantic-search, architecture]
-	last_updated: 2026-03-15
+	last_updated: 2026-03-20
	confidence: high
	---

-	# Semantic Search Architecture Issues
+	# Semantic Search Architecture — IMPLEMENTED

-	## Current state (updated 2026-03-18)
-	FAISS + ONNX MiniLM embedding, running in-process in the gunicorn worker. Multi-tenant via `BackendRegistry` — per-wiki FAISS indexes at `/srv/data/faiss/{slug}/`. ChromaDB deprecated and disabled. Sync thread replaced by lifecycle hooks (`page_saved`/`page_deleted`/`page_renamed`). `reindex_all` per-wiki scoped. Auto-reindex on first wiki access.
+	## Implementation (2026-03-20)

-	## Issues to address
+	Semantic search is fully implemented and operational. All issues listed below have been resolved.

-	### 1. Multi-tenant indexing (blocking)
-	The sync thread watches one wiki (whichever storage was set at startup). TenantResolver swaps storage per-request, but the sync thread holds the original reference. Each wiki needs its own FAISS index directory and its own sync state. The reindex_all function also wipes and rebuilds the entire shared index.
+	### Architecture
+	- FAISS backend with `IndexFlatIP`, per-wiki indexes under `/srv/data/faiss/{slug}/`
+	- ONNX MiniLM-L6-v2 embeddings (ChromaDB bundled model)
+	- Multi-tenant via `BackendRegistry` — lazy per-wiki index creation
+	- Synchronous lifecycle hooks — `HookListener` registers `page_saved`, `page_deleted`, `page_renamed` hooks that trigger immediate FAISS upsert/delete. No background worker or queue.
+	- Auto-reindex on first wiki access back-fills existing pages
+	- `reindex_all` is per-wiki scoped

-	Needed: Per-wiki FAISS directories (`/srv/data/faiss/{slug}/`), per-wiki sync state, sync thread that iterates over all wikis or per-wiki threads.
+	### REST API
+	- `GET /api/v1/semantic-search` — query semantic search
+	- `POST /api/v1/reindex` — trigger full reindex for a wiki
+	- `GET /api/v1/reindex/status` — check reindex progress

-	### 2. In-process embedding risks
-	The ONNX model (~80MB) loads in the gunicorn worker. The sync thread is a daemon thread — killed without cleanup on SIGTERM. If killed mid-write to the FAISS index, the index could corrupt (recovered by full reindex on next start, but that's slow).
+	### MCP integration
+	- `semantic_search` MCP tool calls the REST API

-	Options:
-	- Separate embedding worker process (like ChromaDB was, but lighter)
-	- Queue-based: page saves write to a queue (SQLite reindex_queue table already in schema), worker process reads and embeds
-	- Graceful shutdown handler in sync thread
+	### Tests
+	- Tests exist and pass

-	### 3. Sync frequency
-	Currently every 60 seconds by polling git HEAD SHA. For a multi-tenant setup with many wikis, polling every wiki every 60 seconds doesn't scale. A queue (reindex_queue table triggered by page_saved hook) would be more efficient.
+	## Resolved issues

-	### 4. FAISS sidecar scalability
-	The FAISS backend stores all chunk metadata in a JSON sidecar file (`embeddings.json`) alongside the binary index. The sidecar is loaded fully into memory on startup and re-serialized on every upsert/delete. With Semantic Search V2, new metadata fields (`section`, `section_path`, `page_word_count`, `total_chunks`) add ~160 bytes per chunk, roughly doubling the sidecar size (~140 → ~300 bytes/chunk).
+	### 1. Multi-tenant indexing — RESOLVED
+	Per-wiki FAISS directories (`/srv/data/faiss/{slug}/`), per-wiki state managed by `BackendRegistry`.

-	Investigate:
-	- At what corpus size does sidecar I/O become a bottleneck? (Estimated threshold: ~10K chunks / ~3MB sidecar)
-	- For multi-tenant with many wikis, each loading its own sidecar at startup, what is the aggregate memory and startup time cost?
-	- Should chunk text be stored in the sidecar at all? (It duplicates embedded data — removing it would cut sidecar size significantly)
-	- Alternative: move metadata to SQLite (already in schema as reindex_queue) for indexed access instead of full-file load/save
+	### 2. In-process embedding risks — RESOLVED
+	Synchronous lifecycle hooks replace the daemon sync thread. No risk of mid-write corruption from SIGTERM killing a background thread.

-	## Not blocking launch
-	Semantic search works for the dev wiki. Multi-tenant indexing is needed before opening to users with multiple wikis. The in-process risks, sync frequency, and sidecar scalability are optimization concerns for later.
+	### 3. Sync frequency — RESOLVED
+	Hook-based updates are immediate on page save/delete/rename. No polling.
+
+	### 4. FAISS sidecar scalability — DEFERRED
+	Not blocking. Current corpus sizes are well within the estimated ~10K chunk threshold. Can revisit if corpus grows significantly.
+
+	## Related
+	- [[Tasks/Semantic_Search_Multi_Tenant]] — multi-tenant implementation details
+	- [[Design/Async_Embedding_Pipeline]] — original FAISS + MiniLM design (AWS, archived)

Commit 4a820e

Commit `4a820e`