Properties

category: reference
tags: [tasks, semantic-search, architecture]
last_updated: 2026-03-15
confidence: high

Semantic Search Architecture Issues

Current state

FAISS + ONNX MiniLM embedding, running in-process in the gunicorn worker. Works for single-tenant. 65 pages indexed for dev wiki.

Issues to address

1. Multi-tenant indexing (blocking)

The sync thread watches one wiki (whichever storage was set at startup). TenantResolver swaps storage per-request, but the sync thread holds the original reference. Each wiki needs its own FAISS index directory and its own sync state. The reindex_all function also wipes and rebuilds the entire shared index.

Needed: Per-wiki FAISS directories (/srv/data/faiss/{slug}/), per-wiki sync state, sync thread that iterates over all wikis or per-wiki threads.

2. In-process embedding risks

The ONNX model (~80MB) loads in the gunicorn worker. The sync thread is a daemon thread — killed without cleanup on SIGTERM. If killed mid-write to the FAISS index, the index could corrupt (recovered by full reindex on next start, but that's slow).

Options:

Separate embedding worker process (like ChromaDB was, but lighter)
Queue-based: page saves write to a queue (SQLite reindex_queue table already in schema), worker process reads and embeds
Graceful shutdown handler in sync thread

3. Sync frequency

Currently every 60 seconds by polling git HEAD SHA. For a multi-tenant setup with many wikis, polling every wiki every 60 seconds doesn't scale. A queue (reindex_queue table triggered by page_saved hook) would be more efficient.

4. FAISS sidecar scalability

The FAISS backend stores all chunk metadata in a JSON sidecar file (embeddings.json) alongside the binary index. The sidecar is loaded fully into memory on startup and re-serialized on every upsert/delete. With Semantic Search V2, new metadata fields (section, section_path, page_word_count, total_chunks) add ~160 bytes per chunk, roughly doubling the sidecar size (~140 → ~300 bytes/chunk).

Investigate:

At what corpus size does sidecar I/O become a bottleneck? (Estimated threshold: ~10K chunks / ~3MB sidecar)
For multi-tenant with many wikis, each loading its own sidecar at startup, what is the aggregate memory and startup time cost?
Should chunk text be stored in the sidecar at all? (It duplicates embedded data — removing it would cut sidecar size significantly)
Alternative: move metadata to SQLite (already in schema as reindex_queue) for indexed access instead of full-file load/save

Not blocking launch

Semantic search works for the dev wiki. Multi-tenant indexing is needed before opening to users with multiple wikis. The in-process risks, sync frequency, and sidecar scalability are optimization concerns for later.