Properties
category: reference
tags: [tasks, semantic-search, architecture]
last_updated: 2026-03-20
confidence: high

Semantic Search Architecture — IMPLEMENTED

Implementation (2026-03-20)

Semantic search is fully implemented and operational. All issues listed below have been resolved.

Architecture

  • FAISS backend with IndexFlatIP, per-wiki indexes under /srv/data/faiss/{slug}/
  • ONNX MiniLM-L6-v2 embeddings (ChromaDB bundled model)
  • Multi-tenant via BackendRegistry — lazy per-wiki index creation
  • Synchronous lifecycle hooksHookListener registers page_saved, page_deleted, page_renamed hooks that trigger immediate FAISS upsert/delete. No background worker or queue.
  • Auto-reindex on first wiki access back-fills existing pages
  • reindex_all is per-wiki scoped

REST API

  • GET /api/v1/semantic-search — query semantic search
  • POST /api/v1/reindex — trigger full reindex for a wiki
  • GET /api/v1/reindex/status — check reindex progress

MCP integration

  • semantic_search MCP tool calls the REST API

Tests

  • Tests exist and pass

Resolved issues

1. Multi-tenant indexing — RESOLVED

Per-wiki FAISS directories (/srv/data/faiss/{slug}/), per-wiki state managed by BackendRegistry.

2. In-process embedding risks — RESOLVED

Synchronous lifecycle hooks replace the daemon sync thread. No risk of mid-write corruption from SIGTERM killing a background thread.

3. Sync frequency — RESOLVED

Hook-based updates are immediate on page save/delete/rename. No polling.

4. FAISS sidecar scalability — DEFERRED

Not blocking. Current corpus sizes are well within the estimated ~10K chunk threshold. Can revisit if corpus grows significantly.