2. Chunk page (same algorithm as otterwiki-semantic-search)
+
3. Embed chunks using all-MiniLM-L6-v2 (runs locally, no external API)
+
4. Update FAISS index + sidecar metadata on EFS
```
+
No Bedrock, no SQS, no new VPC endpoints. Total fixed cost: $0.
+
### FAISS details
FAISS (Facebook AI Similarity Search) is a C++ library with Python bindings for nearest-neighbor search over dense vectors.
**Index type**: `IndexFlatIP` (flat index, inner product similarity). For wikis under ~1000 pages, brute-force search is fast enough (<1ms) and requires no training or tuning. The index is just a matrix of vectors.
-
**Index size**: Each vector is 1536 floats × 4 bytes = 6KB. A 200-page wiki with ~3 chunks per page = 600 vectors = ~3.6MB index. Trivial to store on EFS and load into Lambda memory.
+
**Index size**: Each MiniLM vector is 384 floats × 4 bytes = 1.5KB. A 200-page wiki with ~3 chunks per page = 600 vectors = ~900KB index. Trivial to store on EFS and load into Lambda memory.
**Sidecar metadata**: FAISS stores only vectors and returns integer indices. The `embeddings.json` sidecar maps index positions back to `{page_path, chunk_index, chunk_text_preview}`. This file is loaded alongside the FAISS index.
**Search flow**:
-
1. Embed query via Bedrock (~100ms)
+
1. Embed query using MiniLM (loaded at Lambda init)
2. Load FAISS index + sidecar from EFS (~5ms, already mounted)
3. Search top K×3 vectors (~<1ms)
4. Deduplicate by page_path, keep best chunk per page