Commit 6617aa

2026-03-14 20:07:11 Claude (MCP): [mcp] Update Data_Model: replace Bedrock/SQS embedding pipeline with DynamoDB Streams + MiniLM
Design/Data_Model.md ..
@@ 101,35 101,38 @@
---
- ## Semantic Search (Premium)
+ ## Semantic Search
- ### Embedding pipeline
+ Semantic search is available to all users (not tier-gated). See [[Design/Async_Embedding_Pipeline]] for the full architecture.
+
+ ### Embedding pipeline (summary)
```
- Page write (Lambda)
- → SQS message: {user, wiki, page_path, action: "upsert" | "delete"}
- → Embedding Lambda (triggered by SQS):
+ Page write (wiki Lambda, VPC)
+ → DynamoDB write to ReindexQueue table (free gateway endpoint, already deployed)
+ → DynamoDB Streams captures the change
+ → Lambda service polls the stream (outside function's VPC context)
+ → Embedding Lambda (VPC, EFS mount):
1. Read page content from EFS repo
- 2. Chunk page (same algorithm as existing otterwiki-semantic-search)
- 3. Call Bedrock titan-embed-text-v2 for each chunk
- 4. Load current FAISS index from EFS
- 5. Update index (remove old vectors for page, add new ones)
- 6. Write updated index to EFS
- 7. Update embeddings.json sidecar (page_path → chunk vectors mapping)
+ 2. Chunk page (same algorithm as otterwiki-semantic-search)
+ 3. Embed chunks using all-MiniLM-L6-v2 (runs locally, no external API)
+ 4. Update FAISS index + sidecar metadata on EFS
```
+ No Bedrock, no SQS, no new VPC endpoints. Total fixed cost: $0.
+
### FAISS details
FAISS (Facebook AI Similarity Search) is a C++ library with Python bindings for nearest-neighbor search over dense vectors.
**Index type**: `IndexFlatIP` (flat index, inner product similarity). For wikis under ~1000 pages, brute-force search is fast enough (<1ms) and requires no training or tuning. The index is just a matrix of vectors.
- **Index size**: Each vector is 1536 floats × 4 bytes = 6KB. A 200-page wiki with ~3 chunks per page = 600 vectors = ~3.6MB index. Trivial to store on EFS and load into Lambda memory.
+ **Index size**: Each MiniLM vector is 384 floats × 4 bytes = 1.5KB. A 200-page wiki with ~3 chunks per page = 600 vectors = ~900KB index. Trivial to store on EFS and load into Lambda memory.
**Sidecar metadata**: FAISS stores only vectors and returns integer indices. The `embeddings.json` sidecar maps index positions back to `{page_path, chunk_index, chunk_text_preview}`. This file is loaded alongside the FAISS index.
**Search flow**:
- 1. Embed query via Bedrock (~100ms)
+ 1. Embed query using MiniLM (loaded at Lambda init)
2. Load FAISS index + sidecar from EFS (~5ms, already mounted)
3. Search top K×3 vectors (~<1ms)
4. Deduplicate by page_path, keep best chunk per page
@@ 137,10 140,10 @@
### Cost estimate
- - Embedding a 200-page wiki: ~$0.02 (one-time)
- - Per search query: ~$0.0001 (embed the query)
- - 100 queries/day: ~$0.30/month
- - Re-embedding on page edits: negligible
+ - Embedding a 200-page wiki: effectively $0 (Lambda compute only, ~seconds)
+ - Per search query: $0 (MiniLM runs locally)
+ - Re-embedding on page edits: negligible (DynamoDB write + Lambda invocation)
+ - VPC endpoints: $0 (uses existing DynamoDB gateway endpoint)
---
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9