commit 6617aa

Commit `6617aa`

2026-03-14 20:07:11 Claude (MCP): [mcp] Update Data_Model: replace Bedrock/SQS embedding pipeline with DynamoDB Streams + MiniLM

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

`Design/Data_Model.md` ..
@@ 101,35 101,38 @@

	---

-	## Semantic Search (Premium)
+	## Semantic Search

-	### Embedding pipeline
+	Semantic search is available to all users (not tier-gated). See [[Design/Async_Embedding_Pipeline]] for the full architecture.
+
+	### Embedding pipeline (summary)

	```
-	Page write (Lambda)
-	→ SQS message: {user, wiki, page_path, action: "upsert" \| "delete"}
-	→ Embedding Lambda (triggered by SQS):
+	Page write (wiki Lambda, VPC)
+	→ DynamoDB write to ReindexQueue table (free gateway endpoint, already deployed)
+	→ DynamoDB Streams captures the change
+	→ Lambda service polls the stream (outside function's VPC context)
+	→ Embedding Lambda (VPC, EFS mount):
	1. Read page content from EFS repo
-	2. Chunk page (same algorithm as existing otterwiki-semantic-search)
-	3. Call Bedrock titan-embed-text-v2 for each chunk
-	4. Load current FAISS index from EFS
-	5. Update index (remove old vectors for page, add new ones)
-	6. Write updated index to EFS
-	7. Update embeddings.json sidecar (page_path → chunk vectors mapping)
+	2. Chunk page (same algorithm as otterwiki-semantic-search)
+	3. Embed chunks using all-MiniLM-L6-v2 (runs locally, no external API)
+	4. Update FAISS index + sidecar metadata on EFS
	```

+	No Bedrock, no SQS, no new VPC endpoints. Total fixed cost: $0.
+
	### FAISS details

	FAISS (Facebook AI Similarity Search) is a C++ library with Python bindings for nearest-neighbor search over dense vectors.

	Index type: `IndexFlatIP` (flat index, inner product similarity). For wikis under ~1000 pages, brute-force search is fast enough (<1ms) and requires no training or tuning. The index is just a matrix of vectors.

-	Index size: Each vector is 1536 floats × 4 bytes = 6KB. A 200-page wiki with ~3 chunks per page = 600 vectors = ~3.6MB index. Trivial to store on EFS and load into Lambda memory.
+	Index size: Each MiniLM vector is 384 floats × 4 bytes = 1.5KB. A 200-page wiki with ~3 chunks per page = 600 vectors = ~900KB index. Trivial to store on EFS and load into Lambda memory.

	Sidecar metadata: FAISS stores only vectors and returns integer indices. The `embeddings.json` sidecar maps index positions back to `{page_path, chunk_index, chunk_text_preview}`. This file is loaded alongside the FAISS index.

	Search flow:
-	1. Embed query via Bedrock (~100ms)
+	1. Embed query using MiniLM (loaded at Lambda init)
	2. Load FAISS index + sidecar from EFS (~5ms, already mounted)
	3. Search top K×3 vectors (~<1ms)
	4. Deduplicate by page_path, keep best chunk per page
@@ 137,10 140,10 @@

	### Cost estimate

-	- Embedding a 200-page wiki: ~$0.02 (one-time)
-	- Per search query: ~$0.0001 (embed the query)
-	- 100 queries/day: ~$0.30/month
-	- Re-embedding on page edits: negligible
+	- Embedding a 200-page wiki: effectively $0 (Lambda compute only, ~seconds)
+	- Per search query: $0 (MiniLM runs locally)
+	- Re-embedding on page edits: negligible (DynamoDB write + Lambda invocation)
+	- VPC endpoints: $0 (uses existing DynamoDB gateway endpoint)

	---

Commit 6617aa

Commit `6617aa`