Properties
category: spec tags: [design, semantic-search, mcp] last_updated: 2026-03-16 confidence: high
Semantic Search V2: Section-Aware Chunking and Targeted Reads
This page describes planned improvements to otterwiki-semantic-search and otterwiki-mcp to make semantic search more useful to agent consumers (Claude.ai, Claude Code). It supersedes the chunking and result format sections of Design/Semantic_Search.
See also: Tasks/Semantic_Search_Architecture (multi-tenant issues), the empirical findings on the 3GW wiki at Meta/Page Size And Search Quality.
Problem
The current semantic search pipeline has three compounding weaknesses for agent use:
Chunks straddle topic boundaries. The chunker splits on paragraph breaks (
\n\n) with no awareness of markdown headings. A 500-word section under## Russian Substitution Constraintsgets split into 3-4 chunks, and chunks near section boundaries blend content from adjacent, unrelated sections. The resulting embeddings are diluted and match poorly.Search results are truncated to 150 characters. Chunks are ~150 words, but the search API truncates the returned snippet to 150 characters — discarding ~80% of the retrieved content. The agent can't evaluate relevance from a sentence fragment, so it almost always follows up with
read_note, which loads the entire page.read_noteis all-or-nothing. Once the agent decides it needs more context than the snippet provides, the only option is loading the full page. A 4,000-word page costs ~5,000 tokens of context window to retrieve a 500-word section.
The net effect: semantic search routes to the right page but the agent pays full context cost anyway. The search step adds latency without saving tokens.
Constraints
MiniLM-L6-v2 has a 256 wordpiece token context window. Input beyond 256 tokens is silently truncated — it does not contribute to the embedding. At ~1.3 tokens per word, this means effective content per chunk is capped at ~190 words. The current TARGET_WORDS = 150 fits within this limit with room for metadata prefixes.
Any chunk size increase requires switching to a model with a longer context window (e.g., E5-small-v2 or BGE-small at 512 tokens). This design keeps MiniLM and works within the 256-token budget.
Design
Change 1: Section-aware chunking
In: otterwiki-semantic-search, chunking.py
Replace paragraph-only splitting with heading-aware splitting:
- Strip YAML frontmatter (unchanged).
- Parse the markdown into sections by splitting on heading lines (
^#{1,6}\s). Track a header stack — the path of headings from the page title down to the current section (e.g.,["Fertilizer Supply Crisis", "Russian Substitution"]). - Within each section, apply the existing paragraph-accumulation algorithm (target ~150 words, sentence-boundary fallback for oversized paragraphs).
- Hard rule: Never merge content from different sections into the same chunk. A section boundary is always a chunk boundary, even if the preceding chunk is short.
- Floor: If a section is under ~50 words, merge it with the next section at the same or deeper heading level. This prevents stub headings from producing uselessly small chunks.
- Overlap: Continue the 35-word overlap between chunks within the same section. Do not carry overlap across section boundaries — the header prefix (below) provides sufficient context bridging.
Header prefix: Prepend the header path to each chunk's text before embedding:
[Fertilizer Supply Crisis > Russian Substitution] Russia cannot substitute...
The bracketed prefix costs ~10-20 wordpiece tokens, reducing effective content to ~130 words per chunk. This is acceptable — a topically coherent 130-word chunk with a descriptive prefix produces a sharper embedding than a topically mixed 150-word chunk without one.
Chunk metadata gains a new field:
{ "page_path": "Trends/Fertilizer Supply Crisis", "chunk_index": 2, "section": "Russian Substitution", "section_path": ["Fertilizer Supply Crisis", "Russian Substitution"], "title": "Fertilizer Supply Crisis", "category": "trend", "tags": "economics, agriculture" }
Change 2: Return full chunk text in search results
In: otterwiki-semantic-search, index.py
Remove the 150-character snippet truncation. The search API returns the full chunk text (~150 words) as the snippet field. This is small enough to be cheap in context and large enough to evaluate relevance without a follow-up read.
Response format (additions in bold):
{ "query": "Russian fertilizer substitution", "results": [ { "name": "Trends/Fertilizer Supply Crisis", "path": "Trends/Fertilizer Supply Crisis", "snippet": "[Fertilizer Supply Crisis > Russian Substitution] Russia cannot substitute...(full ~150 words)...", "distance": 0.42, "section": "Russian Substitution", "section_path": ["Fertilizer Supply Crisis", "Russian Substitution"], "chunk_index": 2, "total_chunks": 12, "page_word_count": 4188 } ], "total": 1 }
New fields:
section/section_path— where in the page this chunk lives. Gives the agent a handle for targeted reads (Change 4).chunk_index/total_chunks— positional context.page_word_count— lets the agent estimate the context cost of a fullread_noteand decide whether it's worth it.
Change 3: Configurable per-page deduplication
In: otterwiki-semantic-search, index.py
Currently the search deduplicates to one chunk per page. This is too aggressive — if three sections of a page are relevant, the agent sees only one.
Add a max_chunks_per_page parameter (default 2, max 5) to the search API:
GET /api/v1/semantic-search?q=economic+transmission&n=5&max_chunks_per_page=3
The deduplication logic changes from "keep best chunk per page" to "keep best N chunks per page." Total results are still capped at n.
Default of 2 balances breadth (seeing multiple pages) against depth (seeing multiple sections of the most relevant page).
Change 4: Section-level read via MCP
In: otterwiki-mcp (or otterwiki-api REST plugin)
Add a section parameter to the read_note MCP tool:
read_note(path="Trends/Fertilizer Supply Crisis", section="Russian Substitution")
Behavior:
- Load the full page content.
- Parse markdown headings into a tree.
- Find the section matching the
sectionparameter. Match against heading text, case-insensitive. If ambiguous (multiple headings with the same text), accept a/-delimited path:"Country Dependencies/Pakistan". - Return everything from the matched heading to the next heading at the same or higher level.
- Include the heading itself in the returned content.
- If no match, return an error listing available sections (so the agent can retry with the correct name).
Why in the MCP layer, not the REST API: The REST API serves multiple consumers. Section-level reads are an agent UX optimization — the MCP tool can implement it by fetching the full page from the REST API and slicing locally. This avoids adding complexity to the API surface.
Alternative considered: Returning multiple sections in one call (e.g., sections=["Russian Substitution", "Planting Window"]). Deferred — the common case is one section per call, and multiple calls are cheap.
Agent workflow after these changes
semantic_search("Russian fertilizer constraints")— returns 5 results with full chunk text, section paths, and page word counts.- Agent reads the snippets. Two chunks from
Fertilizer Supply Crisisare relevant (sections "Russian Substitution" and "Priority Queue"). One chunk fromP4 Economic Transmissionis relevant. - For the 500-word sections where the 150-word snippet isn't enough:
read_note("Trends/Fertilizer Supply Crisis", section="Russian Substitution")— returns ~500 words instead of ~4,200. - Agent has the context it needs. Total cost: ~1,500 tokens (5 snippets + 1 section read) vs. ~6,500 tokens today (5 truncated snippets + 1 full page load that's mostly irrelevant).
Implementation scope
| Change | Repo | Files | Complexity |
|---|---|---|---|
| Section-aware chunking | otterwiki-semantic-search | chunking.py, tests |
Medium — new heading parser, preserve existing paragraph logic within sections |
| Full chunk text + metadata | otterwiki-semantic-search | index.py, routes.py, tests |
Low — remove truncation, add fields to response |
| Configurable dedup | otterwiki-semantic-search | index.py, routes.py, tests |
Low — parameterize existing logic |
| Section-level read | otterwiki-mcp | MCP tool definition, markdown parser | Medium — heading tree parser, error handling for ambiguous matches |
All changes are backward-compatible. Existing consumers see richer results but don't break. The section parameter on read_note is optional.
Reindexing: Changes 1-3 require a full reindex after deployment. The new chunk boundaries and metadata fields are only populated for newly indexed content. POST /api/v1/reindex handles this.
What this design does NOT address
- Embedding model upgrade. MiniLM-L6-v2's 256-token window is a real constraint but adequate for ~150-word chunks with header prefixes. A model upgrade (to 512-token context) would allow larger chunks and is worth evaluating separately.
- Multi-tenant indexing. Tracked in Tasks/Semantic_Search_Architecture and Tasks/Semantic_Search_Multi_Tenant. Orthogonal to this work.
- In-process embedding risks. The ONNX model in the gunicorn worker and daemon thread shutdown are operational concerns, not search quality concerns. Tracked in Tasks/Semantic_Search_Architecture.