Blame

0e123c Claude (MCP) 2026-03-13 17:49:46
[mcp] Normalize spaces to underscores
1
This page is part of the **wikibot.io PRD** (Product Requirements Document). See also: [[Design/Platform_Overview]], [[Design/Auth]], [[Design/Implementation_Phases]], [[Design/Operations]].
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
2
3
---
4
5
## Data Model
6
8c343d Claude (MCP) 2026-03-15 01:19:10
[mcp] Add superseded banner to Data_Model
7
> **Superseded.** This page describes the DynamoDB/EFS data model for wikibot.io. See [[Design/VPS_Architecture]] for the current plan (SQLite, local disk). The ACL model and storage layout concepts carry forward; the DynamoDB-specific schema does not.
8
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
9
DynamoDB tables. Partition keys noted in comments.
10
11
#### Users
12
13
```
14
User {
15
id: string, // platform-generated (UUID)
16
email: string,
17
display_name: string,
18
oauth_provider: string, // "google" | "github" | "microsoft" | "apple"
19
oauth_provider_sub: string, // provider-native subject ID (e.g., Google sub claim)
20
// GSI on (oauth_provider, oauth_provider_sub) for login lookup
21
// Critical: enables migration off WorkOS or any auth provider
22
created_at: ISO8601,
23
wiki_count: number,
24
stripe_customer_id?: string
25
}
26
```
27
0ff4a4 Claude (MCP) 2026-03-14 20:39:44
[mcp] Update User model: note that tier/wiki_limit fields depend on pricing model choice"
28
Note: the User model is deliberately thin on pricing fields. Under Option A (flat tier), add `tier: "free" | "premium"` and `wiki_limit: number`. Under Option B (per-wiki), no tier field is needed — billing state lives on each Wiki record. See [[Design/Implementation_Phases]] for pricing options.
29
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
30
#### Wikis
31
32
```
33
Wiki {
34
owner_id: string, // User.id
fd2c12 Claude (MCP) 2026-03-14 20:39:20
[mcp] Update Wiki model: add payment_status, custom_slug, remove deferred feature fields"
35
wiki_slug: string, // URL-safe identifier (under user namespace)
36
custom_slug?: string, // paid wikis: top-level slug for {slug}.wikibot.io
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
37
display_name: string,
38
repo_path: string, // EFS path: /mnt/efs/{user_id}/{wiki_slug}/repo.git
072e7a Claude (MCP) 2026-03-14 20:07:47
[mcp] Update Data_Model: Wiki model - FAISS index path not premium-gated, remove semantic_search_enabled flag"
39
index_path?: string, // FAISS index location (on EFS alongside repo)
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
40
mcp_token_hash: string, // bcrypt hash of MCP bearer token
41
is_public: boolean, // read-only public access
fd2c12 Claude (MCP) 2026-03-14 20:39:20
[mcp] Update Wiki model: add payment_status, custom_slug, remove deferred feature fields"
42
is_paid: boolean, // whether this wiki requires payment (i.e., not the free wiki)
43
payment_status: "active" | "lapsed" | "free",
44
// free = the user's one free wiki
45
// active = paid and current
46
// lapsed = payment failed/canceled → read-only, MCP disabled
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
47
created_at: ISO8601,
48
last_accessed: ISO8601,
49
page_count: number,
50
}
51
```
52
53
#### ACLs
54
55
```
56
ACL {
57
wiki_id: string, // owner_id + wiki_slug
58
grantee_id: string, // User.id
59
role: "owner" | "editor" | "viewer",
60
granted_by: string,
61
granted_at: ISO8601
62
}
63
```
64
65
### Storage layout (EFS)
66
67
```
68
/mnt/efs/
69
{user_id}/
70
{wiki_slug}/
71
repo.git/ # bare git repo — persistent filesystem
452499 Claude (MCP) 2026-03-14 20:07:40
[mcp] Update Data_Model: FAISS index no longer premium-only"
72
index.faiss # FAISS vector index
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
73
embeddings.json # page_path → vector mapping
74
```
75
76
---
77
78
## Git Storage Mechanics
79
80
### EFS-backed git repos
81
82
Each wiki's bare git repo lives on a persistent filesystem mounted by the compute layer. No clone/push cycle, no caching, no locks — git operations happen directly on disk.
83
84
**Read path:**
85
```
86
1. Lambda mounts EFS (already attached in VPC)
87
2. Open bare repo at /mnt/efs/{user}/{wiki}/repo.git
88
3. Read page from repo
89
```
90
91
**Write path:**
92
```
93
1. Open bare repo at /mnt/efs/{user}/{wiki}/repo.git
94
2. Commit page change
bb30cb Claude (MCP) 2026-03-14 20:07:28
[mcp] Update Data_Model: write path references DynamoDB reindex queue instead of SQS
95
3. Write reindex record to DynamoDB ReindexQueue table
96
(triggers embedding Lambda via DynamoDB Streams — see Semantic Search section)
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
97
```
98
99
**Concurrency**: NFS handles file-level locking natively. Git's own locking (`index.lock`) works correctly on NFS. Concurrent reads are unlimited. Concurrent writes to the same repo are serialized by git's lock file. No application-level locking needed.
100
101
**Consistency**: Writes are immediately visible to all Lambda invocations mounting the same EFS filesystem. No eventual consistency concerns.
102
103
### Fallback: S3 clone-on-demand
104
105
If Phase 0 testing shows EFS latency or VPC cold starts are unacceptable, fall back to S3-based repos with a DynamoDB write lock + clone-to-/tmp pattern. This adds significant complexity (locking, cache management, /tmp eviction) and is only worth pursuing if EFS fails testing.
106
107
---
108
6617aa Claude (MCP) 2026-03-14 20:07:11
[mcp] Update Data_Model: replace Bedrock/SQS embedding pipeline with DynamoDB Streams + MiniLM
109
## Semantic Search
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
110
6617aa Claude (MCP) 2026-03-14 20:07:11
[mcp] Update Data_Model: replace Bedrock/SQS embedding pipeline with DynamoDB Streams + MiniLM
111
Semantic search is available to all users (not tier-gated). See [[Design/Async_Embedding_Pipeline]] for the full architecture.
112
113
### Embedding pipeline (summary)
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
114
115
```
6617aa Claude (MCP) 2026-03-14 20:07:11
[mcp] Update Data_Model: replace Bedrock/SQS embedding pipeline with DynamoDB Streams + MiniLM
116
Page write (wiki Lambda, VPC)
117
→ DynamoDB write to ReindexQueue table (free gateway endpoint, already deployed)
118
→ DynamoDB Streams captures the change
119
→ Lambda service polls the stream (outside function's VPC context)
120
→ Embedding Lambda (VPC, EFS mount):
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
121
1. Read page content from EFS repo
6617aa Claude (MCP) 2026-03-14 20:07:11
[mcp] Update Data_Model: replace Bedrock/SQS embedding pipeline with DynamoDB Streams + MiniLM
122
2. Chunk page (same algorithm as otterwiki-semantic-search)
123
3. Embed chunks using all-MiniLM-L6-v2 (runs locally, no external API)
124
4. Update FAISS index + sidecar metadata on EFS
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
125
```
126
6617aa Claude (MCP) 2026-03-14 20:07:11
[mcp] Update Data_Model: replace Bedrock/SQS embedding pipeline with DynamoDB Streams + MiniLM
127
No Bedrock, no SQS, no new VPC endpoints. Total fixed cost: $0.
128
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
129
### FAISS details
130
131
FAISS (Facebook AI Similarity Search) is a C++ library with Python bindings for nearest-neighbor search over dense vectors.
132
133
**Index type**: `IndexFlatIP` (flat index, inner product similarity). For wikis under ~1000 pages, brute-force search is fast enough (<1ms) and requires no training or tuning. The index is just a matrix of vectors.
134
6617aa Claude (MCP) 2026-03-14 20:07:11
[mcp] Update Data_Model: replace Bedrock/SQS embedding pipeline with DynamoDB Streams + MiniLM
135
**Index size**: Each MiniLM vector is 384 floats × 4 bytes = 1.5KB. A 200-page wiki with ~3 chunks per page = 600 vectors = ~900KB index. Trivial to store on EFS and load into Lambda memory.
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
136
137
**Sidecar metadata**: FAISS stores only vectors and returns integer indices. The `embeddings.json` sidecar maps index positions back to `{page_path, chunk_index, chunk_text_preview}`. This file is loaded alongside the FAISS index.
138
139
**Search flow**:
6617aa Claude (MCP) 2026-03-14 20:07:11
[mcp] Update Data_Model: replace Bedrock/SQS embedding pipeline with DynamoDB Streams + MiniLM
140
1. Embed query using MiniLM (loaded at Lambda init)
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
141
2. Load FAISS index + sidecar from EFS (~5ms, already mounted)
142
3. Search top K×3 vectors (~<1ms)
143
4. Deduplicate by page_path, keep best chunk per page
144
5. Return top K results with page paths and matching chunk snippets
145
146
### Cost estimate
147
6617aa Claude (MCP) 2026-03-14 20:07:11
[mcp] Update Data_Model: replace Bedrock/SQS embedding pipeline with DynamoDB Streams + MiniLM
148
- Embedding a 200-page wiki: effectively $0 (Lambda compute only, ~seconds)
149
- Per search query: $0 (MiniLM runs locally)
150
- Re-embedding on page edits: negligible (DynamoDB write + Lambda invocation)
151
- VPC endpoints: $0 (uses existing DynamoDB gateway endpoint)
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
152
153
---
154
155
## URL Structure
156
157
Each user gets a subdomain: `{username}.wikibot.io`
158
159
```
160
sderle.wikibot.io/ → user's wiki list (dashboard)
193c7c Claude (MCP) 2026-03-14 20:39:33
[mcp] Update URL structure: paid wikis get top-level custom slugs"
161
sderle.wikibot.io/third-gulf-war/ → wiki web UI (free wiki, under user namespace)
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
162
sderle.wikibot.io/third-gulf-war/api/v1/ → wiki REST API
163
sderle.wikibot.io/third-gulf-war/mcp → wiki MCP endpoint
164
```
165
193c7c Claude (MCP) 2026-03-14 20:39:33
[mcp] Update URL structure: paid wikis get top-level custom slugs"
166
### Custom slugs (paid wikis)
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
167
193c7c Claude (MCP) 2026-03-14 20:39:33
[mcp] Update URL structure: paid wikis get top-level custom slugs"
168
Paid wikis get a top-level slug: `{slug}.wikibot.io`. This is a vanity URL that routes directly to the wiki without the username prefix. The slug is chosen at wiki creation time and must be globally unique (same validation rules as usernames: lowercase alphanumeric + hyphens, 3–30 characters, drawn from the same namespace/blocklist).
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
169
170
```
193c7c Claude (MCP) 2026-03-14 20:39:33
[mcp] Update URL structure: paid wikis get top-level custom slugs"
171
third-gulf-war.wikibot.io/ → wiki web UI (paid wiki, top-level slug)
172
third-gulf-war.wikibot.io/api/v1/ → wiki REST API
173
third-gulf-war.wikibot.io/mcp → wiki MCP endpoint
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
174
```
175
193c7c Claude (MCP) 2026-03-14 20:39:33
[mcp] Update URL structure: paid wikis get top-level custom slugs"
176
The user-namespace URL (`sderle.wikibot.io/third-gulf-war/`) continues to work as a redirect. This means existing MCP connections and bookmarks survive if a free wiki is later upgraded to paid.
177
178
Implementation: the Lambda resolver checks the subdomain against the Wikis table's `custom_slug` GSI first, then falls back to username resolution.
5f53cb Claude (Dev) 2026-03-13 01:49:23
[mcp] Port PRD data model to wiki
179
180
---
181
182
## Usernames
183
184
Each user chooses a username at signup (after OAuth). Usernames are URL-critical (`{username}.wikibot.io`) so they must be:
185
186
- **URL-safe**: lowercase alphanumeric + hyphens, 3–30 characters, no leading/trailing hyphens
187
- **Unique**: enforced in DynamoDB
188
- **Immutable** (MVP): changing usernames means changing URLs, which breaks MCP connections, Git remotes, bookmarks. Defer username changes (with redirect support) to a future iteration.
189
- **Reserved**: block names that conflict with platform routes or look official: `admin`, `www`, `api`, `auth`, `mcp`, `app`, `help`, `support`, `billing`, `status`, `blog`, `docs`, `robot`, `wiki`, `static`, `assets`, `null`, `undefined`, etc. Maintain a blocklist.
190
191
### Username squatting
192
193
Free accounts cost nothing to create, so squatting is possible. Mitigations:
194
- Require at least one wiki with at least one page edit within 90 days of signup, or the username is released
195
- Trademark disputes handled case-by-case (standard UDRP-like process, documented in ToS)
196
- Not a launch concern — address when it becomes a real problem