This page is part of the wikibot.io PRD (Product Requirements Document). See also: Design/Platform_Overview, Design/Auth, Design/Implementation_Phases, Design/Operations.
Data Model
DynamoDB tables. Partition keys noted in comments.
Users
User {
id: string, // platform-generated (UUID)
email: string,
display_name: string,
oauth_provider: string, // "google" | "github" | "microsoft" | "apple"
oauth_provider_sub: string, // provider-native subject ID (e.g., Google sub claim)
// GSI on (oauth_provider, oauth_provider_sub) for login lookup
// Critical: enables migration off WorkOS or any auth provider
created_at: ISO8601,
wiki_count: number,
stripe_customer_id?: string
}
Note: the User model is deliberately thin on pricing fields. Under Option A (flat tier), add tier: "free" | "premium" and wiki_limit: number. Under Option B (per-wiki), no tier field is needed — billing state lives on each Wiki record. See Design/Implementation_Phases for pricing options.
Wikis
Wiki {
owner_id: string, // User.id
wiki_slug: string, // URL-safe identifier (under user namespace)
custom_slug?: string, // paid wikis: top-level slug for {slug}.wikibot.io
display_name: string,
repo_path: string, // EFS path: /mnt/efs/{user_id}/{wiki_slug}/repo.git
index_path?: string, // FAISS index location (on EFS alongside repo)
mcp_token_hash: string, // bcrypt hash of MCP bearer token
is_public: boolean, // read-only public access
is_paid: boolean, // whether this wiki requires payment (i.e., not the free wiki)
payment_status: "active" | "lapsed" | "free",
// free = the user's one free wiki
// active = paid and current
// lapsed = payment failed/canceled → read-only, MCP disabled
created_at: ISO8601,
last_accessed: ISO8601,
page_count: number,
}
ACLs
ACL {
wiki_id: string, // owner_id + wiki_slug
grantee_id: string, // User.id
role: "owner" | "editor" | "viewer",
granted_by: string,
granted_at: ISO8601
}
Storage layout (EFS)
/mnt/efs/
{user_id}/
{wiki_slug}/
repo.git/ # bare git repo — persistent filesystem
index.faiss # FAISS vector index
embeddings.json # page_path → vector mapping
Git Storage Mechanics
EFS-backed git repos
Each wiki's bare git repo lives on a persistent filesystem mounted by the compute layer. No clone/push cycle, no caching, no locks — git operations happen directly on disk.
Read path:
1. Lambda mounts EFS (already attached in VPC)
2. Open bare repo at /mnt/efs/{user}/{wiki}/repo.git
3. Read page from repo
Write path:
1. Open bare repo at /mnt/efs/{user}/{wiki}/repo.git
2. Commit page change
3. Write reindex record to DynamoDB ReindexQueue table
(triggers embedding Lambda via DynamoDB Streams — see Semantic Search section)
Concurrency: NFS handles file-level locking natively. Git's own locking (index.lock) works correctly on NFS. Concurrent reads are unlimited. Concurrent writes to the same repo are serialized by git's lock file. No application-level locking needed.
Consistency: Writes are immediately visible to all Lambda invocations mounting the same EFS filesystem. No eventual consistency concerns.
Fallback: S3 clone-on-demand
If Phase 0 testing shows EFS latency or VPC cold starts are unacceptable, fall back to S3-based repos with a DynamoDB write lock + clone-to-/tmp pattern. This adds significant complexity (locking, cache management, /tmp eviction) and is only worth pursuing if EFS fails testing.
Semantic Search
Semantic search is available to all users (not tier-gated). See Design/Async_Embedding_Pipeline for the full architecture.
Embedding pipeline (summary)
Page write (wiki Lambda, VPC)
→ DynamoDB write to ReindexQueue table (free gateway endpoint, already deployed)
→ DynamoDB Streams captures the change
→ Lambda service polls the stream (outside function's VPC context)
→ Embedding Lambda (VPC, EFS mount):
1. Read page content from EFS repo
2. Chunk page (same algorithm as otterwiki-semantic-search)
3. Embed chunks using all-MiniLM-L6-v2 (runs locally, no external API)
4. Update FAISS index + sidecar metadata on EFS
No Bedrock, no SQS, no new VPC endpoints. Total fixed cost: $0.
FAISS details
FAISS (Facebook AI Similarity Search) is a C++ library with Python bindings for nearest-neighbor search over dense vectors.
Index type: IndexFlatIP (flat index, inner product similarity). For wikis under ~1000 pages, brute-force search is fast enough (<1ms) and requires no training or tuning. The index is just a matrix of vectors.
Index size: Each MiniLM vector is 384 floats × 4 bytes = 1.5KB. A 200-page wiki with ~3 chunks per page = 600 vectors = ~900KB index. Trivial to store on EFS and load into Lambda memory.
Sidecar metadata: FAISS stores only vectors and returns integer indices. The embeddings.json sidecar maps index positions back to {page_path, chunk_index, chunk_text_preview}. This file is loaded alongside the FAISS index.
Search flow:
- Embed query using MiniLM (loaded at Lambda init)
- Load FAISS index + sidecar from EFS (~5ms, already mounted)
- Search top K×3 vectors (~<1ms)
- Deduplicate by page_path, keep best chunk per page
- Return top K results with page paths and matching chunk snippets
Cost estimate
- Embedding a 200-page wiki: effectively $0 (Lambda compute only, ~seconds)
- Per search query: $0 (MiniLM runs locally)
- Re-embedding on page edits: negligible (DynamoDB write + Lambda invocation)
- VPC endpoints: $0 (uses existing DynamoDB gateway endpoint)
URL Structure
Each user gets a subdomain: {username}.wikibot.io
sderle.wikibot.io/ → user's wiki list (dashboard) sderle.wikibot.io/third-gulf-war/ → wiki web UI (free wiki, under user namespace) sderle.wikibot.io/third-gulf-war/api/v1/ → wiki REST API sderle.wikibot.io/third-gulf-war/mcp → wiki MCP endpoint
Custom slugs (paid wikis)
Paid wikis get a top-level slug: {slug}.wikibot.io. This is a vanity URL that routes directly to the wiki without the username prefix. The slug is chosen at wiki creation time and must be globally unique (same validation rules as usernames: lowercase alphanumeric + hyphens, 3–30 characters, drawn from the same namespace/blocklist).
third-gulf-war.wikibot.io/ → wiki web UI (paid wiki, top-level slug) third-gulf-war.wikibot.io/api/v1/ → wiki REST API third-gulf-war.wikibot.io/mcp → wiki MCP endpoint
The user-namespace URL (sderle.wikibot.io/third-gulf-war/) continues to work as a redirect. This means existing MCP connections and bookmarks survive if a free wiki is later upgraded to paid.
Implementation: the Lambda resolver checks the subdomain against the Wikis table's custom_slug GSI first, then falls back to username resolution.
Usernames
Each user chooses a username at signup (after OAuth). Usernames are URL-critical ({username}.wikibot.io) so they must be:
- URL-safe: lowercase alphanumeric + hyphens, 3–30 characters, no leading/trailing hyphens
- Unique: enforced in DynamoDB
- Immutable (MVP): changing usernames means changing URLs, which breaks MCP connections, Git remotes, bookmarks. Defer username changes (with redirect support) to a future iteration.
- Reserved: block names that conflict with platform routes or look official:
admin,www,api,auth,mcp,app,help,support,billing,status,blog,docs,robot,wiki,static,assets,null,undefined, etc. Maintain a blocklist.
Username squatting
Free accounts cost nothing to create, so squatting is possible. Mitigations:
- Require at least one wiki with at least one page edit within 90 days of signup, or the username is released
- Trademark disputes handled case-by-case (standard UDRP-like process, documented in ToS)
- Not a launch concern — address when it becomes a real problem