This page is part of the wikibot.io PRD (Product Requirements Document). See also: Design/Platform_Overview, Design/Auth, Design/Implementation_Phases, Design/Operations.
Data Model
DynamoDB tables. Partition keys noted in comments.
Users
User {
id: string, // platform-generated (UUID)
email: string,
display_name: string,
oauth_provider: string, // "google" | "github" | "microsoft" | "apple"
oauth_provider_sub: string, // provider-native subject ID (e.g., Google sub claim)
// GSI on (oauth_provider, oauth_provider_sub) for login lookup
// Critical: enables migration off WorkOS or any auth provider
tier: "free" | "premium",
created_at: ISO8601,
wiki_count: number,
wiki_limit: number, // 1 for free, 10 for premium
stripe_customer_id?: string
}
Wikis
Wiki {
owner_id: string, // User.id
wiki_slug: string, // URL-safe identifier
display_name: string,
repo_path: string, // EFS path: /mnt/efs/{user_id}/{wiki_slug}/repo.git
index_path?: string, // FAISS index location (premium only)
mcp_token_hash: string, // bcrypt hash of MCP bearer token
is_public: boolean, // read-only public access
created_at: ISO8601,
last_accessed: ISO8601,
page_count: number,
semantic_search_enabled: boolean,
custom_domain?: string, // premium: CNAME target
custom_css?: string, // premium: custom styling
external_git_remote?: string // premium: sync target
}
ACLs
ACL {
wiki_id: string, // owner_id + wiki_slug
grantee_id: string, // User.id
role: "owner" | "editor" | "viewer",
granted_by: string,
granted_at: ISO8601
}
Storage layout (EFS)
/mnt/efs/
{user_id}/
{wiki_slug}/
repo.git/ # bare git repo — persistent filesystem
index.faiss # FAISS vector index (premium only)
embeddings.json # page_path → vector mapping
Git Storage Mechanics
EFS-backed git repos
Each wiki's bare git repo lives on a persistent filesystem mounted by the compute layer. No clone/push cycle, no caching, no locks — git operations happen directly on disk.
Read path:
1. Lambda mounts EFS (already attached in VPC)
2. Open bare repo at /mnt/efs/{user}/{wiki}/repo.git
3. Read page from repo
Write path:
1. Open bare repo at /mnt/efs/{user}/{wiki}/repo.git
2. Commit page change
3. Write reindex record to DynamoDB ReindexQueue table
(triggers embedding Lambda via DynamoDB Streams — see Semantic Search section)
Concurrency: NFS handles file-level locking natively. Git's own locking (index.lock) works correctly on NFS. Concurrent reads are unlimited. Concurrent writes to the same repo are serialized by git's lock file. No application-level locking needed.
Consistency: Writes are immediately visible to all Lambda invocations mounting the same EFS filesystem. No eventual consistency concerns.
Fallback: S3 clone-on-demand
If Phase 0 testing shows EFS latency or VPC cold starts are unacceptable, fall back to S3-based repos with a DynamoDB write lock + clone-to-/tmp pattern. This adds significant complexity (locking, cache management, /tmp eviction) and is only worth pursuing if EFS fails testing.
Semantic Search
Semantic search is available to all users (not tier-gated). See Design/Async_Embedding_Pipeline for the full architecture.
Embedding pipeline (summary)
Page write (wiki Lambda, VPC)
→ DynamoDB write to ReindexQueue table (free gateway endpoint, already deployed)
→ DynamoDB Streams captures the change
→ Lambda service polls the stream (outside function's VPC context)
→ Embedding Lambda (VPC, EFS mount):
1. Read page content from EFS repo
2. Chunk page (same algorithm as otterwiki-semantic-search)
3. Embed chunks using all-MiniLM-L6-v2 (runs locally, no external API)
4. Update FAISS index + sidecar metadata on EFS
No Bedrock, no SQS, no new VPC endpoints. Total fixed cost: $0.
FAISS details
FAISS (Facebook AI Similarity Search) is a C++ library with Python bindings for nearest-neighbor search over dense vectors.
Index type: IndexFlatIP (flat index, inner product similarity). For wikis under ~1000 pages, brute-force search is fast enough (<1ms) and requires no training or tuning. The index is just a matrix of vectors.
Index size: Each MiniLM vector is 384 floats × 4 bytes = 1.5KB. A 200-page wiki with ~3 chunks per page = 600 vectors = ~900KB index. Trivial to store on EFS and load into Lambda memory.
Sidecar metadata: FAISS stores only vectors and returns integer indices. The embeddings.json sidecar maps index positions back to {page_path, chunk_index, chunk_text_preview}. This file is loaded alongside the FAISS index.
Search flow:
- Embed query using MiniLM (loaded at Lambda init)
- Load FAISS index + sidecar from EFS (~5ms, already mounted)
- Search top K×3 vectors (~<1ms)
- Deduplicate by page_path, keep best chunk per page
- Return top K results with page paths and matching chunk snippets
Cost estimate
- Embedding a 200-page wiki: effectively $0 (Lambda compute only, ~seconds)
- Per search query: $0 (MiniLM runs locally)
- Re-embedding on page edits: negligible (DynamoDB write + Lambda invocation)
- VPC endpoints: $0 (uses existing DynamoDB gateway endpoint)
URL Structure
Each user gets a subdomain: {username}.wikibot.io
sderle.wikibot.io/ → user's wiki list (dashboard) sderle.wikibot.io/third-gulf-war/ → wiki web UI sderle.wikibot.io/third-gulf-war/api/v1/ → wiki REST API sderle.wikibot.io/third-gulf-war/mcp → wiki MCP endpoint
Custom domains (premium)
Premium users can CNAME their own domain to their {username}.wikibot.io subdomain. Implementation: API Gateway custom domain + ACM certificate (free via AWS). The Lambda resolver checks DynamoDB for custom domain → user mapping.
research.mysite.com → CNAME → sderle.wikibot.io
This requires wildcard routing at the API Gateway level (*.wikibot.io) and TLS cert provisioning per custom domain. ACM supports up to 2500 certs per account, which is fine for early scale. At larger scale, CloudFront with SNI handles this better.
Usernames
Each user chooses a username at signup (after OAuth). Usernames are URL-critical ({username}.wikibot.io) so they must be:
- URL-safe: lowercase alphanumeric + hyphens, 3–30 characters, no leading/trailing hyphens
- Unique: enforced in DynamoDB
- Immutable (MVP): changing usernames means changing URLs, which breaks MCP connections, Git remotes, bookmarks. Defer username changes (with redirect support) to a future iteration.
- Reserved: block names that conflict with platform routes or look official:
admin,www,api,auth,mcp,app,help,support,billing,status,blog,docs,robot,wiki,static,assets,null,undefined, etc. Maintain a blocklist.
Username squatting
Free accounts cost nothing to create, so squatting is possible. Mitigations:
- Require at least one wiki with at least one page edit within 90 days of signup, or the username is released
- Trademark disputes handled case-by-case (standard UDRP-like process, documented in ToS)
- Not a launch concern — address when it becomes a real problem