This page is part of the wikibot.io PRD (Product Requirements Document). See also: Design/Platform_Overview, Design/Auth, Design/Implementation_Phases, Design/Operations.


Data Model

DynamoDB tables. Partition keys noted in comments.

Users

User {
  id: string,                  // platform-generated (UUID)
  email: string,
  display_name: string,
  oauth_provider: string,      // "google" | "github" | "microsoft" | "apple"
  oauth_provider_sub: string,  // provider-native subject ID (e.g., Google sub claim)
                               // GSI on (oauth_provider, oauth_provider_sub) for login lookup
                               // Critical: enables migration off WorkOS or any auth provider
  tier: "free" | "premium",
  created_at: ISO8601,
  wiki_count: number,
  wiki_limit: number,          // 1 for free, 10 for premium
  stripe_customer_id?: string
}

Wikis

Wiki {
  owner_id: string,            // User.id
  wiki_slug: string,           // URL-safe identifier
  display_name: string,
  repo_path: string,           // EFS path: /mnt/efs/{user_id}/{wiki_slug}/repo.git
  index_path?: string,         // FAISS index location (premium only)
  mcp_token_hash: string,      // bcrypt hash of MCP bearer token
  is_public: boolean,          // read-only public access
  created_at: ISO8601,
  last_accessed: ISO8601,
  page_count: number,
  semantic_search_enabled: boolean,
  custom_domain?: string,      // premium: CNAME target
  custom_css?: string,         // premium: custom styling
  external_git_remote?: string // premium: sync target
}

ACLs

ACL {
  wiki_id: string,             // owner_id + wiki_slug
  grantee_id: string,          // User.id
  role: "owner" | "editor" | "viewer",
  granted_by: string,
  granted_at: ISO8601
}

Storage layout (EFS)

/mnt/efs/
  {user_id}/
    {wiki_slug}/
      repo.git/              # bare git repo — persistent filesystem
      index.faiss            # FAISS vector index (premium only)
      embeddings.json        # page_path → vector mapping

Git Storage Mechanics

EFS-backed git repos

Each wiki's bare git repo lives on a persistent filesystem mounted by the compute layer. No clone/push cycle, no caching, no locks — git operations happen directly on disk.

Read path:

1. Lambda mounts EFS (already attached in VPC)
2. Open bare repo at /mnt/efs/{user}/{wiki}/repo.git
3. Read page from repo

Write path:

1. Open bare repo at /mnt/efs/{user}/{wiki}/repo.git
2. Commit page change
3. Write reindex record to DynamoDB ReindexQueue table
   (triggers embedding Lambda via DynamoDB Streams — see Semantic Search section)

Concurrency: NFS handles file-level locking natively. Git's own locking (index.lock) works correctly on NFS. Concurrent reads are unlimited. Concurrent writes to the same repo are serialized by git's lock file. No application-level locking needed.

Consistency: Writes are immediately visible to all Lambda invocations mounting the same EFS filesystem. No eventual consistency concerns.

Fallback: S3 clone-on-demand

If Phase 0 testing shows EFS latency or VPC cold starts are unacceptable, fall back to S3-based repos with a DynamoDB write lock + clone-to-/tmp pattern. This adds significant complexity (locking, cache management, /tmp eviction) and is only worth pursuing if EFS fails testing.


Semantic search is available to all users (not tier-gated). See Design/Async_Embedding_Pipeline for the full architecture.

Embedding pipeline (summary)

Page write (wiki Lambda, VPC)
  → DynamoDB write to ReindexQueue table (free gateway endpoint, already deployed)
  → DynamoDB Streams captures the change
  → Lambda service polls the stream (outside function's VPC context)
  → Embedding Lambda (VPC, EFS mount):
      1. Read page content from EFS repo
      2. Chunk page (same algorithm as otterwiki-semantic-search)
      3. Embed chunks using all-MiniLM-L6-v2 (runs locally, no external API)
      4. Update FAISS index + sidecar metadata on EFS

No Bedrock, no SQS, no new VPC endpoints. Total fixed cost: $0.

FAISS details

FAISS (Facebook AI Similarity Search) is a C++ library with Python bindings for nearest-neighbor search over dense vectors.

Index type: IndexFlatIP (flat index, inner product similarity). For wikis under ~1000 pages, brute-force search is fast enough (<1ms) and requires no training or tuning. The index is just a matrix of vectors.

Index size: Each MiniLM vector is 384 floats × 4 bytes = 1.5KB. A 200-page wiki with ~3 chunks per page = 600 vectors = ~900KB index. Trivial to store on EFS and load into Lambda memory.

Sidecar metadata: FAISS stores only vectors and returns integer indices. The embeddings.json sidecar maps index positions back to {page_path, chunk_index, chunk_text_preview}. This file is loaded alongside the FAISS index.

Search flow:

  1. Embed query using MiniLM (loaded at Lambda init)
  2. Load FAISS index + sidecar from EFS (~5ms, already mounted)
  3. Search top K×3 vectors (~<1ms)
  4. Deduplicate by page_path, keep best chunk per page
  5. Return top K results with page paths and matching chunk snippets

Cost estimate

  • Embedding a 200-page wiki: effectively $0 (Lambda compute only, ~seconds)
  • Per search query: $0 (MiniLM runs locally)
  • Re-embedding on page edits: negligible (DynamoDB write + Lambda invocation)
  • VPC endpoints: $0 (uses existing DynamoDB gateway endpoint)

URL Structure

Each user gets a subdomain: {username}.wikibot.io

sderle.wikibot.io/                          → user's wiki list (dashboard)
sderle.wikibot.io/third-gulf-war/           → wiki web UI
sderle.wikibot.io/third-gulf-war/api/v1/    → wiki REST API
sderle.wikibot.io/third-gulf-war/mcp        → wiki MCP endpoint

Custom domains (premium)

Premium users can CNAME their own domain to their {username}.wikibot.io subdomain. Implementation: API Gateway custom domain + ACM certificate (free via AWS). The Lambda resolver checks DynamoDB for custom domain → user mapping.

research.mysite.com  →  CNAME  →  sderle.wikibot.io

This requires wildcard routing at the API Gateway level (*.wikibot.io) and TLS cert provisioning per custom domain. ACM supports up to 2500 certs per account, which is fine for early scale. At larger scale, CloudFront with SNI handles this better.


Usernames

Each user chooses a username at signup (after OAuth). Usernames are URL-critical ({username}.wikibot.io) so they must be:

  • URL-safe: lowercase alphanumeric + hyphens, 3–30 characters, no leading/trailing hyphens
  • Unique: enforced in DynamoDB
  • Immutable (MVP): changing usernames means changing URLs, which breaks MCP connections, Git remotes, bookmarks. Defer username changes (with redirect support) to a future iteration.
  • Reserved: block names that conflict with platform routes or look official: admin, www, api, auth, mcp, app, help, support, billing, status, blog, docs, robot, wiki, static, assets, null, undefined, etc. Maintain a blocklist.

Username squatting

Free accounts cost nothing to create, so squatting is possible. Mitigations:

  • Require at least one wiki with at least one page edit within 90 days of signup, or the username is released
  • Trademark disputes handled case-by-case (standard UDRP-like process, documented in ToS)
  • Not a launch concern — address when it becomes a real problem