This page is part of the **wikibot.io PRD** (Product Requirements Document). See also: [[Design/Platform_Overview]], [[Design/Auth]], [[Design/Implementation_Phases]], [[Design/Operations]].

---

## Data Model

> **Superseded.** This page describes the DynamoDB/EFS data model for wikibot.io. See [[Design/VPS_Architecture]] for the current plan (SQLite, local disk). The ACL model and storage layout concepts carry forward; the DynamoDB-specific schema does not.

DynamoDB tables. Partition keys noted in comments.

#### Users

```
User {
  id: string,                  // platform-generated (UUID)
  email: string,
  display_name: string,
  oauth_provider: string,      // "google" | "github" | "microsoft" | "apple"
  oauth_provider_sub: string,  // provider-native subject ID (e.g., Google sub claim)
                               // GSI on (oauth_provider, oauth_provider_sub) for login lookup
                               // Critical: enables migration off WorkOS or any auth provider
  created_at: ISO8601,
  wiki_count: number,
  stripe_customer_id?: string
}
```

Note: the User model is deliberately thin on pricing fields. Under Option A (flat tier), add `tier: "free" | "premium"` and `wiki_limit: number`. Under Option B (per-wiki), no tier field is needed — billing state lives on each Wiki record. See [[Design/Implementation_Phases]] for pricing options.

#### Wikis

```
Wiki {
  owner_id: string,            // User.id
  wiki_slug: string,           // URL-safe identifier (under user namespace)
  custom_slug?: string,        // paid wikis: top-level slug for {slug}.wikibot.io
  display_name: string,
  repo_path: string,           // EFS path: /mnt/efs/{user_id}/{wiki_slug}/repo.git
  index_path?: string,         // FAISS index location (on EFS alongside repo)
  mcp_token_hash: string,      // bcrypt hash of MCP bearer token
  is_public: boolean,          // read-only public access
  is_paid: boolean,            // whether this wiki requires payment (i.e., not the free wiki)
  payment_status: "active" | "lapsed" | "free",
                               // free = the user's one free wiki
                               // active = paid and current
                               // lapsed = payment failed/canceled → read-only, MCP disabled
  created_at: ISO8601,
  last_accessed: ISO8601,
  page_count: number,
}
```

#### ACLs

```
ACL {
  wiki_id: string,             // owner_id + wiki_slug
  grantee_id: string,          // User.id
  role: "owner" | "editor" | "viewer",
  granted_by: string,
  granted_at: ISO8601
}
```

### Storage layout (EFS)

```
/mnt/efs/
  {user_id}/
    {wiki_slug}/
      repo.git/              # bare git repo — persistent filesystem
      index.faiss            # FAISS vector index
      embeddings.json        # page_path → vector mapping
```

---

## Git Storage Mechanics

### EFS-backed git repos

Each wiki's bare git repo lives on a persistent filesystem mounted by the compute layer. No clone/push cycle, no caching, no locks — git operations happen directly on disk.

**Read path:**
```
1. Lambda mounts EFS (already attached in VPC)
2. Open bare repo at /mnt/efs/{user}/{wiki}/repo.git
3. Read page from repo
```

**Write path:**
```
1. Open bare repo at /mnt/efs/{user}/{wiki}/repo.git
2. Commit page change
3. Write reindex record to DynamoDB ReindexQueue table
   (triggers embedding Lambda via DynamoDB Streams — see Semantic Search section)
```

**Concurrency**: NFS handles file-level locking natively. Git's own locking (`index.lock`) works correctly on NFS. Concurrent reads are unlimited. Concurrent writes to the same repo are serialized by git's lock file. No application-level locking needed.

**Consistency**: Writes are immediately visible to all Lambda invocations mounting the same EFS filesystem. No eventual consistency concerns.

### Fallback: S3 clone-on-demand

If Phase 0 testing shows EFS latency or VPC cold starts are unacceptable, fall back to S3-based repos with a DynamoDB write lock + clone-to-/tmp pattern. This adds significant complexity (locking, cache management, /tmp eviction) and is only worth pursuing if EFS fails testing.

---

## Semantic Search

Semantic search is available to all users (not tier-gated). See [[Design/Async_Embedding_Pipeline]] for the full architecture.

### Embedding pipeline (summary)

```
Page write (wiki Lambda, VPC)
  → DynamoDB write to ReindexQueue table (free gateway endpoint, already deployed)
  → DynamoDB Streams captures the change
  → Lambda service polls the stream (outside function's VPC context)
  → Embedding Lambda (VPC, EFS mount):
      1. Read page content from EFS repo
      2. Chunk page (same algorithm as otterwiki-semantic-search)
      3. Embed chunks using all-MiniLM-L6-v2 (runs locally, no external API)
      4. Update FAISS index + sidecar metadata on EFS
```

No Bedrock, no SQS, no new VPC endpoints. Total fixed cost: $0.

### FAISS details

FAISS (Facebook AI Similarity Search) is a C++ library with Python bindings for nearest-neighbor search over dense vectors.

**Index type**: `IndexFlatIP` (flat index, inner product similarity). For wikis under ~1000 pages, brute-force search is fast enough (<1ms) and requires no training or tuning. The index is just a matrix of vectors.

**Index size**: Each MiniLM vector is 384 floats × 4 bytes = 1.5KB. A 200-page wiki with ~3 chunks per page = 600 vectors = ~900KB index. Trivial to store on EFS and load into Lambda memory.

**Sidecar metadata**: FAISS stores only vectors and returns integer indices. The `embeddings.json` sidecar maps index positions back to `{page_path, chunk_index, chunk_text_preview}`. This file is loaded alongside the FAISS index.

**Search flow**:
1. Embed query using MiniLM (loaded at Lambda init)
2. Load FAISS index + sidecar from EFS (~5ms, already mounted)
3. Search top K×3 vectors (~<1ms)
4. Deduplicate by page_path, keep best chunk per page
5. Return top K results with page paths and matching chunk snippets

### Cost estimate

- Embedding a 200-page wiki: effectively $0 (Lambda compute only, ~seconds)
- Per search query: $0 (MiniLM runs locally)
- Re-embedding on page edits: negligible (DynamoDB write + Lambda invocation)
- VPC endpoints: $0 (uses existing DynamoDB gateway endpoint)

---

## URL Structure

Each user gets a subdomain: `{username}.wikibot.io`

```
sderle.wikibot.io/                          → user's wiki list (dashboard)
sderle.wikibot.io/third-gulf-war/           → wiki web UI (free wiki, under user namespace)
sderle.wikibot.io/third-gulf-war/api/v1/    → wiki REST API
sderle.wikibot.io/third-gulf-war/mcp        → wiki MCP endpoint
```

### Custom slugs (paid wikis)

Paid wikis get a top-level slug: `{slug}.wikibot.io`. This is a vanity URL that routes directly to the wiki without the username prefix. The slug is chosen at wiki creation time and must be globally unique (same validation rules as usernames: lowercase alphanumeric + hyphens, 3–30 characters, drawn from the same namespace/blocklist).

```
third-gulf-war.wikibot.io/                  → wiki web UI (paid wiki, top-level slug)
third-gulf-war.wikibot.io/api/v1/           → wiki REST API
third-gulf-war.wikibot.io/mcp              → wiki MCP endpoint
```

The user-namespace URL (`sderle.wikibot.io/third-gulf-war/`) continues to work as a redirect. This means existing MCP connections and bookmarks survive if a free wiki is later upgraded to paid.

Implementation: the Lambda resolver checks the subdomain against the Wikis table's `custom_slug` GSI first, then falls back to username resolution.

---

## Usernames

Each user chooses a username at signup (after OAuth). Usernames are URL-critical (`{username}.wikibot.io`) so they must be:

- **URL-safe**: lowercase alphanumeric + hyphens, 3–30 characters, no leading/trailing hyphens
- **Unique**: enforced in DynamoDB
- **Immutable** (MVP): changing usernames means changing URLs, which breaks MCP connections, Git remotes, bookmarks. Defer username changes (with redirect support) to a future iteration.
- **Reserved**: block names that conflict with platform routes or look official: `admin`, `www`, `api`, `auth`, `mcp`, `app`, `help`, `support`, `billing`, `status`, `blog`, `docs`, `robot`, `wiki`, `static`, `assets`, `null`, `undefined`, etc. Maintain a blocklist.

### Username squatting

Free accounts cost nothing to create, so squatting is possible. Mitigations:
- Require at least one wiki with at least one page edit within 90 days of signup, or the username is released
- Trademark disputes handled case-by-case (standard UDRP-like process, documented in ToS)
- Not a launch concern — address when it becomes a real problem
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9