Operations

---
status: current
platform: VPS (robot.wtf)
---
> Extracted from the original wikibot.io design. AWS-specific content archived at [[Archive/AWS_Design/Operations]].

See also: [[Design/Platform_Overview]], [[Design/Data_Model]], [[Design/Auth]], [[Design/VPS_Architecture]].

---

## Wiki Bootstrap Template

When a user creates a new wiki, the repo is initialized with a starter page set that teaches Claude how to use the wiki effectively. The user connects MCP, starts a conversation, and Claude already knows the conventions.

### Initial pages

**Home** — Landing page with the wiki's name and purpose (user-provided at creation), links to the guide and any starter pages.

**Meta/Wiki Usage Guide** — Instructions for the AI assistant:
- Available MCP tools and what they do
- Session start protocol (read Home first, then check recent changes)
- Page conventions: frontmatter schema, WikiLink syntax, page size guidance (~250–800 words)
- Commit message format
- When to create new pages vs. update existing ones
- How to use categories and tags
- Gardening responsibilities (orphan detection, stale page review, link maintenance)

**Meta/Page Template** — A reference page showing the frontmatter schema, section structure, and WikiLink usage. Claude can copy this pattern when creating new pages.

### Customization

The bootstrap template is parameterized by:
- Wiki name (provided at creation)
- Wiki purpose/description (optional, provided at creation)
- Category set (default set provided, user can customize later)

The default category set matches the existing schema (`actor`, `event`, `trend`, `hypothesis`, `variable`, `reference`, `index`) but users can define their own categories for different research domains.

### Custom template repos (premium)

Premium users can create a wiki from any public (or authenticated) Git repo URL. The server clones the template repo, strips its git history, and commits the contents as the wiki's initial state. This enables:

- Shared team templates ("our standard research wiki layout")
- Domain-specific starter kits (e.g., a policy analysis template, a technical due diligence template)
- Community-contributed templates (a future marketplace opportunity)

---

## Attachment Storage

Otterwiki stores attachments as regular files in the git repo and serves them directly from the working tree.

### MVP approach

Store attachments in the git repo as-is. Tier limits (50MB free, 1GB premium) keep repo sizes manageable.

### Future optimization: external attachment storage

If large attachments become a problem (disk usage, Git remote clone times), decouple attachment storage from the git repo:

1. On upload: store the attachment externally at a known path, commit only a lightweight reference file to git (similar to Git LFS pointer format)
2. On serve: intercept Otterwiki's attachment serving path, resolve the reference, and serve from external storage

This could be implemented as:
- **Otterwiki plugin** that hooks into the attachment upload/serve lifecycle
- **Upstream patch** to Otterwiki adding a pluggable storage backend for attachments (local filesystem vs. external)

The plugin or upstream patch approach is preferable — it benefits the broader Otterwiki community and keeps our fork minimal.

---

## Git Remote Access

Every wiki's bare repo is directly accessible via Git protocol over HTTPS. This is a core feature, not an afterthought — users should never feel locked in.

### Hosted Git remote

```
https://sderle.robot.wtf/third-gulf-war.git
```

Authentication: OAuth JWT or MCP bearer token via Git credential helper, or a dedicated Git access token (simpler for CLI usage).

**Free tier**: read-only. Users can `git clone` and `git pull` their wiki at any time. This is a data portability guarantee — your wiki is always yours.

**Premium tier**: read/write. Users can `git push` to the hosted remote, enabling workflows like local editing, CI/CD integration, or scripted bulk imports.

### Implementation

HTTP route (`/{user}/{wiki}.git/*`) served by the app implementing Git smart HTTP protocol (`git-upload-pack` for clone/fetch, `git-receive-pack` for push). Accesses the same on-disk repo as the wiki handlers.

### External Git sync (premium, future)

Bidirectional sync with an external remote (GitHub, GitLab, etc.). Triggered on schedule or webhook:

1. Open wiki repo
2. `git fetch` from configured external remote
3. Attempt fast-forward merge (no conflicts → auto-merge)
4. Conflicts → flag for human resolution, do not auto-merge
5. Push merged state to external remote
6. Trigger re-embedding if semantic search enabled

---

## Otterwiki Fork Management

The Otterwiki fork is kept as minimal as possible. All customizations are either:
1. **Plugins** (preferred) — no core changes needed
2. **Small, upstreamable patches** — contributed to `schuyler/otterwiki` and submitted as PRs to the upstream `redimp/otterwiki` project
3. **Platform-specific overrides** — admin panel section hiding, template conditionals (kept in a separate branch or patch set)

### Merge strategy

- Track upstream `redimp/otterwiki` as a remote
- Periodically rebase or merge upstream changes into the fork
- Keep platform-specific changes isolated (ideally a thin layer on top, not interleaved with upstream code)
- Automated CI check: does the fork still pass upstream's test suite after merge?

### Upstream relationship

We want to support Otterwiki as a project. Contributions go upstream where possible. If the product generates revenue, donate a portion to the upstream maintainer.

---

## Backup and Disaster Recovery

### What we're protecting

| Data | Source of truth | Severity of loss |
|------|----------------|-----------------|
| Git repos (wiki content) | Local disk | **Critical** — user data, irreplaceable |
| Platform DB (users, wikis, ACLs) | SQLite/PostgreSQL | **High** — reconstructable from repos but painful |
| FAISS indexes | Local disk | **Low** — fully rebuildable from repo content |
| Auth provider state | WorkOS (external) | **Low** — managed by vendor |

### Backup strategy

**Git repos**: rsync to off-site storage, daily. See [[Design/VPS_Architecture]] for specifics.

**Platform DB**: Daily dump + rsync. Point-in-time recovery if using PostgreSQL.

**FAISS indexes**: No backup needed. Rebuildable from repo content (MiniLM runs locally, no API cost).

### Design principle

Git repos are the source of truth. Everything else (platform DB records, FAISS indexes) is either backed up independently or rebuildable from the repos.

---

## Account Lifecycle

### Data retention

User accounts and wiki data are retained indefinitely regardless of activity. Storage cost for an idle wiki is effectively zero. There is no reason to delete inactive accounts — it costs nothing to keep them and deleting user data is irreversible.

### Account deletion

Users can delete their account from the dashboard. This:
1. Deletes all wikis owned by the user (repo, FAISS index, metadata)
2. Removes all ACL grants the user has on other wikis
3. Deletes the user record from the platform DB
4. Does NOT delete the auth provider account (Google/GitHub/etc.) — that's the user's own account

Deletion is permanent and irreversible. Require explicit confirmation ("type your username to confirm").

### GDPR

If serving EU users: account deletion satisfies right-to-erasure. Add a data export endpoint (download all wikis as a zip of git repos) to satisfy right-to-portability — though the Git remote access feature already provides this.

---

## MCP Discoverability

MCP tool descriptions must be self-documenting — any MCP-capable client (Claude, GPT, Gemini, open-source agents) should be able to use the wiki tools without reading external documentation.

Each tool's MCP description should include:
- What it does
- Parameter semantics (e.g., "path is like `Actors/Iran`, not a filesystem path")
- What the return format looks like
- Common next actions ("use `list_notes` to find available pages if you don't know the path")

The bootstrap template's Meta/Wiki Usage Guide provides Claude-specific conventions (session protocol, gardening duties), but the MCP tools themselves should work without it. The guide is optimization, not a prerequisite.

---

## Rate Limiting and Abuse Prevention

**Launch**: OAuth-only accounts + tier limits (1 wiki, 500 pages, 3 collaborators) provide sufficient abuse prevention at low traffic. Public wiki routes are the only unauthenticated surface — acceptable risk at launch with near-zero users.

**Post-launch (when traffic justifies it)**: IP-based rate limiting via reverse proxy (nginx/Caddy). Geographic blocking, bot control, OWASP Top 10 rule sets via WAF or application-level middleware.

**Per-user rate limiting (premium launch)**: When premium tier ships, add per-user throttling on API and MCP endpoints. Define specific limits when the need materializes.