This page is part of the **wikibot.io PRD** (Product Requirements Document). See also: [[Design/Platform_Overview]], [[Design/Data_Model]], [[Design/Auth]], [[Design/Implementation_Phases]].
-
---
+
status: current
+
platform: VPS (robot.wtf)
+
---
+
> Extracted from the original wikibot.io design. AWS-specific content archived at [[Archive/AWS_Design/Operations]].
-
## Infrastructure cost model
-
-
> **Superseded.** This page describes AWS infrastructure costs, CI/CD, and operational procedures for wikibot.io. See [[Design/VPS_Architecture]] for the current plan (Debian 12 VM, systemd, rsync backups, $0 hosting cost). The wiki bootstrap template, attachment storage concepts, and Git remote access design carry forward.
-
-
Fixed monthly costs regardless of user count:
-
-
Fixed monthly costs by phase (dev environment, 1 AZ):
Phases 0–3 use Pulumi-managed environment variables for secrets (redeploy to rotate). Secrets Manager is introduced pre-launch when rotation without redeployment and audit trails matter.
**Why it's not zero**: EFS requires Lambda to run in a VPC. EFS itself is accessed via mount targets in the VPC (no endpoint needed). But VPC Lambda can't reach other AWS services (DynamoDB, S3) over the public internet — it needs either a NAT Gateway ($32/mo — too expensive) or VPC endpoints. Gateway endpoints (DynamoDB, S3) are free. Interface endpoints (Secrets Manager) cost ~$7/mo/AZ — only needed when Secrets Manager is introduced pre-launch (Phase 4). Bedrock and SQS endpoints were originally planned but have been eliminated by switching to DynamoDB Streams and local MiniLM embeddings.
-
-
**Bottom line**: ~$0.50/mo at rest in Phase 0. ~$13-18/mo from Phase 4 (Secrets Manager endpoint + WAF). No further increase for premium features. "Near-zero cost at rest" is accurate.
+
See also: [[Design/Platform_Overview]], [[Design/Data_Model]], [[Design/Auth]], [[Design/VPS_Architecture]].
---
## Wiki Bootstrap Template
-
When a user creates a new wiki, the repo is initialized with a starter page set that teaches Claude how to use the wiki effectively. This is the onboarding experience — the user connects MCP, starts a conversation, and Claude already knows the conventions.
+
When a user creates a new wiki, the repo is initialized with a starter page set that teaches Claude how to use the wiki effectively. The user connects MCP, starts a conversation, and Claude already knows the conventions.
### Initial pages
@@ 76,37 38,32 @@
### Custom template repos (premium)
-
Premium users can create a wiki from any public (or authenticated) Git repo URL. The Lambda clones the template repo, strips its git history, and commits the contents as the wiki's initial state. This enables:
+
Premium users can create a wiki from any public (or authenticated) Git repo URL. The server clones the template repo, strips its git history, and commits the contents as the wiki's initial state. This enables:
- Shared team templates ("our standard research wiki layout")
- Domain-specific starter kits (e.g., a policy analysis template, a technical due diligence template)
- Community-contributed templates (a future marketplace opportunity)
-
### Implementation
-
-
The default template is stored on EFS (or bundled with the Lambda deployment). On wiki creation, the Lambda copies it to the new repo directory, performs string substitution for wiki name/purpose, and commits as the initial state. For custom templates (premium), the Lambda clones from the provided Git URL instead.
-
---
## Attachment Storage
-
Otterwiki stores attachments as regular files in the git repo and serves them directly from the working tree. With EFS, this works the same as on a VPS — no clone overhead, attachments are just files on a mounted filesystem.
+
Otterwiki stores attachments as regular files in the git repo and serves them directly from the working tree.
### MVP approach
-
Store attachments in the git repo as-is. Tier limits (50MB free, 1GB premium) keep repo sizes manageable. EFS serves files directly with no performance concern.
+
Store attachments in the git repo as-is. Tier limits (50MB free, 1GB premium) keep repo sizes manageable.
If large attachments become a problem (EFS cost, Git remote clone times), decouple attachment storage from the git repo:
+
If large attachments become a problem (disk usage, Git remote clone times), decouple attachment storage from the git repo:
-
1. On upload: store the attachment in S3 at a known path (`s3://attachments/{user}/{wiki}/{filename}`), commit only a lightweight reference file to git (similar to Git LFS pointer format)
-
2. On serve: intercept Otterwiki's attachment serving path, resolve the reference, and redirect to S3 (or serve via CloudFront)
+
1. On upload: store the attachment externally at a known path, commit only a lightweight reference file to git (similar to Git LFS pointer format)
+
2. On serve: intercept Otterwiki's attachment serving path, resolve the reference, and serve from external storage
This could be implemented as:
- **Otterwiki plugin** that hooks into the attachment upload/serve lifecycle
-
- **Upstream patch** to Otterwiki adding a pluggable storage backend for attachments (local filesystem vs. S3 vs. LFS)
-
- **Lambda middleware** that intercepts attachment routes and serves from S3 before Otterwiki sees the request
+
- **Upstream patch** to Otterwiki adding a pluggable storage backend for attachments (local filesystem vs. external)
The plugin or upstream patch approach is preferable — it benefits the broader Otterwiki community and keeps our fork minimal.
@@ 119,7 76,7 @@
### Hosted Git remote
```
-
https://sderle.wikibot.io/third-gulf-war.git
+
https://sderle.robot.wtf/third-gulf-war.git
```
Authentication: OAuth JWT or MCP bearer token via Git credential helper, or a dedicated Git access token (simpler for CLI usage).
@@ 130,128 87,19 @@
### Implementation
-
API Gateway route (`/{user}/{wiki}.git/*`) → Lambda implementing Git smart HTTP protocol (`git-upload-pack` for clone/fetch, `git-receive-pack` for push). The Lambda accesses the same EFS-mounted repo as the wiki handlers.
-
-
This is a well-documented protocol — the Lambda needs to handle the handful of Git smart HTTP endpoints (`/info/refs`, `/git-upload-pack`, `/git-receive-pack`). Libraries like dulwich can serve these without shelling out to git.
+
HTTP route (`/{user}/{wiki}.git/*`) served by the app implementing Git smart HTTP protocol (`git-upload-pack` for clone/fetch, `git-receive-pack` for push). Accesses the same on-disk repo as the wiki handlers.
### External Git sync (premium, future)
-
Bidirectional sync with an external remote (GitHub, GitLab, etc.). A separate Lambda triggered on schedule (EventBridge) or webhook:
+
Bidirectional sync with an external remote (GitHub, GitLab, etc.). Triggered on schedule or webhook:
-
1. Open wiki repo on EFS
+
1. Open wiki repo
2. `git fetch` from configured external remote
3. Attempt fast-forward merge (no conflicts → auto-merge)
4. Conflicts → flag for human resolution, do not auto-merge
5. Push merged state to external remote
6. Trigger re-embedding if semantic search enabled
-
Credentials for external remotes stored in Secrets Manager (per-wiki secret, auto-rotation support). This is a future feature — the hosted Git remote is the MVP.
-
-
---
-
-
## Platform: AWS Lambda + EFS
-
-
### Why AWS + EFS
-
-
EFS (Elastic File System) is AWS's managed NFS service. Lambda mounts EFS volumes directly, eliminating the git-on-S3 clone/push cycle. Git repos live on a persistent filesystem — Lambda reads/writes them in place, just like a VPS. Combined with AWS's managed service catalog (DynamoDB, DynamoDB Streams, CloudFront, ACM) and first-class Pulumi support, this is the strongest fit.
-
-
### Key properties
-
-
- **Git repos on EFS work like local disk** — no clone, no push, no S3 sync, no write locks
-
- EFS Infrequent Access: $0.016/GB/month — a 2MB wiki costs ~$0.00003/month at rest
-
- NFS handles concurrent access natively (git's `index.lock` works on NFS)
-
- Lambda scales to zero; EFS cost is storage-only when idle
-
- Built-in backup via AWS Backup (to S3)
-
-
### VPC networking
-
-
EFS requires Lambda to run in a VPC. VPC Lambda can't reach AWS services (DynamoDB, SQS, Bedrock, S3) over the public internet — it needs either a NAT Gateway ($32/mo minimum, kills "zero cost at rest") or VPC endpoints.
-
-
- **Gateway endpoints** (free): DynamoDB, S3 — route traffic through the VPC route table
-
- **Interface endpoints** (~$7/mo each per AZ): Secrets Manager — ENI-based, billed hourly + per-GB. SQS and Bedrock endpoints are no longer needed (semantic search uses DynamoDB Streams + local MiniLM embeddings; see [[Design/Async_Embedding_Pipeline]])
-
- Minimize AZs in dev (1 AZ = 1× endpoint cost); prod needs 2 AZs for availability
-
-
This is a Phase 0 infrastructure requirement — without endpoints, Lambda can mount EFS but can't reach DynamoDB for ACL checks.
-
-
### Known trade-offs
-
-
- EFS requires a VPC → ~1–2s added to Lambda cold starts (Provisioned Concurrency is available if this proves unacceptable — ~$10-15/mo for 1 warm instance)
-
- EFS latency (~1–5ms per op) is higher than local disk but adequate for git
-
- Mangum adapter needed for Flask on Lambda
-
- API Gateway 29s timeout limits long operations
-
- VPC interface endpoints add fixed monthly cost (~$7-14/mo in prod for Secrets Manager across 2 AZs)
-
-
### S3 fallback
-
-
If Phase 0 testing shows EFS latency or VPC cold starts are unacceptable, fall back to S3-based repos with DynamoDB write locks and clone-to-/tmp. Adds significant complexity (locking, cache management, /tmp eviction) — last resort only.
-
-
### Alternatives considered
-
-
**Fly.io Machines**: Persistent volumes, unmodified Otterwiki, simplest architecture. But weaker IaC, no managed services (embeddings, queues, metadata). Simplest fallback if EFS fails.
-
-
**Google Cloud Run**: Cloud Storage FUSE less proven than EFS for filesystem workloads. No clear advantage over AWS+EFS. Less familiar territory.
- CloudWatch monitoring, alerting, and billing alarms
-
- X-Ray tracing configuration
-
- Auth provider configuration (WorkOS AuthKit)
-
- Stripe webhook endpoints
-
-
### What's NOT managed by IaC
-
-
- Application code (deployed via CI/CD pipeline, not Pulumi)
-
- User data (wiki repos, DynamoDB records)
-
- Secrets from external services (Stripe API keys, auth provider secrets) — stored in Pulumi config (`pulumi config set --secret`) and injected as Lambda environment variables during development. Migrated to AWS Secrets Manager pre-launch (Phase 4) for rotation without redeployment and audit trails. The VPC interface endpoint for Secrets Manager is only needed at that point.
| Git repos (wiki content) | Local disk | **Critical** — user data, irreplaceable |
+
| Platform DB (users, wikis, ACLs) | SQLite/PostgreSQL | **High** — reconstructable from repos but painful |
+
| FAISS indexes | Local disk | **Low** — fully rebuildable from repo content |
| Auth provider state | WorkOS (external) | **Low** — managed by vendor |
### Backup strategy
-
**Git repos (EFS)**: AWS Backup with daily snapshots, 30-day retention. EFS supports point-in-time recovery via backup. Cost: negligible for small repos.
-
-
**DynamoDB**: Point-in-Time Recovery (PITR) — continuous backups, restore to any second in the last 35 days. Cost: ~$0.20/GB/month (pennies for our data volume).
-
-
**FAISS indexes**: No backup needed. Rebuildable from repo content via the embedding Lambda (MiniLM runs locally, no API cost). Loss means a one-time re-embedding of all pages — seconds of Lambda compute per wiki.
+
**Git repos**: rsync to off-site storage, daily. See [[Design/VPS_Architecture]] for specifics.
-
### Recovery scenarios
+
**Platform DB**: Daily dump + rsync. Point-in-time recovery if using PostgreSQL.
-
| Scenario | Recovery path | RPO | RTO |
-
|----------|--------------|-----|-----|
-
| Single wiki repo corrupted | Restore from EFS backup snapshot | 24h (daily backup) | Minutes |
-
| Bad push overwrites repo | Restore from EFS backup snapshot | 24h | Minutes |
| DynamoDB total loss | PITR restore; worst case reconstruct from EFS repo inventory | Seconds | Hours |
-
| FAISS index lost | Re-embed all pages for affected wiki | N/A (rebuildable) | Minutes per wiki |
-
| Full region outage | Accept downtime | N/A | Depends on provider recovery |
+
**FAISS indexes**: No backup needed. Rebuildable from repo content (MiniLM runs locally, no API cost).
### Design principle
-
Git repos are the source of truth. Everything else (DynamoDB records, FAISS indexes) is either backed up with PITR or rebuildable from the repos. A DynamoDB wipe is painful but survivable — you can walk the EFS filesystem and reconstruct user/wiki records from the repo inventory.
-
-
---
-
-
## CI/CD
-
-
Code lives in a private GitHub repo. Deployment via GitHub Actions.
-
-
### Pipeline
-
-
```
-
git push to main
-
→ GitHub Actions:
-
1. Run tests (pytest for Python, vitest/jest for frontend)
-
2. Build artifacts (Lambda zip or container image, SPA bundle)
-
3. Deploy infrastructure changes (pulumi up)
-
4. Deploy Lambda code (zip upload or ECR image push)
-
5. Smoke test (hit health endpoint, create/read/delete a test page)
-
```
-
-
### Environment strategy
-
-
- **dev**: auto-deploy on push to `main`. Separate infrastructure stack (`pulumi stack select dev`). Own domain (`dev.wikibot.io`).
-
- **prod**: manual promotion (GitHub Actions workflow dispatch or tag-based). Separate Pulumi stack. `wikibot.io`.
+
Git repos are the source of truth. Everything else (platform DB records, FAISS indexes) is either backed up independently or rebuildable from the repos.
---
@@ 337,14 151,14 @@
### Data retention
-
User accounts and wiki data are retained indefinitely regardless of activity. Storage cost for an idle wiki is effectively zero (a few KB in DynamoDB, a few MB of git repo on EFS Infrequent Access). There is no reason to delete inactive accounts — it costs nothing to keep them and deleting user data is irreversible.
+
User accounts and wiki data are retained indefinitely regardless of activity. Storage cost for an idle wiki is effectively zero. There is no reason to delete inactive accounts — it costs nothing to keep them and deleting user data is irreversible.
### Account deletion
Users can delete their account from the dashboard. This:
1. Deletes all wikis owned by the user (repo, FAISS index, metadata)
2. Removes all ACL grants the user has on other wikis
-
3. Deletes the user record from the DynamoDB
+
3. Deletes the user record from the platform DB
4. Does NOT delete the auth provider account (Google/GitHub/etc.) — that's the user's own account
Deletion is permanent and irreversible. Require explicit confirmation ("type your username to confirm").
@@ 373,30 187,6 @@
**Launch**: OAuth-only accounts + tier limits (1 wiki, 500 pages, 3 collaborators) provide sufficient abuse prevention at low traffic. Public wiki routes are the only unauthenticated surface — acceptable risk at launch with near-zero users.
-
**Post-launch (when traffic justifies it)**: AWS WAF on API Gateway and CloudFront. IP-based rate limiting, geographic blocking, bot control, OWASP Top 10 managed rule sets. Adds ~$5-10/mo. Deploy when there's real traffic to protect.
-
-
**Per-user rate limiting (premium launch)**: When premium tier ships, add per-user throttling on API and MCP endpoints via API Gateway usage plans or WAF custom rules. Define specific limits when the need materializes.
-
-
---
-
-
## Open Questions
-
-
1. **EFS + Lambda performance**: The key Phase 0 question. Does EFS latency for git operations meet targets (<500ms read, <1s write warm)? Does VPC cold start stay under 5s total?
-
-
2. **Otterwiki on Lambda feasibility**: Otterwiki has filesystem assumptions beyond the git repo (config files, static assets). How much Mangum adaptation is needed? EFS satisfies most filesystem assumptions, but Flask-on-Lambda via Mangum still requires testing.
-
-
3. **Lambda package size**: Otterwiki + gitpython + FAISS + FastMCP + Mangum. If over 250MB zip limit, use Lambda container images (up to 10GB).
-
-
4. **Git library choice**: gitpython shells out to `git` (binary dependency — verify availability in Lambda runtime). dulwich is pure Python (no binary, different API, possibly slower). dulwich avoids the binary question entirely.
-
-
5. **MCP Streamable HTTP timeouts**: API Gateway caps at 29s. Most MCP operations complete in <5s, but semantic search with embedding generation could approach 10–15s. Verify this isn't a problem.
-
-
6. **Platform JWT signing key management**: RS256 keypair in Secrets Manager. Need to define key rotation strategy — do we support multiple valid keys (JWKS with `kid` header) for zero-downtime rotation, or is manual rotation with a maintenance window acceptable for MVP?
-
-
7. **WorkOS + FastMCP integration on Lambda**: The FastMCP WorkOS integration is documented but needs validation in our specific setup (Lambda + API Gateway + VPC). Known friction points: `client_secret_basic` default may conflict with some MCP clients, no RFC 8707 resource indicator support. Validate in Phase 0.
-
-
8. **Apple provider sub retrieval**: WorkOS exposes raw OAuth provider `sub` claims via API for Google, GitHub, Microsoft. Apple is undocumented. If we can't get Apple's raw sub, Apple users can't be migrated off WorkOS without re-authenticating. Verify in Phase 0.
-
-
9. **Otterwiki licensing**: MIT licensed — permissive, should be fine for commercial use. Confirm no additional contributor agreements or trademark restrictions.
+
**Post-launch (when traffic justifies it)**: IP-based rate limiting via reverse proxy (nginx/Caddy). Geographic blocking, bot control, OWASP Top 10 rule sets via WAF or application-level middleware.
-
10. **VPC endpoint costs** (resolved): SQS and Bedrock interface endpoints have been eliminated by switching to DynamoDB Streams (for async reindexing) and local MiniLM embeddings (for semantic search). The only remaining interface endpoint is Secrets Manager (~$7/mo/AZ, introduced in Phase 4). See [[Design/Async_Embedding_Pipeline]].
\
No newline at end of file
+
**Per-user rate limiting (premium launch)**: When premium tier ships, add per-user throttling on API and MCP endpoints. Define specific limits when the need materializes.