Properties
category: reference tags: [design, security, encryption, privacy] last_updated: 2026-03-14 confidence: high
E-3 Design Spike: Client-Side Encryption / Zero-Knowledge Storage
Status: Spike complete — recommendation below Relates to: Tasks/Emergent (E-3), Design/CDN_Read_Path, Design/Platform_Overview
Problem
The current privacy claim is "your wiki is private by default" — but the operator can read data at rest on EFS. For a product whose pitch is "memory for your agents," the data is inherently sensitive. This spike evaluates encryption approaches from per-user KMS keys through full zero-knowledge.
Storage Layers
Tenant data lives in two places:
- EFS — git repos. Source of truth. Full page content, edit history, metadata.
- S3 — rendered HTML fragments (per Design/CDN_Read_Path). Derived data for CDN serving. Less sensitive (no history, no metadata).
Any encryption strategy must be evaluated against both layers.
Approaches Evaluated
1. EFS Encryption at Rest (Single KMS Key)
Enable EFS's built-in encryption. One key for the entire filesystem.
| Dimension | Assessment |
|---|---|
| Protects against | Physical disk theft at AWS facility |
| Does NOT protect against | Operator access, compromised IAM credentials, AWS insider |
| Effort | Checkbox — near zero |
| Cost | Free (AWS-managed key) or $1/month (CMK) |
| Feature impact | None |
| CDN interaction | None — encryption is transparent to all reads/writes |
Verdict: Necessary baseline. Does not provide tenant isolation.
2. Per-Tenant SSE-KMS on S3 Fragments
Each tenant's S3 fragments encrypted with their own KMS CMK via SSE-KMS.
| Dimension | Assessment |
|---|---|
| Protects against | Cross-tenant S3 access if IAM policy is misconfigured |
| Does NOT protect against | EFS access (the actual source of truth) |
| Effort | Low-medium (key lifecycle management, per-object key selection) |
| Cost | $1/month per CMK × N tenants, plus API call costs |
| Feature impact | None |
| CDN interaction | Assembly Lambda decrypts transparently; CloudFront caches plaintext HTML for 30-60s as designed |
Verdict: Security theater. Fragments are derived data — protecting the derivative while the source (EFS) is unprotected adds complexity without meaningful security gain.
3. Separate EFS Filesystems Per Tenant
Each tenant gets their own EFS filesystem with its own KMS key.
| Dimension | Assessment |
|---|---|
| Protects against | Tenant isolation at infrastructure level; operator access restricted per-key |
| Effort | Very high — per-tenant Lambda config or mux layer, mount target management |
| Cost | Mount targets: ~\(0.05/hr/AZ each. At 1,000 tenants × 2 AZs: ~\)36K/year in mount targets alone |
| Feature impact | Lambda can only mount one EFS access point per function — requires per-tenant functions or a routing layer |
| CDN interaction | None — read path uses S3 fragments, not EFS directly |
Verdict: Strong isolation but operationally brutal and cost-prohibitive at scale.
4. Application-Level Encryption on EFS
Encrypt file contents before writing to git, decrypt after reading.
| Dimension | Assessment |
|---|---|
| Protects against | Operator reading data at rest |
| Effort | High |
| Cost | KMS API calls for envelope encryption |
| Feature impact | Breaks git. Dulwich needs plaintext to compute SHAs, produce diffs, walk history. Encrypted blobs are opaque binary — no diffs, no blame, no log. The value of git-backed storage evaporates. |
| CDN interaction | Fragments would be rendered from decrypted content, so no impact on CDN path |
Verdict: Incompatible with git-backed storage model.
5. Full Client-Side Encryption (Zero-Knowledge)
Content encrypted in the browser/agent before reaching the server. Server never sees plaintext.
| Dimension | Assessment |
|---|---|
| Protects against | Everything — operator, AWS, infrastructure compromise |
| Effort | Very high — new client-side crypto layer, key management, SPA rewrite |
| Feature impact | Severe: server-side search impossible (can't embed ciphertext), MCP agents need key provisioning (server sees plaintext transiently during sessions), web UI must decrypt in browser via Web Crypto API |
| CDN interaction | Incompatible with CDN caching. CloudFront cannot cache encrypted content that requires per-user decryption keys. Either disable caching (defeats E-2) or cache ciphertext and decrypt client-side (requires SPA, Option D from CDN design). |
Verdict: Architecturally incompatible with current design. Would require rethinking storage (not git), rendering (SPA), search (client-side or encrypted indexes), MCP key management, and CDN strategy. This is a ground-up rebuild, not a feature addition.
CDN Caching Interaction
Regardless of at-rest encryption approach, the CDN read path (per Design/CDN_Read_Path) caches decrypted HTML at CloudFront edge nodes for 30-60s. This is architecturally standard (every encrypted-at-rest + CDN system works this way), but it means:
- AWS CloudFront infrastructure sees plaintext during the cache window
- Any claim of "zero-knowledge" is false while CDN caching is active
- Auth (CloudFront Functions JWT validation) gates who can read the cache, but the content exists in plaintext at the edge
- Per-user KMS does not change this — content is decrypted before it reaches CloudFront regardless of who holds the key
CDN caching and zero-knowledge are fundamentally in tension. You can have fast reads or end-to-end encryption, not both.
KMS Cost Model at Scale
| Item | Cost |
|---|---|
| CMK per tenant | $1/month each |
| 1,000 tenants | $1,000/month for keys alone |
| KMS API calls (SSE-KMS) | $0.03 per 10,000 requests |
| Assembly Lambda: 3 fragment fetches per cache miss | Multiplied by request volume and short TTLs |
S3 Bucket Keys reduce API calls ~99% within a single CMK, but per-tenant keys means per-tenant bucket key generation — cross-tenant savings don't apply.
Recommendation
For Launch
- Enable EFS encryption at rest (single AWS-managed key). Checkbox, zero cost, zero feature impact.
- Enable S3 default encryption (SSE-S3). Also a checkbox.
- CloudTrail logging on all EFS and S3 data access.
- Restrictive IAM policies: Lambda role can access EFS, human roles cannot without break-glass procedure.
- Be honest in the privacy policy: "Data encrypted at rest. Operator access restricted by IAM policy and audit logging. We cannot read your data without a deliberate policy override that is logged."
This is what B2B customers actually evaluate — not zero-knowledge, but "can you prove who accessed my data and when."
Per-User KMS Becomes Viable When
The storage model changes to something that supports per-tenant keys natively:
- Per-tenant S3 buckets for git repos (via
git-remote-s3or similar) — S3 SSE-KMS works naturally per-bucket - DynamoDB with the DynamoDB Encryption Client for item-level encryption
- Any storage backend where the encryption boundary aligns with the tenant boundary
This is a storage architecture decision, not a bolt-on to the current EFS/git model.
Full Zero-Knowledge: Conditions for Revisiting
Worth pursuing if/when:
- The product has enough traction that trust is a competitive differentiator
- The storage model has moved off EFS
- There's willingness to sacrifice server-side search (or invest in encrypted indexes à la Proton Mail)
- MCP key provisioning has a credible UX (user provisions key per agent session)
Precedents Referenced
- Standard Notes — PBKDF2-derived key, client-side encryption, client-side search only. Simple data model (text blobs, no git).
- Proton Drive/Mail — per-user asymmetric keys, Web Crypto API in browser. Invested heavily in encrypted search indexes.
- git-crypt — encrypts blobs in git repo, decrypts on clone with local key. Works for developer workflows, not web UIs.