commit 4f2a4d

Commit `4f2a4d`

2026-03-14 19:03:54 Claude (MCP): [mcp] E-3 design spike: encryption approaches evaluated against EFS/S3/CDN architecture

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

`/dev/null` .. `Design/E-3_Encryption_Spike.md`
@@ 0,0 1,156 @@
+	---
+	category: reference
+	tags: [design, security, encryption, privacy]
+	last_updated: 2026-03-14
+	confidence: high
+	---
+
+	# E-3 Design Spike: Client-Side Encryption / Zero-Knowledge Storage
+
+	Status: Spike complete — recommendation below
+	Relates to: [[Tasks/Emergent]] (E-3), [[Design/CDN_Read_Path]], [[Design/Platform_Overview]]
+
+	## Problem
+
+	The current privacy claim is "your wiki is private by default" — but the operator can read data at rest on EFS. For a product whose pitch is "memory for your agents," the data is inherently sensitive. This spike evaluates encryption approaches from per-user KMS keys through full zero-knowledge.
+
+	## Storage Layers
+
+	Tenant data lives in two places:
+
+	1. EFS — git repos. Source of truth. Full page content, edit history, metadata.
+	2. S3 — rendered HTML fragments (per [[Design/CDN_Read_Path]]). Derived data for CDN serving. Less sensitive (no history, no metadata).
+
+	Any encryption strategy must be evaluated against both layers.
+
+	## Approaches Evaluated
+
+	### 1. EFS Encryption at Rest (Single KMS Key)
+
+	Enable EFS's built-in encryption. One key for the entire filesystem.
+
+	\| Dimension \| Assessment \|
+	\|---\|---\|
+	\| Protects against \| Physical disk theft at AWS facility \|
+	\| Does NOT protect against \| Operator access, compromised IAM credentials, AWS insider \|
+	\| Effort \| Checkbox — near zero \|
+	\| Cost \| Free (AWS-managed key) or $1/month (CMK) \|
+	\| Feature impact \| None \|
+	\| CDN interaction \| None — encryption is transparent to all reads/writes \|
+
+	Verdict: Necessary baseline. Does not provide tenant isolation.
+
+	### 2. Per-Tenant SSE-KMS on S3 Fragments
+
+	Each tenant's S3 fragments encrypted with their own KMS CMK via SSE-KMS.
+
+	\| Dimension \| Assessment \|
+	\|---\|---\|
+	\| Protects against \| Cross-tenant S3 access if IAM policy is misconfigured \|
+	\| Does NOT protect against \| EFS access (the actual source of truth) \|
+	\| Effort \| Low-medium (key lifecycle management, per-object key selection) \|
+	\| Cost \| $1/month per CMK × N tenants, plus API call costs \|
+	\| Feature impact \| None \|
+	\| CDN interaction \| Assembly Lambda decrypts transparently; CloudFront caches plaintext HTML for 30-60s as designed \|
+
+	Verdict: Security theater. Fragments are derived data — protecting the derivative while the source (EFS) is unprotected adds complexity without meaningful security gain.
+
+	### 3. Separate EFS Filesystems Per Tenant
+
+	Each tenant gets their own EFS filesystem with its own KMS key.
+
+	\| Dimension \| Assessment \|
+	\|---\|---\|
+	\| Protects against \| Tenant isolation at infrastructure level; operator access restricted per-key \|
+	\| Effort \| Very high — per-tenant Lambda config or mux layer, mount target management \|
+	\| Cost \| Mount targets: ~$0.05/hr/AZ each. At 1,000 tenants × 2 AZs: ~$36K/year in mount targets alone \|
+	\| Feature impact \| Lambda can only mount one EFS access point per function — requires per-tenant functions or a routing layer \|
+	\| CDN interaction \| None — read path uses S3 fragments, not EFS directly \|
+
+	Verdict: Strong isolation but operationally brutal and cost-prohibitive at scale.
+
+	### 4. Application-Level Encryption on EFS
+
+	Encrypt file contents before writing to git, decrypt after reading.
+
+	\| Dimension \| Assessment \|
+	\|---\|---\|
+	\| Protects against \| Operator reading data at rest \|
+	\| Effort \| High \|
+	\| Cost \| KMS API calls for envelope encryption \|
+	\| Feature impact \| Breaks git. Dulwich needs plaintext to compute SHAs, produce diffs, walk history. Encrypted blobs are opaque binary — no diffs, no blame, no log. The value of git-backed storage evaporates. \|
+	\| CDN interaction \| Fragments would be rendered from decrypted content, so no impact on CDN path \|
+
+	Verdict: Incompatible with git-backed storage model.
+
+	### 5. Full Client-Side Encryption (Zero-Knowledge)
+
+	Content encrypted in the browser/agent before reaching the server. Server never sees plaintext.
+
+	\| Dimension \| Assessment \|
+	\|---\|---\|
+	\| Protects against \| Everything — operator, AWS, infrastructure compromise \|
+	\| Effort \| Very high — new client-side crypto layer, key management, SPA rewrite \|
+	\| Feature impact \| Severe: server-side search impossible (can't embed ciphertext), MCP agents need key provisioning (server sees plaintext transiently during sessions), web UI must decrypt in browser via Web Crypto API \|
+	\| CDN interaction \| Incompatible with CDN caching. CloudFront cannot cache encrypted content that requires per-user decryption keys. Either disable caching (defeats E-2) or cache ciphertext and decrypt client-side (requires SPA, Option D from CDN design). \|
+
+	Verdict: Architecturally incompatible with current design. Would require rethinking storage (not git), rendering (SPA), search (client-side or encrypted indexes), MCP key management, and CDN strategy. This is a ground-up rebuild, not a feature addition.
+
+	## CDN Caching Interaction
+
+	Regardless of at-rest encryption approach, the CDN read path (per [[Design/CDN_Read_Path]]) caches decrypted HTML at CloudFront edge nodes for 30-60s. This is architecturally standard (every encrypted-at-rest + CDN system works this way), but it means:
+
+	- AWS CloudFront infrastructure sees plaintext during the cache window
+	- Any claim of "zero-knowledge" is false while CDN caching is active
+	- Auth (CloudFront Functions JWT validation) gates who can read the cache, but the content exists in plaintext at the edge
+	- Per-user KMS does not change this — content is decrypted before it reaches CloudFront regardless of who holds the key
+
+	CDN caching and zero-knowledge are fundamentally in tension. You can have fast reads or end-to-end encryption, not both.
+
+	## KMS Cost Model at Scale
+
+	\| Item \| Cost \|
+	\|---\|---\|
+	\| CMK per tenant \| $1/month each \|
+	\| 1,000 tenants \| $1,000/month for keys alone \|
+	\| KMS API calls (SSE-KMS) \| $0.03 per 10,000 requests \|
+	\| Assembly Lambda: 3 fragment fetches per cache miss \| Multiplied by request volume and short TTLs \|
+
+	S3 Bucket Keys reduce API calls ~99% within a single CMK, but per-tenant keys means per-tenant bucket key generation — cross-tenant savings don't apply.
+
+	## Recommendation
+
+	### For Launch
+
+	1. Enable EFS encryption at rest (single AWS-managed key). Checkbox, zero cost, zero feature impact.
+	2. Enable S3 default encryption (SSE-S3). Also a checkbox.
+	3. CloudTrail logging on all EFS and S3 data access.
+	4. Restrictive IAM policies: Lambda role can access EFS, human roles cannot without break-glass procedure.
+	5. Be honest in the privacy policy: "Data encrypted at rest. Operator access restricted by IAM policy and audit logging. We cannot read your data without a deliberate policy override that is logged."
+
+	This is what B2B customers actually evaluate — not zero-knowledge, but "can you prove who accessed my data and when."
+
+	### Per-User KMS Becomes Viable When
+
+	The storage model changes to something that supports per-tenant keys natively:
+
+	- Per-tenant S3 buckets for git repos (via `git-remote-s3` or similar) — S3 SSE-KMS works naturally per-bucket
+	- DynamoDB with the DynamoDB Encryption Client for item-level encryption
+	- Any storage backend where the encryption boundary aligns with the tenant boundary
+
+	This is a storage architecture decision, not a bolt-on to the current EFS/git model.
+
+	### Full Zero-Knowledge: Conditions for Revisiting
+
+	Worth pursuing if/when:
+
+	- The product has enough traction that trust is a competitive differentiator
+	- The storage model has moved off EFS
+	- There's willingness to sacrifice server-side search (or invest in encrypted indexes à la Proton Mail)
+	- MCP key provisioning has a credible UX (user provisions key per agent session)
+
+	## Precedents Referenced
+
+	- Standard Notes — PBKDF2-derived key, client-side encryption, client-side search only. Simple data model (text blobs, no git).
+	- Proton Drive/Mail — per-user asymmetric keys, Web Crypto API in browser. Invested heavily in encrypted search indexes.
+	- git-crypt — encrypts blobs in git repo, decrypts on clone with local key. Works for developer workflows, not web UIs.

Commit 4f2a4d

Commit `4f2a4d`