Blame

4f2a4d Claude (MCP) 2026-03-14 19:03:54
[mcp] E-3 design spike: encryption approaches evaluated against EFS/S3/CDN architecture
1
---
2
category: reference
3
tags: [design, security, encryption, privacy]
4
last_updated: 2026-03-14
5
confidence: high
6
---
7
8
# E-3 Design Spike: Client-Side Encryption / Zero-Knowledge Storage
9
10
**Status:** Spike complete — recommendation below
11
**Relates to:** [[Tasks/Emergent]] (E-3), [[Design/CDN_Read_Path]], [[Design/Platform_Overview]]
12
13
## Problem
14
15
The current privacy claim is "your wiki is private by default" — but the operator can read data at rest on EFS. For a product whose pitch is "memory for your agents," the data is inherently sensitive. This spike evaluates encryption approaches from per-user KMS keys through full zero-knowledge.
16
17
## Storage Layers
18
19
Tenant data lives in two places:
20
21
1. **EFS** — git repos. Source of truth. Full page content, edit history, metadata.
22
2. **S3** — rendered HTML fragments (per [[Design/CDN_Read_Path]]). Derived data for CDN serving. Less sensitive (no history, no metadata).
23
24
Any encryption strategy must be evaluated against both layers.
25
26
## Approaches Evaluated
27
28
### 1. EFS Encryption at Rest (Single KMS Key)
29
30
Enable EFS's built-in encryption. One key for the entire filesystem.
31
32
| Dimension | Assessment |
33
|---|---|
34
| Protects against | Physical disk theft at AWS facility |
35
| Does NOT protect against | Operator access, compromised IAM credentials, AWS insider |
36
| Effort | Checkbox — near zero |
37
| Cost | Free (AWS-managed key) or $1/month (CMK) |
38
| Feature impact | None |
39
| CDN interaction | None — encryption is transparent to all reads/writes |
40
41
**Verdict:** Necessary baseline. Does not provide tenant isolation.
42
43
### 2. Per-Tenant SSE-KMS on S3 Fragments
44
45
Each tenant's S3 fragments encrypted with their own KMS CMK via SSE-KMS.
46
47
| Dimension | Assessment |
48
|---|---|
49
| Protects against | Cross-tenant S3 access if IAM policy is misconfigured |
50
| Does NOT protect against | EFS access (the actual source of truth) |
51
| Effort | Low-medium (key lifecycle management, per-object key selection) |
52
| Cost | $1/month per CMK × N tenants, plus API call costs |
53
| Feature impact | None |
54
| CDN interaction | Assembly Lambda decrypts transparently; CloudFront caches plaintext HTML for 30-60s as designed |
55
56
**Verdict:** Security theater. Fragments are derived data — protecting the derivative while the source (EFS) is unprotected adds complexity without meaningful security gain.
57
58
### 3. Separate EFS Filesystems Per Tenant
59
60
Each tenant gets their own EFS filesystem with its own KMS key.
61
62
| Dimension | Assessment |
63
|---|---|
64
| Protects against | Tenant isolation at infrastructure level; operator access restricted per-key |
65
| Effort | Very high — per-tenant Lambda config or mux layer, mount target management |
66
| Cost | Mount targets: ~$0.05/hr/AZ each. At 1,000 tenants × 2 AZs: ~$36K/year in mount targets alone |
67
| Feature impact | Lambda can only mount one EFS access point per function — requires per-tenant functions or a routing layer |
68
| CDN interaction | None — read path uses S3 fragments, not EFS directly |
69
70
**Verdict:** Strong isolation but operationally brutal and cost-prohibitive at scale.
71
72
### 4. Application-Level Encryption on EFS
73
74
Encrypt file contents before writing to git, decrypt after reading.
75
76
| Dimension | Assessment |
77
|---|---|
78
| Protects against | Operator reading data at rest |
79
| Effort | High |
80
| Cost | KMS API calls for envelope encryption |
81
| Feature impact | **Breaks git.** Dulwich needs plaintext to compute SHAs, produce diffs, walk history. Encrypted blobs are opaque binary — no diffs, no blame, no log. The value of git-backed storage evaporates. |
82
| CDN interaction | Fragments would be rendered from decrypted content, so no impact on CDN path |
83
84
**Verdict:** Incompatible with git-backed storage model.
85
86
### 5. Full Client-Side Encryption (Zero-Knowledge)
87
88
Content encrypted in the browser/agent before reaching the server. Server never sees plaintext.
89
90
| Dimension | Assessment |
91
|---|---|
92
| Protects against | Everything — operator, AWS, infrastructure compromise |
93
| Effort | Very high — new client-side crypto layer, key management, SPA rewrite |
94
| Feature impact | Severe: server-side search impossible (can't embed ciphertext), MCP agents need key provisioning (server sees plaintext transiently during sessions), web UI must decrypt in browser via Web Crypto API |
95
| CDN interaction | **Incompatible with CDN caching.** CloudFront cannot cache encrypted content that requires per-user decryption keys. Either disable caching (defeats E-2) or cache ciphertext and decrypt client-side (requires SPA, Option D from CDN design). |
96
97
**Verdict:** Architecturally incompatible with current design. Would require rethinking storage (not git), rendering (SPA), search (client-side or encrypted indexes), MCP key management, and CDN strategy. This is a ground-up rebuild, not a feature addition.
98
99
## CDN Caching Interaction
100
101
Regardless of at-rest encryption approach, the CDN read path (per [[Design/CDN_Read_Path]]) caches **decrypted HTML** at CloudFront edge nodes for 30-60s. This is architecturally standard (every encrypted-at-rest + CDN system works this way), but it means:
102
103
- AWS CloudFront infrastructure sees plaintext during the cache window
104
- Any claim of "zero-knowledge" is false while CDN caching is active
105
- Auth (CloudFront Functions JWT validation) gates who can read the cache, but the content exists in plaintext at the edge
106
- Per-user KMS does not change this — content is decrypted before it reaches CloudFront regardless of who holds the key
107
108
CDN caching and zero-knowledge are fundamentally in tension. You can have fast reads or end-to-end encryption, not both.
109
110
## KMS Cost Model at Scale
111
112
| Item | Cost |
113
|---|---|
114
| CMK per tenant | $1/month each |
115
| 1,000 tenants | $1,000/month for keys alone |
116
| KMS API calls (SSE-KMS) | $0.03 per 10,000 requests |
117
| Assembly Lambda: 3 fragment fetches per cache miss | Multiplied by request volume and short TTLs |
118
119
S3 Bucket Keys reduce API calls ~99% within a single CMK, but per-tenant keys means per-tenant bucket key generation — cross-tenant savings don't apply.
120
121
## Recommendation
122
123
### For Launch
124
125
1. **Enable EFS encryption at rest** (single AWS-managed key). Checkbox, zero cost, zero feature impact.
126
2. **Enable S3 default encryption** (SSE-S3). Also a checkbox.
127
3. **CloudTrail logging** on all EFS and S3 data access.
128
4. **Restrictive IAM policies**: Lambda role can access EFS, human roles cannot without break-glass procedure.
129
5. **Be honest in the privacy policy**: "Data encrypted at rest. Operator access restricted by IAM policy and audit logging. We cannot read your data without a deliberate policy override that is logged."
130
131
This is what B2B customers actually evaluate — not zero-knowledge, but "can you prove who accessed my data and when."
132
133
### Per-User KMS Becomes Viable When
134
135
The storage model changes to something that supports per-tenant keys natively:
136
137
- **Per-tenant S3 buckets** for git repos (via `git-remote-s3` or similar) — S3 SSE-KMS works naturally per-bucket
138
- **DynamoDB** with the DynamoDB Encryption Client for item-level encryption
139
- Any storage backend where the encryption boundary aligns with the tenant boundary
140
141
This is a storage architecture decision, not a bolt-on to the current EFS/git model.
142
143
### Full Zero-Knowledge: Conditions for Revisiting
144
145
Worth pursuing if/when:
146
147
- The product has enough traction that trust is a competitive differentiator
148
- The storage model has moved off EFS
149
- There's willingness to sacrifice server-side search (or invest in encrypted indexes à la Proton Mail)
150
- MCP key provisioning has a credible UX (user provisions key per agent session)
151
152
## Precedents Referenced
153
154
- **Standard Notes** — PBKDF2-derived key, client-side encryption, client-side search only. Simple data model (text blobs, no git).
155
- **Proton Drive/Mail** — per-user asymmetric keys, Web Crypto API in browser. Invested heavily in encrypted search indexes.
156
- **git-crypt** — encrypts blobs in git repo, decrypts on clone with local key. Works for developer workflows, not web UIs.