Properties

category: reference
tags: [design, security, encryption, privacy]
last_updated: 2026-03-14
confidence: high

E-3 Design Spike: Client-Side Encryption / Zero-Knowledge Storage

Status: Spike complete — recommendation below Relates to: Tasks/Emergent (E-3), Design/CDN_Read_Path, Design/Platform_Overview

Problem

The current privacy claim is "your wiki is private by default" — but the operator can read data at rest on EFS. For a product whose pitch is "memory for your agents," the data is inherently sensitive. This spike evaluates encryption approaches from per-user KMS keys through full zero-knowledge.

Storage Layers

Tenant data lives in two places:

EFS — git repos. Source of truth. Full page content, edit history, metadata.
S3 — rendered HTML fragments (per Design/CDN_Read_Path). Derived data for CDN serving. Less sensitive (no history, no metadata).

Any encryption strategy must be evaluated against both layers.

Approaches Evaluated

1. EFS Encryption at Rest (Single KMS Key)

Enable EFS's built-in encryption. One key for the entire filesystem.

Dimension	Assessment
Protects against	Physical disk theft at AWS facility
Does NOT protect against	Operator access, compromised IAM credentials, AWS insider
Effort	Checkbox — near zero
Cost	Free (AWS-managed key) or $1/month (CMK)
Feature impact	None
CDN interaction	None — encryption is transparent to all reads/writes

Verdict: Necessary baseline. Does not provide tenant isolation.

2. Per-Tenant SSE-KMS on S3 Fragments

Each tenant's S3 fragments encrypted with their own KMS CMK via SSE-KMS.

Dimension	Assessment
Protects against	Cross-tenant S3 access if IAM policy is misconfigured
Does NOT protect against	EFS access (the actual source of truth)
Effort	Low-medium (key lifecycle management, per-object key selection)
Cost	$1/month per CMK × N tenants, plus API call costs
Feature impact	None
CDN interaction	Assembly Lambda decrypts transparently; CloudFront caches plaintext HTML for 30-60s as designed

Verdict: Security theater. Fragments are derived data — protecting the derivative while the source (EFS) is unprotected adds complexity without meaningful security gain.

3. Separate EFS Filesystems Per Tenant

Each tenant gets their own EFS filesystem with its own KMS key.

Dimension	Assessment
Protects against	Tenant isolation at infrastructure level; operator access restricted per-key
Effort	Very high — per-tenant Lambda config or mux layer, mount target management
Cost	Mount targets: ~$0.05/hr/AZ each. At 1,000 tenants × 2 AZs: ~$36K/year in mount targets alone
Feature impact	Lambda can only mount one EFS access point per function — requires per-tenant functions or a routing layer
CDN interaction	None — read path uses S3 fragments, not EFS directly

Verdict: Strong isolation but operationally brutal and cost-prohibitive at scale.

4. Application-Level Encryption on EFS

Encrypt file contents before writing to git, decrypt after reading.

Dimension	Assessment
Protects against	Operator reading data at rest
Effort	High
Cost	KMS API calls for envelope encryption
Feature impact	Breaks git. Dulwich needs plaintext to compute SHAs, produce diffs, walk history. Encrypted blobs are opaque binary — no diffs, no blame, no log. The value of git-backed storage evaporates.
CDN interaction	Fragments would be rendered from decrypted content, so no impact on CDN path

Verdict: Incompatible with git-backed storage model.

5. Full Client-Side Encryption (Zero-Knowledge)

Content encrypted in the browser/agent before reaching the server. Server never sees plaintext.

Dimension	Assessment
Protects against	Everything — operator, AWS, infrastructure compromise
Effort	Very high — new client-side crypto layer, key management, SPA rewrite
Feature impact	Severe: server-side search impossible (can't embed ciphertext), MCP agents need key provisioning (server sees plaintext transiently during sessions), web UI must decrypt in browser via Web Crypto API
CDN interaction	Incompatible with CDN caching. CloudFront cannot cache encrypted content that requires per-user decryption keys. Either disable caching (defeats E-2) or cache ciphertext and decrypt client-side (requires SPA, Option D from CDN design).

Verdict: Architecturally incompatible with current design. Would require rethinking storage (not git), rendering (SPA), search (client-side or encrypted indexes), MCP key management, and CDN strategy. This is a ground-up rebuild, not a feature addition.

CDN Caching Interaction

Regardless of at-rest encryption approach, the CDN read path (per Design/CDN_Read_Path) caches decrypted HTML at CloudFront edge nodes for 30-60s. This is architecturally standard (every encrypted-at-rest + CDN system works this way), but it means:

AWS CloudFront infrastructure sees plaintext during the cache window
Any claim of "zero-knowledge" is false while CDN caching is active
Auth (CloudFront Functions JWT validation) gates who can read the cache, but the content exists in plaintext at the edge
Per-user KMS does not change this — content is decrypted before it reaches CloudFront regardless of who holds the key

CDN caching and zero-knowledge are fundamentally in tension. You can have fast reads or end-to-end encryption, not both.

KMS Cost Model at Scale

Item	Cost
CMK per tenant	$1/month each
1,000 tenants	$1,000/month for keys alone
KMS API calls (SSE-KMS)	$0.03 per 10,000 requests
Assembly Lambda: 3 fragment fetches per cache miss	Multiplied by request volume and short TTLs

S3 Bucket Keys reduce API calls ~99% within a single CMK, but per-tenant keys means per-tenant bucket key generation — cross-tenant savings don't apply.

Recommendation

For Launch

Enable EFS encryption at rest (single AWS-managed key). Checkbox, zero cost, zero feature impact.
Enable S3 default encryption (SSE-S3). Also a checkbox.
CloudTrail logging on all EFS and S3 data access.
Restrictive IAM policies: Lambda role can access EFS, human roles cannot without break-glass procedure.
Be honest in the privacy policy: "Data encrypted at rest. Operator access restricted by IAM policy and audit logging. We cannot read your data without a deliberate policy override that is logged."

This is what B2B customers actually evaluate — not zero-knowledge, but "can you prove who accessed my data and when."

Per-User KMS Becomes Viable When

The storage model changes to something that supports per-tenant keys natively:

Per-tenant S3 buckets for git repos (via git-remote-s3 or similar) — S3 SSE-KMS works naturally per-bucket
DynamoDB with the DynamoDB Encryption Client for item-level encryption
Any storage backend where the encryption boundary aligns with the tenant boundary

This is a storage architecture decision, not a bolt-on to the current EFS/git model.

Full Zero-Knowledge: Conditions for Revisiting

Worth pursuing if/when:

The product has enough traction that trust is a competitive differentiator
The storage model has moved off EFS
There's willingness to sacrifice server-side search (or invest in encrypted indexes à la Proton Mail)
MCP key provisioning has a credible UX (user provisions key per agent session)

Precedents Referenced

Standard Notes — PBKDF2-derived key, client-side encryption, client-side search only. Simple data model (text blobs, no git).
Proton Drive/Mail — per-user asymmetric keys, Web Crypto API in browser. Invested heavily in encrypted search indexes.
git-crypt — encrypts blobs in git repo, decrypts on clone with local key. Works for developer workflows, not web UIs.