Properties
category: reference
tags: [design, security, encryption, privacy]
last_updated: 2026-03-14
confidence: high

E-3 Design Spike: Client-Side Encryption / Zero-Knowledge Storage

Status: Spike complete — recommendation below Relates to: Tasks/Emergent (E-3), Design/CDN_Read_Path, Design/Platform_Overview

Problem

The current privacy claim is "your wiki is private by default" — but the operator can read data at rest on EFS. For a product whose pitch is "memory for your agents," the data is inherently sensitive. This spike evaluates encryption approaches from per-user KMS keys through full zero-knowledge.

Storage Layers

Tenant data lives in two places:

  1. EFS — git repos. Source of truth. Full page content, edit history, metadata.
  2. S3 — rendered HTML fragments (per Design/CDN_Read_Path). Derived data for CDN serving. Less sensitive (no history, no metadata).

Any encryption strategy must be evaluated against both layers.

Approaches Evaluated

1. EFS Encryption at Rest (Single KMS Key)

Enable EFS's built-in encryption. One key for the entire filesystem.

Dimension Assessment
Protects against Physical disk theft at AWS facility
Does NOT protect against Operator access, compromised IAM credentials, AWS insider
Effort Checkbox — near zero
Cost Free (AWS-managed key) or $1/month (CMK)
Feature impact None
CDN interaction None — encryption is transparent to all reads/writes

Verdict: Necessary baseline. Does not provide tenant isolation.

2. Per-Tenant SSE-KMS on S3 Fragments

Each tenant's S3 fragments encrypted with their own KMS CMK via SSE-KMS.

Dimension Assessment
Protects against Cross-tenant S3 access if IAM policy is misconfigured
Does NOT protect against EFS access (the actual source of truth)
Effort Low-medium (key lifecycle management, per-object key selection)
Cost $1/month per CMK × N tenants, plus API call costs
Feature impact None
CDN interaction Assembly Lambda decrypts transparently; CloudFront caches plaintext HTML for 30-60s as designed

Verdict: Security theater. Fragments are derived data — protecting the derivative while the source (EFS) is unprotected adds complexity without meaningful security gain.

3. Separate EFS Filesystems Per Tenant

Each tenant gets their own EFS filesystem with its own KMS key.

Dimension Assessment
Protects against Tenant isolation at infrastructure level; operator access restricted per-key
Effort Very high — per-tenant Lambda config or mux layer, mount target management
Cost Mount targets: ~\(0.05/hr/AZ each. At 1,000 tenants × 2 AZs: ~\)36K/year in mount targets alone
Feature impact Lambda can only mount one EFS access point per function — requires per-tenant functions or a routing layer
CDN interaction None — read path uses S3 fragments, not EFS directly

Verdict: Strong isolation but operationally brutal and cost-prohibitive at scale.

4. Application-Level Encryption on EFS

Encrypt file contents before writing to git, decrypt after reading.

Dimension Assessment
Protects against Operator reading data at rest
Effort High
Cost KMS API calls for envelope encryption
Feature impact Breaks git. Dulwich needs plaintext to compute SHAs, produce diffs, walk history. Encrypted blobs are opaque binary — no diffs, no blame, no log. The value of git-backed storage evaporates.
CDN interaction Fragments would be rendered from decrypted content, so no impact on CDN path

Verdict: Incompatible with git-backed storage model.

5. Full Client-Side Encryption (Zero-Knowledge)

Content encrypted in the browser/agent before reaching the server. Server never sees plaintext.

Dimension Assessment
Protects against Everything — operator, AWS, infrastructure compromise
Effort Very high — new client-side crypto layer, key management, SPA rewrite
Feature impact Severe: server-side search impossible (can't embed ciphertext), MCP agents need key provisioning (server sees plaintext transiently during sessions), web UI must decrypt in browser via Web Crypto API
CDN interaction Incompatible with CDN caching. CloudFront cannot cache encrypted content that requires per-user decryption keys. Either disable caching (defeats E-2) or cache ciphertext and decrypt client-side (requires SPA, Option D from CDN design).

Verdict: Architecturally incompatible with current design. Would require rethinking storage (not git), rendering (SPA), search (client-side or encrypted indexes), MCP key management, and CDN strategy. This is a ground-up rebuild, not a feature addition.

CDN Caching Interaction

Regardless of at-rest encryption approach, the CDN read path (per Design/CDN_Read_Path) caches decrypted HTML at CloudFront edge nodes for 30-60s. This is architecturally standard (every encrypted-at-rest + CDN system works this way), but it means:

  • AWS CloudFront infrastructure sees plaintext during the cache window
  • Any claim of "zero-knowledge" is false while CDN caching is active
  • Auth (CloudFront Functions JWT validation) gates who can read the cache, but the content exists in plaintext at the edge
  • Per-user KMS does not change this — content is decrypted before it reaches CloudFront regardless of who holds the key

CDN caching and zero-knowledge are fundamentally in tension. You can have fast reads or end-to-end encryption, not both.

KMS Cost Model at Scale

Item Cost
CMK per tenant $1/month each
1,000 tenants $1,000/month for keys alone
KMS API calls (SSE-KMS) $0.03 per 10,000 requests
Assembly Lambda: 3 fragment fetches per cache miss Multiplied by request volume and short TTLs

S3 Bucket Keys reduce API calls ~99% within a single CMK, but per-tenant keys means per-tenant bucket key generation — cross-tenant savings don't apply.

Recommendation

For Launch

  1. Enable EFS encryption at rest (single AWS-managed key). Checkbox, zero cost, zero feature impact.
  2. Enable S3 default encryption (SSE-S3). Also a checkbox.
  3. CloudTrail logging on all EFS and S3 data access.
  4. Restrictive IAM policies: Lambda role can access EFS, human roles cannot without break-glass procedure.
  5. Be honest in the privacy policy: "Data encrypted at rest. Operator access restricted by IAM policy and audit logging. We cannot read your data without a deliberate policy override that is logged."

This is what B2B customers actually evaluate — not zero-knowledge, but "can you prove who accessed my data and when."

Per-User KMS Becomes Viable When

The storage model changes to something that supports per-tenant keys natively:

  • Per-tenant S3 buckets for git repos (via git-remote-s3 or similar) — S3 SSE-KMS works naturally per-bucket
  • DynamoDB with the DynamoDB Encryption Client for item-level encryption
  • Any storage backend where the encryption boundary aligns with the tenant boundary

This is a storage architecture decision, not a bolt-on to the current EFS/git model.

Full Zero-Knowledge: Conditions for Revisiting

Worth pursuing if/when:

  • The product has enough traction that trust is a competitive differentiator
  • The storage model has moved off EFS
  • There's willingness to sacrifice server-side search (or invest in encrypted indexes à la Proton Mail)
  • MCP key provisioning has a credible UX (user provisions key per agent session)

Precedents Referenced

  • Standard Notes — PBKDF2-derived key, client-side encryption, client-side search only. Simple data model (text blobs, no git).
  • Proton Drive/Mail — per-user asymmetric keys, Web Crypto API in browser. Invested heavily in encrypted search indexes.
  • git-crypt — encrypts blobs in git repo, decrypts on clone with local key. Works for developer workflows, not web UIs.