Commit aceac6
2026-03-13 19:28:30 Claude (MCP): [mcp] Add Tasks/Emergent for cold start re-instrumentation and CDN caching design| /dev/null .. Tasks/Emergent.md | |
| @@ 0,0 1,85 @@ | |
| + | --- |
| + | category: reference |
| + | tags: [tasks, emergent] |
| + | last_updated: 2026-03-13 |
| + | confidence: high |
| + | --- |
| + | |
| + | ## How to read this document |
| + | |
| + | - Emergent tasks arise during development but don't belong to a specific phase |
| + | - They may block, inform, or optimize work in any phase |
| + | - Tasks are numbered `E-{sequence}` |
| + | - Priority indicates urgency relative to current phase work |
| + | |
| + | --- |
| + | |
| + | ## Emergent Tasks |
| + | |
| + | ### E-1: Re-Instrument Cold Start INIT Phase |
| + | |
| + | **Priority:** High — blocks architectural decisions about cold start mitigation |
| + | **Discovered during:** Phase 2 (CDN caching design discussion) |
| + | **Relates to:** [[Dev/Phase_0_EFS_Benchmarks]], [[Design/Platform_Overview]] |
| + | |
| + | **Context:** |
| + | Phase 0 benchmarks measured cold starts at ~3,400ms and attributed ~2,400ms to "VPC ENI attach." However, AWS Hyperplane ENI (shipped 2019) pre-creates network interfaces at function creation time, not at invocation time. Current documentation and third-party benchmarks consistently report VPC overhead under 50–100ms for properly configured functions. The 2,400ms attribution is almost certainly incorrect. |
| + | |
| + | The actual cold start time is likely dominated by Python package initialization: loading dulwich, Flask/Otterwiki, Mangum, aws-xray-sdk, and all transitive dependencies from a 39MB deployment package. Possibly also EFS mount negotiation (NFS/TLS handshake to mount target). Without accurate instrumentation, any cold start mitigation strategy (provisioned concurrency, architecture changes, package optimization) is a guess. |
| + | |
| + | **Task:** |
| + | Re-run cold start benchmarks with fine-grained tracing of the INIT phase. Break down time spent in: |
| + | |
| + | 1. VPC/ENI setup (should be negligible with Hyperplane) |
| + | 2. EFS mount negotiation |
| + | 3. Python runtime startup |
| + | 4. Module imports (dulwich, Flask, Mangum, Otterwiki, aws-xray-sdk) |
| + | 5. Application initialization (framework setup, config loading) |
| + | |
| + | Use X-Ray subsegments or manual timing around import blocks and init steps. Compare with a minimal VPC Lambda (no EFS, no heavy imports) as a control. |
| + | |
| + | **Deliverables:** |
| + | - Updated [[Dev/Phase_0_EFS_Benchmarks]] with corrected INIT breakdown |
| + | - Identification of top 2–3 contributors to cold start latency |
| + | - Recommendation: package optimization, lazy imports, memory tuning, or architectural change |
| + | |
| + | **Acceptance criteria:** |
| + | - [ ] INIT phase broken into at least 4 measured segments |
| + | - [ ] Each segment's contribution to total cold start quantified (ms and %) |
| + | - [ ] Control Lambda (minimal VPC, no EFS) measured for baseline comparison |
| + | - [ ] Benchmark page updated with corrected attribution |
| + | |
| + | --- |
| + | |
| + | ### E-2: CDN Caching Layer Design |
| + | |
| + | **Priority:** Medium — improves page load UX independent of cold start fix |
| + | **Discovered during:** Phase 2 (page responsiveness discussion) |
| + | **Relates to:** [[Design/Platform_Overview]], [[Design/Operations]] |
| + | |
| + | **Context:** |
| + | Wiki pages are written infrequently (during Claude sessions via MCP) and read much more often (browsing, reference). CloudFront is already in the architecture for static SPA hosting but is not used to cache wiki page content. Adding a caching layer for page reads would reduce most page loads from ~270ms (warm Lambda) to ~10–50ms (edge serve), and reduce origin load. |
| + | |
| + | **Design decisions needed:** |
| + | |
| + | **Cache freshness strategy:** Short TTL (30–60s) on page HTML with `Cache-Control` headers from the origin. No invalidation API calls under normal operation — pages self-expire. Static assets (CSS, JS, fonts) use content-hashed filenames with long TTLs (1 year). Invalidation reserved for exceptional cases (page deletion, privacy). This avoids the invalidation cost problem: at scale (e.g. 1,000 active wikis × 5 writes/day), path-based invalidation would exceed the 1,000/month free tier and cost ~$220/month, growing linearly with write volume. |
| + | |
| + | **Auth-aware caching for private wikis:** Three options evaluated: |
| + | |
| + | 1. **CloudFront signed cookies** — most CloudFront-native; set after OAuth login, scoped to user's subdomain. CloudFront validates signature at edge before serving cached content. Signed cookie attributes are excluded from cache key, so all authenticated users of the same wiki share cached pages. |
| + | 2. **CloudFront Functions with JWT validation** — lightweight JS function on viewer-request validates JWT at edge using built-in crypto module + KeyValueStore for public key. Sub-millisecond execution, no extra cost. Works well for RS256 if public key verification fits within execution constraints. |
| + | 3. **Lambda@Edge** — most powerful, can do full OIDC flows, but heavier and more expensive. Overkill for token validation on cached content. |
| + | |
| + | Recommended approach: CloudFront Functions (option 2) for auth validation + short-TTL cache for page content. Needs validation that RS256 signature verification runs within CloudFront Functions execution limits. |
| + | |
| + | **MCP calls are not cached** — POST requests on a separate path pattern, always pass through to Lambda. |
| + | |
| + | **Deliverables:** |
| + | - Design page: `Design/CDN_Caching` |
| + | - CloudFront Functions prototype for JWT validation (validate RS256 fits within execution constraints) |
| + | - Estimate of cache hit ratio for typical wiki usage patterns |
| + | |
| + | **Acceptance criteria:** |
| + | - [ ] Design page documents cache strategy, auth approach, TTL rationale, and cost model |
| + | - [ ] CloudFront Functions JWT validation tested (RS256 performance confirmed or HS256 fallback documented) |
| + | - [ ] Cache behavior configuration specified for page content vs. static assets vs. MCP vs. API routes |