Properties
category: design
tags: [infrastructure, caddy, auth, multi-tenant]
last_updated: 2026-03-19
confidence: medium

Custom Domains

Allow users to serve their wiki from a domain they own (e.g., wiki.example.com) instead of {slug}.robot.wtf.

Scope

  • Subdomains only for v1 (e.g., wiki.example.com). Apex domains (example.com) require ALIAS/ANAME records which are provider-dependent and not universally supported.
  • One custom domain per wiki. The schema supports multiple, but the UI enforces one. Can relax later.
  • MCP works unchanged through custom domains (bearer token auth, no cookie dependency).

Database Schema

New custom_domains table in robot.db:

CREATE TABLE IF NOT EXISTS custom_domains (
    domain TEXT PRIMARY KEY,
    wiki_slug TEXT NOT NULL REFERENCES wikis(slug) ON DELETE CASCADE,
    verification_status TEXT NOT NULL DEFAULT 'pending',  -- pending | verified | active
    verification_token TEXT NOT NULL,
    verified_at TEXT,
    created_at TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS ix_custom_domains_slug ON custom_domains(wiki_slug);

Separate table (not a column on wikis) because domain verification has its own lifecycle and metadata.

DNS Verification

User must create two DNS records:

  1. CNAME: wiki.example.com CNAME {slug}.robot.wtf. (routes traffic)
  2. TXT: _robotwtf-verify.wiki.example.com TXT "robotwtf-verify={verification_token}" (proves ownership)

CNAME alone is insufficient — anyone could temporarily point a CNAME. The TXT prefix _robotwtf-verify avoids collision with other TXT records.

Verification uses dnspython (new dependency). Flow:

  1. User enters domain in settings UI → server generates token, stores as pending
  2. UI shows required DNS records
  3. User clicks "Verify" → server checks both CNAME and TXT
  4. Both pass → status becomes active

Periodic re-verification (cron) to detect removed CNAME records is desirable but not required for v1.

TLS (Caddy)

Caddy's on-demand TLS with the existing ask endpoint handles this. Modify /api/internal/check-slug to also accept custom domains:

  1. If domain ends with .{PLATFORM_DOMAIN}, do existing slug lookup
  2. Otherwise, look up domain in custom_domains where verification_status = 'active'
  3. Return 200 if found, 404 if not

Caddy automatically obtains Let's Encrypt certificates for any domain that passes the ask check. No Caddyfile changes beyond ensuring the on-demand TLS block is configured (may already be).

Tenant Resolution

TenantResolver.__call__() gains a fallback path:

  1. Try _parse_host(host) as today → returns slug for {slug}.robot.wtf
  2. If None, look up host in custom_domains where status is active
  3. If found, use the associated wiki_slug
  4. If neither, 404

Performance: in-memory cache ({domain: slug} dict) with 60-second TTL. Invalidated on domain add/remove. Multiple gunicorn workers each maintain their own cache — short TTL makes this acceptable.

Set environ['CUSTOM_DOMAIN'] = domain when serving via custom domain so downstream code (auth, link generation) can detect it.

Authentication on Custom Domains

This is the hard part. The platform_token cookie is set on .robot.wtf and won't be sent to wiki.example.com.

Solution: Redirect-based auth relay

Standard pattern used by GitHub Pages, Notion, etc.

  1. Unauthenticated user visits wiki.example.com
  2. Wiki requires auth → redirect to https://robot.wtf/auth/login?return_to=https://wiki.example.com/...
  3. User authenticates on robot.wtf (cookie set on .robot.wtf)
  4. Auth callback detects return_to is a custom domain
  5. Generates a relay token: signed JWT with user claims, domain claim, 60-second expiry, single-use nonce
  6. Redirects to https://wiki.example.com/_auth/relay?token={relay_token}
  7. /_auth/relay handler validates token (signature, expiry, domain match, nonce), sets platform_token cookie scoped to wiki.example.com, redirects to original page

Auth changes required

  • _is_safe_return_url() must accept verified custom domains (query custom_domains)
  • Auth callback generates relay token when return_to is a custom domain
  • New /_auth/relay route in resolver (or dedicated handler)
  • TenantResolver._resolve_auth() checks domain-scoped cookie (same name platform_token, browser sends the right one based on domain)

Relay token security

  • Signed with the platform's RSA key (same as PlatformJWT)
  • 60-second expiry
  • Single-use: nonce stored in DB, consumed on use
  • Domain-bound: domain claim must match the request's Host header
  • No open redirect: final redirect path embedded in token, validated

Logout

Logging out on robot.wtf clears the .robot.wtf cookie but not the wiki.example.com cookie. Mitigation: set custom domain cookies with a 1-hour max-age (vs 24h for the platform cookie). Stale sessions are short-lived.

Management UI

"Custom Domain" card on wiki_settings.html:

No domain configured:

  • Text input + "Add Domain" button

Pending verification:

  • Show required DNS records (copyable)
  • "Check DNS" button
  • "Remove" button

Active:

  • Domain with green status badge
  • "Remove" button

Backend routes:

  • POST /app/wiki/<slug>/domain — add
  • POST /app/wiki/<slug>/domain/verify — check DNS
  • POST /app/wiki/<slug>/domain/remove — remove

Implementation Phases

  1. Schema + CustomDomainModel + DNS verification logic + tests
  2. Modify check-slug endpoint for Caddy integration
  3. Resolver custom domain lookup + cache
  4. Auth relay (hardest phase)
  5. Management UI

Risks

  • Auth relay is a new attack surface. Must be cryptographically signed, time-limited, single-use, domain-bound.
  • DNS propagation delays. Users may verify before records propagate. UI should explain this and allow re-checking.
  • Let's Encrypt rate limits. 50 certs per registered domain per week. Unlikely at current scale.
  • Cache invalidation across workers. Short TTL (60s) is the simplest correct approach.