Blame
|
1 | --- |
||||||
| 2 | category: spec |
|||||||
| 3 | tags: [security, logging, owasp, plan] |
|||||||
| 4 | last_updated: 2026-03-18 |
|||||||
| 5 | confidence: high |
|||||||
| 6 | --- |
|||||||
| 7 | ||||||||
| 8 | # Security Logging Plan |
|||||||
| 9 | ||||||||
| 10 | Addresses OWASP A09 finding: "No audit trail for auth events, ACL changes, wiki deletions." |
|||||||
| 11 | ||||||||
| 12 | ## Events to Log |
|||||||
| 13 | ||||||||
| 14 | ### Auth server (`app/auth_server.py`) |
|||||||
| 15 | ||||||||
| 16 | | Event | Route / Location | Fields | |
|||||||
| 17 | |---|---|---| |
|||||||
| 18 | | `login.initiated` | `POST /auth/login` (after PAR submit) | actor=handle, ip | |
|||||||
| 19 | | `login.success` | `oauth_callback()` after JWT issued | actor=did, handle, ip | |
|||||||
| 20 | | `login.new_user` | `oauth_callback()` → redirect to signup | actor=did, ip | |
|||||||
| 21 | | `signup.success` | `signup()` after `user_model.create()` | actor=did, username, ip | |
|||||||
| 22 | | `consent.granted` | `_handle_consent_post()` action=approve | actor=did, wiki_slug, client_id, ip | |
|||||||
| 23 | | `consent.denied` | `_handle_consent_post()` action=deny | actor=did, wiki_slug, client_id, ip | |
|||||||
| 24 | | `logout` | `oauth_logout()` | actor=did, ip | |
|||||||
| 25 | | `rate_limit.hit` | `ratelimit_handler()` (429) | ip, path | |
|||||||
| 26 | ||||||||
| 27 | ### Management middleware (`app/management/routes.py`) |
|||||||
| 28 | ||||||||
| 29 | | Event | Method | Fields | |
|||||||
| 30 | |---|---|---| |
|||||||
| 31 | | `wiki.created` | `_create_wiki()` return 201 | actor=did, slug | |
|||||||
| 32 | | `wiki.deleted` | `_delete_wiki()` return 200 | actor=did, slug | |
|||||||
| 33 | | `token.regenerated` | `_regenerate_token()` return 200 | actor=did, slug | |
|||||||
| 34 | | `rate_limit.hit` | 429 block in `__call__()` | ip, method, path | |
|||||||
| 35 | ||||||||
| 36 | ### Resolver (`app/resolver.py`) |
|||||||
| 37 | ||||||||
| 38 | | Event | Location | Fields | |
|||||||
| 39 | |---|---|---| |
|||||||
| 40 | | `auth.bearer_invalid` | `_resolve_bearer_token()` raises AuthError 401 | ip, wiki_slug | |
|||||||
| 41 | | `auth.bearer_mismatch` | `_resolve_bearer_token()` raises AuthError 403 | ip, wiki_slug | |
|||||||
| 42 | | `rate_limit.hit` | 429 block in `__call__()` | ip, wiki_slug | |
|||||||
| 43 | ||||||||
| 44 | **Not logged:** ACL flag changes (allow_read, allow_write, is_admin, is_approved). These happen inside otterwiki's admin UI with no current hook point. Deferred — track as a follow-on once otterwiki lifecycle hooks or the per-wiki DB plan is in place. |
|||||||
| 45 | ||||||||
| 46 | ## Log Format |
|||||||
| 47 | ||||||||
| 48 | Structured JSON, one object per line, emitted via Python stdlib `logging` to stdout → systemd journal (already configured). No new log file or rotation needed — journal handles retention (30-day, 500MB cap per `robot-journald.conf`). |
|||||||
| 49 | ||||||||
| 50 | ```json |
|||||||
| 51 | { |
|||||||
| 52 | "ts": "2026-03-18T12:34:56.789Z", |
|||||||
| 53 | "event": "login.success", |
|||||||
| 54 | "actor_did": "did:plc:abc123", |
|||||||
| 55 | "actor_handle": "user.bsky.social", |
|||||||
| 56 | "wiki_slug": null, |
|||||||
| 57 | "client_id": null, |
|||||||
| 58 | "outcome": "success", |
|||||||
| 59 | "ip": "1.2.3.4", |
|||||||
| 60 | "syslog_identifier": "robot-auth" |
|||||||
| 61 | } |
|||||||
| 62 | ``` |
|||||||
| 63 | ||||||||
| 64 | Fields: |
|||||||
| 65 | - `ts` — UTC ISO-8601 |
|||||||
| 66 | - `event` — dot-namespaced string (see tables above) |
|||||||
| 67 | - `actor_did` — DID of the acting user, or null for anonymous/system |
|||||||
| 68 | - `actor_handle` — AT Protocol handle, or null |
|||||||
| 69 | - `wiki_slug` — target wiki, or null for platform-level events |
|||||||
| 70 | - `client_id` — OAuth client_id for consent events, else null |
|||||||
| 71 | - `outcome` — `"success"` | `"failure"` | `"blocked"` |
|||||||
| 72 | - `ip` — client IP (from `request.remote_addr` in Flask, or `get_client_ip(environ)` in WSGI middleware) |
|||||||
| 73 | ||||||||
| 74 | **PII note:** IP addresses are PII. They are logged for security purposes (rate limit forensics, abuse investigation). Journal retention is 30 days — no change needed. Do not log full handles in combination with IPs in any external/forwarded log sink. |
|||||||
| 75 | ||||||||
| 76 | ## Implementation Approach |
|||||||
| 77 | ||||||||
| 78 | ### New module: `app/audit.py` |
|||||||
| 79 | ||||||||
| 80 | A thin wrapper around stdlib logging. No new dependencies. |
|||||||
| 81 | ||||||||
| 82 | ```python |
|||||||
| 83 | import logging, json |
|||||||
| 84 | from datetime import datetime, timezone |
|||||||
| 85 | ||||||||
| 86 | _audit = logging.getLogger("robot.audit") |
|||||||
| 87 | ||||||||
| 88 | def log(event: str, *, actor_did=None, actor_handle=None, |
|||||||
| 89 | wiki_slug=None, client_id=None, outcome="success", ip=None): |
|||||||
| 90 | _audit.info(json.dumps({ |
|||||||
| 91 | "ts": datetime.now(timezone.utc).isoformat(), |
|||||||
| 92 | "event": event, |
|||||||
| 93 | "actor_did": actor_did, |
|||||||
| 94 | "actor_handle": actor_handle, |
|||||||
| 95 | "wiki_slug": wiki_slug, |
|||||||
| 96 | "client_id": client_id, |
|||||||
| 97 | "outcome": outcome, |
|||||||
| 98 | "ip": ip, |
|||||||
| 99 | })) |
|||||||
| 100 | ``` |
|||||||
| 101 | ||||||||
| 102 | The logger name `robot.audit` lets operators filter with `journalctl -u robot-auth SYSLOG_IDENTIFIER=robot-auth | grep robot.audit` or similar. |
|||||||
| 103 | ||||||||
| 104 | ### Call sites |
|||||||
| 105 | ||||||||
| 106 | **`app/auth_server.py`** — add `from app import audit` (or `from app.audit import log as audit_log`) and call `audit_log(...)` at the points noted in the events table. Most insertions are single lines immediately after the decision that determines the outcome. The `ratelimit_handler()` at the bottom of `create_app()` is the single place to cover all rate limit hits on the auth server. |
|||||||
| 107 | ||||||||
| 108 | **`app/management/routes.py`** — `ManagementMiddleware.__call__()` already has the 429 path. `_create_wiki()` and `_delete_wiki()` return tuples — log immediately before `return`. `actor_did` comes from `user.user_did`, `ip` from `get_client_ip(environ)` (already imported). |
|||||||
| 109 | ||||||||
| 110 | **`app/resolver.py`** — `TenantResolver._resolve_bearer_token()` raises `AuthError` on invalid/mismatched tokens — log before raising. The 429 block in `__call__()` already has `client_ip`. No logging in `_permissions_for_user()` — permission derivation is not itself an audit event. |
|||||||
| 111 | ||||||||
| 112 | ### IP extraction |
|||||||
| 113 | ||||||||
| 114 | - Flask routes: `request.remote_addr` (ProxyFix is already wired in `create_app()`) |
|||||||
| 115 | - WSGI middleware: `get_client_ip(environ)` from `app.rate_limit` (already imported in both `routes.py` and `resolver.py`) |
|||||||
| 116 | ||||||||
| 117 | ### No request ID correlation (yet) |
|||||||
| 118 | ||||||||
| 119 | Cross-service correlation (e.g., tracing a consent grant through auth → resolver → MCP) would require propagating a request ID header. Deferred — the audit events are service-scoped and the `actor_did` + `ts` + `wiki_slug` triple is sufficient for forensics at this scale. |
|||||||
| 120 | ||||||||
| 121 | ## Ansible Changes |
|||||||
| 122 | ||||||||
| 123 | 1. **No new service file changes needed.** All services already emit stdout to the journal (`StandardOutput=journal`). The `robot.audit` logger will appear in the same journal unit as the service that emits it. |
|||||||
| 124 | ||||||||
| 125 | 2. **Log level configuration.** Add `AUDIT_LOG_LEVEL=INFO` to `ansible/roles/deploy/templates/robot.env.j2` and wire it in `app/audit.py` (`logging.basicConfig` or the app's existing logging setup). The default `INFO` level is correct; `DEBUG` should never emit audit records. |
|||||||
| 126 | ||||||||
| 127 | 3. **No logrotate role needed.** `ansible/roles/logging/` already configures journald with 30-day retention and 500MB cap — sufficient. |
|||||||
| 128 | ||||||||
| 129 | 4. **Future:** if a SIEM or external log forwarder is added, `journalctl -o json -u robot-auth -u robot-api -u robot-mcp` can be tailed. No Ansible change required at that point beyond a forwarding role. |
|||||||
| 130 | ||||||||
| 131 | ## Out of Scope (Deferred) |
|||||||
| 132 | ||||||||
| 133 | - ACL flag changes via otterwiki admin UI — needs lifecycle hook or per-wiki DB plan |
|||||||
| 134 | - Token issuance/refresh events for the ATProto OAuth session — currently no post-refresh hook; low priority since the platform JWT (24h) is the user-facing credential |
|||||||
| 135 | - Feeding into a monitoring dashboard — tracked separately |
|||||||
| 136 | - Structured log querying / alerting — revisit when user base grows beyond single-digit wikis |
|||||||
