Blame

7402ee Claude (MCP) 2026-03-18 03:26:21
[mcp] Add security logging plan (OWASP A09)
1
---
2
category: spec
3
tags: [security, logging, owasp, plan]
4
last_updated: 2026-03-18
5
confidence: high
6
---
7
8
# Security Logging Plan
9
10
Addresses OWASP A09 finding: "No audit trail for auth events, ACL changes, wiki deletions."
11
12
## Events to Log
13
14
### Auth server (`app/auth_server.py`)
15
16
| Event | Route / Location | Fields |
17
|---|---|---|
18
| `login.initiated` | `POST /auth/login` (after PAR submit) | actor=handle, ip |
19
| `login.success` | `oauth_callback()` after JWT issued | actor=did, handle, ip |
20
| `login.new_user` | `oauth_callback()` → redirect to signup | actor=did, ip |
21
| `signup.success` | `signup()` after `user_model.create()` | actor=did, username, ip |
22
| `consent.granted` | `_handle_consent_post()` action=approve | actor=did, wiki_slug, client_id, ip |
23
| `consent.denied` | `_handle_consent_post()` action=deny | actor=did, wiki_slug, client_id, ip |
24
| `logout` | `oauth_logout()` | actor=did, ip |
25
| `rate_limit.hit` | `ratelimit_handler()` (429) | ip, path |
26
27
### Management middleware (`app/management/routes.py`)
28
29
| Event | Method | Fields |
30
|---|---|---|
31
| `wiki.created` | `_create_wiki()` return 201 | actor=did, slug |
32
| `wiki.deleted` | `_delete_wiki()` return 200 | actor=did, slug |
33
| `token.regenerated` | `_regenerate_token()` return 200 | actor=did, slug |
34
| `rate_limit.hit` | 429 block in `__call__()` | ip, method, path |
35
36
### Resolver (`app/resolver.py`)
37
38
| Event | Location | Fields |
39
|---|---|---|
40
| `auth.bearer_invalid` | `_resolve_bearer_token()` raises AuthError 401 | ip, wiki_slug |
41
| `auth.bearer_mismatch` | `_resolve_bearer_token()` raises AuthError 403 | ip, wiki_slug |
42
| `rate_limit.hit` | 429 block in `__call__()` | ip, wiki_slug |
43
44
**Not logged:** ACL flag changes (allow_read, allow_write, is_admin, is_approved). These happen inside otterwiki's admin UI with no current hook point. Deferred — track as a follow-on once otterwiki lifecycle hooks or the per-wiki DB plan is in place.
45
46
## Log Format
47
48
Structured JSON, one object per line, emitted via Python stdlib `logging` to stdout → systemd journal (already configured). No new log file or rotation needed — journal handles retention (30-day, 500MB cap per `robot-journald.conf`).
49
50
```json
51
{
52
"ts": "2026-03-18T12:34:56.789Z",
53
"event": "login.success",
54
"actor_did": "did:plc:abc123",
55
"actor_handle": "user.bsky.social",
56
"wiki_slug": null,
57
"client_id": null,
58
"outcome": "success",
59
"ip": "1.2.3.4",
60
"syslog_identifier": "robot-auth"
61
}
62
```
63
64
Fields:
65
- `ts` — UTC ISO-8601
66
- `event` — dot-namespaced string (see tables above)
67
- `actor_did` — DID of the acting user, or null for anonymous/system
68
- `actor_handle` — AT Protocol handle, or null
69
- `wiki_slug` — target wiki, or null for platform-level events
70
- `client_id` — OAuth client_id for consent events, else null
71
- `outcome``"success"` | `"failure"` | `"blocked"`
72
- `ip` — client IP (from `request.remote_addr` in Flask, or `get_client_ip(environ)` in WSGI middleware)
73
74
**PII note:** IP addresses are PII. They are logged for security purposes (rate limit forensics, abuse investigation). Journal retention is 30 days — no change needed. Do not log full handles in combination with IPs in any external/forwarded log sink.
75
76
## Implementation Approach
77
78
### New module: `app/audit.py`
79
80
A thin wrapper around stdlib logging. No new dependencies.
81
82
```python
83
import logging, json
84
from datetime import datetime, timezone
85
86
_audit = logging.getLogger("robot.audit")
87
88
def log(event: str, *, actor_did=None, actor_handle=None,
89
wiki_slug=None, client_id=None, outcome="success", ip=None):
90
_audit.info(json.dumps({
91
"ts": datetime.now(timezone.utc).isoformat(),
92
"event": event,
93
"actor_did": actor_did,
94
"actor_handle": actor_handle,
95
"wiki_slug": wiki_slug,
96
"client_id": client_id,
97
"outcome": outcome,
98
"ip": ip,
99
}))
100
```
101
102
The logger name `robot.audit` lets operators filter with `journalctl -u robot-auth SYSLOG_IDENTIFIER=robot-auth | grep robot.audit` or similar.
103
104
### Call sites
105
106
**`app/auth_server.py`** — add `from app import audit` (or `from app.audit import log as audit_log`) and call `audit_log(...)` at the points noted in the events table. Most insertions are single lines immediately after the decision that determines the outcome. The `ratelimit_handler()` at the bottom of `create_app()` is the single place to cover all rate limit hits on the auth server.
107
108
**`app/management/routes.py`**`ManagementMiddleware.__call__()` already has the 429 path. `_create_wiki()` and `_delete_wiki()` return tuples — log immediately before `return`. `actor_did` comes from `user.user_did`, `ip` from `get_client_ip(environ)` (already imported).
109
110
**`app/resolver.py`**`TenantResolver._resolve_bearer_token()` raises `AuthError` on invalid/mismatched tokens — log before raising. The 429 block in `__call__()` already has `client_ip`. No logging in `_permissions_for_user()` — permission derivation is not itself an audit event.
111
112
### IP extraction
113
114
- Flask routes: `request.remote_addr` (ProxyFix is already wired in `create_app()`)
115
- WSGI middleware: `get_client_ip(environ)` from `app.rate_limit` (already imported in both `routes.py` and `resolver.py`)
116
117
### No request ID correlation (yet)
118
119
Cross-service correlation (e.g., tracing a consent grant through auth → resolver → MCP) would require propagating a request ID header. Deferred — the audit events are service-scoped and the `actor_did` + `ts` + `wiki_slug` triple is sufficient for forensics at this scale.
120
121
## Ansible Changes
122
123
1. **No new service file changes needed.** All services already emit stdout to the journal (`StandardOutput=journal`). The `robot.audit` logger will appear in the same journal unit as the service that emits it.
124
125
2. **Log level configuration.** Add `AUDIT_LOG_LEVEL=INFO` to `ansible/roles/deploy/templates/robot.env.j2` and wire it in `app/audit.py` (`logging.basicConfig` or the app's existing logging setup). The default `INFO` level is correct; `DEBUG` should never emit audit records.
126
127
3. **No logrotate role needed.** `ansible/roles/logging/` already configures journald with 30-day retention and 500MB cap — sufficient.
128
129
4. **Future:** if a SIEM or external log forwarder is added, `journalctl -o json -u robot-auth -u robot-api -u robot-mcp` can be tailed. No Ansible change required at that point beyond a forwarding role.
130
131
## Out of Scope (Deferred)
132
133
- ACL flag changes via otterwiki admin UI — needs lifecycle hook or per-wiki DB plan
134
- Token issuance/refresh events for the ATProto OAuth session — currently no post-refresh hook; low priority since the platform JWT (24h) is the user-facing credential
135
- Feeding into a monitoring dashboard — tracked separately
136
- Structured log querying / alerting — revisit when user base grows beyond single-digit wikis