Commit 0820e6

2026-03-17 04:17:05 Claude (MCP): [mcp] Add design doc for otterwiki stats plugin (page count + disk usage)
/dev/null .. Design/Wiki_Stats_Plugin.md
@@ 0,0 1,113 @@
+ ---
+ category: reference
+ tags: [architecture, plugin, quotas]
+ last_updated: 2026-03-17
+ confidence: high
+ ---
+
+ # Wiki Stats Plugin
+
+ Design for an otterwiki plugin that tracks page count and disk usage per wiki, enabling tier limits and quota enforcement in robot.wtf.
+
+ ## Problem
+
+ The robot.wtf resolver enforces `MAX_PAGES_PER_WIKI = 500` and `QUOTA_BYTES = 50MB` against `page_count` and `disk_usage_bytes` fields in robot.db. Both fields always read 0 because nothing updates them. Tier limits and disk quotas are dead code.
+
+ ## Approach
+
+ An otterwiki pluggy plugin that hooks page lifecycle events to maintain a `stats` table in the per-wiki `wiki.db`. The resolver reads stats from there instead of robot.db.
+
+ ### Why a plugin
+
+ - Otterwiki already has a pluggy-based plugin system with page lifecycle hooks (`otterwiki_after_page_save`, `otterwiki_after_page_delete`)
+ - Keeps stats logic out of otterwiki core and the resolver
+ - The per-wiki DB is already swapped in per-request, so the plugin writes to the correct DB naturally
+ - Can be installed via pip alongside otterwiki-api and otterwiki-semantic-search
+
+ ### Why per-wiki DB (not robot.db)
+
+ The plugin runs inside otterwiki's process, which doesn't have access to robot.db (the platform DB). The per-wiki `wiki.db` is the right place:
+ - Already swapped in by the resolver per-request
+ - No cross-DB access needed
+ - Stats are inherently per-wiki data
+
+ ## Schema
+
+ Add a `stats` table to the per-wiki DB (or use the existing `cache` table):
+
+ ```sql
+ CREATE TABLE IF NOT EXISTS stats (
+ name VARCHAR(64) PRIMARY KEY,
+ value TEXT,
+ updated_at DATETIME
+ );
+ ```
+
+ Keys: `page_count` (integer), `disk_usage_bytes` (integer).
+
+ The resolver's `_init_wiki_db()` should create this table alongside the existing four. Stats rows are seeded on first access.
+
+ ## Plugin hooks
+
+ ### `otterwiki_after_page_save(pagepath, content, author)`
+
+ 1. Open the current per-wiki DB (via `otterwiki.server.db` or direct `sqlite3`)
+ 2. Count `.md` files in the git repo (or increment a counter — but counting is safer since saves can be creates or updates)
+ 3. Measure disk usage: `os.path.getsize()` walk on the wiki directory, or cache a running total
+ 4. Upsert `page_count` and `disk_usage_bytes` into the `stats` table
+
+ ### `otterwiki_after_page_delete(pagepath, author)`
+
+ Same as above — recount and update.
+
+ ### Setup hook
+
+ `otterwiki_setup(app, db, storage)` — register the plugin, create the `stats` table if missing.
+
+ ## Resolver integration
+
+ In `_swap_database()` or immediately after, the resolver reads `stats` from the per-wiki DB:
+
+ ```python
+ def _get_wiki_stats(wiki_dir: str) -> dict:
+ db_path = os.path.join(wiki_dir, "wiki.db")
+ conn = sqlite3.connect(db_path)
+ try:
+ rows = conn.execute("SELECT name, value FROM stats").fetchall()
+ return {name: value for name, value in rows}
+ except:
+ return {}
+ finally:
+ conn.close()
+ ```
+
+ The quota check in the resolver uses these values instead of the (always-zero) robot.db fields.
+
+ ## Cron backstop
+
+ A periodic job (cron or Ansible task) iterates all wiki directories, counts pages, measures disk, and writes to each `wiki.db`. This catches:
+ - Git-push-based changes (bypasses lifecycle hooks)
+ - Any missed updates from crashes
+ - Initial population for existing wikis
+
+ ```bash
+ for wiki_dir in /srv/data/wikis/*/; do
+ slug=$(basename "$wiki_dir")
+ repo="${wiki_dir}repo"
+ db="${wiki_dir}wiki.db"
+ [ -f "$db" ] || continue
+ page_count=$(find "$repo" -name "*.md" | wc -l)
+ disk_bytes=$(du -sb "$wiki_dir" | cut -f1)
+ sqlite3 "$db" "INSERT OR REPLACE INTO stats (name, value, updated_at) VALUES ('page_count', '$page_count', datetime('now')), ('disk_usage_bytes', '$disk_bytes', datetime('now'));"
+ done
+ ```
+
+ ## Open questions
+
+ - **Should the plugin use `otterwiki.server.db` (SQLAlchemy) or raw `sqlite3`?** SQLAlchemy is cleaner but requires defining a model. Raw sqlite3 avoids model coupling. Leaning toward raw sqlite3 since the stats table is simple and the plugin shouldn't depend on otterwiki's model layer.
+ - **Disk usage scope:** Count only the git repo, or the entire wiki directory (including wiki.db, FAISS index)? The quota is meant to limit content, so git repo only seems right. But attachments (stored in git) can be large.
+ - **Frequency of disk measurement:** `du` on every page save is cheap for small wikis but expensive for large ones. Could measure disk only on the cron pass and update page_count on every save.
+
+ ## Not implementing yet
+
+ This is a design doc only. Implementation deferred until after the current batch of work (is_public removal, Phase 2 User Management, backup hardening) lands.
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9