Properties
category: reference tags: [architecture, plugin, quotas] last_updated: 2026-03-17 confidence: high
Wiki Stats Plugin
Design for an otterwiki plugin that tracks page count and disk usage per wiki, enabling tier limits and quota enforcement in robot.wtf.
Problem
The robot.wtf resolver enforces MAX_PAGES_PER_WIKI = 500 and QUOTA_BYTES = 50MB against page_count and disk_usage_bytes fields in robot.db. Both fields always read 0 because nothing updates them. Tier limits and disk quotas are dead code.
Approach
An otterwiki pluggy plugin that hooks page lifecycle events to maintain a stats table in the per-wiki wiki.db. The resolver reads stats from there instead of robot.db.
Why a plugin
- Otterwiki already has a pluggy-based plugin system with page lifecycle hooks (
otterwiki_after_page_save,otterwiki_after_page_delete) - Keeps stats logic out of otterwiki core and the resolver
- The per-wiki DB is already swapped in per-request, so the plugin writes to the correct DB naturally
- Can be installed via pip alongside otterwiki-api and otterwiki-semantic-search
Why per-wiki DB (not robot.db)
The plugin runs inside otterwiki's process, which doesn't have access to robot.db (the platform DB). The per-wiki wiki.db is the right place:
- Already swapped in by the resolver per-request
- No cross-DB access needed
- Stats are inherently per-wiki data
Schema
Add a stats table to the per-wiki DB (or use the existing cache table):
CREATE TABLE IF NOT EXISTS stats ( name VARCHAR(64) PRIMARY KEY, value TEXT, updated_at DATETIME );
Keys: page_count (integer), disk_usage_bytes (integer).
The resolver's _init_wiki_db() should create this table alongside the existing four. Stats rows are seeded on first access.
Plugin hooks
otterwiki_after_page_save(pagepath, content, author)
- Open the current per-wiki DB (via
otterwiki.server.dbor directsqlite3) - Count
.mdfiles in the git repo (or increment a counter — but counting is safer since saves can be creates or updates) - Measure disk usage:
os.path.getsize()walk on the wiki directory, or cache a running total - Upsert
page_countanddisk_usage_bytesinto thestatstable
otterwiki_after_page_delete(pagepath, author)
Same as above — recount and update.
Setup hook
otterwiki_setup(app, db, storage) — register the plugin, create the stats table if missing.
Resolver integration
In _swap_database() or immediately after, the resolver reads stats from the per-wiki DB:
def _get_wiki_stats(wiki_dir: str) -> dict: db_path = os.path.join(wiki_dir, "wiki.db") conn = sqlite3.connect(db_path) try: rows = conn.execute("SELECT name, value FROM stats").fetchall() return {name: value for name, value in rows} except: return {} finally: conn.close()
The quota check in the resolver uses these values instead of the (always-zero) robot.db fields.
Cron backstop
A periodic job (cron or Ansible task) iterates all wiki directories, counts pages, measures disk, and writes to each wiki.db. This catches:
- Git-push-based changes (bypasses lifecycle hooks)
- Any missed updates from crashes
- Initial population for existing wikis
for wiki_dir in /srv/data/wikis/*/; do slug=$(basename "$wiki_dir") repo="${wiki_dir}repo" db="${wiki_dir}wiki.db" [ -f "$db" ] || continue page_count=$(find "$repo" -name "*.md" | wc -l) disk_bytes=$(du -sb "$wiki_dir" | cut -f1) sqlite3 "$db" "INSERT OR REPLACE INTO stats (name, value, updated_at) VALUES ('page_count', '$page_count', datetime('now')), ('disk_usage_bytes', '$disk_bytes', datetime('now'));" done
Open questions
- Should the plugin use
otterwiki.server.db(SQLAlchemy) or rawsqlite3? SQLAlchemy is cleaner but requires defining a model. Raw sqlite3 avoids model coupling. Leaning toward raw sqlite3 since the stats table is simple and the plugin shouldn't depend on otterwiki's model layer. - Disk usage scope: Count only the git repo, or the entire wiki directory (including wiki.db, FAISS index)? The quota is meant to limit content, so git repo only seems right. But attachments (stored in git) can be large.
- Frequency of disk measurement:
duon every page save is cheap for small wikis but expensive for large ones. Could measure disk only on the cron pass and update page_count on every save.
Not implementing yet
This is a design doc only. Implementation deferred until after the current batch of work (is_public removal, Phase 2 User Management, backup hardening) lands.