Properties
category: reference
tags: [architecture, plugin, quotas]
last_updated: 2026-03-17
confidence: high

Wiki Stats Plugin

Design for an otterwiki plugin that tracks page count and disk usage per wiki, enabling tier limits and quota enforcement in robot.wtf.

Problem

The robot.wtf resolver enforces MAX_PAGES_PER_WIKI = 500 and QUOTA_BYTES = 50MB against page_count and disk_usage_bytes fields in robot.db. Both fields always read 0 because nothing updates them. Tier limits and disk quotas are dead code.

Approach

An otterwiki pluggy plugin that hooks page lifecycle events to maintain a stats table in the per-wiki wiki.db. The resolver reads stats from there instead of robot.db.

Why a plugin

  • Otterwiki already has a pluggy-based plugin system with page lifecycle hooks (otterwiki_after_page_save, otterwiki_after_page_delete)
  • Keeps stats logic out of otterwiki core and the resolver
  • The per-wiki DB is already swapped in per-request, so the plugin writes to the correct DB naturally
  • Can be installed via pip alongside otterwiki-api and otterwiki-semantic-search

Why per-wiki DB (not robot.db)

The plugin runs inside otterwiki's process, which doesn't have access to robot.db (the platform DB). The per-wiki wiki.db is the right place:

  • Already swapped in by the resolver per-request
  • No cross-DB access needed
  • Stats are inherently per-wiki data

Schema

Add a stats table to the per-wiki DB (or use the existing cache table):

CREATE TABLE IF NOT EXISTS stats (
    name VARCHAR(64) PRIMARY KEY,
    value TEXT,
    updated_at DATETIME
);

Keys: page_count (integer), disk_usage_bytes (integer).

The resolver's _init_wiki_db() should create this table alongside the existing four. Stats rows are seeded on first access.

Plugin hooks

otterwiki_after_page_save(pagepath, content, author)

  1. Open the current per-wiki DB (via otterwiki.server.db or direct sqlite3)
  2. Count .md files in the git repo (or increment a counter — but counting is safer since saves can be creates or updates)
  3. Measure disk usage: os.path.getsize() walk on the wiki directory, or cache a running total
  4. Upsert page_count and disk_usage_bytes into the stats table

otterwiki_after_page_delete(pagepath, author)

Same as above — recount and update.

Setup hook

otterwiki_setup(app, db, storage) — register the plugin, create the stats table if missing.

Resolver integration

In _swap_database() or immediately after, the resolver reads stats from the per-wiki DB:

def _get_wiki_stats(wiki_dir: str) -> dict:
    db_path = os.path.join(wiki_dir, "wiki.db")
    conn = sqlite3.connect(db_path)
    try:
        rows = conn.execute("SELECT name, value FROM stats").fetchall()
        return {name: value for name, value in rows}
    except:
        return {}
    finally:
        conn.close()

The quota check in the resolver uses these values instead of the (always-zero) robot.db fields.

Cron backstop

A periodic job (cron or Ansible task) iterates all wiki directories, counts pages, measures disk, and writes to each wiki.db. This catches:

  • Git-push-based changes (bypasses lifecycle hooks)
  • Any missed updates from crashes
  • Initial population for existing wikis
for wiki_dir in /srv/data/wikis/*/; do
    slug=$(basename "$wiki_dir")
    repo="${wiki_dir}repo"
    db="${wiki_dir}wiki.db"
    [ -f "$db" ] || continue
    page_count=$(find "$repo" -name "*.md" | wc -l)
    disk_bytes=$(du -sb "$wiki_dir" | cut -f1)
    sqlite3 "$db" "INSERT OR REPLACE INTO stats (name, value, updated_at) VALUES ('page_count', '$page_count', datetime('now')), ('disk_usage_bytes', '$disk_bytes', datetime('now'));"
done

Open questions

  • Should the plugin use otterwiki.server.db (SQLAlchemy) or raw sqlite3? SQLAlchemy is cleaner but requires defining a model. Raw sqlite3 avoids model coupling. Leaning toward raw sqlite3 since the stats table is simple and the plugin shouldn't depend on otterwiki's model layer.
  • Disk usage scope: Count only the git repo, or the entire wiki directory (including wiki.db, FAISS index)? The quota is meant to limit content, so git repo only seems right. But attachments (stored in git) can be large.
  • Frequency of disk measurement: du on every page save is cheap for small wikis but expensive for large ones. Could measure disk only on the cron pass and update page_count on every save.

Not implementing yet

This is a design doc only. Implementation deferred until after the current batch of work (is_public removal, Phase 2 User Management, backup hardening) lands.