Storage & Archival
Bewitch stores metrics in DuckDB with optional data lifecycle management: retention pruning, compaction, and Parquet archival for long-term storage.
DuckDB Storage
The daemon writes metrics using DuckDB's Appender API for high-performance bulk inserts. The schema is applied automatically on startup (CREATE IF NOT EXISTS). Database writes are decoupled from collection via a buffered channel — the API cache is always updated immediately.
WAL checkpointing
DuckDB uses a write-ahead log (WAL) for crash safety. Checkpoints are handled automatically when the WAL exceeds checkpoint_threshold (default 16MB). For additional crash safety, setcheckpoint_interval to force periodic checkpoints:
[daemon]
checkpoint_threshold = "16MB" # auto-checkpoint WAL size
checkpoint_interval = "5m" # forced periodic checkpointRetention Pruning
When retention is configured, the daemon periodically deletes metrics older than the specified duration.
[daemon]
retention = "30d" # delete data older than 30 days
prune_interval = "1h" # run pruning every hourCompaction
Compaction performs a full database rebuild to reclaim fragmented space. It can run on a schedule or be triggered manually.
[daemon]
compaction_interval = "7d" # weekly compactionbewitch -config /etc/bewitch.toml compact
# or remotely
bewitch -addr myserver:9119 -token secret compactDuring compaction, incoming writes are buffered in memory and flushed on completion. Pruning, compaction, and archiving are mutually exclusive (coordinated via mutex).
Parquet Archival
For long-term storage efficiency, metrics older than archive_threshold can be exported to monthly Parquet files compressed with zstd (~10x smaller than DuckDB).
[daemon]
archive_threshold = "7d"
archive_interval = "6h"
archive_path = "/var/lib/bewitch/archive"
retention = "90d" # also prunes old Parquet filesHow it works
- Data older than
archive_thresholdis exported to monthly Parquet files - Exported data is deleted from DuckDB to save space
- Dimension tables are snapshotted to Parquet on each archive run
- History API queries automatically combine DuckDB and Parquet data based on the time range
- Old Parquet files are deleted based on the
retentionsetting
Manual archive/unarchive
# Archive old data to Parquet
bewitch -config /etc/bewitch.toml archive
# Reload all Parquet data back into DuckDB
bewitch -config /etc/bewitch.toml unarchiveunarchive reloads all Parquet data into DuckDB, removes the Parquet files, and resets the archive state. Useful for changing strategies or disabling archival.
Snapshots
Create standalone DuckDB files for offline analysis — complex queries, sharing with colleagues, or use with DBeaver, Jupyter, or the DuckDB CLI.
# Metrics + dimensions only (default)
bewitch -config /etc/bewitch.toml snapshot /tmp/metrics.duckdb
# Include alerts, preferences, scheduled jobs
bewitch snapshot -with-system-tables /tmp/backup.duckdbSnapshots merge the live database and any archived Parquet data into a single self-contained file. Open directly with any DuckDB-compatible tool:
duckdb /tmp/metrics.duckdb "SELECT COUNT(*) FROM cpu_metrics"Concurrency
The daemon uses a DuckDB connection pool (MaxOpenConns(4)) to allow API handlers to execute concurrently with batch writes. The TUI opens a separate read-only connection. During pruning/compaction, the store buffers incoming writes in memory and flushes them on completion.
Schema
Tables are defined as a const string in the codebase and applied with CREATE IF NOT EXISTS on startup. Key tables:
cpu_metrics— per-core CPU usagememory_metrics— memory usagedisk_metrics— disk space and I/Onetwork_metrics— network throughputtemperature_metrics— sensor temperaturespower_metrics— power consumptionprocess_metrics— process resource usageprocess_info— enriched process metadatadimension_values— normalized dimension lookups (mount, device, interface, sensor, zone)alert_rules— alert rule definitionsalerts— fired alertspreferences— key-value UI preferencesarchive_state— archival tracking