Collectors
Bewitch has 9 metric collectors. All implement the Collector interface with Name() and Collect() methods. Collectors run in parallel via goroutines on each tick. The daemon uses a GCD-based tick scheduler to fire each collector at its configured interval.
CPU
Reads per-core CPU usage from /proc/stat. Computes delta percentages between samples. The first sample after startup is discarded (needs a baseline).
- Metrics: per-core usage %, aggregate %
- Storage:
cpu_metricstable - Default interval: inherits
default_interval(5s)
Memory
Reads /proc/meminfo for total, free, available, buffers, cached, and swap. Computes used bytes and used percentage.
- Metrics: total, used, free, available, buffers, cached, swap (bytes + percentages)
- Storage:
memory_metricstable
Disk
Three data sources per mount: space usage (via statfs), I/O rates (via /proc/diskstats), and SMART health (via smartctl or direct device access).
Space
- Metrics: total, used, free bytes; used percentage per mount
- Mount filtering:
/snap/and/run/excluded by default
I/O
- Metrics: read/write bytes per second per device
- Delta-based: keeps previous reading, computes rate. First sample discarded.
SMART Health
Reads SMART data per physical device (not per partition). Multiple mounts from the same disk share one SMART read. SMART data is live-only — not stored in the database since it changes slowly.
- NVMe: available spare %, percent used, critical warning, temperature, power-on hours, power cycles
- SATA: reallocated sectors, pending sectors, uncorrectable errors, temperature, power-on hours
- Fallback chain: smartctl (preferred) → smart.go library → direct SAT passthrough
- Requires:
CAP_SYS_RAWIOcapability (configured by Debian package)
[collectors.disk]
interval = "30s"
smart_interval = "5m" # min 30s, "0" to disable
exclude_mounts = ["/boot/efi"]Network
Reads per-interface bytes from /proc/net/dev. Computes RX/TX bytes per second. Delta-based with first sample discarded.
- Metrics: rx_bytes/sec, tx_bytes/sec per interface
- Storage:
network_metricstable with dimension IDs for interface names
ECC
Reads ECC memory error counts from /sys/devices/system/edac/. Live-only data — not stored in DB. Useful for servers with ECC memory.
- Metrics: correctable and uncorrectable error counts per DIMM
- Default interval: 60s (ECC errors change very infrequently)
Temperature
Reads hardware sensor temperatures from /sys/class/hwmon/. Caches sensor paths and refreshes every 60 seconds to avoid expensive glob operations.
- Metrics: temperature in °C per sensor
- Storage:
temperature_metricstable with dimension IDs for sensor names - Can be disabled via
enabled = falsein config - Displayed in the Hardware tab's Temperature sub-section
Power
Reads power consumption from Linux powercap/RAPL zones at /sys/class/powercap/. Delta-based, computes watts from energy counter differences. Caches zone paths (60s refresh).
- Metrics: watts per power zone (package, core, uncore, DRAM)
- Storage:
power_metricstable with dimension IDs for zone names - Can be disabled via
enabled = falsein config
GPU
Monitors GPU utilization, frequency, power, and memory. Supports Intel iGPUs via intel_gpu_top (long-lived JSON subprocess) and NVIDIA GPUs via nvidia-smi (point-in-time CSV queries). Both backends auto-detect tool availability at startup; if neither is found, the collector produces empty samples.
Intel iGPU
- Runs
intel_gpu_top -Jas a persistent subprocess streaming JSON - Detects i915/xe driver via
/sys/class/drm/ - Utilization = max engine busy % (Render/3D, Video, etc.)
- First sample discarded (needs prior period for deltas)
- Requires:
CAP_PERFMONcapability andintel-gpu-toolspackage
NVIDIA
- Runs
nvidia-smi --query-gpu=... --format=csvwith 10s timeout - Reports utilization, memory used/total, temperature, power, clock speed
- Requires: NVIDIA driver with
nvidia-smi
- Metrics: utilization %, frequency MHz, power watts, memory used/total (NVIDIA), temperature (NVIDIA)
- Storage:
gpu_metricstable with dimension IDs for GPU names - Can be disabled via
enabled = falsein config - Multi-vendor: Intel and NVIDIA backends can be active simultaneously
[collectors.gpu]
# interval = "5s"
# enabled = true # Intel iGPU via intel_gpu_top, NVIDIA via nvidia-smiProcess
Two-phase collection. Phase 1 cheaply scans all /proc/[pid]/stat files. Phase 2 enriches the top N processes (by CPU/memory) plus pinned processes with expensive data.
Phase 1 (all processes)
- PID, name, state, CPU%, RSS, thread count
- Very fast — reads a single file per process
Phase 2 (enriched processes)
- Command line, UID, FD count, detailed memory breakdown
- Reads
/proc/[pid]/cmdline,/proc/[pid]/status,/proc/[pid]/fd - Default: top 100 processes enriched
Process pinning
Pinned processes always receive Phase 2 enrichment regardless of ranking. Useful for monitoring low-resource but critical services.
[collectors.process]
max_processes = 100
pinned = ["nginx*", "postgres", "redis-server"]Pins can also be set interactively in the TUI with the * key. TUI pins persist in the daemon's preferences database across restarts.
Collector Backoff
When a collector's Collect() returns an error, consecutive failures trigger exponential backoff. The collector skips 2^(n-1) intervals (capped at 64x) before retrying. On success, the failure count resets immediately. First error is always logged; subsequent errors include attempt count and backoff duration.
Parallel Collection
Collectors due on each tick run concurrently via goroutines, reducing total cycle time. Results are gathered with sync.WaitGroup. The API cache is updated immediately after collection, then samples are enqueued to a buffered channel for asynchronous database writing. If the write channel is full, the batch is dropped with a warning.