Skip to content

SLO & Health Endpoints

Operational endpoints for health checking, readiness probing, and Prometheus metric scraping.

Overview

ZERG exposes three standard operational endpoints: a deep health check (/api/v1/health), a shallow load-balancer probe (/api/v1/ready), and Prometheus-formatted metrics (/api/v1/metrics). These are exempt from authentication to support automated infrastructure.

Authentication

None of these endpoints require authentication. They are exempted via auth_exempt_paths configuration.

Endpoints

Health Check

GET /api/v1/health

Deep health check that validates core system components.

json
{
  "status": "ok",
  "version": "0.408.0",
  "uptime_seconds": 7200,
  "components": {
    "mnesia": { "status": "ok", "tables": 35 },
    "zmq": { "status": "ok", "sockets": 3 },
    "providers": { "total": 5, "healthy": 5 },
    "workers": { "active": 12, "idle": 4 }
  }
}

Returns HTTP 503 if any critical component is degraded.

Readiness Probe

GET /api/v1/ready

Shallow load-balancer probe. Returns HTTP 200 when the server is accepting connections.

text
OK

Used by nginx, Docker health checks, and deployment scripts. Does not validate downstream components.

Metrics

GET /api/v1/metrics

Prometheus-formatted metrics endpoint.

text
# HELP zerg_http_requests_total Total HTTP requests
# TYPE zerg_http_requests_total counter
zerg_http_requests_total{method="GET",path="/api/v1/health"} 42

# HELP zerg_provider_latency_seconds Provider request latency
# TYPE zerg_provider_latency_seconds histogram
zerg_provider_latency_seconds_bucket{provider="anthropic",le="0.1"} 15
zerg_provider_latency_seconds_bucket{provider="anthropic",le="0.5"} 28
zerg_provider_latency_seconds_bucket{provider="anthropic",le="+Inf"} 35
zerg_provider_latency_seconds_sum 8.2
zerg_provider_latency_seconds_count 35

# HELP zerg_circuit_breaker_state Circuit breaker state
# TYPE zerg_circuit_breaker_state gauge
zerg_circuit_breaker_state{provider="anthropic"} 0

Available Metrics:

MetricTypeLabelsDescription
zerg_http_requests_totalcountermethod, path, statusHTTP request count
zerg_provider_latency_secondshistogramproviderProvider call latency
zerg_circuit_breaker_stategaugeprovider0=closed, 1=half-open, 2=open
zerg_active_workersgaugeCurrently active ZMQ workers
zerg_memory_bytesgaugeETS + process memory
zerg_event_totalcountertypeEventBus event count

SLO Endpoint:

GET /api/v1/slo

Returns current SLO compliance status:

json
{
  "latency_p99": { "target": 5000, "current": 3200, "compliant": true },
  "uptime_7d": { "target": 99.9, "current": 99.95, "compliant": true },
  "error_rate": { "target": 1.0, "current": 0.3, "compliant": true }
}

Examples:

bash
curl http://127.0.0.1:11434/api/v1/health
curl http://127.0.0.1:11434/api/v1/ready
curl http://127.0.0.1:11434/api/v1/metrics

Released under the MIT License.