Monitoring
Operators need to know that every layer is alive and within latency budgets. Three probes per layer, all exposed via HTTP /health and Prometheus.
Health probes
| Layer | Endpoint | What it checks |
|---|---|---|
| L1 | api.useoris.xyz/v1/health/l1 | CCIP-Read gateway reachable, latest block within 5 minutes |
| L2 | api.useoris.xyz/v1/health/l2 | Policy engine cache hit rate, p95 latency under 10 ms |
| L3 | api.useoris.xyz/v1/health/l3 | Veris gRPC ping, sanctions feed freshness |
| L5 | api.useoris.xyz/v1/health/l5 | Tree builder flush cadence, root commit lag |
| L6 | api.useoris.xyz/v1/verify/health | Verifier signature throughput, pubkey availability |
| L7 | api.useoris.xyz/v1/audit/health | Hash chain head age, anchor lag |
SLA targets
| Metric | Target |
|---|---|
| Bundle assembly p95 | < 50 ms |
| Policy evaluation p95 | < 10 ms |
| Veris attestation p50 | 4.4 ms |
| Verifier verdict p95 | < 18 ms |
| Audit anchor lag | < 1 hour |
| Sanctions feed freshness | < 6 hours |
Alert routing
Three severity levels:
- CRIT — any layer down or SLA breach above threshold. Pagerduty.
- WARN — degraded performance, single source outage, or partial cache miss spike. Slack.
- INFO — anchor lag, scheduled rotation, planned downtime. Logged.
Where to go next
- Incident response for the runbook when probes fail.
- Deployment for the layer-by-layer architecture.
- Key rotation for scheduled rotation procedures.