Devii · Cloud · 2026-04-08 · 8 min read

Share

SRE Error Budgets: SLIs, SLOs, And When To Slow Releases

Google's SRE book defines reliability targets in measurable terms; error budgets connect uptime to product risk.

Site Reliability Engineering defines **SLI** (indicator), **SLO** (objective over time), and **SLA** (contract, often external). An **error budget** is the allowed unreliability: if your SLO is 99.9% monthly availability, the budget is the remaining 0.1% downtime or failed requests.

When the budget is exhausted, teams prioritize reliability work over feature launches. This is a product decision encoded in metrics, not a pager-only policy.

Incident retrospective
Incident retrospective

Choose SLIs that reflect user pain: successful HTTP requests, correct query results, job completion within deadline. Avoid vanity metrics unrelated to experience.

Primary reference: Google SRE books on `sre.google` and the Workbook's SLO chapter. Adapt targets to your scale; a three-nines target you cannot measure is useless.