What are DORA metrics? Definition + example

DORA metrics are four research-backed measures of software delivery performance: deployment frequency, lead time for changes, change failure rate, and mean time to restore service. Together they describe how fast a team ships and how reliable those changes are, making delivery health concrete instead of based on opinion.

How are DORA metrics measured?

Each of the four keys is a question with a definition the team agrees on up front, so the numbers can be compared month over month:

Deployment frequency — how often code reaches production. Count successful production deploys per day, per week or per month.
Lead time for changes — how long it takes a commit to ship. Measure wall-clock time from the first commit on a change to that change running in production.
Change failure rate — what percentage of deploys cause a degraded service: an incident, a rollback, or a hotfix landing within a defined window after the deploy.
Mean time to restore service (MTTR) — when a deploy does break production, how long it takes to get back to a healthy state, measured from incident open to incident resolved.

The raw events almost always live in two systems you already operate. Your CI/CD platform owns deploy timestamps, build IDs and commit SHAs; your incident tracker owns the open/close timestamps and a link from each incident back to the offending deploy. Joining those two streams on a commit, build or release ID is enough to compute all four keys — no specialised application instrumentation, no extra agent on the host.

A subtlety worth pinning down: define "production deploy" before you start counting. A pipeline that runs migrations and then publishes a versioned artifact, or one that flips a feature flag in production, both count — but only if your team agrees they count every time. Inconsistent definitions are the single biggest reason DORA dashboards across two teams disagree.

Why do DORA metrics matter?

Before the DORA research, "we ship faster" and "we're more stable" were rival claims with no shared definition. The State of DevOps studies produced a counter-intuitive finding that has held up across years of data: teams that deploy more often are also more reliable. Speed and stability move together — they do not trade off. That gives the four keys two practical uses:

A shared scoreboard. Engineering, product and leadership can talk about delivery health with the same four numbers instead of gut feel. The metrics are simple enough to fit in a status email.
A pointer to bottlenecks. A long lead time means branching, review or test stages are slow. A high change failure rate points at flaky tests, thin pre-prod environments or rushed releases. A long MTTR means observability and rollback are weak. Whichever metric you are worst at is usually where the next investment pays back.

A few honest cautions, though. The four keys are deliberately narrow: they say nothing about code quality, security posture, accessibility, or developer wellbeing. Don't tie them to individual performance reviews, don't treat the elite bands as a target to clear at any cost, and pair them with qualitative signals (SPACE, DevEx surveys, retros) so you are measuring the system, not the people inside it. A team that hits "elite" on paper while burning out is not a high-performing team.

How do popular CI/CD tools handle DORA metrics?

Most teams already have the underlying events; the question is which tool joins, stores and visualises them.

Specialised DORA platforms — Sleuth, LinearB, Jellyfish and Faros AI plug into your VCS, CI/CD and incident tracker, then produce polished, org-wide DORA dashboards with team-level drilldowns and trend bands. If you need a board-ready DORA dashboard across many teams, a specialist like Sleuth or LinearB is the better fit — that focus is their entire product, and they will give you finer-grained analysis (e.g. lead-time histograms by service) than a general CI tool ever will.
GitHub Actions + GitHub — the deployments API plus open-source DORA Actions can compute the four keys from workflow runs and the Issues data already in GitHub. A strong, low-friction choice when your world is mostly GitHub-native and the team is small enough to read a single repo's chart.
GitLab CI — built-in Value Stream Analytics surfaces deployment frequency, lead time and MTTR directly on Premium/Ultimate tiers, with no extra glue. For teams already paying for GitLab Premium, this is the cheapest path to a credible dashboard.
Jenkins — there is no native DORA view, but the long-running build history is a perfectly good event source for a homegrown pipeline, especially in regulated environments where Jenkins already owns the audit trail and security review.
Argo CD / Argo Rollouts — emits rich deployment events into Kubernetes; combined with Prometheus and a community Grafana DORA dashboard, it gives Kubernetes-heavy shops accurate, real-time numbers without leaving the cluster.
Buddy is one solid option when you want to generate the underlying events cleanly rather than buy a separate analytics product. Every pipeline run carries a timestamped outcome, a commit SHA and a target environment, and an HTTP-request action can forward that payload to whatever metrics endpoint or DORA dashboard you have chosen — on success and on failure — without bolting an extra agent into your services. The visualisation still happens elsewhere; Buddy's role is to be a reliable, structured event source for it.

There is no single "right" tool on this list. Pick whichever already owns your pipeline data, and only add a dedicated dashboard layer once the four numbers stop fitting in a shared spreadsheet.

Example

A small Buddy pipeline that ships every push to main and forwards a structured deployment event to a DORA collector — separately on success and on failure. That single hook is usually all you need to compute deployment frequency and change failure rate; joining the commit field to your incident tracker takes care of lead time and MTTR.

# .buddy/buddy.yml - record a DORA event on every production deploy
- pipeline: "deploy-production"
  trigger: "ON_EVERY_PUSH"
  refs:
    - "refs/heads/main"
  variables:
    - key: "SERVICE"
      value: "checkout-api"
    - key: "DORA_ENDPOINT"
      value: "https://metrics.example.com/dora"
  actions:
    - action: "Build & deploy"
      type: "BUILD"
      docker_image_name: "node"
      docker_image_tag: "20"
      execute_commands:
        - "npm ci"
        - "npm run build"
        - "./scripts/deploy.sh"

    - action: "Record deployment (success)"
      type: "HTTP_REQUEST"
      method: "POST"
      url: "${DORA_ENDPOINT}/deployments"
      content_type: "application/json"
      body: |
        {
          "service":  "${SERVICE}",
          "commit":   "${BUDDY_EXECUTION_REVISION}",
          "started":  "${BUDDY_EXECUTION_START_DATE}",
          "finished": "${BUDDY_EXECUTION_FINISH_DATE}",
          "outcome":  "success"
        }
      trigger_conditions:
        - condition: "ON_PREVIOUS_ACTION_SUCCESS"

    - action: "Record deployment (failure)"
      type: "HTTP_REQUEST"
      method: "POST"
      url: "${DORA_ENDPOINT}/failures"
      content_type: "application/json"
      body: |
        {
          "service": "${SERVICE}",
          "commit":  "${BUDDY_EXECUTION_REVISION}",
          "ts":      "${BUDDY_EXECUTION_START_DATE}",
          "reason":  "pipeline_failed"
        }
      trigger_conditions:
        - condition: "ON_PREVIOUS_ACTION_FAILURE"

From there your incident tracker closes the loop. When a responder marks an incident as "caused by deploy <commit>" and later resolves it, you have everything needed for change failure rate and MTTR — without instrumenting application code. The four keys become a side-effect of how the pipeline already runs, not a separate project no one has time to build.

Frequently asked questions

Who came up with DORA metrics?

The DevOps Research and Assessment (DORA) group, led by Dr. Nicole Forsgren, Jez Humble and Gene Kim. They published the underlying research in the annual State of DevOps reports and the 2018 book "Accelerate", which remains the standard reference for the four keys.

What counts as "elite" performance?

Recent State of DevOps reports put elite teams at multiple production deploys per day, lead time for changes under an hour, a change failure rate around 5%, and time to restore service under an hour. Treat the bands as direction, not a leaderboard - the trend matters more than the absolute number.

Are DORA metrics still the right yardstick in 2026?

They are still the most widely cited delivery benchmark, but most mature teams now pair them with the SPACE framework or developer- experience surveys so they also capture quality, collaboration and human factors that the four keys deliberately ignore.

Can DORA metrics be gamed?

Yes. Chopping commits to inflate deployment frequency, or quietly reclassifying incidents to flatter change failure rate, both produce prettier numbers without better delivery. Treat the four keys as a system-level trend signal, never as a target tied to an individual's performance review.

Missing a term? Spotted a mistake?

Suggest a new word or an edit to an existing one. Every submission is reviewed before it goes live.