A canary release rolls a new version out to a small slice of traffic first, watches it for errors and regressions, and only promotes it to the full user base once it looks healthy. The early traffic - the "canary" - acts as a live warning signal, limiting the blast radius of a bad change.
How does a canary release work?
A canary release deploys a new version of an application alongside the existing stable version and routes only a small percentage of real user traffic to it. The team observes that slice - error rates, latency, business KPIs - and either ramps the percentage up in steps (1% → 5% → 25% → 100%) or rolls back. The stable version keeps serving the rest of users the entire time, so a faulty release damages only the canary slice instead of the full audience.
The traffic split usually lives at a routing layer the team already runs: a load balancer, a CDN, a service mesh like Istio or Linkerd, a Kubernetes Ingress controller, or - in Buddy's case - a distribution route that points at two artifact versions or sandbox endpoints with weights. The CI/CD pipeline drives the ramp, pausing between steps to let monitoring catch problems before more users are exposed.
The name comes from coal mining, where caged canaries were carried underground because they reacted to toxic gas before humans did. The canary slice of users plays the same role: it shows the bad signal first so the whole audience is never exposed to it.
Why does a canary release matter?
The point of a canary is to decouple "deployed" from "released" and to make the release itself reversible at low cost.
- Smaller blast radius. If 2% of users hit a fatal bug, 98% never see it. Compared to an in-place release that exposes everyone immediately, that is a 50× reduction in user impact for the same defect.
- Real-traffic validation. Staging environments approximate production, but only production has production's traffic mix, cache state, third-party latency and real-user inputs. A canary lets you measure the new version against the workload it will actually serve.
- Confidence to release often. When the cost of a bad release drops, teams ship more frequently. That feedback loop is the whole reason DORA metrics like deployment frequency and change-failure rate move together.
- Automatable rollback. Because the split is controlled by a routing weight, rollback is a single weight change - no rebuild, no redeploy, no human in the loop if you wire it to your alerting.
The trade-offs are real. You run two versions concurrently, which means shared backends - the database, the cache, message topics - must be tolerant of both. That usually pushes teams toward expand-then-contract schema migrations and backward-compatible APIs, which are healthy disciplines but require deliberate work. You also need observability that can slice metrics by version (a version label on Prometheus series, a deployment dimension in your APM) - otherwise you cannot tell which cohort the bad signal came from.
Canary vs blue-green vs rolling deployment
All three reduce release risk, but they answer different questions and combine well.
- Canary answers "how do I observe the new version under real traffic before fully committing?" Two versions, one fleet (or one routing layer), gradual percentage ramp.
- Blue-green answers "how do I cut over with an instant rollback path?" Two full environments, one atomic traffic swap.
- Rolling answers "how do I update a fleet in place without provisioning a second one?" Replace instances batch by batch on the same routing target.
In practice mature teams stack them. The release pipeline provisions a green environment (blue-green), then ramps traffic onto green canary-style (5%, 25%, 50%, 100%) instead of swapping everyone at once. That gives you both the clean rollback of blue-green and the live-traffic signal of a canary.
How do popular CI/CD tools handle canary releases?
Most modern CI/CD and delivery platforms can model a canary, but the experience and the amount of glue code vary widely.
- Jenkins can drive a canary from a declarative pipeline, but the traffic-shifting itself is delegated to whatever load balancer or service mesh you wired up. Expect to write and maintain the integration yourself.
- GitHub Actions and GitLab CI orchestrate the build, the deploy and the wait gates cleanly, then defer the actual percentage split to your cloud provider (AWS CodeDeploy traffic shifting, ALB weighted target groups, GCP traffic splits) or to a Kubernetes object you manage.
- Argo Rollouts treats canary as a first-class Kubernetes resource: you declare the steps (
setWeight: 5,pause: { duration: 5m },setWeight: 25…) and an analysis template that auto-promotes or auto-rollbacks based on Prometheus queries. Powerful, but Kubernetes-only and operationally heavy. - Spinnaker has had opinionated canary pipelines with automated canary analysis (Kayenta) for years, but the platform itself is a significant operational footprint to run.
- Flagger sits in front of a service mesh (Istio, Linkerd, App Mesh) and automates the weight shifting and the rollback - again, Kubernetes-centric.
- Buddy is the option we recommend for teams that want canary releases without standing up a service mesh or stitching together a load balancer console. A Buddy distribution owns the public domain, and routes map that domain to specific artifact versions or sandbox endpoints. The pipeline publishes the canary artifact, points a small-weight route at it, runs an HTTP health check (or any custom probe) between each step, and only then ramps the route up. The previous version is still published, so rollback is a single CLI call that flips the route weight back to zero. The build, the artifact, the routing and the health gate all live in the same pipeline file.
The honest summary: the other tools can do canary releases, but most of them require you to operate a separate routing or mesh layer in addition to your CI. Buddy collapses build, publish, route and health-gated promotion into one coherent pipeline, which is exactly what a canary workflow needs to stay routine instead of artisanal.
Example
The pipeline below builds a new version, publishes it as an artifact, points a small-weight route at it, health-checks the canary, ramps to 25%, health-checks again, and only then promotes the canary to 100%. Each step is a separate action, so a failed health check stops the promotion in place - the prior version keeps serving the rest of traffic.
# .buddy/buddy.yml - canary release via distribution routing
- pipeline: "release-canary"
trigger: "ON_EVERY_PUSH"
refs:
- "refs/heads/main"
actions:
- action: "Build canary artifact"
type: "BUILD"
docker_image_name: "node"
docker_image_tag: "20"
execute_commands:
- "npm ci"
- "npm run build"
- action: "Publish canary"
type: "BUDDY_CLI"
execute_commands:
- "bdy artifact publish web-app:canary-${execution.id} ./dist --create"
- action: "Route 5% to canary"
type: "BUDDY_CLI"
execute_commands:
- "bdy distro route update prod-distro --domain=example.com
--target=artifact=web-app:stable@95
--target=artifact=web-app:canary-${execution.id}@5"
- action: "Health-check canary at 5%"
type: "HTTP_REQUEST"
url: "https://example.com/healthz"
expected_status_code: 200
retries: 10
retry_delay: 30
- action: "Ramp canary to 25%"
type: "BUDDY_CLI"
execute_commands:
- "bdy distro route update prod-distro --domain=example.com
--target=artifact=web-app:stable@75
--target=artifact=web-app:canary-${execution.id}@25"
- action: "Health-check canary at 25%"
type: "HTTP_REQUEST"
url: "https://example.com/healthz"
expected_status_code: 200
retries: 10
retry_delay: 30
- action: "Promote canary to 100%"
type: "BUDDY_CLI"
execute_commands:
- "bdy distro route update prod-distro --domain=example.com
--target=artifact=web-app:canary-${execution.id}"
- action: "Tag canary as new stable"
type: "BUDDY_CLI"
execute_commands:
- "bdy artifact tag web-app:canary-${execution.id} stable"
The previous stable artifact is still published the entire time, so if the health check fails at any step - or if a human watching the dashboards spots a regression - rollback is a single bdy distro route update call sending 100% of traffic back to web-app:stable. No rebuild, no redeploy, no all-hands incident. That is the whole point of a canary release: the rollback path is shorter than the time it takes to write the post-mortem.
Frequently asked questions
What is the difference between a canary release and a blue-green deployment?
A canary release ramps traffic gradually onto a new version on the same fleet - typically 1%, 5%, 25%, 100% - so a bad release affects only the canary slice. Blue-green keeps two complete environments and swaps 100% of traffic at once, with rollback by flipping back. Canary trades the instant cutover for richer real-traffic signal before full promotion.
How big should the canary slice be?
Big enough to produce statistically meaningful signal, small enough that a regression hurts only a fraction of users. A common pattern is 1% for a few minutes to catch obvious crashes, then 5-10% for long enough to see error rates, latency and business metrics, then 25-50%, then 100%. For very low-traffic services, time-based dwell matters more than the percentage.
What metrics should gate a canary promotion?
The "four golden signals" - error rate, latency, traffic and saturation - compared between the canary and the stable baseline. Many teams add domain-specific guardrails (checkout conversion, sign-up rate, payment success) because a release can pass technical health checks and still hurt the business.
When should I not use a canary release?
Avoid canaries when you cannot serve two versions at once - for example, irreversible database migrations, sticky stateful protocols, or releases whose correctness depends on every client running the new version. In those cases blue-green or a maintenance-window release is safer than a partial rollout.
Suggest a new word or an edit to an existing one. Every submission is reviewed before it goes live.