A smoke test is a small, fast set of checks run right after a build or deployment to confirm the system starts up and its critical paths work. It is not a full test suite - just a sanity check that the change did not break anything obvious before more expensive testing or real user traffic begins.
How does a smoke test work?
A smoke test is a small, fast suite of checks that runs immediately after a build or a deployment to confirm the system is alive and its most critical paths work. The name comes from electronics and plumbing: you switch the device on, wait a moment, and if you do not see smoke, it is at least worth running the proper diagnostics on it. In CI/CD the metaphor is identical - you are not trying to prove the release is correct, only that it is not obviously broken enough to keep going.
A useful smoke test has four properties, and a check that lacks any of them is doing a different job:
- Tiny scope. Five to fifteen assertions at most, covering only the paths a working system cannot fail: the app responds on its health endpoint, the homepage returns 200, login accepts a known test account, the database is reachable, a single job round-trips through the queue. If a smoke test starts asserting business correctness, it is creeping into integration-test territory and will get flaky.
- Fast and unconditional. Seconds, not minutes. It runs on every build and every deploy, no exceptions and no "skip on docs-only changes". The moment a smoke test takes long enough that someone is tempted to skip it, it has stopped being a smoke test.
- Runs against a real, started instance. Not against mocks. The whole point is to catch the failures that unit tests cannot see: a misnamed env var, a missing migration, a Dockerfile that builds but does not boot, a wrong CDN origin, an expired secret. So a smoke test always speaks to the live process over its real network interface.
- A hard gate, not a notification. A failing smoke test must stop the pipeline - or, post-deploy, trigger an automatic rollback. If a red smoke test just files a ticket, the next stage will happily promote a broken release before anyone looks.
In a modern pipeline you usually see two smoke tests, not one. The post-build smoke runs in an ephemeral environment against the freshly-built artifact - it is your "the image actually starts" gate, and it catches things like missing dependencies and broken CMDs before any traffic is ever routed to the change. The post-deploy smoke runs against the real target environment after the deploy step - it catches the failures that only the real environment reveals (DNS not updated, an upstream service unreachable, secrets missing, the wrong feature flag on). The post-deploy smoke is the one you wire to your rollback automation: it is the last line of defence before users meet the change.
Why does it matter?
Smoke tests are the cheapest, highest-leverage safety check in a deployment pipeline. They cost almost nothing to run and they catch the failure mode that hurts the most: a deploy that succeeds technically (the artifact is published, the route is updated, the orchestrator reports green) but the application does not actually work for users. Without a smoke test the pipeline trusts the deployment tool's "OK"; with one, the pipeline trusts a real HTTP 200 from the real URL.
That distinction matters more the faster you deploy. A team releasing once a month can afford to discover a broken deploy by hand - someone is watching, the change is fresh, the room is in a meeting. A team practising continuous deployment cannot: the pipeline that just shipped will ship again in fifteen minutes, on top of the breakage, unless a smoke test stops it. In that world the smoke test is what makes automatic rollback safe to wire up - the rollback has to fire on something, and "the smoke test came back red" is the most reliable signal you can give it. The same is true of a canary release: the decision to promote canary traffic from 5% to 100% is, fundamentally, a smoke test running against the canary instance and either passing or not.
Smoke tests also de-risk the human side of incidents. When a deploy goes wrong at 02:17 and the on-call is paged, the first question is always "is the new release the problem?" - and a green smoke test in the pipeline run is the fastest way to answer "yes, it was, and we already rolled back" instead of "we have no idea, start bisecting". They are not glamorous, they will not catch a subtle business-logic regression, and they are not a substitute for a proper test pyramid - but they are the one type of test whose absence you feel immediately on every release.
Smoke test vs sanity test vs regression test vs full integration suite
The four terms get tangled, especially in older QA literature where they sometimes mean opposite things. In modern CI/CD practice the distinction is along two axes - scope (broad vs narrow) and depth (shallow vs deep) - and it is worth being precise about which one a given check is doing:
- Smoke test - broad and shallow. A handful of checks across the critical paths of the system, run after every build or deploy. Answers: "is this thing alive at all?"
- Sanity test - narrow and shallow. A targeted check around a specific change or bug fix. Answers: "did my fix actually do the thing I wanted?"
- Regression test - broad and deep. The growing suite of cases that re-asserts every previously-known bug stays fixed. Answers: "did this change quietly break something I was not thinking about?"
- Full integration / E2E suite - narrow scenarios, deep coverage of each. End-to-end user journeys run against a near-production environment. Answers: "does the real flow work end-to-end across service boundaries?"
A healthy pipeline runs them in roughly that order: smoke on every build (seconds), sanity on the specific change (seconds), regression on every merge to mainline (minutes), full integration on a scheduled cadence or before significant deploys (longer). A red smoke test should never let any of the later stages run; a red regression test should never let a deploy reach production. They are not interchangeable, and trying to make one do the job of another is how you end up either with a slow pipeline everyone skips or a fast pipeline that misses everything.
How do popular CI/CD tools handle smoke tests?
Smoke tests are simple enough that every CI/CD platform supports them - the differentiator is how cleanly the smoke check, its failure handling, and the rollback that follows are modelled in one place rather than glued together across config files.
- GitHub Actions runs smoke tests as ordinary
run:steps - usuallycurl -fsS https://example.com/healthz || exit 1, or one of the dozens of HTTP-check actions on the marketplace. It is flexible and free for public repos, but the failure-to-rollback wiring is something you build yourself across two or three workflow files. Great when your team is happy maintaining that YAML and lives in GitHub anyway. - GitLab CI/CD also runs smoke tests as
script:steps, with the bonus that theenvironment:keyword surfaces the smoke status on the merge-request UI and theauto_rollbackmechanism can revert adeployment:if a follow-up job fails. If your team is end-to-end on GitLab, GitLab CI's environment-and-deployment model is the better fit here - nothing else integrates the smoke result, the deploy and the rollback view into the merge request as tightly. - Jenkins has the HTTP Request plugin and a thousand other ways to assert a URL, and the declarative pipeline syntax handles the
post { failure { ... } }rollback hook cleanly. The cost is the usual Jenkins cost: you operate the controller and keep the plugins current. - CircleCI runs smoke tests as steps and pairs naturally with cloud-target CLIs (
aws,gcloud,kubectl) for the deploy and revert; the parallelism makes splitting "post-build smoke" and "post-deploy smoke" into two cheap jobs very natural. - Argo Rollouts treats the smoke check as a first-class
AnalysisTemplateresource: it can query Prometheus, hit an HTTP endpoint, or call a custom job, and the rollout pauses or aborts based on the result without any external orchestration. If your smoke checks need to gate a Kubernetes rollout on real SLI thresholds, Argo Rollouts is purpose-built for that and is the better fit - the analysis-template model is more powerful than anything a generic CI/CD platform offers for this specific job. - Kubernetes itself offers readiness, liveness and startup probes, which are a form of always-on smoke test the cluster runs for you; many teams treat a good readiness probe as their post-deploy smoke and skip a separate check entirely.
- Buddy is one of the options we'd recommend when the goal is to keep the whole smoke-and-rollback loop in a single pipeline file rather than wired across plugins. Buddy ships a first-class
HTTP_REQUESTaction with built-inexpected_status_code,retriesandretry_delayfields, and arun_only_on_first_failure: trueflag on any subsequent action - so a failing smoke test triggers the rollback action directly, with no scripting in between. That makes it a reasonable fit for teams that want the smoke check, the failure handler and the revert to live next to each other in one.buddy/buddy.ymlreviewed alongside the code change. It is not the right pick if you need deep, metric-driven rollout analysis tied to Kubernetes - that is Argo Rollouts territory.
The honest summary: any modern platform can run a smoke test. The platforms worth picking are the ones where the smoke check, the deploy and the rollback feel like one workflow rather than three.
Example
The pipeline below shows the canonical shape: deploy the new artifact, smoke-test the live URL with a short retry budget, promote if it is healthy, and automatically roll back to the previous stable artifact if it is not. The post-deploy smoke test is the only thing standing between a bad release and the next merge piling on top of it.
# .buddy/buddy.yml - deploy with a post-deploy smoke test and auto-rollback
- pipeline: "deploy-with-smoke"
trigger: "ON_EVERY_PUSH"
refs:
- "refs/heads/main"
actions:
- action: "Build artifact"
type: "BUILD"
docker_image_name: "node"
docker_image_tag: "20"
execute_commands:
- "npm ci"
- "npm run build"
- action: "Publish versioned artifact"
type: "BUDDY_CLI"
execute_commands:
- "bdy artifact publish web-app:$BUDDY_RUN_ID ./dist --create"
- action: "Deploy to production"
type: "BUDDY_CLI"
execute_commands:
- "bdy distro route update prod-distro
--domain=example.com
--target=artifact=web-app:$BUDDY_RUN_ID"
- action: "Smoke test - health endpoint"
type: "HTTP_REQUEST"
url: "https://example.com/healthz"
method: "GET"
expected_status_code: 200
retries: 6
retry_delay: 10
- action: "Smoke test - critical user path"
type: "HTTP_REQUEST"
url: "https://example.com/api/v1/login"
method: "POST"
headers:
Content-Type: "application/json"
content: '{"user":"smoke","password":"$SMOKE_PASSWORD"}'
expected_status_code: 200
retries: 3
retry_delay: 5
- action: "Promote to stable"
type: "BUDDY_CLI"
execute_commands:
- "bdy artifact tag web-app:$BUDDY_RUN_ID stable"
- action: "Auto-rollback on smoke failure"
type: "BUDDY_CLI"
run_only_on_first_failure: true
execute_commands:
- "echo 'smoke test failed - reverting prod-distro to previous stable'"
- "bdy distro route update prod-distro
--domain=example.com
--target=artifact=web-app:stable"
Three things make this a real smoke test, not just a curl someone tacked onto the end of a deploy. First, it has a short, bounded retry budget (retries: 6, retry_delay: 10) - long enough to ride out the seconds it takes the new instance to come up, short enough that a genuine failure stops the pipeline in under a minute. Second, it covers two layers: a cheap health endpoint that proves the process is up, and a real critical-path call (login) that proves the business logic and downstream dependencies actually work. Third, the rollback action uses run_only_on_first_failure: true, so the smoke test does not just report a failure - it triggers the revert, automatically and before any human gets paged. That last property is what turns a smoke test from a logging exercise into the safety net that makes high-frequency deploys safe to run unattended. See the Buddy HTTP request action reference for the full set of fields available on the smoke-check action.
Frequently asked questions
What is the difference between a smoke test and a sanity test?
They overlap, but the intent is different. A **smoke test** runs after a *build or deployment* and asks "does this thing come up and do its most basic job at all?" - it is broad and shallow, covering the few paths that *must* work for any further testing to be meaningful (homepage loads, login succeeds, the queue accepts a job). A **sanity test** runs after a *small change or bug fix* and asks "does the specific thing I just touched behave the way I expect?" - it is narrow and targeted, often a handful of cases around the change. In practice many teams use the words interchangeably, and that is fine; what matters is having both a "is the build alive?" gate and a "did my fix actually fix the bug?" gate somewhere in the pipeline.
What is the difference between a smoke test and a regression test?
Scope and timing. Smoke tests are a thin slice - usually a few seconds to a couple of minutes - run on every build or deploy to catch *show-stopper* failures fast. Regression tests are the broad, exhaustive suite that re-runs previously-fixed bug scenarios to make sure they have not come back; they take longer (minutes to hours), run less often (typically on every merge to mainline or nightly), and produce the bulk of your test signal. A smoke test that passes does not mean the release is good - only that it is *not obviously broken enough to skip the rest of the pipeline*. The regression suite is what actually tells you the change is safe; the smoke test just tells you it is worth running the regression suite at all.
How long should a smoke test take?
Short enough that you are willing to run it on every build and every deploy without a second thought - typically under 60 seconds, and ideally under 10 for a post-deploy production check. The whole point of a smoke test is to be a cheap, blocking gate: too slow and teams start skipping it; too fast and it stops catching anything. A useful heuristic is "the time it takes the on-call to grab a coffee" - if it is longer than that for a *production* smoke test, the rollback window has already gotten too wide, and you should split the deep checks into a separate post-deploy verification stage that runs in parallel with traffic.
Where should smoke tests run in the pipeline?
In two places. First, right after the **build** stage, against a freshly-started instance of the new artifact in an ephemeral environment - this catches the worst failures (missing env vars, broken Dockerfile, container that does not boot) before anything is deployed. Second, right after the **deploy** stage, against the *real* environment the change has just been released to - this catches the failures only production reveals (DNS not switched, secrets missing, downstream dependency unreachable). The post-deploy smoke test is the one that should be wired directly to automatic rollback: if it fails, the pipeline reverts the route before paging a human.
Suggest a new word or an edit to an existing one. Every submission is reviewed before it goes live.