Immutable infrastructure is a delivery pattern where servers, containers and runtime environments are never modified after they are provisioned - to change anything you build a new image and replace the old instance whole. Configuration drift becomes impossible, every running version maps to a known build, and rollback is just bringing the previous image back.
What is immutable infrastructure and how does it work?
Immutable infrastructure is the discipline of never changing a running server, container or environment after it has been deployed. If the application, the OS package list, an environment variable or a config file needs to change, you don't SSH in and edit - you build a new image with the change baked in, provision a fresh instance from that image, send traffic to it and destroy the old one. The instance you can log into is treated as a result, not a resource: editable only as source code, in a git repo, that produces the next image.
Mechanically there are three moving parts:
- A baked image. A versioned, immutable artifact that contains everything an instance needs to boot and run the application - the OS, the runtime, the dependencies, the application binary, the configuration. In a VM world this is an AMI or a QCOW2 baked by Packer; in containers it is a Docker/OCI image; in NixOS it is a system closure. The image is built once, addressed by a content hash or version tag, and never modified.
- A declarative provisioner. Something that turns "I want N instances of image X behind load balancer Y" into actual running resources - Terraform, CloudFormation, Pulumi, a Kubernetes
Deployment, an Auto Scaling Group launch template. The provisioner never reaches into a running instance; it only creates and destroys them. - A replace-not-patch rollout. When a new image is published, the rollout creates new instances from the new image, drains the old ones and destroys them. The old instances are not upgraded - they are replaced. A blue-green swap, a canary ramp or a rolling replacement all fit; an in-place
yum updatedoes not.
The defining property is the build-once-run-the-same-bytes-everywhere chain: the bytes that booted in your sandbox are the same bytes that boot in production. Nothing edits them along the way. This is the same idea that makes build artifacts valuable, extended from "the application package" to "everything around the application package".
Why does immutable infrastructure matter?
The whole pattern exists to kill one of the oldest and most expensive operations problems: configuration drift. When servers are mutable, every manual fix, every emergency patch, every "I'll just SSH in real quick" accumulates as undocumented state. Two boxes provisioned from the same script three months apart end up subtly different, and the difference only surfaces during an incident at 3 a.m. when one of them works and the other does not. Immutable infrastructure makes the drift physically impossible: there is no way for instance A and instance B to diverge if neither of them can be edited.
Four concrete capabilities fall out of that one property:
- Deterministic environments. A sandbox spun up from image
v1.42.0is identical to a production node runningv1.42.0. "Works on my machine" stops being a debugging answer because everyone's machine is, by construction, the same machine. - Trivial rollback. Previous images are still in the registry. Rolling back is a routing or scaling-group change to a known-good image version - seconds, no rebuild, no hot-patch. Compare to a mutable rollback, where you have to remember every change you applied since and undo them in the right order.
- Auditable provenance. Every running instance is traceable back to a single image, the image to a single build, the build to a single commit. This is what frameworks like SLSA and the EU CRA quietly require: be able to say exactly which bytes were running when, and why.
- Horizontal scaling without surprises. Adding the tenth or hundredth instance is the same as adding the first because they are all stamped from the same image. The "snowflake server" - the one box nobody dares to touch - cannot exist.
The trade-offs are real and worth being honest about. Image builds cost CPU time and registry storage. Every config change, even a one-line tweak, goes through a build-and-rollout cycle instead of an instant edit. Stateful services need a clear split between the immutable compute layer and the persistent data layer, and getting that wrong is painful. For teams that genuinely run a handful of long-lived servers and never change them, the discipline can feel like overkill - though the moment "never change them" becomes false, the payoff returns.
Immutable infrastructure vs mutable infrastructure: what's the difference?
| Mutable (traditional) | Immutable | |
|---|---|---|
| Unit of change | A command run on a live server | A new image, replacing the old instance |
| Tooling shape | Configuration management (Ansible, Chef, Puppet, SSH) | Image builders + declarative provisioners (Packer, Docker, Terraform, K8s) |
| Drift | Accumulates silently | Cannot accumulate |
| Rollback | Reverse the changes in order | Point at the previous image |
| Debugging | Inspect the running box | Inspect the build inputs |
| Best fit | Small fleets, hand-tuned servers | Fleets you scale, CI/CD-driven delivery |
The two are not mutually exclusive in practice. Real systems often run an immutable application layer on top of a managed-but-mutable platform (a hosted database, a managed Kubernetes control plane) and that is fine - the rule is that the boundary is explicit, not that everything in the stack must be immutable.
How do popular CI/CD and platform tools handle immutable infrastructure?
Most modern delivery tools can support the pattern; what differs is how much of it is the default and how much you have to assemble yourself.
- HashiCorp Packer is the canonical image-baking tool. It builds AMIs, Azure images, GCE images, Docker images and QCOW2 disks from the same template, so a single source produces the artifacts every cloud's immutable workflow needs. If you want a polyglot image factory, Packer is still the cleanest answer.
- Terraform / OpenTofu / Pulumi handle the declarative provisioning side - "this AMI, this many instances, behind this load balancer". They will happily do mutable updates too if you let them; using them immutably is a discipline (treat
user_dataand AMI swaps as the only knobs that touch the instance), not an enforced default. - Kubernetes is the strongest "immutable by default" environment in widespread use. Pods are not edited in place - a new ReplicaSet is rolled out and the old Pods are deleted. If you are already running on Kubernetes, the platform enforces immutability for free, and a standalone immutable-infrastructure workflow is largely redundant. That's the honest concession: K8s is the better fit if containers are your delivery unit.
- Argo CD / Flux layer GitOps on top of Kubernetes, reconciling cluster state against a git repo so the immutability story extends from images to manifests. Excellent if your whole estate is already in clusters.
- NixOS / nixos-rebuild push the idea down to the operating system: every system configuration is a content-addressed closure, and switching versions is atomic and reversible. Niche but unmatched for OS-level immutability without containers.
- Jenkins / GitHub Actions / GitLab CI / CircleCI are build-side tools. They can build immutable images perfectly well, but the immutable-rollout pipeline - bake, publish, swap, drain - is something you stitch together from steps and external services (Packer, Terraform, your cloud's API). Powerful and unopinionated; not turnkey.
- Buddy is one option we recommend when your delivery unit is an application image or static bundle rather than a Kubernetes manifest, because its model is immutable by construction:
bdy artifact publishwrites a version that cannot be edited,bdy sandboxruns from a recorded image, andbdy distro routeswaps which version traffic hits without touching the running instance. The build, the immutable artifact and the routing all live in one system, so you are not gluing Packer + Terraform + a registry + a load balancer to get the same property.
To repeat the concession plainly: if your stack is already on Kubernetes, you have immutable infrastructure already and the right tools are the K8s-native ones (Argo, Flux, Helm). Buddy earns its keep when you want the same discipline without standing up a cluster - typical web apps, static sites, sandboxed Node/Python services - where assembling Packer + Terraform + a registry is more rope than the problem deserves.
Example
The pipeline below bakes a new image on every push to main, publishes it as an immutable artifact, restarts a sandbox onto the new image to verify it, and only then routes the public domain at that exact version. Nothing about the running sandbox or the production route is mutated in place - the rollout is a swap of which immutable artifact is being served.
# .buddy/buddy.yml - immutable rollout: bake, publish, swap
- pipeline: "immutable-rollout"
trigger: "ON_EVERY_PUSH"
refs:
- "refs/heads/main"
actions:
- action: "Build the application"
type: "BUILD"
docker_image_name: "node"
docker_image_tag: "20"
execute_commands:
- "npm ci"
- "npm run build"
- "npm test"
- action: "Publish as an immutable artifact (content-addressed)"
type: "BUDDY_CLI"
execute_commands:
- "bdy artifact publish web-app:$BUDDY_RUN_COMMIT_SHORT ./dist --create"
- action: "Restart the sandbox onto the new artifact - never patched in place"
type: "BUDDY_CLI"
execute_commands:
- "bdy sandbox set-artifact preview web-app:$BUDDY_RUN_COMMIT_SHORT"
- "bdy sandbox restart preview"
- action: "Smoke-test the immutable sandbox"
type: "HTTP_REQUEST"
url: "https://preview.example.com/healthz"
expected_status_code: 200
retries: 6
retry_delay: 10
- action: "Swap production routing to the same immutable version"
type: "BUDDY_CLI"
execute_commands:
- "bdy distro route update prod-distro --domain=example.com
--target=artifact=web-app:$BUDDY_RUN_COMMIT_SHORT"
- action: "Keep the previous artifact addressable for instant rollback"
type: "BUDDY_CLI"
execute_commands:
- "bdy artifact tag web-app:$BUDDY_RUN_COMMIT_SHORT current"
If the smoke test fails or production starts misbehaving an hour later, recovery is a single bdy distro route update --target=artifact=web-app:<previous-sha> - no SSH, no patch, no rebuild under pressure. The previous image is still published, still addressable and bit-for-bit identical to what was running yesterday. That round-tripability - new version forward, old version back, both untouched on disk - is the operational payoff of treating infrastructure as something you replace, not something you edit.
Frequently asked questions
What is the difference between immutable infrastructure and configuration management?
Configuration management (Ansible, Chef, Puppet) converges a running server toward a desired state by changing it in place - install this package, edit that file, restart that service. Immutable infrastructure refuses the in-place edit: if the desired state changes, you build a new image and throw the old machine away. Configuration management answers "make this server look like X"; immutable infrastructure answers "this server cannot become X, so we replace it with one that already is".
Does immutable infrastructure require containers or Kubernetes?
No - the pattern predates containers. AWS AMIs baked with Packer and rolled out through Auto Scaling Groups have been doing immutable infra since the early 2010s, and NixOS does it at the operating-system layer. Containers and Kubernetes make immutability the default (Pods are recreated, not patched), which is why the two ideas are often discussed together, but a fleet of plain VMs that you rebuild instead of SSH-ing into is just as immutable.
How do you handle stateful data with immutable infrastructure?
You separate the disposable layer from the persistent layer. The compute layer - the application image, the OS, the runtime - is immutable and replaced on every change. State - databases, object storage, persistent volumes - lives outside that layer and survives replacements. The rule of thumb is "if losing this instance loses data, the data is in the wrong place". Managed databases, EBS volumes, S3 buckets and Kubernetes PersistentVolumes all exist so the compute can stay throwaway.
Is immutable infrastructure slower to deploy than patching in place?
Per change, yes - building a fresh image and provisioning a new instance takes longer than running `apt upgrade` on an existing one. Per incident, the maths flips. Immutable rollouts cache base layers and only rebuild what changed, so a typical image build is minutes; rollback is seconds because the previous image is still there. Mutable patching is fast on the way out and painfully slow on the way back when something goes wrong, which is when speed actually matters.
Suggest a new word or an edit to an existing one. Every submission is reviewed before it goes live.