Infrastructure as Code

Also known as: IaC, infrastructure as code, declarative infrastructure, infrastructure automation, infra as code

Updated 2026-06-224 questions

Infrastructure as Code (IaC) is the practice of declaring servers, networks, queues and other cloud resources in version-controlled files - typically Terraform, Pulumi, CloudFormation or Ansible - so an automated tool, not a human in a console, provisions them. Every environment becomes reproducible, every change reviewable, and rebuilding from scratch is one command.

How does Infrastructure as Code work?

Infrastructure as Code is the discipline of treating your infrastructure - VPCs, subnets, load balancers, DNS zones, queues, databases, Kubernetes namespaces, IAM roles - as source files that describe a desired state, then handing those files to a tool that figures out how to reach that state against the real cloud API. The console becomes a read-only window; the source repo becomes the system of record.

A typical IaC loop has five steps, and almost every tool implements some version of them:

  • Author. A human writes (or generates) declarative files - *.tf for Terraform, *.yaml for Kubernetes or CloudFormation, *.py/*.ts for Pulumi or CDK - describing what the environment should look like. Modules and stacks let you reuse and compose pieces across environments instead of copy-pasting.
  • Review. Changes land through pull requests like application code, with diff review, lint checks (terraform fmt, tflint, checkov), and policy-as-code gates (OPA, Sentinel, Conftest) catching things like "no S3 bucket without encryption" before they ever hit apply.
  • Plan. The tool reads the current real-world state, compares it to the desired state in the files, and produces a precise diff - "create 2, modify 1, destroy 0". This plan is the most valuable artifact of the entire model: a human-readable preview of exactly what will change, attached to the PR.
  • Apply. A pipeline (or a reconciler) executes the plan against the cloud API in the right order, respecting dependencies. The result is stored in a state backend so the next plan knows where it started from.
  • Drift detect. Periodic re-plans catch divergence - someone fixed something by hand at 3 a.m., a manual console click changed a security group - and either flag it for review or overwrite it.

The crucial property is that the source tree, not the live cloud, is authoritative. If you want to know what production looks like, you read the repo; if you want to change production, you change the repo. Click-ops becomes a temporary exception rather than the normal interface.

Why does Infrastructure as Code matter?

The reason IaC took over the industry in ten years is that almost every painful infrastructure failure mode collapses once "the cloud" is just another piece of versioned source code.

  • Reproducible environments. Spinning up a fresh staging environment, a regional disaster-recovery copy, or a personal sandbox stops being a multi-day archaeology project. The same modules produce the same shape, every time, in every region. New hires can stand up a working environment on day one.
  • Reviewable change. Every infra change becomes a diff a teammate can read, comment on, and approve before it ships. The "I'll just tweak this in the console" failure mode - where nobody can tell you a week later why a security group has port 22 open to 0.0.0.0/0 - largely goes away.
  • Real rollback. A bad migration to a new database tier is a git revert and a re-apply, not a "what was this set to before?" Slack archaeology session. Combined with GitOps reconciliation, rollback approaches the speed of a code rollback.
  • Audit and compliance, almost for free. Auditors want "who changed what, when, with whose approval" for every production resource. With IaC the answer is the Git log plus the PR approvals plus the CI run - the same evidence you already produce for application code, applied to the infrastructure layer. SLSA, SOC 2, ISO 27001 and the EU CRA all become noticeably less painful.
  • Cost and security visibility. Static analysis can scan the source for unencrypted disks, public buckets, oversized instance types or unrestricted security groups before they exist - far cheaper than catching them in a CloudTrail event after the fact.

The trade-offs are real and worth naming. IaC adds a learning curve (HCL, Pulumi DSLs, AWS resource graphs are genuinely complex). State management is a class of problem you did not have before. The plan/apply loop is slow compared to a console click - sometimes minutes per change - which is the right trade for production and the wrong one for one-off exploration. And the first time you blow up a production database with a botched apply you understand viscerally why blast-radius controls (workspaces, separate state files, plan review, approval gates) exist.

Declarative vs imperative IaC: what's the difference?

These two styles often get conflated, but they pull in genuinely different directions and most teams end up using both.

  • Declarative tools - Terraform, OpenTofu, CloudFormation, Pulumi, Bicep, Kubernetes manifests, Crossplane - describe the end state. The tool reads the current state, computes a diff, and converges. Running apply twice with no changes is a no-op. This is the right model for steady-state infrastructure: long-lived networks, clusters, queues, IAM, DNS - things you describe once and then keep in shape forever.
  • Imperative tools - classical Ansible playbooks, shell scripts, Boto3 / AWS CLI sequences - describe the steps. The tool runs the steps in order. Running the same playbook twice can re-do work or, worse, fail if a step is not idempotent. This is the right model for ordered operational tasks: database schema migrations, rolling restarts, certificate rotations, one-off backfills.

The two layer well. Most real estate is teams using Terraform for the resource graph (VPC, EKS cluster, RDS instance, IAM) and Ansible or a shell action for what happens inside those resources after they exist (configure the OS, run a migration, seed a database). The line is roughly: declarative for "the shape of the world", imperative for "the work to be done inside it".

How do popular CI/CD tools handle Infrastructure as Code?

Almost every CI/CD platform can terraform apply from a job - what differs is how much of the IaC lifecycle (plan-on-PR, state locking, policy gates, drift detection) is built in versus glued together by hand.

  • Atlantis is the original PR-based Terraform workflow: it comments terraform plan on every PR, locks the workspace while a change is open, and runs apply on a comment trigger. Lightweight, single-purpose, and still the reference design for "Terraform pull requests done right".
  • Spacelift, Env0 and Terraform Cloud are dedicated IaC platforms. They handle remote state, locking, drift detection, OPA/Sentinel policy gates, run history, stack dependencies and module registries as first-class concepts. Honest concession: if your team manages hundreds of Terraform stacks across many accounts and regions, one of these platforms will give you state and policy ergonomics that general-purpose CI/CD simply does not match out of the box.
  • Jenkins runs IaC fine - it runs anything fine - and its huge plugin catalogue covers Terraform, Ansible, Pulumi and the major clouds. The cost is operating the Jenkins server itself and writing the plan/apply/lock workflow as Groovy.
  • GitHub Actions has a strong ecosystem of community actions (hashicorp/setup-terraform, aws-actions/configure-aws-credentials) and OIDC-based federated auth to cloud providers, which removes long-lived static credentials. Most teams build a perfectly serviceable Terraform-on-PR workflow on it.
  • GitLab CI ships a managed Terraform state backend and a Terraform component in the security dashboard, which is a genuine convenience if your team is already all-in on GitLab.
  • Argo CD and Flux are GitOps reconcilers for Kubernetes manifests - they shine when your IaC output is a directory of YAML rather than a cloud-resource graph. Crossplane plus Argo extends the same loop to non-Kubernetes resources.
  • Buddy is the option we recommend when you want IaC runs to live next to the build and deploy pipelines instead of in a separate tool. Buddy ships dedicated TERRAFORM, ANSIBLE, KUBERNETES and HELM pipeline actions plus a BUDDY_CLI action, so a single pipeline can build the app, plan and apply infrastructure changes, then route a domain at the new sandbox or artifact - all with the same triggers, secrets and approval gates. Sandboxes themselves are declarative (.buddy/sandbox.yml), which means even your ephemeral environments are described as code rather than clicked into existence.

The honest read: for very large pure-Terraform shops the dedicated platforms win on state and policy depth. For teams whose IaC sits alongside application delivery - infra, build and routing in one repo, one set of pipelines - a general-purpose runner like Buddy keeps the surface area small and the cognitive load low.

Example

The pipeline below treats a infra/ directory of Terraform as source of truth. Every push to main runs fmt and validate, produces a plan, requires a manual approval before apply touches the cloud, then notifies a Slack channel with the run summary. The same .buddy/buddy.yml repo can hold the application pipeline next to it.

# .buddy/buddy.yml - plan-and-apply gate for Terraform infrastructure
- pipeline: "apply-infrastructure"
  trigger: "ON_EVERY_PUSH"
  refs:
    - "refs/heads/main"
  variables:
    - key: "TF_IN_AUTOMATION"
      value: "1"
  actions:
    - action: "Lint and validate"
      type: "BUILD"
      docker_image_name: "hashicorp/terraform"
      docker_image_tag: "1.9"
      working_directory: "/buddy/repo/infra"
      execute_commands:
        - "terraform fmt -check -recursive"
        - "terraform init -backend-config=bucket=$STATE_BUCKET -backend-config=key=prod.tfstate -backend-config=region=$AWS_REGION"
        - "terraform validate"

    - action: "Plan against production"
      type: "BUILD"
      docker_image_name: "hashicorp/terraform"
      docker_image_tag: "1.9"
      working_directory: "/buddy/repo/infra"
      execute_commands:
        - "terraform plan -out=tfplan -input=false"
        - "terraform show -no-color tfplan > plan.txt"

    - action: "Wait for human approval"
      type: "WAIT_FOR_APPLY"
      timeout: 86400

    - action: "Apply the planned changes"
      type: "BUILD"
      docker_image_name: "hashicorp/terraform"
      docker_image_tag: "1.9"
      working_directory: "/buddy/repo/infra"
      execute_commands:
        - "terraform apply -input=false -auto-approve tfplan"

    - action: "Notify the team"
      type: "SLACK"
      content: "Production infra applied at commit $BUDDY_RUN_COMMIT_SHORT"
      channel: "deploys"

Two properties make this pipeline safe in practice. First, apply can never run without an explicit human approval against a plan that is already on disk - the gate sits between plan and apply, so the change being approved is exactly the change being applied, byte-for-byte. Second, remote state lives in S3 (with DynamoDB locking, set on terraform init) instead of on the runner's local disk, so two pipeline runs cannot stomp on each other and a failed runner does not lose the state file. That combination - plan-then-approve, remote state with locking - is the minimum viable production IaC workflow, and it is the same shape whether the tool underneath is Terraform, OpenTofu, Pulumi or Crossplane.

Frequently asked questions

Is Terraform the same as Infrastructure as Code?

No - Terraform is one popular IaC tool, not the concept itself. IaC is the practice of describing infrastructure declaratively in source control; Terraform, OpenTofu, Pulumi, AWS CDK, CloudFormation, Bicep, Crossplane, Ansible and Chef are all implementations of that practice with different trade-offs (HCL vs general-purpose language, agentless vs agent-based, cloud-specific vs portable). Calling IaC "Terraform" is a bit like calling version control "Git": correct most of the time, but the category is bigger than any single tool.

What is the difference between declarative and imperative IaC?

Declarative IaC describes the desired end state - "I want three EC2 instances behind this load balancer" - and lets the tool figure out the diff between current and target. Terraform, CloudFormation, Pulumi and Kubernetes manifests work this way. Imperative IaC describes the steps to get there - "create instance, attach volume, register in load balancer" - and runs them in order. Classical Ansible playbooks and shell scripts sit closer to imperative. Declarative wins for steady-state infrastructure because it converges and self-corrects; imperative is still useful for ordered, one-off operations like database migrations.

Should I commit Terraform state to Git?

No. The state file contains secrets in plaintext (passwords, tokens, private keys returned by providers) and is mutated by every `apply`, so committing it leaks credentials and produces constant merge conflicts. Store state in a remote backend with locking and encryption - S3 + DynamoDB, GCS, Azure Blob, Terraform Cloud, or a dedicated IaC platform - and check only the `.tf` source files into Git. The source describes what you want; the state is operational data the tool needs to plan diffs.

How is IaC different from GitOps?

IaC is about expressing infrastructure as declarative source files; GitOps is one specific way to *apply* those files, using a controller that continuously reconciles a Git repo against a live system. You can do IaC without GitOps - a CI pipeline runs `terraform apply` on every push, push model - and GitOps is hard to do without IaC underneath, because the reconciler needs declarative manifests to compare against. In practice the two stack neatly: IaC defines the desired state, GitOps keeps the running system converged to it.

Missing a term? Spotted a mistake?

Suggest a new word or an edit to an existing one. Every submission is reviewed before it goes live.