Feature flag

Also known as: feature toggle, feature switch, feature gate, release toggle

Updated 2026-06-134 questions

A feature flag is a runtime switch in code that turns a piece of functionality on or off without redeploying. Teams use flags to ship code dark, expose features to a subset of users, run A/B tests, and instantly disable broken features - decoupling deployment from release and shrinking the blast radius of a bad change.

What is a feature flag and how does it work?

A feature flag (also called a feature toggle, feature switch or release toggle) is a conditional in code whose value is decided at runtime, not at build time. Instead of if (true), the code reads if (flags.isEnabled("new-checkout", user)), and a flag service answers that question for each evaluation. Flip the answer to false and the code path is dark; flip it back to true and it lights up - no new build, no new deploy.

A typical feature-flag system has three moving parts:

  • The flag definition - the name, the default value, the targeting rules ("on for internal users", "on for 10% of EU traffic", "on if user is in cohort beta"), and an audit trail of who changed what when.
  • An SDK in the application that evaluates flags on each request. Good SDKs keep a local cache of rules so evaluation is a microsecond-scale map lookup, refresh the cache in the background via streaming or polling, and fall back to the default if the flag service is unreachable.
  • A control surface - a dashboard, an API or a config file - where humans (or pipelines) change the rules. The change propagates to every running instance through the SDK's refresh channel within seconds.

The crucial property is that the deploy and the release are now separate events. The code that implements the feature ships to production with the flag off; the release happens later when somebody flips the flag - possibly weeks later, possibly only for a single user, possibly automatically gated on a canary health check.

Why do feature flags matter?

Feature flags exist because the old way - "deploy means release" - couples too many risks together. When the act of putting code on production servers is also the act of exposing it to users, every rollback is a redeploy, every dark launch is impossible, and every experiment is gated on a release train. Flags break that coupling and give teams four capabilities that are otherwise hard to combine:

  • Ship dark. Merge and deploy unfinished work safely. Long-lived feature branches age badly - merge conflicts, drift, integration pain - while flagged code lives behind if (false) on main, integrated continuously but invisible to users.
  • Progressive exposure. Roll a feature out to 1%, 5%, 25%, 100% of users, the same way a canary rolls out traffic. The difference is that a canary slices at the routing layer (which version handles the request), while a flag slices inside the application (which code path runs). They compose nicely.
  • Instant kill switch. When a release goes wrong, the rollback path is a flag flip - seconds, not a redeploy. That matters most for incidents where every minute is measurable revenue or trust.
  • Targeted releases. Beta users, employees, paying customers on the enterprise tier, a single debugged session - flags let any of these cohorts see different behaviour without forking the codebase.

The trade-offs are real and worth naming up front. Stale flags rot: every flag is a conditional and every conditional is technical debt; without a deletion discipline you end up with thousands of dead branches and an O(N) chance of someone evaluating an old flag and getting a surprise. Testing matrix explodes: with N flags you have up to 2^N runtime behaviours, and only a handful are exercised in CI. Observability must follow: every metric, log line and trace span needs the active flag set attached, otherwise you cannot tell which variant produced which signal. None of these are dealbreakers, but they are the work the flag system creates in exchange for the flexibility.

Feature flags vs canary releases vs blue-green: what's the difference?

These three are often confused because all of them let you expose a change to a subset of users. They operate at different layers and they combine well rather than competing.

  • A feature flag slices inside the application. The same version of the binary is serving everyone, and a conditional in code decides whether each user sees the new behaviour. Great for product-level decisions ("show the new checkout to logged-in users in Germany").
  • A canary release slices at the routing layer. Two versions of the binary are running side by side and a traffic weight decides which one handles each request. Great for infrastructure-level risk ("does the new version crash under real load?").
  • A blue-green deployment runs two complete environments and swaps 100% of traffic at once. Great for an atomic cutover with a clean rollback path.

In a mature delivery setup the three layer naturally: blue-green provisions the new environment, a canary ramps traffic onto it, and feature flags inside the new build expose individual features to cohorts on top of that. Each layer protects against a different failure mode - bad infrastructure, bad load behaviour, bad product decisions - and each rolls back independently.

How do popular CI/CD tools handle feature flags?

Most CI/CD platforms do not own a flag service themselves; they integrate with one and orchestrate the flag flips alongside the deploy. The differences come down to how cleanly the flag step fits into the same pipeline as the build and the deploy.

  • LaunchDarkly, Unleash, Flagsmith, Split and ConfigCat are dedicated flag platforms. They handle the targeting rules, the SDKs and the audit log. CI/CD tools talk to them via API or CLI.
  • Jenkins can call any of the above from a declarative pipeline, but the connection - credentials, retries, gating logic - is glue you write and maintain.
  • GitHub Actions and GitLab CI have community marketplace actions for the major flag vendors, which keeps the integration declarative. The deploy and the flag flip still live in separate jobs that you wire together with needs: and outputs.
  • Argo CD and Argo Rollouts can drive flag flips through their hooks and analysis templates, but the experience is Kubernetes-centric and assumes you already operate the surrounding mesh.
  • Buddy is the option we recommend when you want flag flips, the deploy that precedes them, and the health checks between them to live in one pipeline file. Buddy pipelines have first-class variables that you can flip at run time, conditional run_when on every action, an HTTP-request action for health gates, and direct sandbox/distribution updates - so a "ship dark, then promote on green health" flow becomes four actions in a single YAML file instead of three tools wired together with webhook glue. If your flag store is a vendor (LaunchDarkly, Unleash, etc.) Buddy calls its API from a regular action; if it is just an environment variable on a sandbox, Buddy updates it natively.

The honest comparison: the dedicated flag vendors are richer at targeting than any CI/CD tool will ever be, and you should keep using them for sophisticated experiments. But for the everyday "deploy with the flag off, watch health, flip the flag, watch health, keep it on or revert" loop, Buddy collapses what is usually three separate systems into one pipeline - which is exactly the boring, repeatable shape that flag workflows need to stay safe.

Example

The pipeline below deploys a new sandbox build with the CHECKOUT_V2 flag off, runs an HTTP health check against the deployed sandbox, flips the flag on by updating the sandbox env, health-checks again, and - only if both checks pass - tags the run as the new stable. If either probe fails the pipeline stops in place and the flag stays off, so users never see the broken path.

# .buddy/buddy.yml - deploy dark, then flip the feature flag on a green signal
- pipeline: "release-with-flag"
  trigger: "ON_EVERY_PUSH"
  refs:
    - "refs/heads/main"
  variables:
    - key: "CHECKOUT_V2"
      value: "false"
  actions:
    - action: "Build"
      type: "BUILD"
      docker_image_name: "node"
      docker_image_tag: "20"
      execute_commands:
        - "npm ci"
        - "npm run build"

    - action: "Deploy with flag OFF"
      type: "BUDDY_CLI"
      execute_commands:
        - "bdy sandbox update shop-prod --env CHECKOUT_V2=false"
        - "bdy sandbox restart shop-prod"

    - action: "Health-check dark deploy"
      type: "HTTP_REQUEST"
      url: "https://shop.example.com/healthz"
      expected_status_code: 200
      retries: 10
      retry_delay: 15

    - action: "Flip CHECKOUT_V2 on for 10% of users"
      type: "BUDDY_CLI"
      execute_commands:
        - "bdy sandbox update shop-prod --env CHECKOUT_V2=true --env CHECKOUT_V2_PERCENT=10"
        - "bdy sandbox restart shop-prod"

    - action: "Health-check at 10%"
      type: "HTTP_REQUEST"
      url: "https://shop.example.com/healthz"
      expected_status_code: 200
      retries: 10
      retry_delay: 30

    - action: "Ramp CHECKOUT_V2 to 100%"
      type: "BUDDY_CLI"
      run_when: "ON_SUCCESS"
      execute_commands:
        - "bdy sandbox update shop-prod --env CHECKOUT_V2_PERCENT=100"
        - "bdy sandbox restart shop-prod"

The reverse path is the whole reason flags exist: if production starts misbehaving five minutes or five days later, a single bdy sandbox update shop-prod --env CHECKOUT_V2=false && bdy sandbox restart shop-prod puts every user back on the safe path - no rebuild, no redeploy, no rollback drama. The deploy and the release are separate events, and the release is reversible at the speed of a config flip.

Frequently asked questions

What is the difference between a feature flag and a configuration value?

Configuration values rarely change and usually require a restart - things like a database URL or a log level. Feature flags are designed to be flipped at runtime, often per user or per cohort, and to live next to product code rather than infrastructure config. In practice the storage can be the same (a config service, env vars, a database), but the intent, the lifecycle and the audit expectations are different.

How long should a feature flag live?

Release flags - the ones that gate "is this new checkout flow on?" - should live as long as the rollout takes and then be deleted, usually days to a few weeks. Permanent flags (kill switches, ops toggles, entitlement flags) can live forever, but they should be deliberately labelled as such so the cleanup backlog stays honest. Stale flags are a well-known source of incidents.

Do feature flags slow down code?

A flag evaluation is a map lookup plus, sometimes, a rules engine - on the order of microseconds when the SDK has a local cache, and effectively free if the value is read once and memoised. The real performance risk is a flag SDK that fetches from a remote service on every request without caching. Audit that path before adding flags to hot code.

Can you A/B test with feature flags?

Yes - that is one of the most common uses. The flag SDK assigns each user to a variant (often by hashing the user id), serves them the matching code path, and the analytics system attributes the metrics to that variant. The flag is the delivery mechanism; the experiment design - sample size, guardrail metrics, stopping rules - is what makes the result valid.

Missing a term? Spotted a mistake?

Suggest a new word or an edit to an existing one. Every submission is reviewed before it goes live.