Skip to main content

Canary-release your API with a workflow config change — no Kubernetes required

Most canary deployment guides assume Kubernetes and a service mesh. Here's how to do percentage-based API traffic splitting with a workflow branch — no mesh, no kubectl, no sprint-long infrastructure project.

  • api-management
  • workflows
  • deployment
  • developer-experience
Zerq team

Every guide to canary deployments says the same thing: configure your service mesh, set up a VirtualService or HTTPRoute, adjust the weight field, apply the manifest, watch the metrics, roll forward or back.

That is a reasonable workflow if you already have Kubernetes and Istio in production and a platform team to support them. It is a two-sprint infrastructure project if you do not.

There is a simpler model: put the traffic splitting logic in the API gateway workflow, not in the infrastructure layer. Change a number, publish, done.

What a canary release actually requires

Strip away the infrastructure complexity and a canary release has three requirements:

  1. Split incoming traffic between old backend and new backend at a configurable percentage
  2. Observe the canary — errors, latency, success rate — before committing to 100%
  3. Roll back instantly if the canary performs badly, without a deployment

That is it. The mechanism for splitting traffic is less important than the ability to control the split and roll it back fast.

The workflow-based approach

In a gateway workflow, you model the canary as a conditional branch. The first node evaluates an incoming request and routes it: 95% goes to the stable backend, 5% goes to the canary.

A simplified version looks like this:

[HTTP Trigger]
    |
[Branch: canary?]
    |--- yes (5%) ---> [Call: new-backend] ---> [Response]
    |--- no  (95%) --> [Call: stable-backend] -> [Response]

The branch condition can be:

  • Percentage-based — random assignment, seeded per request
  • Header-basedX-Canary: true routes testers to the new backend
  • Partner-based — specific client IDs hit the canary; others hit stable
  • Combined — 5% of traffic overall, but always beta users regardless of percentage

The backends are just different upstream URLs. The workflow calls whichever one the branch selects and returns its response.

What rollback looks like

Rollback is editing the branch condition and publishing the workflow. Change 5% to 0%, or point the branch to the stable backend for all traffic. No kubectl rollout undo. No manifest revert. No infrastructure coordination.

The time from "the canary is bad" to "all traffic is on stable" is the time it takes to edit a number and click publish.

What you can observe during the canary

Because both the stable and canary paths go through the same gateway, you get unified observability without extra instrumentation:

  • Per-path metrics: request volume, error rate, and latency for each branch — visible in the same dashboard
  • Structured logs: every request logs which branch was taken, which backend was called, and the response status and latency
  • Audit trail: the workflow change that introduced the canary — who changed it, when, from what to what — is in the audit log

When you are ready to promote to 100%, you have the data to justify it. When you need to explain a rollback, you have the data to explain what happened.

The case for header-based canary in addition to percentage

Percentage-based canary is good for production validation. Header-based canary is good for controlled testing before you open it to random traffic.

A useful pattern: start with X-Canary: true routing a specific group (internal users, beta partners, your own team). Once that group validates the new backend, switch to a percentage split for broader exposure. Once the percentage validates cleanly, go to 100%.

The workflow handles all three phases without a redeployment:

  1. Branch on X-Canary: true header → new backend
  2. Add a 5% random split → new backend for random traffic
  3. Remove the canary branch entirely → all traffic on new backend

When to use the service mesh instead

The workflow approach is the right choice when:

  • You need to move fast without infrastructure work
  • The traffic split is per-API or per-product (not cluster-wide)
  • Your team owns the gateway workflow but not the Kubernetes infra

A service mesh is the right choice when:

  • You need canary at the infrastructure level, across multiple services simultaneously
  • Your platform team has Kubernetes and Istio expertise and capacity
  • You need integration with cluster-level observability like Prometheus/Grafana at the pod level

These are not mutually exclusive. Many teams use gateway-level canary for API products and service mesh for infrastructure-level traffic management. The point is that you should not need a service mesh to canary a single API endpoint.


Zerq's workflow designer lets you model canary releases, A/B tests, and header-based routing as configurable branches — no infrastructure changes required. See conditional routing and canary in the use cases or request a demo to walk through your release workflow.