Why Zerq Is Built in Go — And Why That Matters for API Gateway Performance
The gateway core is a single Go binary with no vendor runtime lock-in. Goroutines, sub-millisecond GC pauses, and a tiny memory footprint change what is operationally possible at the data plane. Here is what that means in practice.
- architecture
- performance
- go
- platform
- operations
Language choice for a gateway core is not a preference question. A gateway sits on the critical path of every API call your business serves — to partners, to internal services, to AI agents. The runtime characteristics of that binary directly affect your p99 latency, your memory budget per replica, your cold-start time in auto-scaling events, and whether you need a separate warmup phase before sending traffic.
This post explains why Zerq's gateway core is written in Go, what that means concretely for deployment and performance, and what it does not solve.
What the gateway actually does on each request
To understand why runtime characteristics matter, it helps to trace what happens on the hot path. For a typical API request through Zerq:
- Accept the TCP connection (or reuse an existing one from a connection pool)
- Parse the HTTP/1.1 or HTTP/2 frame
- Look up the route against the collection and proxy configuration
- Evaluate the auth policy: validate the token signature, check scopes, enforce rate limits
- If a workflow is attached: fan out to upstream calls, wait for responses, apply transformations
- Forward to the upstream or return the constructed response
- Write the structured audit record
Each of these steps needs to happen in microseconds to milliseconds, in parallel for hundreds of concurrent requests. The runtime you choose determines how efficiently you can do that work and what the tail latency looks like when the system is under load.
Why Go fits the gateway model
Goroutines: concurrency without threads
Go's concurrency model is built around goroutines — lightweight user-space threads managed by the Go runtime scheduler, not by the OS kernel. Each goroutine starts at approximately 8KB of stack (growing dynamically as needed), compared to the typical 1-8MB for OS threads in Java or Python.
For a gateway handling 10,000 concurrent connections, the difference is concrete:
- OS threads at 1MB each: ~10GB of stack memory for concurrency primitives alone
- Go goroutines at ~8KB each: ~80MB for the same concurrent connection count
This is not a theoretical benchmark. It determines how many concurrent requests you can handle on a given memory budget, which directly affects the size of the instances you need and the cost of scaling out.
The Go scheduler uses M:N threading — many goroutines multiplexed onto a smaller number of OS threads. Network I/O, which dominates gateway workloads, is handled with non-blocking syscalls under the scheduler. A goroutine waiting for an upstream response parks efficiently without blocking an OS thread.
GC characteristics: predictable latency under load
Garbage collection pauses are the enemy of consistent API latency. A gateway that adds 2ms of latency on average but occasionally adds 50ms during a GC cycle has a p99 that is unusable for latency-sensitive integrations.
Go's garbage collector has been tuned over many releases for low-pause concurrent collection. Current Go versions (1.21+) routinely achieve sub-millisecond GC stop-the-world pauses for typical gateway-scale heap sizes. The collector runs concurrently with application goroutines for most of its work.
Contrast with the JVM: Java's GC options have improved dramatically, but default configurations on common distributions often produce multi-millisecond pauses under load. Tuning JVM GC for gateway workloads requires expertise and ongoing maintenance as heap sizing and traffic patterns change. Go's GC is substantially simpler to operate at consistent latency targets.
Single binary: no runtime installation, no classpath, no warmup
Go compiles to a single statically-linked binary. Deploying a new gateway binary means:
- Build the binary:
go build -o zerq-gateway - Copy it to the target host or container
- Start it
No JDK version to manage. No node_modules directory to install. No classpath configuration. No native library to compile. No warmup period before the binary can serve traffic at full performance.
This matters more than it might seem for:
Auto-scaling events. When an HPA spins up a new gateway replica in response to a traffic spike, a Go binary is ready to serve traffic within seconds of the container starting. A JVM-based gateway might require 30-60 seconds of warmup before JIT compilation has optimised the hot paths sufficiently to serve production latency targets. During that warmup window, the new replica is nominally up but adding latency to requests routed to it.
Rolling deployments. Zero-downtime rolling updates work cleanly when new replicas reach full performance immediately. The readiness probe passes, traffic is shifted, old pods are terminated. With a runtime that has a warmup curve, the readiness probe passing does not mean the replica is performing at full capacity.
Air-gapped deployments. In environments where network access is restricted, a single binary is vastly easier to distribute and verify than a runtime ecosystem with transitive dependencies. Copy one file. Verify its checksum. Done.
Memory footprint: right-sizing replicas
Go programs at rest use significantly less memory than equivalent Java or Node.js programs serving the same traffic. A Go gateway process that handles 1,000 requests per second might use 50-150MB of RSS. A Java equivalent is more likely to start at 256MB and grow depending on heap configuration.
For Kubernetes deployments, this means:
- Smaller resource requests/limits per pod
- More replicas on the same node count, or smaller nodes for the same replica count
- Lower cost at scale for the same throughput
For on-prem deployments in resource-constrained environments — particularly relevant for regulated industries where you cannot simply add cloud capacity — a smaller memory footprint means the gateway competes less with other services on the same hosts.
No vendor runtime dependency
The gateway binary has no dependency on a proprietary managed runtime, a cloud-hosted control plane, or a vendor-operated service mesh agent. It runs on any host that can execute a Linux binary. That includes:
- Your existing VMs
- Your Kubernetes cluster
- A bare-metal server in a classified environment
- A Docker container on a developer laptop
- An ARM-based edge node
The operational dependency graph for the gateway data plane is: the binary, your MongoDB instance (for config and audit), and optional Redis (for distributed rate limit state). No agent. No call-home. No license validation over the network.
What Go does not solve
Being honest about the limitations:
Workflow-heavy requests involve more allocations. When a request triggers a multi-step workflow — fan-out, transformation, conditional branching — the go runtime is allocating and collecting objects. The GC handles this efficiently, but it is not free. Very complex workflows on high-throughput paths will produce more GC pressure than simple proxy requests.
Hot reload requires a restart. Go does not support runtime bytecode reloading. Configuration changes that require a new binary must go through a rolling deployment. This is a different operational model from some JVM-based systems where certain changes can be applied without a restart. Zerq's Management API handles live configuration changes without restarts for policy and routing changes; binary updates require a rolling deploy.
Profiling tooling is different. If you are debugging a performance issue, Go's pprof is excellent but different from Java's profiling ecosystem. Platform teams coming from a Java-heavy background will need to learn the tooling.
The practical summary
The Go choice for Zerq's gateway core produces a deployment artifact that:
- Starts cold in seconds, not minutes — clean behaviour during auto-scaling and rolling deploys
- Handles high concurrent connection counts efficiently on modest memory budgets
- Produces consistent low-latency responses without JVM-style warmup curves or GC tuning requirements
- Deploys as a single binary with no runtime installation — works in restricted and air-gapped environments
- Carries no dependency on a proprietary vendor runtime at the data plane
These are not abstract advantages. They affect how your platform team sizes instances, how your on-call engineers debug tail latency, and whether your air-gapped deployment scenario is feasible at all.
See Zerq's architecture page for the full tech stack overview, or request a demo to discuss your specific performance and deployment requirements.