Your API gateway is probably logging the wrong things. Here's what your compliance team actually needs.

When a security incident happens or a regulator asks questions, your engineering team opens the logs and discovers the same problem: the data is there, but it cannot answer the question being asked.

Request logs are full of HTTP status codes, latency percentiles, and error rates. What the compliance team needs to know is: which partner accessed which API product, on which date, and what operations did they perform?

These are different questions. They need different logs.

The two-layer model most teams are missing

Effective API observability has two distinct layers, and most teams only build one of them.

Layer 1: Operational metrics and request logs. This is what engineers use for debugging and capacity planning. It includes:

Request volume, error rate, latency by endpoint and time window
Which upstream services are slow or returning errors
Which clients are hitting rate limits
Infrastructure health — pod readiness, memory, CPU

This layer answers: is the system working correctly right now?

Layer 2: Audit logs. This is what compliance teams, security teams, and regulators use. It includes:

Every API request, attributed to a specific authenticated identity (client + profile)
Every configuration change — who changed what, from what value, to what value, at what time
Access control decisions — what was permitted, what was denied, and why

This layer answers: who did what, and when?

The gap between these two layers is where most compliance failures start. Teams have Layer 1. Auditors ask Layer 2 questions. The data exists, but it cannot be queried in the way auditors need.

What auditors actually ask

In practice, a compliance review or security investigation surfaces questions like:

"Show me all API calls made by Partner X to the payments product in March."
"When did the workflow configuration for this endpoint last change, and who changed it?"
"Was any API call made to this endpoint outside business hours last month?"
"Which clients accessed patient data resources between these two dates?"

These questions share a structure: they are filtered by identity, product, and time range. The answer needs to be producible quickly — not after a multi-day log aggregation project.

This means your audit data needs to be:

Per-partner, not just per-endpoint: aggregated logs by endpoint tell you traffic volume; you need logs filterable by the authenticated caller identity
Structured and queryable: JSON format, indexed by client, product, and timestamp
Separated from debug noise: a compliance query should not wade through connection pool logs and health check pings

Why Prometheus metrics alone are not enough

Prometheus is excellent for operational visibility. Error rates, latency histograms, request throughput by product — all of this is essential for running a reliable gateway.

But Prometheus metrics are aggregated. By design, a counter metric answers "how many requests happened" — not "which specific requests happened and who made them." When a regulator asks for a filtered report of access events, Prometheus cannot produce it.

The right model is: Prometheus for operational metrics, structured JSON audit logs for compliance queries. Both layers, side by side.

Configuration change audit: the layer teams forget

Request logging captures what clients did. What is often missing is a log of what operators did.

When a workflow definition changes, a rate limit is updated, or a partner's access is modified — these are configuration events that carry compliance weight. If a breach happens and the investigation shows an access control rule was changed three hours earlier, "who changed that rule" is one of the first questions asked.

Configuration audit should capture:

The identity of the operator who made the change (not just "admin")
The resource that changed (which collection, which proxy, which workflow)
The before and after values
The timestamp

This is separate from — and complementary to — request logging. Together they give a complete picture of both what clients did and what operators did.

What to measure for partner SLAs

Observability is not only for compliance. It is also the evidence base for partner conversations about SLAs.

When a partner reports that your API was slow or unavailable last week, you need per-partner, per-product latency and error data — not aggregate gateway metrics. If the problem was on their network, your per-partner data shows that their calls succeeded. If the problem was on your side, the data shows when and which endpoints were affected.

Per-partner metrics — request volume, p95 latency, error rate — segmented by product and time window, are the foundation for honest SLA conversations.

The operational observability checklist

Before your next compliance review, verify:

Can you produce a list of every API call made by a specific partner to a specific product in a given date range?
Can you show the full history of changes to a specific workflow definition — who changed it, when, and what changed?
Are your audit logs in a structured format that your SIEM can ingest without transformation?
Are per-partner request logs separate from aggregate gateway metrics?
Can you answer "were any unusual access patterns detected for this client last month" without a multi-day data engineering project?

If any of these require significant manual effort, the observability architecture needs attention before the next audit makes it urgent.

Zerq provides structured request logs and configuration audit logs, filterable by partner, product, and time range — alongside Prometheus metrics for operational visibility. See how observability works in Zerq or request a demo to walk through your compliance reporting requirements.