Certificate Rotation, Vault Integration, and Zero Secrets in Config — A Security Checklist for API Platforms

An API gateway platform has more credential surfaces than most teams track systematically. There are the obvious ones — client API keys, OAuth tokens, upstream service credentials — and the less obvious ones: TLS certificates for mTLS enforcement, the database credentials the gateway uses to connect to MongoDB, the Redis credentials for rate limit state, the connection details in the management UI's environment config.

Most security reviews cover some of these. This checklist is designed to cover all of them, in a format you can run against your current deployment and identify gaps before an auditor or an incident does it for you.

Checklist area 1: mTLS client certificates

What it covers: Client certificates used for mutual TLS authentication between API consumers and your gateway.

☐ Certificate inventory exists

You have a list of every mTLS certificate issued to a client, with: subject, issuer, expiry date, the client identity it is bound to, and the last time it was used. This list is reviewed quarterly.

Why it matters: An mTLS certificate for a decommissioned partner that has never been revoked is an active access path. If the certificate was ever shared or extracted from the partner's environment, it can be used to authenticate after the partnership ended.

☐ Certificate expiry monitoring is automated

Certificates expiring within 30 days trigger an alert to the certificate owner and the platform team. Certificates expiring within 7 days trigger an escalation.

Why it matters: An expired mTLS certificate causes authentication failures for the client. If you do not have advance warning, you discover this when a partner calls you with a production outage.

☐ Certificate rotation is overlap-safe

When rotating a certificate, you issue the new certificate before revoking the old one. There is an explicit overlap window (typically 24-48 hours) during which both are valid. The old certificate is revoked only after confirming the client is using the new one.

Why it matters: Revoking the old certificate before the client has deployed the new one causes an outage. Rotation without overlap is higher-risk than rotation with overlap.

☐ Revocation is tested

You have verified that revoking a certificate through your gateway prevents authentication within one token validation TTL. This is tested in a staging environment, not assumed.

Why it matters: Certificate revocation mechanisms (CRL, OCSP) have historically been unreliable in many implementations. If revocation does not work in practice, your ability to respond to a compromised certificate is theoretical.

Checklist area 2: upstream TLS certificates

What it covers: TLS certificates on the upstream services your gateway proxies traffic to.

☐ Gateway validates upstream certificates

Your gateway verifies the TLS certificate of upstream services when forwarding requests. It does not accept self-signed certificates in production without explicit trust store configuration that has been reviewed.

Why it matters: A gateway that does not validate upstream certificates is vulnerable to man-in-the-middle attacks between the gateway and the upstream service. Traffic is encrypted in transit to the client but not protected between the gateway and the backend.

☐ Upstream certificate expiry is monitored

Your monitoring covers upstream service certificates, not just client-facing certificates. Alerts fire at 30 days and 7 days before expiry.

Why it matters: An upstream service's certificate expiring causes 502s or TLS errors for all clients of that proxy. Upstream certificate management is often owned by a different team than the gateway team, creating gaps in monitoring coverage.

☐ Certificate pinning policy is documented

For high-security upstream services (payment processors, identity providers), you either pin the expected certificate fingerprint or explicitly document why certificate validation without pinning is acceptable.

Why it matters: Certificate pinning prevents an attacker who has compromised a certificate authority from issuing a fraudulent certificate for your upstream. It adds operational complexity (pins must be updated when certificates rotate) but is appropriate for the highest-risk upstream relationships.

Checklist area 3: platform database credentials

What it covers: The credentials the Zerq gateway and management services use to connect to MongoDB and Redis.

☐ Database credentials are not in source control

MongoDB connection strings and Redis passwords are not in .env files committed to your code repository, not in Docker Compose files in version control, and not in Kubernetes manifest files without a secrets management layer.

Why it matters: Credentials committed to source control are exposed to everyone with repository access and to any CI/CD system that clones the repository. Source control exposure is one of the most common routes for credential leakage.

Target state: MongoDB URI and Redis credentials are in a secrets manager (Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault) and injected into the runtime environment without appearing in code or config files.

☐ Database credentials are rotated on a schedule

MongoDB and Redis credentials for platform services are rotated at least annually, and immediately on any suspected compromise or team member departure with access to the credentials.

Why it matters: Long-lived static credentials accumulate exposure over time. Every person who has ever had access to the MongoDB URI could potentially still use it if the credential has not been rotated since they had access.

☐ Database credential access is minimal

The MongoDB user account used by the gateway service has only the permissions it needs: read and write on the Zerq database, no administrative permissions, no access to other databases on the same MongoDB instance.

Why it matters: If the gateway's MongoDB credential is compromised, a minimally-privileged credential limits what the attacker can do. An administrative credential gives the attacker full control of the MongoDB instance, including other databases.

Checklist area 4: client credential lifecycle

What it covers: API keys, client secrets, and tokens issued to API consumers through your gateway.

☐ Every credential has an owner and an expiry

Every active client credential can be mapped to: the organisation it was issued to, the individual who requested it, the approval that authorised it, and an expiry date. Credentials with no owner or no expiry are flagged in access reviews.

Why it matters: Credentials with no owner are abandoned credentials. No one will notice if they are used anomalously. No one will revoke them when the associated project ends.

☐ Credential rotation uses a grace period

When a client rotates their credential, you issue the new credential before the old one is revoked. The overlap window allows the client to deploy the new credential without an outage. The old credential is revoked after the overlap window regardless of whether the client has confirmed migration.

Why it matters: Clients may not confirm migration in a timely fashion. A grace period with a hard end date balances operational safety with security: clients get time to migrate, but the old credential is not extended indefinitely.

☐ Compromised credential revocation is tested and fast

You have a documented and tested process for revoking a client credential immediately on compromise notification. "Immediately" means within one hour of notification, not "within the next business day."

Why it matters: The time between credential compromise and revocation is the window of active breach. A revocation process that requires a ticket to be filed and processed during business hours means a weekend compromise has a 60+ hour window.

☐ Access reviews cover all active credentials

Quarterly (or more frequent) access reviews include every active client credential, not just human user accounts. The review asks: is this credential still needed? Is the owning organisation still a partner? Is the scope still appropriate?

Why it matters: Access reviews that cover only human users leave the service account and API key estate unaudited. Most AI agent credentials are service-account-class, not user-class — they will be missed.

Checklist area 5: Vault integration for dynamic secrets

What it covers: Using HashiCorp Vault (or equivalent) to eliminate long-lived static secrets from your platform.

☐ Gateway authenticates to Vault via AppRole or Kubernetes auth

The gateway service authenticates to Vault using a platform-appropriate method (AppRole for VM-based deployments, Kubernetes auth for K8s-based deployments) rather than a static Vault token.

Why it matters: A static Vault token is itself a credential that needs to be managed, rotated, and protected. Vault's auth methods eliminate the need for a static bootstrap credential by using platform identity (pod service account, cloud IAM role) to authenticate.

☐ Upstream credentials are fetched dynamically at startup

Credentials the gateway needs to call upstream services (API keys for third-party services, database credentials) are fetched from Vault at service startup and refreshed before TTL expiry. They are not stored in environment variables or config files.

Why it matters: Dynamic credential fetching means a gateway binary or container image never contains a static credential. If the image is extracted, the credentials in it are either expired or never existed. The blast radius of image extraction is near-zero.

☐ Dynamic secret TTLs are appropriate to the secret type

Database credentials fetched from Vault have a TTL short enough to limit exposure (4-24 hours for most use cases). The gateway has lease renewal logic that refreshes credentials before TTL expiry, not after. Lease renewal failures trigger alerting.

Why it matters: A dynamic credential that is never revoked is functionally equivalent to a static credential. Short TTLs with active renewal create a credential lifecycle that is automatically managed, with revocation on service termination via Vault's lease system.

☐ Vault audit log is monitored

Vault's audit log is enabled and monitored. Every credential fetch, renewal, and revocation is recorded. Anomalous access patterns (unexpected credential fetches from new sources, unusual fetch frequency) trigger alerts.

Why it matters: Vault's audit log is your evidence trail for "who fetched which credential when." Without it, a compromised Vault token is invisible — you know credentials were accessed but not by whom.

Checklist area 6: zero secrets in environment variables

What it covers: Eliminating static secrets from process environment variables in all deployment environments.

☐ Production has no static secrets in environment variables

In your production Kubernetes manifests or Docker Compose files, secrets (connection strings, API keys, passwords) are sourced from Kubernetes Secrets (or equivalent), not from literal values in environment blocks.

Why it matters: Kubernetes Secrets are not encrypted by default (though they can be with KMS envelope encryption), but they are a better practice than literals in manifests: they can be rotated without manifest changes, they have RBAC, and they do not appear in kubectl describe pod output in the same way.

Target state: External Secrets Operator (or equivalent) synchronises secrets from your secrets manager into Kubernetes Secrets, with automatic rotation. No human ever handles the actual secret value.

☐ CI/CD secrets are scoped and short-lived

Secrets used by CI/CD pipelines (deploy credentials, registry tokens) are scoped to the specific job that needs them and expire after the job completes. They are not stored as long-lived CI/CD variables that any pipeline run can access.

☐ Local development uses a local secrets injection method

Developers running the platform locally use a defined secrets injection method (Vault dev mode, .env files explicitly gitignored, or a developer-specific secrets manager entry) rather than checking credentials into .env.local files that might accidentally be committed.

Checklist area 7: rotation and review schedule

The final area is not a technical control — it is the operational discipline that keeps the other six areas current.

Credential type	Rotation trigger	Review frequency
mTLS client certificates	90 days before expiry or immediately on compromise	Quarterly inventory audit
Upstream TLS certificates	Vendor-managed, monitored for expiry	Monthly expiry report
MongoDB/Redis platform credentials	Annual or on team member departure	Semi-annual
Client API keys and secrets	Client-initiated with grace period	Quarterly access review
Vault AppRole role-id/secret-id	Annual or on compromise	Quarterly
CI/CD deploy credentials	Job-scoped, auto-expiry	Quarterly job audit

If any cell in this table does not have a defined process and an owner, that is a gap to close before your next compliance assessment.

This checklist is designed to be run as an annual security review exercise, not as a one-time implementation guide. The threat landscape changes, team members change, and integrations are added and removed. A checklist that was complete last year may have gaps this year.

Zerq's client credential management, per-client audit trail, and MongoDB-backed credential store provide the foundation for implementing this checklist. See Zerq's security architecture or request a demo to review your current credential posture against these controls.