When the Internet Blinks: What Cloudflare’s Outage Teaches Us About Standing Privileges

Gabriel Avner

November 20, 2025

When the Internet Blinks: What Cloudflare’s Outage Teaches Us About Standing Privileges post thumbnail

If you were online yesterday, you probably noticed that a surprising amount of the internet simply wasn’t there. Uber, X, Canva, ChatGPT, and dozens of others all began returning internal server errors. For a few hours, it looked like the web had taken the afternoon off.

As usual, the immediate assumption was that someone must be attacking the internet. Even Cloudflare initially suspected a large-scale DDoS event. When many unrelated services break at once, it often signals malicious activity.

But not this time. In this case, the outage came from inside Cloudflare’s own infrastructure.

What Actually Happened

Cloudflare later released a detailed postmortem, but here’s the high-level version:

A permissions change was made to an internal Cloudflare database system.
That change altered how a recurring query generated a Bot-Management “feature file.”
The database began outputting multiple unintended entries.
The file grew to nearly double its intended size and exceeded a built-in limit.
The oversized file propagated across Cloudflare’s network.
Systems that loaded the file began failing immediately.
Some DB nodes produced valid files while others produced invalid ones, causing the environment to oscillate before eventually locking into the failed state everywhere.

Restoring service meant halting file generation, deploying a known-good version, and restarting global proxy services.

No attacker. No malware. Just a permissions change that rippled far further than expected.

What This Incident Reveals About Access Controls

It’s easy to treat IAM issues as problems that only matter when attackers are involved. We usually picture overly broad S3 bucket policies, stale permissions, exposed API keys, or wildcard roles.

Cloudflare’s outage shows something different: standing privileges can cause operational harm even when everything is functioning “as designed.”

Somewhere in the workflow, a human or service account had the ability to modify an internal system in a way that impacted global production. That doesn’t automatically mean the privilege was wrong, but it does raise the question: was this level of access truly necessary for the task being performed?

This isn’t just about human users. Non-human identities such as service accounts, automation pipelines, and long-lived API keys often hold broad and permanent permissions. They interact with highly sensitive infrastructure and rarely get the same level of scrutiny as user accounts.

The takeaway is simple. Your privilege environment is more sensitive than you think, and the more permanent access you leave in place, the more fragile your systems become.

This leads naturally into the approach that exists specifically to prevent situations like this.

Reducing Risk with a Zero Standing Privileges Approach

Zero Standing Privileges (ZSP) is built on a straightforward idea: no identity should carry broad, always-on access to sensitive systems. Instead:

Access is granted Just-in-Time
Permissions are scoped to only what’s needed
Access expires automatically
Each elevation has context and an audit trail

Most teams view ZSP primarily as a defense against attackers. And that’s true. The 2025 Verizon DBIR highlighted that even as malicious actions like social engineering remain dominant access vectors, and stolen credentials and API keys are still everywhere, misconfiguration continues to be a top error category.

Many operational incidents, including Cloudflare’s, stem from misconfiguration or from overbroad access that enables a single change to have a global impact. If privileges had been granted only when needed, and only at the right scope, the chances of this issue occurring—or spreading—would have been reduced significantly.

ZSP gives teams a safety net. It limits how much damage can be done by accident, not just by attackers.

Apono’s Role in Strengthening Privilege Boundaries

Cloudflare’s outage shows that privilege design affects reliability just as much as security. A permissions change shouldn’t have the power to ripple across global infrastructure. The best way to prevent that is by ensuring identities only get the access they need, only when they need it, and only for as long as required.

Apono helps organizations put these guardrails in place in a way that works across cloud environments, databases, Kubernetes, and the many non-human identities that keep modern infrastructure running. Apono enables teams to:

Replace long-lived access with temporary, on-demand privileges so sensitive rights aren’t left active longer than necessary.
Enforce fine-grained controls that determine exactly what an identity can modify rather than just whether it can access a resource.
Streamline Just-in-Time elevation with context, approvals, and automatic expiration.
Centralize logging and visibility so privilege changes are easy to trace during investigations or audits.
Apply the same guardrails to non-human identities by rightsizing service accounts, API keys, and automation tokens.

The result is a privilege model that reduces the risk of breaches and also lowers the likelihood that a routine change will cascade into a widespread outage. It’s a practical way to make modern infrastructure safer and more resilient without slowing teams down.

Download the Standing Privilege Risk Checklist

Get a quick snapshot of your access risk with a checklist built for modern cloud teams. It’s an easy way to validate whether your privilege model is aligned with Zero Standing Privileges and identify the blind spots that matter most.