Automating Cloudflare Under Attack Mode for 24/7 resilience

It was the eve of a major campaign. Nothing was stirring—except for the downtime alerts and the concerned furrowing of a brow by the developer on call. Outside, the night was pitch black; inside, the performance dashboard was turning a deep, alarming shade of red.

Connectivity finally dropped out. Response times had reached such lengths that even the most determined visitor would have abandoned ship.

This was the reality facing our long-standing client, Patient Safety Learning, earlier this year. Their renowned community platform, the hub, came under immense pressure from automated bot attacks. Despite a robust server infrastructure, content delivery networks, and well-optimised caches, a firehose of malicious traffic depleted the pool of available connections and ground the site to a halt.

Reviewing the logs, we could see thousands of entries attempting to find a viable entry point, rotating through a lengthy dictionary of known vulnerabilities at a rapid pace. While the system was hardened and secure against the breach itself, the relentless pinging from distributed connections plagued the site. We watched in real-time as traffic spiked and server resources vanished.

The challenge of fighting back

Managing these attacks within your own infrastructure is increasingly difficult. While you can block traffic sources at a firewall level, every request must still be analysed and processed. It is all too easy for a system to be overwhelmed by sheer volume.

Then there is the human cost. Even the most ardent developer needs to sleep, eat, and disconnect. Given that the source of many bots lies outside of Western timezones, alerts often trigger in the dead of night. For many organisations, maintaining a 24/7 response team simply isn’t financially viable—especially given the unpredictable nature of these spikes.

Our initial move was to deploy the leading service for protection against large-scale bot attacks: Cloudflare. We recommended Patient Safety Learning set up a Web Application Firewall (WAF) and, over the following weeks, we adjusted settings to minimise disruption to legitimate users while blocking malicious traffic.

This was largely successful, yet the issue of out-of-hours downtime remained.

The dilemma of ‘Under Attack mode’

Cloudflare operates at varying levels of stringency. By default, you want to block known bad actors without imposing barriers on genuine users or ‘good’ bots, such as search engines.

When an attack ramps up, Cloudflare offers "Under Attack Mode." This introduces additional protections, such as human verification checks (CAPTCHAs) and heightened scrutiny of request metadata. It keeps the lights on, but at the expense of user experience.

This created a dilemma for out-of-hours attacks:

Option A: Leave the site to fend for itself until the team comes back online (risking downtime).
Option B: Leave restrictive measures enabled permanently or for longer than necessary (risking user frustration).

Engineering a smarter solution

We felt neither option was good enough. At Brew Digital, we don’t just accept platform limitations; we look for engineering solutions.

We developed a lightweight utility installed on our client’s servers that bridges the gap between the server status and Cloudflare’s protections. Here is how it works:

Intelligent Monitoring: The tool monitors CPU, memory, and network activity against configured thresholds to minimise false positives.
Automated Trigger: When it senses the server is under too much pressure, it automatically pings the Cloudflare API to enable Under Attack Mode.
Smart Deactivation: Crucially, it detects when the server has cooled down. Once metrics return to tolerated parameters for a succession of checks—indicating the attack has halted—it resets Cloudflare to its previous state.
Zero Churn: We programmed logic to avoid "flapping" (switching on and off rapidly), ensuring stability.

We can see the activity through Slack webhook integration, which shows when thresholds are breached and when the attack is over. Whether the spike lasts 10 minutes or 10 hours, the system reacts immediately without human intervention.

Building true resilience

A core pillar of our services is resilience. When the lights are going out elsewhere, we believe our customers’ digital presence should continue to shine brightly. This is why we maintain high standards through our ISO 27001 certification, and why our customers receive best-in-class levels of monitoring and backup redundancy. It is also why we don’t settle for “out of the box” solutions if they don’t fully solve the problem—even with a platform as comprehensive as Cloudflare. By automating this defence mechanism, we used technology to protect Patient Safety Learning’s mission. This affords them, and us, total peace of mind, and exemplifies our drive to use technology for the betterment of our customers and their missions.

Automating Cloudflare Under Attack mode for round-the-clock resilience

The challenge of fighting back

The dilemma of ‘Under Attack mode’

Engineering a smarter solution

Building true resilience

Read More...

Let's be friends with benefits!

Latest Episodes

How to choose a digital agency that actually grows your business

How to protect your marketing strategy when search traffic drops

Stop buying boosts like a lazy marketer

Why your B2B attribution model is probably flawed

The challenge of fighting back

The dilemma of ‘Under Attack mode’

Engineering a smarter solution

Building true resilience

Read More...

Let's be friends with benefits!