How the 2024 CrowdStrike outage revealed glaring gaps in risk and incident management

Do not index

The CrowdStrike outage of 2024 brought businesses around the world to a sudden stark reality. They were woefully unprepared in the event their systems were to be disabled, and underscored the crucial importance of proper risk and incident management processes. This blog will explore how the outage occurred, as well as the lessons that can be learned by any business from this event.

Understanding the Outage

On the 19th July 2024, the American cyber security company CrowdStrike pushed an update to their security software that caused major disruption to IT systems around the world. The flawed update led to machines entering ‘boot loops’, and displaying the infamous ‘Blue Screen of Death’, leaving them essentially inoperable. This issue primarily affected Microsoft Windows environments, causing system failures in data centers, airports, hospitals, financial institutions, and government agencies.

While CrowdStrike quickly identified and halted the faulty update, the damage was already widespread, affecting around 8.5 million devices globally. Furthermore, businesses that relied solely on CrowdStrike for endpoint security suddenly found themselves in a vulnerable position, with no remediation options available to them.

The Widespread Disruptions in Cybersecurity

The fallout from the CrowdStrike outage was significant, highlighting how a single faulty update in a widely used piece of software can have devastating implications.

1. Aviation and Transportation

Flights were grounded, with major airlines experiencing check-in system failures.

Airports struggled with IT infrastructure issues, leading to cancellations and delays.

2. Financial Institutions

Banks and stock exchanges reported outages due to endpoint protection failures.

ATMs and online banking services experienced disruptions, frustrating customers.

3. Healthcare Services

Hospitals and clinics lost access to patient records, delaying urgent care.

Medical devices and systems relying on Windows were impacted, causing operational slowdowns.

4. Cloud and Enterprise IT

Data centers dependent on CrowdStrike’s Falcon platform suffered downtime, affecting hosted services.

Businesses relying on a single security vendor lacked backup response plans, leading to prolonged outages.

What Compliance Frameworks Say About Risk Management & Incident Response

Aside from businesses struggling to continue critical operations with inoperable devices, the CrowdStrike outage also raised critical questions regarding risk management and business continuity planning. Businesses suddenly became VERY aware that they lacked a comprehensive plan in the event their systems were no longer accessible.

As it happens, risk management and business continuity planning are two key areas covered by cybersecurity compliance frameworks like ISO 27001, SOC 2, and NIST CSF, all of which are commonly implemented and followed by business worldwide.

ISO 27001: Risk Management & Business Continuity

ISO 27001, a globally recognised information security standard, requires businesses to:

Assess and manage third-party risks

Develop robust incident response plans

Implement business continuity measures to minimise downtime

In the case of CrowdStrike, organisations dependent on a single security vendor may not have adequately followed ISO 27001’s vendor risk assessment guidelines, leaving them unprepared for widespread failures.

SOC 2: Availability & Incident Response

SOC 2 compliance focuses on the availability, integrity, and security of systems. The outage highlighted the need for:

Redundancy and failover strategies to maintain service availability.

Incident response planning to quickly mitigate vendor-related risks.

Monitoring vendor dependencies to identify potential security gaps before they cause disruptions.

For organisations with SOC 2 compliance in place, having a multi-layered security strategy could have reduced their exposure to this outage.

NIST CSF: Third-Party Risk & Resilience

The NIST Cybersecurity Framework (CSF) provides clear guidelines for third-party risk management. It recommends:

Continuous monitoring of security vendors to detect and mitigate risks early.

Incident recovery planning to ensure rapid response and system restoration.

Testing backup and disaster recovery solutions to minimise downtime.

Businesses that followed NIST’s guidance on supply chain security would have had better failover mechanisms in place, reducing the impact of the outage.

While it’s easy to point at a potential lack of preparation for businesses that comply with theses standards, it’s also important to mention the extraordinary circumstances that the CrowdStrike outage caused. That being said, businesses should still have processes and procedures in place to counter any potential risk, however unlikely or ridiculous.

Lessons Learned – How Businesses Can Prepare for Similar Incidents

As mentioned consistently throughout this article, the CrowdStrike outage highlighted some fundamental failures within businesses of all sizes and markets, and underscores the importance of comprehensive risk assessment and incident management strategies.

Let’s examine some measures that businesses can implement to better prepare for similar circumstances in the future:

1. Diversify Security Vendors & Tools

Avoid relying solely on a single cyber security provider

Implement redundant security solutions in the event a widely used tool is integrated into the business.

2. Conduct Regular Vendor Risk Assessments

Periodically audit third-party security providers to ensure they follow best practices.

Establish contingency plans for vendor failures and software issues, such as creating regular backups, in the event a similar incident occurs.

3. Strengthen Incident Response & Business Continuity Planning

Develop detailed incident response playbooks for service disruptions, and make sure employees are aware of their responsibilities.

Test disaster recovery and failover strategies to ensure rapid system restoration.

Conclusion – The Importance of Proactive Risk Management

The CrowdStrike outage of 2024 was a wake-up call for businesses worldwide. It demonstrated that even leading security providers can introduce risks, and that businesses must be prepared for vendor-related failures. Even organisations that comply with standards such as ISO 27001, SOC 2, and NIST CSF, were still caught unawares and struggled to manage the incident.

Going forward, businesses should ensure that they have comprehensive risk and incident management processes in place, even for the most unlikeliest of circumstances. Compliance with these standards isn’t just about passing audits and showing off a certification, it’s about ensuring and maintaining long term security, effectively managing risks, and being better equipped to handle future disruptions.