CrowdStrike releases preliminary Post Incident Review into global Microsoft outage

CrowdStrike releases preliminary Post Incident Review into global Microsoft outage

Review identifies a defect in the Rapid Response Content that went undetected during validation checks.

A preliminary Post Incident Review (PIR) by CrowdStrike into the global Microsoft outage has identified a defect in the Rapid Response Content – which went undetected during validation checks – as the cause.

CrowdStrike is now adopting enhanced software testing procedures and independent reviews of end-to-end quality processes from development through deployment to prevent such an outage happening again.

Other initiatives identified in the PIR include:

  • Improved Rapid Response Content testing
  • Introduction of additional validation checks in the Content Validator
  • Enhanced Resilience and Recoverability
  • Strengthening error handling mechanisms in the Falcon sensor
  • Adoption of a staggered deployment strategy
  • Enhanced monitoring of sensor and system performance during the staggered content deployment
  • Providing customers with greater control over the delivery of Rapid Response Content updates
  • Providing notifications of content updates and timing
  • Conducting multiple independent third-party security code reviews

According to the PIR, a content configuration update impacted the Falcon Sensor and the Windows Operating System (BSOD)

“By regularly updating, security products can quickly adapt to emerging threats, ensuring robust protection for users and their systems,” the report says.

Outlining the sequence of events, the PIR confirms that on July 19, 2024, at 04:09 UTC, a Rapid Response Content update for the Falcon Sensor was published to Windows hosts running sensor version 7.11 and above.

Such updates are a regular part of the dynamic protection mechanisms of the Falcon platform.

On July 19, the update was to gather telemetry on new threat techniques observed by CrowdStrike, but triggered crashes (BSOD) on systems that were online between 04:09 and 05:27 UTC.

The problematic Rapid Response Content configuration update resulted in a Windows system crash.

Mac and Linux hosts were not impacted. Windows hosts that were not online, or did not connect during this period, were not impacted.

The PIR confirms the crashes as due to a defect in the Rapid Response Content, which went undetected during validation checks. When the content was loaded by the Falcon Sensor, this caused an out-of- bounds memory read – leading to Windows crashes.

George Kurtz, CrowdStrike Founder and CEO, again offered apologies for the outage saying all of CrowdStrike understood the gravity and impact of the situation.

“We quickly identified the issue and deployed a fix, allowing us to focus diligently on restoring customer systems as our highest priority.”

CrowdStrike, said Kurtz, was operating normally and the issue does not affect the Falcon platform systems.

Nor, he said, is there impact to any protection if the Falcon Sensor is installed with Falcon Complete and Falcon OverWatch services not disrupted.

“We are working closely with impacted customers and partners to ensure that all systems are restored,” Kurtz said.

Warnings against ‘bad actors’ exploiting the outage remain in place.



Click below to share this article

Browse our latest issue

Intelligent CIO Middle East

View Magazine Archive