‘Worst over’ as worldwide Windows outage winds down

The worst seems over as the outage disrupting Microsoft Windows worldwide contracts – with attention now focused on clearing backlogs, getting systems back up and identifying scammers.

Microsoft was hit by a faulty software update from the cybersecurity specialist CrowdStrike on Thursday, July 18.

According to Microsoft, the preliminary root cause appeared to be a configuration change in a portion of Azure backend workloads.

This caused interruption between storage and compute resources which resulted in connectivity failures that affected downstream Microsoft 365 services dependent on these connections.

CrowdStrike has reported BSOD being observed in multiple locations and say the cause is currently under investigation. The company has said the issue is related to its Falcon Sensor product, and engineering teams are working to resolve the issue, according to a support notice.

CrowdStrike founder and CEO George Kurtz has ruled out a cyberattack as the cause of the outages.

A round up of the tech sector response so far:

This was a developing story subject to updates.

DAY 2 – With the worst over, CrowdStrike and Microsoft issue statements

CrowdStrike has published a detailed statement on its company blog outlining the events of July 19th and its analysis of what went wrong.

It reads: “On July 19, 2024 at 04:09 UTC, as part of ongoing operations, CrowdStrike released a sensor configuration update to Windows systems. Sensor configuration updates are an ongoing part of the protection mechanisms of the Falcon platform. This configuration update triggered a logic error resulting in a system crash and blue screen (BSOD) on impacted systems.

The sensor configuration update that caused the system crash was remediated on Friday, July 19, 2024 05:27 UTC.

This issue is not the result of or related to a cyberattack.

Impact

Customers running Falcon sensor for Windows version 7.11 and above, that were online between Friday, July 19, 2024 04:09 UTC and Friday, July 19, 2024 05:27 UTC, may be impacted.

Systems running Falcon sensor for Windows 7.11 and above that downloaded the updated configuration from 04:09 UTC to 05:27 UTC – were susceptible to a system crash.

Configuration File Primer

The configuration files mentioned above are referred to as ‘Channel files’ and are part of the behavioral protection mechanisms used by the Falcon sensor. Updates to Channel Files are a normal part of the sensor’s operation and occur several times a day in response to novel tactics, techniques, and procedures discovered by CrowdStrike. This is not a new process; the architecture has been in place since Falcon’s inception.

Technical Details

On Windows systems, Channel Files reside in the following directory:

C:\Windows\System32\drivers\CrowdStrike\

and have a file name that starts with “C-”. Each channel file is assigned a number as a unique identifier. The impacted Channel File in this event is 291 and will have a filename that starts with “C-00000291-” and ends with a .sys extension. Although Channel Files end with the SYS extension, they are not kernel drivers.

Channel File 291 controls how Falcon evaluates named pipe¹ execution on Windows systems. Named pipes are used for normal, interprocess or intersystem communication in Windows.

The update that occurred at 04:09 UTC was designed to target newly observed, malicious named pipes being used by common C2 frameworks in cyberattacks. The configuration update triggered a logic error that resulted in an operating system crash.

Channel File 291

CrowdStrike has corrected the logic error by updating the content in Channel File 291. No additional changes to Channel File 291 beyond the updated logic will be deployed. Falcon is still evaluating and protecting against the abuse of named pipes.

This is not related to null bytes contained within Channel File 291 or any other Channel File.

Remediation

The most up-to-date remediation recommendations and information can be found on our blog or in the support portal.

We understand that some customers may have specific support needs and we ask them to contact us directly.

Systems that are not currently impacted will continue to operate as expected, continue to provide protection, and have no risk of experiencing this event in the future.

Systems running Linux or macOS do not use Channel File 291 and were not impacted.

Root Cause Analysis

We understand how this issue occurred and we are doing a thorough root cause analysis to determine how this logic flaw occurred. This effort will be ongoing. We are committed to identifying any foundational or workflow improvements that we can make to strengthen our process. We will update our findings in the root cause analysis as the investigation progresses.

David Weston, Vice President, Enterprise and OS Security, Microsoft, has released a statement outlining and updating recovery work the company has been undertaking over the outage.

It reads: “On July 18, CrowdStrike, an independent cybersecurity company, released a software update that began impacting IT systems globally. Although this was not a Microsoft incident, given it impacts our ecosystem, we want to provide an update on the steps we’ve taken with CrowdStrike and others to remediate and support our customers.

Since this event began, we’ve maintained ongoing communication with our customers, CrowdStrike and external developers to collect information and expedite solutions. We recognize the disruption this problem has caused for businesses and in the daily routines of many individuals. Our focus is providing customers with technical guidance and support to safely bring disrupted systems back online.

Steps taken have included:

Engaging with CrowdStrike to automate their work on developing a solution. CrowdStrike has recommended a workaround to address this issue and has also issued a public statement. Instructions to remedy the situation on Windows endpoints were posted on the Windows Message Center.
Deploying hundreds of Microsoft engineers and experts to work directly with customers to restore services.
Collaborating with other cloud providers and stakeholders, including Google Cloud Platform (GCP) and Amazon Web Services (AWS), to share awareness on the state of impact we are each seeing across the industry and inform ongoing conversations with CrowdStrike and customers.
Quickly posting manual remediation documentation and scripts..
Keeping customers informed of the latest status on the incident through the Azure Status Dashboard.

We’re working around the clock and providing ongoing updates and support. Additionally, CrowdStrike has helped us develop a scalable solution that will help Microsoft’s Azure infrastructure accelerate a fix for CrowdStrike’s faulty update. We have also worked with both AWS and GCP to collaborate on the most effective approaches.

While software updates may occasionally cause disturbances, significant incidents like the CrowdStrike event are infrequent. We currently estimate that CrowdStrike’s update affected 8.5 million Windows devices, or less than one percent of all Windows machines. While the percentage was small, the broad economic and societal impacts reflect the use of CrowdStrike by enterprises that run many critical services.

This incident demonstrates the interconnected nature of our broad ecosystem — global cloud providers, software platforms, security vendors and other software vendors, and customers. It’s also a reminder of how important it is for all of us across the tech ecosystem to prioritize operating with safe deployment and disaster recovery using the mechanisms that exist. As we’ve seen over the last two days, we learn, recover and move forward most effectively when we collaborate and work together. We appreciate the cooperation and collaboration of our entire sector, and we will continue to update with learnings and next steps.”

CrowdStrike Intelligence has confirmed receiving reports of malicious activity leveraging the outage event as a lure theme with threat actors are conducting the following activity:

Sending phishing emails posing as CrowdStrike support to customers
Impersonating CrowdStrike staff in phone calls
Posing as independent researchers, claiming to have evidence the technical issue is linked to a cyberattack and offering remediation insights
Selling scripts purporting to automate recovery from the content update issue

The company has released a list of domains identified on July 19, 2024, that impersonate CrowdStrike’s brand – with the proviso that some domains in the list are not currently serving malicious content or could be intended to amplify negative sentiment.

However, these sites may, the company warns, support future social-engineering operations.

The domains are:

crowdstrike.phpartners[.]org
crowdstrike0day[.]com
crowdstrikebluescreen[.]com
crowdstrike-bsod[.]com
crowdstrikeupdate[.]com
crowdstrikebsod[.]com
www.crowdstrike0day[.]com
www.fix-crowdstrike-bsod[.]com
crowdstrikeoutage[.]info
www.microsoftcrowdstrike[.]com
crowdstrikeodayl[.]com
crowdstrike[.]buzz
www.crowdstriketoken[.]com
www.crowdstrikefix[.]com
fix-crowdstrike-apocalypse[.]com
microsoftcrowdstrike[.]com
crowdstrikedoomsday[.]com
crowdstrikedown[.]com
whatiscrowdstrike[.]com
crowdstrike-helpdesk[.]com
crowdstrikefix[.]com
fix-crowdstrike-bsod[.]com
crowdstrikedown[.]site
crowdstuck[.]org
crowdfalcon-immed-update[.]com
crowdstriketoken[.]com
crowdstrikeclaim[.]com
crowdstrikeblueteam[.]com
crowdstrikefix[.]zip
crowdstrikereport[.]com

CrowdStrike Intelligence recommends that organizations ensure they are communicating with CrowdStrike representatives through official channels and they adhere to technical guidance the CrowdStrike support teams have provided.

Scammers are attempting to use the outage to steal from SMEs with offers of fake fixes, the Australian government has warned.

Home affairs minister, Clare O’Neil, said government agencies were picking up on attempts to conduct phishing through the outage – with SMEs in particular receiving emails purportedly from CrowdStrike or Microsoft seeking bank details to access a reboot to fix the error.

O’Neil said: “I ask Australians to be really cautious over the next few days about attempts to use this for scamming or phishing. If you see an email, if you see a text message that looks a little bit funny, that indicates something about CrowdStrike or IT outages, just stop. Don’t put any details.”

She said if people receive calls along those lines they should hang up and if people do hand over their banking information then to contact their bank immediately to report it.

CrowdStrike has warned of an eCrime actor targeting LATAM based customers with a malicious ZIP archive in the wake of the outage.

On its company blog, CrowdStrike wrote: “CrowdStrike Intelligence has since observed threat actors leveraging the event to distribute a malicious ZIP archive named crowdstrike-hotfix.zip. The ZIP archive contains a HijackLoader payload that, when executed, loads RemCos. Notably, Spanish filenames and instructions within the ZIP archive indicate this campaign is likely targeting Latin America-based (LATAM) CrowdStrike customers.”

CrowdStrike recommends that: “Organizations ensure they are communicating with CrowdStrike representatives through official channels and adhere to technical guidance the CrowdStrike support teams have provided”.

George Kurtz, CrowdStrike founder and chief executive, previously warned of “bad actors” exploiting the outage event, saying: “We know that adversaries and bad actors will try to exploit events like this. I encourage everyone to remain vigilant and ensure that you’re engaging with official CrowdStrike representatives. Our blog and technical support will continue to be the official channels for the latest updates.”

CrowdStrike’s stock shows signs of rallying after a fall of nearly 15% over the outage.

CrowdStrike stock that started at $345 per share Thursday (July 18) plunged to $294 per share by Friday morning.

From there, however, the price began slowly rebounding to trade at around $312 per share as of
late morning Friday – coinciding with the company’s announcement that the ‘issue’ causing the outage had been identified and isolated with a fix deployed.

Prof Ciaran Martin, former chief executive of the UK National Cyber Security Centre urges governments and the tech sector to ‘get together’ on designing out system flaws to lessen the chances of another global outage on the scale of CrowdStrike/Microsoft.

But much now depended on regulation decided in the USA, he said, with world have to ‘learn to cope’ with future such outages.

Speaking to Sky News, Prof Martin said he was confident the worst of the CrowdStrike/Microsoft outage was over – with the focus now on getting systems back up again.

Prof Martin said: “The worst of this is over because the nature of the crisis was such that it went very badly wrong, very quickly. It was spotted quite quickly, and essentially, it was turned off.”

“Until governments and the industry get together and work out how to design out some of these flaws, I’m afraid we are likely to see more of these again.

Within countries like the UK and elsewhere in Europe, you can try and build up that national resilience to cope with this. But ultimately, a lot of this is going to be determined in the US.

If there’s going to be regulation to try and iron out these flaws, it’ll probably have to come from the US and there’s not a great deal that we can do about that.

So unless and until the structure of the way we do tech changes, we’re going to have to learn to cope with these things, rather than eliminate them.”

DAY ONE

Mark Jow, Security Evangelist EMEA, Gigamon:

“This Microsoft IT outage demonstrates the need for more robust and resilient solutions so that when these issues do arise, they can be resolved quickly without causing such widespread customer chaos and security risk. Preparedness is key – every IT and security vendor must have a robust system in place across its software development lifecycle to test upgrades before they are rolled out to ensure that there are no security flaws within the updates.”

Alexey Lukatsky, Managing Director, Cybersecurity Business Consultant, Positive Technologies:

“This case reminds us of the importance of secure development, since in this case it was most likely the lack of update checking both on the side of the manufacturer – CrowdStrike – and on the side of consumers who automatically installed all the updates that reached them and led to a massive global outage around the globe. With the exception of those countries that are not using infosec products from this American corporation.

In addition, this story shows us how firmly information technologies have become embedded in people’s lives and in various business processes, and how catastrophic the consequences of an accidental or unauthorized, malicious impact on the IT infrastructure can be. That is, in other words, businesses are faced with the task of assessing those non-tolerable events with catastrophic consequences that can occur in their activities due to the impact on the IT infrastructure.

And this is not the only case of a similar scale. There have already been cases of this kind. For example, related to the McAfee antivirus update in 2010. A similar problem occurred with updates to the Windows operating system itself as well as its Microsoft Defender protections, which resulted in the inability to perform normal functions for users. Therefore, this problem is of a general nature, it is not connected with the country of origin of this or that software and simply raises once again the question of how much the influence of the IT infrastructure on business can lead to the implementation of certain non-tolerable events.

At the moment, the root cause, based on the scale of the disaster, the way the incident manifested itself, appears to be failure to follow safe development practices. But there is a version that cannot be ruled out: it has not yet found any confirmation, but we, as experts in the field of cybersecurity, cannot completely deny it. This is the intrusion of attackers into the software development process at CrowdStrike, which could have led to the introduction of malicious functionality into the next update – which ultimately led to this kind of massive failure.

Everyone remembers the story with SolarWinds, also an American company, which suffered from such an incident a couple of years ago when attackers penetrated the development process and introduced malicious functionality into an update that was rolled out to the computers of almost 20 thousand SolarWinds customers.

The only thing that can suggest that these are unlikely to be malicious actions of cybercriminals who have intruded into the development process is that usually in these types of stories the task of cybercriminals is to remain undetected for as long as possible. In order to be able to penetrate the networks of companies in which software products with malicious loads are installed.

In this case, the update almost instantly led to computer inoperability, which is often not the goal of most APT-groups, whose task is not to disable systems, but to obtain either data that can then be sold, or blackmail the victim’s company, or perform some kind of other functions related to cyber espionage.”

Darren Anstee, Chief Technology Officer for Security, NETSCOUT:

“The worldwide IT outage currently affecting airlines, media, banks and much more appears to have been caused by a faulty software update which was automatically applied, and not a cyberattack. This is another demonstration of how dependent we are on both our IT infrastructure, and the supply chains that deliver tightly integrated capabilities within it.

“There will undoubtedly be a huge fall out from this, with a lot of questions set to be raised around how to balance the need for regular security updates for defence, compliance etc, with the risk of applying unqualified updates to systems. Most enterprise software goes through testing and controlled roll-out before it is pushed to a whole population, but this doesn’t seem to be the case in this instance.”

This critical event serves as a wake-up call for businesses globally to reassess their IT infrastructure and the processes they have in place for software updates and security measures.

Speaking on NBC’s Today Show in the US, Crowdstrike founder and CEO George Kurtz ruled out a cyber-attack as being behind the catastrophic ongoing outages.

Kurtz confirmed the outages as down to a bug in a single update.

“We identified this very quickly and remediated the issue.”

Kurtz also said there had been a “negative interaction” between the update and Microsoft’s operating system, which had then caused computers to crash.

Asked how one faulty update could cause such global chaos, Kurtz said: “We have to go back and see what happened here, our systems are always looking for the latest attacks from adversaries that that are out there.”

Although maintaining the problem had been identified and a fix issued, Kurtz warned it could be some time for some systems to return to normal – stressing that they would not “just automatically recover”.

Tim Grieveson, Senior Vice President and Global Cyber Risk Advisor, Bitsight

“Today’s global outages are the result of a software update from CrowdStrike’s Falcon Identity Threat Protection. It may have been prevented by following Information Technology Infrastructure Library (ITIL) fundamentals including processes around change management, incident management, release management and regression testing. Additionally, a Software Bill of Materials (SBOM) is crucial to understand all the components, functional testing needs to be performed before release, and a code review must be undertaken to ensure all code operates as it should before deployment.

The widespread disruption proves the importance of the supply chain in delivering an organisation’s critical services, as well as the impact of not properly assessing supply chain risk or understanding a vendor’s value to your organisation. Today’s events highlight the importance of incident reporting and handling. Having processes in place for misconfiguration, assessment and roll back is vital, as is the proper testing of business resilience and business continuity scenarios to help prepare for such an incident.

Crowdstrike must now consider its communication to restore reputation and maintain consumers’ confidence in services, software and capabilities being delivered by a third party.”

George Kurtz, CrowdStrike President, has said on X: “CrowdStrike is actively working with customers impacted by a defect found in a single content update for Windows hosts. Mac and Linux hosts are not impacted. This is not a security incident or cyberattack. The issue has been identified, isolated and a fix has been deployed.

“We refer customers to the support portal for the latest updates and will continue to provide complete and continuous updates on our website. We further recommend organizations ensure they’re communicating with CrowdStrike representatives through official channels. Our team is fully mobilized to ensure the security and stability of CrowdStrike customers.”

Al Lakhani, CEO, IDEE, says the outages reinforce a need for the tech sector to prioritise agentless solutions.

“This incident underscores the importance of businesses thoroughly researching and vetting their cybersecurity solutions before implementation.

“(An) approach which relies on a single agent focused on detection, might seem good at first glance, but as we can see, it can create significant issues. For instance, agents require installation and maintenance of software on multiple different OSes, adding layers of complexity and potential points of failure. Moreover, agents can become a single point of failure, as a bad update can compromise the entire network.

“The lesson here is blindingly obvious: investing in cybersecurity is not just about acquiring the latest or most popular tools but ensuring those tools are reliable and resilient.”

Omer Grossman, CIO, CyberArk has responded to the outages saying:

“The current event appears – even in July – that it will be one of the most significant cyber issues of 2024. The damage to business processes at the global level is dramatic. The glitch is due to a software update of CrowdStrike’s EDR product. This is a product that runs with high privileges that protects endpoints. A malfunction in this can, as we are seeing in the current incident, cause the operating system to crash.

There are two main issues on the agenda: The first is how customers get back online and regain continuity of business processes. It turns out that because the endpoints have crashed – the Blue Screen of Death – they cannot be updated remotely and this problem must be solved manually, endpoint by endpoint. This is expected to be a process that will take days.

The second is around what caused the malfunction. The range of possibilities ranges from human error to the complex and intriguing scenario of a deep cyberattack, prepared ahead of time and involving an attacker activating a “doomsday command” or “kill switch”. CrowdStrike’s analysis and updates in the coming days will be of the utmost interest.”

Alois Reitbauer, Chief AI Strategist, Dynatrace:

“Given the increasing complexity of software, all software developers and organizations are susceptible to outages. When outages do occur, organizations need the capability to pinpoint root cause and remediate immediately. AI-driven approaches have become essential for complex IT operations to deploy as manual processes cannot keep up. A power of 3 approach to AI leveraging predictive, causal and generative AI is increasingly critical to help organizations deliver the highest availability and performance of software as well as minimize disruption to end user experience.”

John Atkinson, Director Solutions Engineering, EMEA, Riverbed:

“When technology works as it should, it’s the route to more efficient operations and a great digital experience for employees and customers. When it doesn’t, the effects can be painful for all. Today’s global IT outage has shown first-hand the consequences – from grounding planes and shaking up stock exchanges to retailers unable to take payments. It’s more important than ever that IT teams have an overview of all operations and can get ahead of any issues like this.

“Operating with such complex IT estates, it’s now too difficult and stressful for IT teams to keep operations running smoothly without additional tools. With networks’ intense interconnectedness, teams need observability so they can better understand topology and see when a seemingly insignificant issue could lead to a flown blown outage. This paired with AIOps enables the automation of network monitoring and issue remediation which troubleshoots problems before IT teams are even aware of them – ensuring no drop in employee or customer digital experience.”

Keiron Holyome, VP UK & Emerging Markets, Blackberry Cybersecurity:

“Given this outage is impacting some of the most critical systems, networks and applications in the world, the response must be met with speed, accuracy, and accountability. Here, a critical event management (CEM) solution can provide real-time visibility to ensure a quick and informed response as the crisis evolves. It is too early to say the exact root cause, however, this is likely another example of legacy cybersecurity practices in play, with complex EDR and heavy endpoint agents a major infrastructure risk and unnecessarily complex. Using a lightweight AI on the endpoint can avoid these types of outages, as it protects your environment without heavy agents and regular updates that put your operations at risk.”

“More broadly, today’s global IT outage serves as a stark reminder that the best defence is a good offence. Understanding your vulnerabilities and risks through regular testing is paramount, not only when deploying new software but consistently over time. To protect against potential threat actors who seek to take advantage of IT outages, a combination of AI-enabled internal and external penetration testing assessments remains vital. These reveal how an outside threat actor with authorised access, or one starting from within the internal network, could compromise assets through ever-evolving tactics, techniques and procedures. The performance and security of your systems is only as good as its least secure hardware and software components, so blind spots need to be addressed as a priority to keep companies operating as usual.”

Ilkka Turunen, Field CTO, Sonatype: “The widespread outages across the world affecting Microsoft Windows are due to a botched update to a piece of Crowdstrike software.

In terms of technical details, the update causes a BSOD loop on any Windows machine essentially making it boot and crash on an infinite loop. Making it worse is the fact that there are a significant number of Windows machines that the update was auto installed on overnight. There are workarounds that customers of theirs will apply, but it seems to be very manual.

It’s definitely a supply chain style incident – what it shows is that one popular vendor botching an update can have a huge impact on its customers and how far a single well-orchestrated update can spread in a single night. It’s not yet clear if the contents were due to malicious reasons – but it shows how quickly targeted attacks on popular vendors could spread.”

Mark Grindey, CEO, Zeus Cloud: “It’s clear that adequate testing for updates should be done in a safe environment before issuing them company-wide. Companies should never have auto-updates set in a live environment and always test an update in a safe environment before releasing it live to minimise potential risks. This global outage highlights the need for businesses to not blindly trust their suppliers when it comes to updates before testing first.

The only fix now is to reboot in safe mode and remove the erroneous file; unfortunately, this can’t be done remotely. It could so easily have been a security incident or cyber-attack and this manual intervention required to get back up and running opens the door for other potential security risks and vulnerabilities. The only course of action now is to manually and safely reboot the thousands of computers affected – a task that will undoubtedly be challenging and time-consuming.”

Maxine Holt, Senior Director, Cybersecurity, Omdia

“The global IT outage crisis is escalating, and organizations everywhere are in full scramble mode, desperately implementing workarounds to keep their businesses afloat. Microsoft has pointed fingers at a third-party software update, while CrowdStrike admits to a “defect found in a single content update for Windows hosts” and is working feverishly with affected customers. Omdia analysts connect the dots: this isn’t a cyberattack, but it’s unquestionably a cybersecurity disaster.

Cybersecurity’s role is to protect and ensure uninterrupted business operations. Today (July 19^th 2024) many organizations are failing to operate, proving that even non-malicious cybersecurity failures can bring businesses to their knees. The workaround, involving booting into safe mode, is a nightmare for cloud customers. Cloud-dependent businesses are facing severe disruptions.

Omdia’s Cloud and Data Center analysts have long warned about over-reliance on cloud services. Today’s outages will make enterprises rethink moving mission-critical applications off-premises. The ripple effect is massive, hitting CrowdStrike, Microsoft, AWS, Azure, Google and beyond. CrowdStrike’s shares have plummeted by more than 20% in unofficial pre-market trading in the US, translating to a staggering $16 billion loss in value.

Looking forward, there’s a shift towards consolidating security tools into integrated platforms. However, as one CISO starkly put it: “Consolidating with fewer vendors means that any issue has a huge operational impact. Businesses must demand rigorous testing and transparency from their vendors.”

CrowdStrike’s testing procedures will undoubtedly be scrutinized in the aftermath. For now, the outages continue to rise and the tech world watches as the fallout unfolds.”

Kevin Beaumont, cybersecurity researcher, posted on X that he has seen a copy of the CrowdStrike update that was issued and says the file isn’t properly formatted and “causes Windows to crash every time.”

In further posts, Beaumont says that it appears there isn’t an automated way to fix the issues, at least currently. This may mean that impacted machines need to be manually rebooted before they can come back online – process that could take hours or days depending on the impacted entity.

Brody Nisbet, Director of Overwatch, CrowdStrike, posted on X that the workaround fix the company had issued involves booting up Windows machines into safe mode, finding a file called “C-00000291*.sys,” deleting it and then rebooting the machine normally. “There is a fix of sorts so some devices in between BSODs should pick up the new channel file and remain stable,” Nisbet posted,

Widespread 911 outages across three US states are not being linked to the ongoing Windows outage.

Lumen Technologies, which provides services to Nebraska, Nevada and South Dakota, has confirmed a disruption to its network caused a two-and-a-half-hour 911 outage on Wednesday (July 17) that has been rectified.

Several cities in Texas also reported outages around the same time. But Lumen does not service Texas – leaving the cause of those outages unclear.

Police departments in the affected states reported 911 calls as down on cellular carriers and in some areas, on landlines – but many areas could still contact 911 by text.

The outages were apparently caused by the third-party installation of a light pole that disrupted the Luman network.

Tesserent, cyber solutions by Thales Australia has just issued this update:

CrowdStrike have deployed a new content update which resolves the previously erroneous update and subsequent host issues. As your devices receive this update you may need to reboot for the changes to take effect and for the blue screen (BSOD) issues to be resolved.
Tesserent SOC says it will continue monitoring the situation and relay any pertinent updates.

Microsoft’s Service Health Status updates has identified the preliminary root cause of the outage as a “configuration change in a portion of our Azure backend workloads (that has) caused interruption between storage and compute resources, and which (has) resulted in connectivity failures.” This has affected “downstream (and dependent)” Microsoft 365 services.

In an X thread, Microsoft said it is “investigating an issue impacting users’ ability to access various Microsoft 365 apps and services”.

“We’re working on rerouting impacted traffic to alternate systems to alleviate impact in a more expedient fashion. We’re still observing a positive trend in service availability while we continue to redirect impacted traffic. We still expect users will continue to see gradual relief as we continue to mitigate the issue.”

Services were seeing “continuous improvements while we continue to take mitigation actions.”

CrowdStrike Engineering has already identified a content deployment as relating to the outage and reverted those changes.

The company posted a resolution update for affected Windows users:

1. Boot Windows into Safe Mode or the Windows Recovery Environment

1. Navigate to the C:\Windows\System32\drivers\CrowdStrike directory

1. Locate the file matching C-00000291*.sys and delete it

1. Boot the host normally.

Tesserent, cyber solutions by Thales is reporting Crowdstrike as having confirmed the outages as a falcon sensor issue.

Though Tesserent is acknowledging the limited information available it says its Security Operations Centre is continuing to monitor the situation and provide updates to managed services clients including resolution plans once these become available.

Tesserent said: “Currently, our Security Operation Centre have our engineering teams testing rollback as a potential solution and have a pilot underway. If this resolution has been confirmed we will work with our clients to rollout this fix.”

Notifications

‘Worst over’ as worldwide Windows outage winds down

Impact

Configuration File Primer

Technical Details

Channel File 291

Remediation

Root Cause Analysis

Intelligent Technologies

Intelligent Verticals

Countries

Analysis

Other Regions