“Largest Outage in History” Caused by CrowdStrike Might Take Weeks to Resolve

A catastrophic IT failure, described as the “largest outage in history,” wreaked havoc worldwide on Friday, disrupting airports, healthcare services, and businesses. The disruption resulted from a botched software upgrade affecting Microsoft’s Windows operating system, executed by US cybersecurity company CrowdStrike. Flights were grounded, hospital appointments cancelled, payroll systems seized up, and TV channels went […]

"Largest Outage in History" Caused by CrowdStrike Might Take Weeks to Resolve
by Manish Raj Malik - July 20, 2024, 5:05 am

A catastrophic IT failure, described as the “largest outage in history,” wreaked havoc worldwide on Friday, disrupting airports, healthcare services, and businesses. The disruption resulted from a botched software upgrade affecting Microsoft’s Windows operating system, executed by US cybersecurity company CrowdStrike. Flights were grounded, hospital appointments cancelled, payroll systems seized up, and TV channels went off air as users encountered the infamous “blue screen of death.” Experts suggest that every affected PC may require manual fixes, complicating the recovery efforts further.

In response, the UK government’s crisis committee, Cobra, coordinated efforts to tackle the fallout, with ministers actively engaging with affected sectors. Transport Secretary Louise Haigh reported working “at pace with industry” to address disruptions affecting trains and flights. Microsoft acknowledged the issue, attributing it to a third-party software platform update, and anticipated a forthcoming resolution. CrowdStrike confirmed the problem originated from one of its product updates and was not a cyber-attack. Founder and CEO George Kurtz expressed deep regret for the impact, citing a “negative interaction” between the update and Windows. Consequently, CrowdStrike’s stock price plummeted by as much as 13% during the day.

The outage caused significant disruptions across various industries. Govia Thameslink Railway (GTR) warned passengers to expect delays. Major UK services, including Visa, BT, supermarkets, banks, online gaming platforms, and media outlets, reported issues. Sky News and CBBC channels temporarily went off air in the UK, while Australia’s ABC was also affected. Metro Bank and Santander experienced problems with phone lines and card payments. Monzo reported customer issues, and some JP Morgan bankers were unable to log in, while the London Stock Exchange faced news service disruptions.

The healthcare sector was notably impacted, with UK GP practices reporting an inability to access patient records or book appointments. Some surgeries were unable to use the EMIS Web system. The Royal Surrey NHS Trust declared a critical incident, cancelling radiotherapy appointments. The National Pharmacy Association confirmed potential service disruptions. The Israeli health ministry reported the outage affected 16 hospitals, and the Schleswig-Holstein University Hospital in Germany cancelled all planned operations.

The aviation industry faced widespread chaos. Ryanair advised passengers to arrive at airports three hours early due to potential disruptions. Heathrow Airport worked to minimize passenger impact and advised travellers to check with airlines. US flights were grounded, affecting American Airlines, Delta, and United Airlines. Berlin Airport temporarily halted all flights. Aviation analytics company Cirium reported 5,078 flight cancellations globally on Friday.

Cybersecurity experts emphasized the unprecedented scale of the IT failure. Troy Hunt called it the largest IT outage in history, likening it to the feared but uneventful Y2K bug. Adam Leon Smith of BCS indicated recovery could take days or weeks, depending on the severity of issues like blue screens and endless loops. Alan Woodward from the University of Surrey noted the challenge of manually rebooting affected machines, especially for organizations with numerous PCs.

The University of Surrey’s Alan Woodward explained that the outage was caused by CrowdStrike Falcon, which monitors large PC networks. Steven Murdoch from University College London highlighted the difficulty of implementing fixes remotely. However, Ciaran Martin, former chief executive of the National Cyber Security Centre, suggested the situation would stabilize quickly as the solution had already been identified.

While the immediate impact of the outage is severe, experts believe that recovery efforts are underway and that the most critical disruptions should be resolved within a week. The incident underscores the importance of robust IT management and contingency planning to mitigate future risks.