Millions of Americans woke up to the news of a global IT crisis impacting Windows PCs, causing widespread disruptions in banks, airports, and other critical sectors. The crisis has raised concerns about the safety of turning on affected systems. The issue appears to be a major blackout affecting Windows operating systems, leading to significant operational challenges and potential security risks.
CrowdStrike issued a flawed update, causing millions of Windows PCs worldwide to become inoperable and display the Blue Screen of Death. Although the fault has been identified and a fix released, each affected PC will need to be manually repaired, a potentially slow process given the large number of computers and remote workers.
This CrowdStrike software failure that led to Windows computers experiencing blue screen errors started around July 19, 2024, according to multiple sources. CrowdStrike, in a promotional video, has said it is used by over half of “Fortune 500” companies.
A “Modern” Problem That is Worsening
The issue appears to be tied to an errant update provided by CrowdStrike. I’ve seen this before: Software Development in the “Agile” age. Folks, it’s time to get used to it. Nowadays, inexperienced programmers write your software without knowing how it really works and are guided by an even less knowledgeable management team.
Yes! Of Course! Software development is another industry affected by DEI.
Key Takeaways
- A global IT crisis is currently affecting Windows PCs running CrowdStrike anti-virus software, causing major disruptions.
- Some 300 of the Fortune 500 companies use CrowdStrike.
- Critical sectors such as banks and airports are significantly impacted.
- There is widespread concern about the safety of turning on Windows PCs during this crisis.
- The issue is being referred to as a major blackout of Windows operating systems.
- Users are advised to stay informed and follow official guidance on using their systems safely.
The issue lies not with Windows itself but with Windows computers running CrowdStrike’s software. Home users and PCs running macOS, Linux, or Chrome OS are unaffected. Most users should allow automatic updates.
For individuals with company-supplied Windows PCs, you should check with your IT department before turning on the device to avoid potential issues. A manual fix has been released, it might take time for companies to validate and apply it, and the process is currently manual.
This incident highlights our reliance on Big Tech, as affected systems disrupt access to critical services. Business environments that leave PCs running overnight for updates may find systems already impacted by the flaw upon arriving at work.
What Should Businesses Do?
Businesses can implement several strategies to protect themselves from vendor errors, such as problematic software updates. Here are some key measures:
- Regular Backups:
- Ensure that all critical data is regularly backed up. This includes using both on-site and off-site backups to recover data in case of severe issues.
- Testing in Sandbox Environments:
- Before deploying updates or changes to the entire enterprise, test them in a sandbox or staging environment. A sandbox environment is an isolated testing space that mimics production settings, allowing new software versions to be tested safely without affecting the live system or sensitive data. This helps identify potential issues without affecting the main production environment.
- Patch Management Policies:
- Develop and follow strict patch management policies. A simple patch management policy for businesses involves regularly scheduling and testing software updates in a controlled environment before deploying them to production systems. This ensures the timely application of critical patches while minimizing operational disruptions.
- Monitor Vendor Updates:
- Stay informed about vendor updates. Subscribe to their newsletters, security updates, and bulletins to be aware of upcoming changes and potential risks.
- Network Segmentation:
- Use network segmentation to isolate systems. Network segmentation is the practice of dividing a larger network into smaller, isolated, and firewalled subnetworks to enhance security and performance by limiting the spread of potential threats and controlling traffic flow. This limits the spread of issues and allows easier management and correction of specific segments without affecting the entire network.
- Vendor SLAs and Contracts:
- Negotiate strong Service Level Agreements (SLAs) with vendors that clearly define their responsibilities during disruptions. Ensure there are clear terms related to support, response times, and remediation.
- Incident Response Plan:
- Have a robust incident response plan in place. This should include clear steps for identifying, responding to, and mitigating the effects of vendor errors. It involves clearly defined roles, procedures, and communication strategies to ensure a swift and effective response, minimize damage, and facilitate a quick recovery.
- Multi-Vendor Strategy:
- When appropriate, avoid relying on a single vendor for critical services. Use a multi-vendor strategy to ensure that if one vendor fails, another can take over. This would not have helped with the CrowdStrike failure.
- Employee Training:
- Train employees on best practices and policies for software updates and patch management. Ensuring that IT staff and end users follow proper procedures is crucial for minimizing risks.
- Third-Party Evaluations:
- Employ security audits and evaluations from third parties to assess the reliability and security of vendor products and updates.
- Controlled Rollout:
- Implement updates in controlled phases rather than all at once. Start with non-critical systems, monitor the impact, and then proceed with a wider rollout if no issues are found.
By putting these measures in place, businesses can significantly reduce the impact of vendor errors and maintain system stability and security.
Any Questions? Just hit <REPLY> and ask.