Meltdown in an over-networked world
The incident occurred on Friday, July 19, 2024. It affected thousands of flights and millions of people across dozens of nations on several continents. It has led to total costs of over $5 billion. “It” was an IT outage.
This disruption, one of the biggest ever, was not the result of a pernicious virus, malicious hackers employed by an underground criminal cartel, or even dubious governments with unscrupulous agendas. Ironically, the cause of this global IT chaos was a faulty security update as part of a cybersecurity protocol, a development that caught many by surprise.
CrowdStrike, the company developing and implementing that protocol, showcases its talents in the corporate claim “Stop breaches. Drive business.” Because of their robustness, CrowdStrike’s programs are used in most Microsoft applications worldwide.
IT services such as Microsoft programs and the cloud are network business models. While the CrowdStrike disruption affected several other businesses, a focus on one example epitomizes the problem: the air traffic control system. Airports and airlines are also network business models. A mistake in one network exacerbates disruptions in the other, showcasing the intricate and interdependent nature of the IT ecosystem and massively impacting the lives of millions of enterprises and people. This well-intentioned blunder caused damage on par with cyberwarfare.
Connectivity through the digital and tangible
Network business models, or industries in general, are, at heart, about connectivity. They link people and companies, providing a platform for interaction. Meta’s social media platforms are a typical example: The Facebook and Instagram apps allow people to instantly share moments of their lives, opinions or moods. The same programs also enable advertisers to reach very specific customer demographics. Facebook and Instagram connect both people and advertisers with other people. Connectivity is the core – it’s not just a feature.
Similarly, Microsoft’s programs and cloud services enable users to share texts, calculations and presentations. If we all use the same programs, we can exchange files easily. Again, connectivity is the core of the product. More importantly, Microsoft’s software also connects infrastructures, facilitating the automated exchange of information between them and therefore their coordination.
Connectivity is the core – it’s not just a feature.
With the ascent of network or platform business models, we became conditioned to imagine networked industries in the digital world. Of course, Amazon, Uber, Airbnb, X, Facebook and Microsoft are examples of successful digital networks. However, not all networked industries use digital infrastructure.
The highway system, telephone lines, water supply and distribution and the airline traffic control system are examples of networks. Their business, too, is to connect people. They are as wide as digital networks and show the same degree of interconnectedness. An important aspect of infrastructure is that it is, as such, a network. Infrastructure also relies on other networks, such as IT services, to function. Together, they create a compound network.
Network effects, concentration and damages
The simple usefulness of connectivity is why we seek it. The more connectivity is created, the more networks benefit people. A highway system with 300 roads is better than one with a dozen. An airline connecting five continents is better than one with domestic flights only. A computer system connecting all airlines and airports is better than a proprietary program running in some regional airports.
“Better” here means more beneficial. Benefits are conceived in terms of what people primarily want to do with the network, that is, drive or fly to their final destination or maintain services running smoothly.
Network vulnerability
Most networks seem to increase the benefits they provide by increasing their size or the number of users. This is what economists refer to as the “network effect.” But many economists fear that networks lead to market concentration due to their inherent expansionist character. That is, they tend to expand until their span is as wide as the entire market.
This is a very plausible explanation for what happened in July 2024. Microsoft’s programs and, with them, the CrowdStrike defense protocol, had expanded to reach virtually all airports. This is where network vulnerability strikes. The other side of the coin regarding network effects is network disruptions. As the benefits of the network increase with its size, so do the potential damages in cases of outage.
Millions of people experienced firsthand the compound effect of a disruption in one network, unsettling another network and rippling through those and subsequent networks’ branches. The integration of IT and infrastructure is in itself a form of market concentration. Two network industries layered upon each other in concentrated markets turned a relatively small coding error into a worldwide tribulation.
Derisking, decentralization and defense
As always, following the July outage, commentators, academics and politicians called for more regulation. European Union officials immediately started talking about derisking. The dangers of network industries were condemned worldwide. What these populists and their kneejerk reactions miss, however, is the content of their regulations and derisking. First, air traffic will continue to be a network industry because that is the core of that business. Second, IT will also maintain its character as a connecting platform because that is the source of its usefulness. And third, the compound network of infrastructure and IT will not change simply because they were designed to be compounded.
However, there are ways of curbing risks in networks. Regulators want either to increase their grip over networks or break them up, while promising ways to increase networks’ robustness and resilience to take decentralization to the next level. Ironically, regulators themselves have deterred the development and implementation of decentralized systems. Here are just two examples: decentralized network management and decentralized programming.
A blockchain inspires decentralized network management, creating nodes that each validate the system. When a majority of nodes are in consensus, a decision is taken. This system’s advantage is that it continues to work even if some nodes are disrupted. The disadvantages are the slowing down of the decision-making process and increased costs.
Similarly, decentralized programming uses nodes but requires them to use different algorithms while maintaining their functional equivalence. In other words, the programs are supposed to do the same thing and lead to the same outcome, but their specific codes differ. Its advantage is that errors in one program do not disrupt other nodes. Its disadvantages are redundancy and the demands on computational capacity.
While these concepts are still nascent, they show how embracing networks and their benefits while decentralizing at the same time might be the best defense against disruptions. They also showcase how much development is impeded by populist regulators – a network of their own. Concepts of decentralization are associated with investment losses. However, decentralization has a payoff, such as lower costs related to security and those associated with disruptions, while also lowering management costs thanks to better network management. Crucially, decentralization also increases security.
Scenarios
What may follow CrowdStrike’s massive IT outage can be summarized in three scenarios. Keeping in mind that that the disruption – let’s call it a security event – was caused by a relatively small error and not by a malicious cyberattack or anything similar, these scenarios focus on how the world at large is likely to approach networks and connectivity going forward.
Most likely: Continued use of concentrated networks
The most likely scenario is a continuation of the status quo. This means that the concentration of IT services, both generally and for managing infrastructures specifically, will continue. It also involves the increased connectivity of air traffic and the increased grip of regulators. The outcome of this likely scenario is the increased magnitude, and potentially frequency, of disruptive incidents.
In this scenario, the core issue of the disruptive security event is not addressed. Therefore, it can and will most likely happen again. However, due to the momentum of network extension, the concentration of interconnectedness and the lack of competition between regulatory regimes, when future events strike, they will ripple through still more extensive networks. Homogenous regulation leads to homogeneous protocols, programming and deployment, easing and accelerating the spread of disruption through the network.
Possible: Decentralization of networks
This scenario is predicated on the willing participation of both regulators and market participants, networks and their operators in learning to embrace decentralization, and regulators allowing them to do so. Implementing decentralized network management or decentralized programming are just two examples of how this can be done.
There are several more tools available to support decentralization, such as introducing security layers and quantum computing. The initial costs of these mechanisms will fall as adoption increases and as their deployment in expanding networks becomes routine. This scenario addresses the underlying cause of the security event without jeopardizing the benefits of ever-expanding networks. It also decreases both the likelihood and magnitude of security events.
Unlikely: Network disconnect
In this case, regulators will increase their grip over the basic infrastructure – in our example, air traffic control – and force industry participants to use different IT services, leading to network fragmentation, effectively disconnecting its parts. In a world with fragmented networks, the benefits of the network industry are lesser, and there is a higher cost associated with switching through the different networks. In this scenario, the underlying problem is not addressed either. The probability of the security event occurring remains the same, but disconnection curbs both its magnitude and the potential benefits from more extensive networks.
This report was originally published here: https://www.gisreportsonline.com/r/crowdstrike-networks/