The last weekend in May was a complete PR and logistics nightmare for British Airways. Hardware damage, caused by a power surge in one of the company’s datacenters, led to a widespread IT outage.
Lesson #1: Always have a backup plan
Businesses must overcome risks in order to become successful, and these risks tend to vary from one sector to another. Thus, the first thing a company must do is recognize all the potential risks and scenarios it needs to protect itself against. This is where a lot of organizations go wrong. Why? Because: They remain inactive.
Ever heard the saying failing to plan is planning to fail? Well, nothing’s more appropriate for businesses in this situation. The truth is, what happened to British Airways should serve as a wakeup call to other companies, prompting them to make provisions for similarly disruptive occurrences.
This is why a business-continuity plan is so integral to the success of a company. If your organization has yet to create one, there’s no time like the present. It could be that you’ve no idea where to begin, or you simply do not have the time or experience to plan for future situations – in that case, make sure you seek professional help, and that certainly should not be from Saul Silver from the movie “Pineapple Express.” He is not that sharp nor reliable!
Consider all the various risks and scenarios facing your business. However, instead of getting too specific, think about their impact on the company. The trick is to ensure your plans cover the general issues, and only go into the nitty-gritties when required.
Lesson #2: perfect your business continuity plan
You already have a contingency plan in place? Terrific! But what often works in theory doesn’t translate well into practice. And that is exactly what happened with British Airways. According to a spokesperson for the company, they already had backup systems but they did not come online when required.
Avoid something like this at all costs – and the only way to do so is to test your backup plans occasionally. This will confirm whether systems, procedures, and processes will work in times of crises as expected. There are different types of tests, but some of the testing needs to include a transfer of processing to backup systems. It is also critical to use tools such as desktop exercises.
Lesson #3: Schedule frequent disaster recovery tests
All it takes is one mistake. Your organization might be impenetrable, but even a single point of failure can cause the whole system to collapse. The scary part is, that single point can be anything – it could be part of the infrastructure or systems, or one of the vendors or people. This is why it’s crucial to have a backup plan in case those key resources fail.
The same thing happened in the case of British Airways. There was only a single point of failure somewhere within their system, which caused the backup systems to falter. And this lone mistake cost them dearly.
You need to understand that nobody expects you to have a physical duplicate computer setup in a different datacenter. Your systems, however, should be in a position to respond quickly and effectively whenever there is loss of a key infrastructure component or system. You should aspire to have such a salient contingency plan that won’t even know an incident occurred since the backup kicked in and plugged the gap as soon as it detected a problem.
Most passengers who got stuck because of British Airways’ goof-up shared one common complaint – nobody could tell them what exactly went wrong.
Every company’s business-continuity plan must have an incident response plan. This ensures that those involved at least understand what went wrong, and what needs to be done so that the business can resume normal operations quickly. This could mean the formation of a dedicated incident response team capable of overseeing and managing the process of returning a company to normal operations.
You cannot fix the problem unless you know what it is right.
One of the most basic aspects of an incident-response plan is communication. This ensures that every stakeholder, from investors to customers, are given timely updates, and are provided the reassurance that the organization is trying to regain control of the incident to the best of their abilities.
Lesson #4: Proper focus on resiliency will cost you less than the alternatives
Management is often blamed for poor resiliency of an organization. British Airways CEO Alex Cruz, for example, faced flak for his handling of the situation from customers. What’s worse, even the board members questioned the capability of him and his team.
Nobody remains with an organization forever; it’s natural for people to leave. However, it is vital that people do not leave the company during a critical period. This is why organizations must have a stalwart succession plan in place. These documents, which cover senior members within the company or public department and those identified as crucial to business operations, improve the company’s resiliency.
Lesson #5: Active data replication
The impact becomes greater the longer the IT outage lasts. Years ago, 15 minutes of IT service loss wasn’t that big of a deal; but now, in the world of Big Data, every minute counts. Too long, and it could topple even mighty organizations such as British Airways. This is the reason companies should consider active data replication for syncing primary and secondary datacenters. Once enabled, it should make IT outage undetectable.
If you have seen “The Social Network,” you can see how worried Mark Zuckerberg was when his business partner, Eduardo Saverin, threatened to cut financing so they could not pay their IT or server bills.
Lesson #6: Implementation of artificial intelligence
AI and machine learning are becoming commonplace within enterprises. AI-based solutions are being developed and used more frequently by organizations to better understand tech.
But how does this benefit IT companies? Well, within an IT environment, AI-powered, automated tools are capable of looking at every relevant field and highlighting the issues that require immediate attention. For example, a smart monitoring tool implemented in the IT framework of British Airways would have flagged the issues before they escalated. Users would also understand the correct course of action. Thus, artificial intelligence helps achieve results effortlessly and keeps the business safe instead of wasting thousands of employee hours.
The snafu with British Airways uncorked a lot of the potential IT threats that companies across the world might be facing due to some negligence or oversight on their part. However, it’s not too late – you can still use what you learned from the incident to rectify any IT problems you may have in your own organization. This will enable you to keep on performing smoothly and efficiently without any disruptions.
Photo credit: British Airways