Building an Effective Recovery and Continuity Solution - Leaning Towards Availability and Integrity
It is easy to get confused over the difference between Disaster Recovery (DR) Business Continuity Planning (BCP) / (BC) and things like High availability and Backups. In this article I will describe each component and the differences between them and what your organisation needs to build to be fully protected.
For more than ten years I have worked in large corporate environs, overseeing and designing disaster recovery solutions and business continuity solutions. Before virtualization became mainstream I was working with IBM and with some large corporate companies that knew that it (virtualization) was the future, they were not brave enough to use VM technology for live infrastructure so it found its place in the DR centre as replica machines. With configurations set to manual processes, an organisation could manually restore data and configurations into the VMs for Disaster Recovery purposes. This covers controls like Anti Virus, backups, disk mirrors, data mirrors, UPS, Fire Prevention and other like technical controls.
Many companies figured out that configuring a DR machine similar to their live solution and then restoring the backup and configuration data to get it as close as they could to the live solution, would be the best approach.
So What is the Difference Between DR and BCP?
The logical way to look at this is simple; DR (Disaster Recovery) is a short term recovery strategy (Reactive) in the event of a disaster. Disasters come in many forms, even as a mail server going down can be classed as a disaster. So Disaster Recovery is a strategy of 24 – 72 hours. Disaster Recovery is a part of the larger process that we call BCP or Business Continuity Planning. The aim here is to reduce loss in the event of a disaster. Disaster Recovery typically gets invoked right after a major disruptive event, like 9/11 or the earthquake in Haiti.
Business Continuity is more long term (Pro-Active). BC or (Business Continuity Planning BCP) is about keeping the business running whilst anomalies are attempting to affect it. BC focuses more on long term planning and involves things like replication to other sites and resilience to avoid disaster; it’s like a Disaster Avoidance Plan. It is what is required to stay in business and covers the broader approach of including Disaster recovery as one of its elements. The whole process requires a lot more analysis and focuses on crisis management, DR is part of risk management and includes a risk assessment and risk calculation that helps in understanding the organisations’ risk profile in order to ascertain the spend on the risk mitigation strategy. One of the analysis that needs to be performed as part of BCP is BIA (Business Impact Analysis) will deal with a broader part of the business and not only IT, involving the understating of MTPD (Maximum Tolerable Period of Disruption) and goes more in detail when it comes to threat analysis.
A good example of when Business Continuity needs to be looked at is when more long term events like the Swine Flu pandemic rear their heads into society, strategies then need to be implemented that keep the business running on a longer term rather than contracting an event like an earthquake.
This is more in line with the question; how often do these threats occur and what is the likelihood that this would happen to us? If it is going to happen, how we can better prepare for this and how much should we spend to help us lessen the blow? Imagine how useful this would have been to the legal firms that were based in the twin towers, some of which were only backing up using line of sight technology from one tower to another.
The bottom line is you need both and some elements of each do overlap and this is why it is so confusing. In short, DR is to recover from a disaster in the 24-72 hour time frame, and BC/BCP helps in keeping things going for longer term.
The Steps and How to Keep Things Going
Start by creating a Disaster Recovery plan because this is what you need to get in the place first. Then once this is in place and it becomes process and procedure, mature the solution by adding in the Business continuity element as a Disaster Recovery gap analysis and mitigation strategy to keep the systems up and running. BCP normally takes a lot longer to build and put into action. Normally, this solution is much more long term. The idea is to aim to never have to invoke DR because your BCP is working so well and the system is so resilient that it does not go down. This is becoming more of a reality nowadays as technology improves. With the rise and maturation of virtualization this will also become more affordable for the mainstream.
BCP and DR plans are less useful if they are not updated with current tested information, not keeping the document alive means that when it’s required you suffer the wrath of obsolescence. Details about key contacts, operating system versions and updates and latest configurations all form part of keeping the document alive. There are so many elements that need to be kept up to date that without periodic testing the document will remain out of sync with the business and impede, if not cause, the recovery to fail. Full testing, process and procedures need to form part of maintaining the document.
Things to do
- Start with a “to do” list. Included in this list will be a plan. This list will help in generating the overall plan and will define the tasks and the approach in time.
- Test the whole plan at least once a year and practice with all business units to ensure all circumstances are covered. The more you practice the better you get and the more bases you and your organisation will cover.
- Invest in an offsite location that your organization is aware of and can regroup at. This will form a base in times of crisis and help to restructure and form a command centre when you need to.
- Fill out the gaps in your BIA business Impact Analysis, this step should have been performed right in the beginning of your journey and you could have hired someone experienced to help fast track the gap analysis process.
- Build a comprehensive awareness program that will inform and keep your staff updated and trained, so that in time of crisis they can react accordingly.
- Ensure that all vital service information is captured, like fire stations, Ambulance and other emergency service. It’s a good idea to visit these entities and get accustom with the procedures and process.
- Keep testing and perfecting the plan, this will ensure you are as prepared as you can be, and remember to involve the whole company.
It is important to keep the plan current and evolving with the changes in the business. Taking a multi-pronged approach will ensure that you cover as many bases as possible. The only way that you will reveal and accommodate the changes is by actively testing and refining the plan. Remember that Disaster recovery is reactive and short term and that Business Continuity is (proactive) and long term. Know that you will need the business full support to move this programme forward, without the support of the business this will fail. Through this article I hoped to have covered the fundamentals of what you should be looking for, I hope the information has been helpful.