- Chapter 1: Disaster Recovery Tactics that Ensure Business Continuity
- Chapter 2: Disaster Recovery Target
- Chapter 3: Formulation of the Business Continuity Plan
- Chapter 4: Disaster Recovery Objectives and Milestones
- Chapter 5: Building Preparation
Business Continuity Plan Requirements
When formulating a Business Continuity Plan (BCP) it is essential to add elements that comply with business requirements. Some of business requirements that need to be addressed are listed below:
1. Time windows
Acceptable timelines need to be established and negotiated before the continuity plans are formulated. This is because following steps require time windows in order to formulate the correct action steps and in order to arrive at the technology solution.
2. Formulation of actions
The formulation of what actions need be taken in the event of a disaster is necessary and documentation of any installation and configuration needs to be managed. This action plan is incremental and needs to be a document that grows within the organization. The continuous refinement of this document should keep reoccurring when tests are performed on the recovery of the operations.
A good way of formulating an action plan is to attempt a recovery onsite at first in an isolated area or room, and document the steps as you go along.
Measures need to be put in place to ensure availability of IT related and non-IT related services. Because of the close integration of non-IT related services like telephone networks and other office related technologies and amenities it is vital that the correct business people be involved when deciding what services need to be available.
4. Hardware selection
The hardware you choose today has a direct impact if you will be able to recover in the future. This is especially evident in archive strategies and needs to be noted. DLT (Digital Linear Tape) device used to be recommended and used a few years back but now the norm is LTO (Linear Tape Open) if you need to restore media that is a few years old you may find it challenging if your equipment has been decommissioned and especially if it is no longer supported by the manufacturers. It is pertinent to ensure that your data is constantly backed up onto the latest media type. So apart from having a live server you may need to also have an archive server that your organization can backup onto more modern media. Not all media that has data on it needs to be sent offsite.
5. Implementation of protection mechanisms on critical business systems and processes
Personal firewalls and antivirus on critical workstation machines needs to be installed and maintained to avoid data loss. Security on these machines needs to be high as they are critical and data loss is at risk. Levels of protection need to formulate for each critical services.
6. Preparation needs to be meticulous and all documentation needs to sent offsite after being updated.
7. Documentation must be current and offsite
8. Change control involves consistent updating of change control documentation and is a necessary component of disaster recovery that needs to happen without fail. Good change control processes help in risk mitigation and helps track system change that could have led to a disaster. Disaster recovery plans and procedure also need to be maintained on a regular basis and changes need to be noted as soon as they happen. The procedure should be decided on the change. Test the change, quality assure it. Document the change and then implement the change. The document should then be sent offsite. This procedure will insure that the documentation is updated.
9. Backup media integrity. Backups are great and if the backup software reflects that the backup was successful then you may have some more comfort however. The absolute way of testing this is to recover the data. In this way you can test the integrity of your backup tapes. These days you can use helpful technologies like virtual PC and virtual server to test semi-live environments. A good example of this would be to have your backup on your live LAN running at a scheduled time and then a schedule restore the next day that recovers the live LAN's backed up data onto another semi-live VPCed network. This verification mechanism is relatively inexpensive and can be implemented fairly quickly by adding two network cards to your backup server and scheduling the restore and backups to use alternate network cards. Please note that magnetic media is susceptible to loss and there is no way of tracking the integrity of such a solution once it is sent offsite. Tapes and other magnetic media can be damaged if dropped, exposed to temperature ranges outside operational specifications, this includes tapes that are flown in from across state as the tapes are held in fuselage and could potentially be exposed to low temperatures, and this in turn may require acclimatization to avoid condensation. This occurrence is rare but needs to be pointed out, small risks are the ones that potentially cause disasters. Most disasters are caused by the smallest occurrences.
Figure 1: The diagram above depicts a possible restore solution
12. Disaster levels
13. Regular meetings at scheduled intervals
Full disaster: This is after total destruction of all operations and production systems. You will need all offsite resources to recover and this will be a mandatory offsite recovery as in this case the workplace has been destroyed.
Partial disaster: Certain portions of the operational and production systems have been destroyed or impeded. This will result in a partial recovery taking place. This recovery may not necessitate an offsite recovery. Partial disaster could comprise one non-critical machine going down and restore occurring from backup or an online networked source.
Minimal disaster: Only small non-critical portions of the operation environment have been harmed. Virus outbreaks and file deletion may cause these small disasters. These disasters can easily be recovered using undelete software or backup tape restores.
1. Establish Disaster Recovery team.
A disaster recovery team should consist of a team leader that will drive the project and have authority over the process and influence in the company's managerial department. This will ratify the plan and also ensure that policy is accepted.
2. Document DR team and participant's contact details.
This process is important from the standpoint that when the disaster emerges everyone concerned in the recovery needs to be contacted. This contact list should encompass suppliers, IT professionals, Support staff, team leaders, senior members of staff and all respective DR personnel.
3. Establish a DR pack.
A Disaster Recovery pack needs to be established with all of the essential documentation and respective contact details of each DR member.
4. Formulate a plan.
5. Establish what needs to be available.
Starting with the physical network cable and connectivity and the way the computers that you recover will communicate as well as the infrastructure used and switching fabric.
- Remote access
Do not forget the non-IT related infrastructure that overlaps into IT's area of responsibility. Something like telephone lines and access controls can become a disaster on their own if not adequately planned and provided for. This is why it is vital to incorporate the whole business and get participation of all relevant parties to ensure completeness of such a plan.
6. Ensure that reserve personnel are available.
In the event of the organization's staff being harmed alternate staff needs to be brought in to resolve the issues associated with the disaster. For this particular reason it is important to have detailed and updated documentation so that generic staff can restore the complex customized systems. Some systems are highly dynamic and are incredibly challenging to document and restore. Remember that moon landings and space science has been documented. There is no reason for documentation to fail if the system is acting as intended. There is no substitute for experience and for this reason the reserve staff should be carefully selected.
7. Make sure procedure documentation is updated.
A document that is not updated can be considered useless as pertinent information may be missing that is required to restore the environment. It should be the responsibility of the DR team leader to ensure that the DRP document and any annexure and any other pertinent information must be updated. No compromises on this one, it is high risk and if this process fails then your DRP is compromised.
8. Make sure configuration management documents are updated.
Configuration documentation is a fallback in case a change causes a disaster. This is why the change control is filled out before the change is made and then signed off after the change is made. This is the same reason that the changes are implemented on a test system. Update all the documentation using track changes may help, or saving a new version each time, or using version control solutions.
9. Ensure a copy of all documentation is offsite and updated.
All documentation must be sent offsite once it has been updated. When a disaster happens it is highly likely that the documents will also be destroyed. It is therefore important to ensure offsite storage of all of your DRP documentation. Ensure that the documents are stored in a safe place that insures integrity and confidentiality of these documents as they are likely to contain sensitive information.
10. Ensure that all information is protected and stored in a secure location.
A daily back-up of all data is taken to tape and stored offsite in the safe environment. On a weekly basis, documentation should be updated and uploaded to the offsite storage or sent physically offsite to the remote location. Please note that remote locations refer to buildings that are not across the road. History has taught us that towers across the road can also be destroyed.
In part three of the DR series, hardware selection, potential solutions and recovery strategies and centralization of information storage have been covered. The different types of disasters have also been covered. Preparing for these at different levels will help organizations to become effective when a true disaster occurs.