Despite automation, scripts, and Disaster Recovery as a Service (DRaaS), it still takes more than pushing a button to get your business back up and running in an emergency. Make sure your operations team knows what they need to do by creating a comprehensive disaster recovery plan that spells out the step-by-step procedures to be followed.
The comprehensive plan should define the goals of the recovery process, the recovery time objectives, and the entire process, including locating and retrieving backups, the steps to restore them, how to verify that the recovery and restarts succeeded, and what to do in case they didn’t.
Document Who Performs the Recovery
One of the most important parts of your DR plan has nothing to do with the technology, but with who is responsible for it. Make sure you have current contact information for every employee who is part of the DR team, and note their responsiblities. There should be at least one backup person for each role. If vendors and third parties will be part of your recovery process, keep a copy of your contract agreement with them as an appendix to your plan.
Document What Will be Recovered
Create an inventory of hardware, software, and data assets that need to be recovered, along with the network design. You need to track both physical and virtual instances, whether local in your data center or running in the cloud. You should have a record of the specific configuration of each physical and virtual instance, including hardware components, memory size, operating system, and system configuration settings. Network dependencies and security requirements are also important.
Applications should be prioritized to define a restart sequence; application dependencies that impact the restart sequence also need to be documented.
Document the Recovery Process Design
It's likely your recovery process doesn't aim to bring up an exact mirror of the production environment. In order to avoid confusion and make sure the personnel executing the procedures understand how the recovery environment differs from production, document the differences between the two environments and the reasons for those differences.
Document the Recovery Process
Each step of the recovery process should be documented in detail, including the specific command line to be entered. Each task should be assigned to a specific person on the DR team. The dependencies between tasks should be highlighted, and the plan should include checkpoints to verify that recovery is proceeding properly. The plan should include instructions on what to do when recovery steps encounter an error and identify the triggers and contacts for escalation.
Document the Business Process
DR isn't just a technical process; business users will be impacted as well. This may be as trivial as accessing an alternate website or as complex as verifying and reentering data for transactions that were only partially completed. Those steps should be documented. In addition, it's important that business users understand the priority of their specific applications in the recovery process and whether it will be fully recovered or the functionality will be limited.
Document the Fallback Process
Eventually you'll want to restore your operations to their normal production mode from their disaster mode. The steps for transitioning back to those servers need the same level of documentation as the steps for failing over to the DR site.
Test Your Documentation
When you run a disaster recovery test—and you should be running a DR test at least once per year—you should also be testing your documentation. Start by making sure that it's accessible to the employees who need it when they aren't at their primary work site. Then make sure they're able to work following its instructions; any points that are confusing, incorrect, or incomplete should be reported and corrected.