Define Disaster Recovery Procedures and Test Plan

To define procedures for disaster recovery, including use of:

·         Services provided by third-party disaster recovery service providers

·         In-house resources

Procedure

1.       Together with project leads and the project steering committee, determine the maximum acceptable downtime in the production system.

-         Check that you can achieve the downtime figures determined for live operation with the hardware installed and any hardware to be installed.

-         Check your service level agreements with hardware partners:

·         How quickly can you get replacement hardware?

·         Do you need more contractual cover from your hardware partner?

·         Do you need more service contracts?

2.       Make a formal record of decision and escalation paths. Include escalation procedures for hardware and software partners, and take accout of internal and external SLAs. When making the record, consider the following questions:

-         When do disaster recovery conditions apply?

-         Who decides when disaster recovery conditions apply in a particular case?

-         To what extent is SAP operation affected by a recovery?

-         How is disaster recovery performed?

-         Who monitors the performance and success of disaster recovery?

-         Which decision paths lead to escalation and in what timeframe?

3.       Determine which application procedures require a description of transitional operation.

-         Document the transitional operation.

-         Determine any other potential impact, such as with interfaces and output devices.

4.       Document verifiable conditions for determining that disaster recovery is successful. Add detailed descriptions of procedures for verifying application-specific considerations. Describe separately the following activities:

-         Integration of transitional operation

-         Subsequent reworking on data interfaces

-         Subsequent reworking on printed output (fax, EDI)

-         Transition to regular operation:

·         When is regular operation re-established?

·         How is the transition to regular operation recorded, and how are people notified?

·         De-escalation procedure

5.       Formulate a plan for responding to disk failure:

-         Replace disk with spare disk: where can it be found, who is responsible?

-         Restore backup of data files: where are the backups, how old are they, who is responsible?

-         Recover data files: where are the log files, still on tape or on disk, and who is responsible?

·         Occasionally, small hardware parts fail. It is wise to have spare parts available. For a production system, it is essential to have support contracts with 24-hour support for software and hardware.

·         Unless you use a disk system that allows online exchange and recovery of a damaged disk, such as RAID 5, you need time to exchange the disk and to recover the database.

·         Ensure that recovery of the database is tested before you go live.

·         Have all support numbers, such as hardware suppliers, SAP, and hotline, readily accessible.

·         If your system must run 24 x 7 all year, consider high availability solutions provided by your hardware partner or external companies.

·         For more information, see the guide System Administration Made Easy.