If an outage occurs in the Production site, Site Recovery is the Nakivo solution included in Nakivo Backup & Replication for an effective DR strategy to maintain all IT services at DR Site.
Site Recovery allows to create an effective Disaster Recovery Plan to reduce the business downtime and potential data loss (picture from Nakivo).
Blog series
Nakivo Site Recovery: Job creation - pt.1
Nakivo Site Recovery: VMs Re-protect - pt.2
Nakivo Site Recovery: Failback - pt.3
Benefits of using Nakivo Site Recovery
- Easy management - the management of the activities to perform during a Disaster Recovery event is much easier with Site Recovery since backups, replica, DR workflow, failovers and failbacks can be managed centrally from a single pane of glass.
- Limited downtime - with a DR workflow, IT services can be maintained during an unplanned outage.
- All in one solution - with a single product, you have a set of available features useful to manage data protection, disaster recovery orchestration and IT infrastructure monitoring.
- Cost-effective - you don't need to purchase an additional software or license, Site Recovery is included in the Nakivo Backup & Replication Enterprise Edition at no extra cost.
- Ransomware recovery - if you suffer a ransomware attack and your main site is down, with Site Recovery you can run a pre-configured failover sequence in your DR Site to quickly restore the services.
- Sandbox environment - to minimize data loss and disruptions, and workloads functionality, patches and new releases can be tested in an isolated environment without affection the production.
Site Recovery key features
- Disaster Recovery Orchestration - to reduce downtime and services disruption in your production site, Site Recovery allows you to configure a DR Plan to create replicas and disaster recovery sequences you can execute with a simple click.
- Efficient replication - replicas for mission critical workloads can be created for the supported platforms. To reduce the load on the production environment, a replica can be created from both source VMs and existing VM backups.
- Consistency - critical apps, databases, SQL Server, Exchange and Active Directory are consistent after being processed as a replica. No additional configuration is required once a replica is powered on and all the provided services can be used immediately.
- Roles and permissions - specific roles and permissions can be assigned to IT staff to manage all DR tasks (workflows, testing, powering on replicas, failover, failback, etc.).
- Custom workflows - different DR sequences can be created for each scenario such as failover to a secondary site, failback, planned failover and planned data center migration.
- Notifications - checking VMs status on regular basis, notifications can be sent if a VM cannot be reached and enable timely action before a disaster strikes.
- DR Testing - replicas should be tested to ensure they are usable in case of a disaster. Tests can be run scheduled and on-demand without affecting the production environment.
Create a disaster recovery workflow
To create an effective DR Plan, we need to have a clear overview of our infrastructure and to know the required actions to restore the services at DR Site.
Knowing in advance the correct power on sequence of the mission-critical machines is essential to be able to restore the various services quickly limiting the offline.
For this reason is crucial to have a DR workflow in place, that is a sequence of actions to perform defining the order in which they should be executed.
Available Site Recovery actions
During the creation of the DR Plan, required actions must be supported by Nakivo Site Recovery.
The following actions are currently included in the supported platforms:
- Failover - initiates failover to replica.
- Failback - returns workloads from the VM replica to the source VM. The changes made in the VM replica since the point of failover are written to the source VM when the failback operation is performed. The VMs are synchronized and the source VM is in the actual production state again.
- Start - starts VM.
- Stop - stops running VM.
- Run job - runs a supported job type (backup job, replication job, site recovery job, backup copy job, or Flash VM Boot job).
- Stop jobs - stops a job.
- Run script - runs a script.
- Attach repository - attaches a backup repository used by Nakivo.
- Detach repository - detaches a backup repository.
- Send email - sends an email to defined recipients.
- Wait - waits for the designated period of time before proceeding to the next action.
- Check condition - one of the following conditions are checked based on your input: the resource exists, the resource is running, IP/Hostname is reachable.
Example environment
To test Site Recovery, an example infrastructure has been used. The environment is composed by the following components:
- Domain Controller
- File Server
In case of Production site failure, to restore the services at DR site you need to perform specific actions. For example, to access the File Server we must be sure the DC is up and running to authenticate the users who want to access the server.
The Domain Controller will be then the first server to failover.
The sequence to configure in the Site Recovery will be the following:
- DC failover to replica
- Wait DC is up
- Check if DC is reachable
- FS failover to replica
- Wait FS is up
- Check if FS is reachable
- Configure the network (if Prod and DR have different VLAN)
- Configure the Re-IP rule (if Prod and DR have different subnets)
- Optionally you can configure the test on-demand (useful to ensure the job can be run successfully)
- Configure options.
Site Recovery configuration
Login to Nakivo Backup & Replication.
Access the Jobs area, click the + icon then select Site recovery job under Site Recovery Job to run the configuration wizard.
First action to configure is the DC failover. Click on Failover VMware VMs link.
Select the Domain Controller and click Next.
Make sure that Power off source VMs is enable to avoid IP conflicts and click Save.
Now click Wait as next action to configure.
Set Time to wait to 2/3 minutes to ensure the DC completes the boot process. Click Save.
Click Check Condition action to verify the status of the powered on VM.
Set Condition type as Resource is running and specify the VM details. Click Save.
Now repeat same steps for the second VM (File Server).
- Failover VMware VMs
- Wait
- Check Condition
When all actions have been configured, click Next.
If Production and DR sites use different VLANs, thick the Enable network mapping option to map the correct network during failover. The mapping used for the replication is automatically presented and can be used accordingly. Click Create a new mapping if you need to use a different network instead. Click Next.
If Production and DR sites use different subnets, thick the Enable Re-IP option. Click Select VMs to specify the VMs to configure.
Select VMs to configure and set the credentials to use. Click Manage Credentials if you need to create new credentials. Click X to exit.
Click Add existing rule if you have an already configured subnet.
Select the subnet to use and click Close.
Click Next.
To ensure replicas are usable, it is recommended to test the Site Recover Job on regular basis. Specify the desired Test Schedule and click Next.
Specify the Job name and in case of the Job priority. Click Finish to save the Job configuration.
The DR workflow has been configured and you can now run the Site Recovery Job in case of need.
To ensure the minimum data loss and services availability, there are additional configurations to do in Nakivo Backup & Replication once the failover action has been triggered:
- Re-protect VMs - VMs (replicas) running at DR site should be protected against a possibile DR site outage while Production site is down.
- Failback - when you restore VMs back to the Production site, any change or new data in the VM replica must be transferred back to the source VM to avoid data loss.
Part 2 will cover the test of the Site Recovery Job and the configuration of VMs Re-protect to protect the active replicas against possible failures.