Disaster recovery (DR) is a process of protecting the workloads running on the primary site to another site known as the disaster recovery site. This keeps the application workloads up and running in case of a natural or any other kind of disaster that may occur. Disaster recovery is like insurance for the data center, it doesn’t hurt to spend some amount of the IT budget to make sure the end customers have continuity in the service that is provided to them. Businesses invest a lot of time and money in having a disaster recovery plan in place to make sure they don’t lose any customers or revenue when it actually happens.
Though the purpose of DR remains the same, the approaches can vary. The process of DR is achieved by replicating the workloads from the primary site to the secondary site. Replication can be a manual process where an administrator is manually copying the VM from one site to another. Or it could be completely automated where a click of a button results in switch over to the DR site. The customer chooses the solution based on how quickly they need the VMs that went down on the primary site to be brought back up on the disaster recovery site. This can be broken down into two things. The frequency of replications that take place and the time it requires to bring up the machine which can be accessed by the end user after the DR process. The frequency of replication defines the amount of data loss that is acceptable, which is called the RPO (Recovery Point Objective) and the time taken for the replicated VM(s) to be powered on and prepped for the user access is know as RTO (Recovery Time Objective). Shorter RPO values translate to reduced data loss.
One of the ways to carry out the replication between two sites is to make use of a software that periodically replicates, without needing manual intervention, the changes that take place within a VM . NAKIVO Backup and Recovery (NBR) is one such tool that automates the process of disaster recovery by helping the customer protect the VMs on the primary site to the DR site.
Using NBR, the administrator can create replication job for a a single VM or bunch of VMs which need to be protected. In the below screenshots, we will see how this is done.
Deploy the NBR on vSphere environment using the OVA file that is provided by NAKIVO. Configure the networking (IP config and DNS). Wait for the services to start and the point the browser to https://:4443. It will ask you to configure a password for the admin user and then bring you to the below screen.
The first task is to create a new replication job. Click on Create > VMware vSphere replication job
This brings up the replication job wizard. This is broken down into 4 simple steps. 1st step is to choose the VM that has to be replicated to the DR site. I’m choosing my DC machine, as it is a crucial for me to have a copy/multiple copies of it at all times.
The 2nd step is to choose the replication location. I’m choosing a resource pool but this can be a different cluster or another VC that is running on the DR site where the replica of this particular VM will be created. The datastore and network also have to be chosen in this step. If there is a requirement of running keep this replica on an isolated VLAN, this is where it can be chosen.
The step 3 is where the job schedule is defined. The replication job can be schedules to run daily (on all days of the week or chosen days) or weekly or monthly. We can define a particular time for this job to run. This is useful as most of the replications are scheduled off hours to minimize the impact on the network.
The 4th and last step is where we define the options for the job. Whether this job as to use change tracking. If it does, should it use VMware in-built CBT or NBR bock tracking or not use nay tracking at all. There is a setting called app-aware mode which when enabled quisces the disk while taking the snapshot of the VM. There are separate set of options for recovery point retention period which can be changed as per the requirement.
Once the option are all selected, we can either finish and wait for the job to run on the next schedule or run it immediately. The first run will be a full replication which will make a copy of the entire VM on the DR site. The rest of the scheduled jobs will only write the incremental changes to the replica VM.
If you choose to run the job immediately, you can see the progress of the job under the dashboard. This might take a while depending on the bandwidth and also the size of the VM that is being replicated.
Once the job is complete, it shows the stats on the dashboard. It shows stats like what is the avg speed of transfer and the total amount of transferred data.
This shows for multiple jobs which is very useful. The admin gets to know the change of delta data that has been replicated in every cycle.
NBR are also gives us the capability to recover the replicated VM in case of corruption or failback after the primary site is back up after the disaster. To recover the replicated machine, Go to Recover > VMs from replica.
Choose the replica copy that you want to recover. There is an option that says “Always use the latest recover point”. This acn also be changes to a speciic recovery point from the list of retion copies that are maintained during replication jobs.
Select the recovert location, this will only allow you to change the network. The hosts and datastore will remain the same as the source used during replication process.
3rd and last step, give the recovery job a name and run the recovery.
This completes the entire process of DR replication and recovery. With the capability to perform VM replication. NBR also adds features like network acceleration which gives 2X better performance over WAN and Direct SAN access which makes the replication faster as it is handed over to SAN. Give this a try and see how DR becomes as easy as it sounds.