Clustered Continuous Replication Failover with Standby Continuous Replication (Part 2)

 If you would like to read the other parts in this article series please go to:

In part one of this three-part article, we covered an overview of the lab setup required for the process of moving an Exchange 2007 Clustered Mailbox Server (CMS) from one Clustered Continuous Replication (CCR) environment to a standby cluster via Standby Continuous Replication (SCR). We then saw how to enable SCR to one of the standby cluster nodes but were left seeing that the actual database was missing from the SCR target folder. We will discuss why this is and then move on to the process required to recover the CMS to the standby cluster nodes. We will then bring these nodes up as the new CCR environment.

Seed the SCR Target

Let’s now look at the process of seeding the SCR target. Although this can be achieved manually by dismounting and copying the databases, I am going to use the Exchange Management Shell (EMS) to achieve the desired result. This process centers around the use of several EMS cmdlets, most notably the Update-StorageGroupCopy cmdlet, and we will be running these from the passive node NH-W2K3-SRV01.

Before re-seeding a database, the storage group replication process must be suspended by running the Suspend-StorageGroupCopy cmdlet. Since we have two storage groups that we are working with, both will be suspended at the same time as shown in Figure 4 below. The cmdlets to use are:

Suspend-StorageGroupCopy “CCREX01\First Storage Group” –StandbyMachine NH-W2K3-SRV01

Suspend-StorageGroupCopy “CCREX01\Second Storage Group” –StandbyMachine NH-W2K3-SRV01

Image
Figure 4:
Suspending the Storage Group Copy Process

With storage group replication suspended, it is now possible to remove any existing database files from NH-W2K3-SRV01. If you look again at Figure 3 from part one of this article, you can see that the enabling of SCR has already produced several transaction log files into the storage group folder on NH-W2K3-SRV01 but no database file. This is because SCR will only create the target database once at least 50 transaction log files have been copied over from the SCR source, plus the value of the period specified in the –ReplayLagTime value has occurred. The value of 50 transaction log files is hard-coded and therefore cannot be changed. As you have seen from the cmdlets used when we enabled SCR, the –ReplayLagTime value has been set to 0 which therefore means that effectively the databases will only be created once 50 transaction logs have been shipped from the SCR source. Re-seeding the database now will create the database immediately.

Let us get back to removing the existing database files. It is now possible to safely remove any .EDB, .LOG, .JRS and .CHK files from the folders containing the copies of the storage groups on NH-W2K3-SRV01. Once this has been done, the databases can be seeded onto NH-W2K3-SRV01 by running the following two cmdlets:

Update-StorageGroupCopy “CCREX01\First Storage Group” –StandbyMachine NH-W2K3-SRV01

Update-StorageGroupCopy “CCREX01\Second Storage Group” –StandbyMachine NH-W2K3-SRV01

The results of running these cmdlets can be seen in Figure 5 where you can see the database seeding process in action.

Image
Figure 5:
Database Reseeding in Progress

The cmdlets used above will automatically resume replication to the SCR target, so there is no need to use the Resume-StorageGroupCopy cmdlet at this time.

Site Failover Process

At this point SCR has been configured and any transaction logs that are created on the active node of the CCR environment are not only replicated to the CCR passive node, they are also replicated to the SCR target server NH-W2K3-SRV01. Thus, assuming the CCR environment is in the production datacenter, the SCR environment is in the backup datacenter and the other required services also exist in the backup datacenter, a site resilient solution has been created. These other required services include Active Directory domain controllers, Hub Transport servers, Client Access Servers, DNS, and so on.

To simulate a failure of the production datacenter, and thus the production CCR environment, I will simply shut down both the active and passive nodes of the CCR environment, namely NH-W2K3-SRV03 and NH-W2K3-SRV04. At this point, we now need to go about recovering the CMS, CCREX01, so that it is now running on the standby cluster. There are quite a few steps to perform to achieve this goal, the first being the need to activate the storage group copy on the standby cluster via the Restore-StorageGroupCopy cmdlet. You will remember that when we first enabled the storage group copy, we specified the target server as NH-W2K3-SRV01 so in this example I am now going to run the Restore-StorageGroupCopy cmdlets from this standby cluster node. The two cmdlets to run are:

Restore-StorageGroupCopy –Identity “CCREX01\First Storage Group” –StandbyMachine NH-W2K3-SRV01 -Force

Restore-StorageGroupCopy –Identity “CCREX01\Second Storage Group” –StandbyMachine NH-W2K3-SRV01 -Force

One thing that you should note from the above cmdlets is the use of the –Force parameter. This is used when the SCR source, in this case the CCR environment consisting of NH-W2K3-SRV03 and NH-W2K3-SRV04, is no longer available which will be the case should you have lost the production datacenter. If the original SCR source was still available, the –Force parameter would not be used as any outstanding transaction logs would be copied over from the SCR source. The results of running these cmdlets are shown in Figure 6.

Image
Figure 6:
Restoring the Storage Groups to the Standby Cluster Node

Recover the CMS

Once the storage groups have been prepared for mounting via the Restore-StorageGroupCopy cmdlets, the next thing to do is to recover the CMS. This is achieved easily by running the Exchange 2007 setup.com program on the target server NH-W2K3-SRV01. The setup.com program has a special switch called /RecoverCMS which requires that you then specify the CMS name that you are recovering, as well as the CMS IP address. One thing to remember here is that it is likely if you are recovering the CMS within a different datacenter, you will be specifying a new IP address for the CMS since the disaster recovery datacenter will likely be on a different IP subnet. This is why, in the example below, a different IP address of 172.16.6.153 is used rather than the one originally owned by the CMS (172.16.6.80) when it ran on nodes NH-W2K3-SRV03 and NH-W2K3-SRV04. This is perfectly normal. The setup.com command to use in my example is:

setup.com /RecoverCMS /CMSName:CCREX01 /CMSIPAddress:172.16.6.153

In Figure 7, you can see the results of running the setup.com program.


Figure 7:
Recovering the CMS

Once the recovery of the CMS has been performed, the databases can be mounted either via the Exchange Management Console (EMC) or via the EMS. Since we have been focusing on management shell cmdlets and command-line programs so far, let us continue down this route and use the management shell to mount the two databases that we have in this system. This can be achieved by using the Mount-Database cmdlet. Since we have two databases to mount, there are two cmdlets to run as follows:

Mount-Database –Identity “CCREX01\First Storage Group\Mailbox Database”

Mount-Database –Identity “CCREX01\Second Storage Group\Public Folder Database”

Assuming the databases are mounted correctly, the EMS prompt simply returns without any error messages. I then verified that I could access my mailbox as normal, which I could.

Re-create CCR Environment

We have now successfully recovered the CMS to NH-W2K3-SRV01 and mounted the databases, so everything is looking good so far. However, since the production datacenter had a CCR configuration, it is desirable for the backup datacenter to have a similar configuration particularly if there are plans to operate out of this datacenter for some time. Therefore, we now have to ensure that the databases are seeded onto the other cluster node in the backup datacenter, namely NH-W2K3-SRV05. As a result of this process, the original standby cluster running in the backup datacenter will now be a full CCR environment running the original CMS called CCREX01. You have already seen detailed information on the database seeding process within this article, so I will not repeat that information again.

As a final test of the configuration of the standby cluster called E2K7CLU02, it is prudent to ensure that the CMS and other cluster resources can be correctly moved between the two cluster nodes NH-W2K3-SRV01 and NH-W2K3-SRV05. Since the resources are currently running on NH-W2K3-SRV01, we need to move them to NH-W2K3-SRV05 and test for correct functionality. The default cluster group that contains resources such as the Majority Node Set can be moved easily by right-clicking the group called Cluster Group in Cluster Administrator and choosing Move from the context menu as shown in Figure 8.

Image
Figure 8:
Moving the Cluster Group

The CMS resources have to be moved using the Move-ClusteredMailboxServer cmdlet. The full cmdlet to use is shown below, where the –TargetMachine parameter is used to specify the node in the cluster that you would like to move the resources to and the –MoveComment parameter is used to specify a reason for the move which is added to the application event log.

Move-ClusteredMailboxServer CCREX01 –TargetMachine NH-W2K3-SRV05 –MoveComment “Test move after CMS recovery”

The results of running this cmdlet should be that all CMS resources are taken offline, moved to NH-W2K3-SRV05 and then brought online again. At this point you need to test that access to the CMS is still possible and that you are confident that the CMS can safely operate on both cluster nodes of the standby CCR environment.

Summary

In part two of this article we have managed to successfully recover the CMS so that it is now running on a different CCR environment. Of course, since the CMS has a new IP address there will also be DNS updates to consider, but overall you should see that Microsoft has made plenty of progress in ensuring that the ability to move a CMS is as painless as possible. In the last part of this article we will look at the steps required to move back to the production datacenter.

 If you would like to read the other parts in this article series please go to:

About The Author

Leave a Comment

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Scroll to Top