Clustered Continuous Replication Failover with Standby Continuous Replication (Part 1)

If you would like to read the next parts in this article series please go to:

Arguably the most important feature of Exchange 2007 Service Pack 1 is Standby Continuous Replication (SCR). In a nutshell, SCR allows you to replicate your Exchange database information from your production servers to a standby server that can be brought online should the production servers be lost. Although existing Exchange 2007 technologies such as Clustered Continuous Replication (CCR) offer high availability, site resilience is something currently best achieved via SCR. This is because it can be problematic to implement CCR across datacenters that have different IP subnets, as the members of the CCR cluster must be in the same subnet when Windows 2003 is used as the operating system. Although this requirement can sometimes be addressed by the networking team, many organizations are looking at implementing SCR in the backup datacenter and opting for manually initializing the SCR servers in the event of a disaster at the production datacenter. Quite often, it is desirable to have to manually intervene to bring up the Exchange system at the backup datacenter rather than have an automated process.

In this three part article, we are going to look at the process of implementing SCR between two sets of CCR environments. The idea behind this article is that I was interested to know an outline of the procedure of moving a Clustered Mailbox Server (CMS) from one CCR environment to another and then back again. Obviously in the real world these CCR environments would be in separate datacenters but for the purposes of this article all servers are virtual servers configured on the same network. For clarity, I will be using the terms production datacenter and backup datacenter to help illustrate which CCR environment we are dealing with at the time. In this article we will go through the process of:

  • Enabling SCR between the two CCR environments. In actual fact, the CCR environment in the backup datacenter is actually a standby cluster and is thus a pair of passive nodes ready to take ownership of and run the CMS.
  • Simulating the loss of the production CCR environment and therefore producing the need to bring the CMS up on the standby cluster in the backup datacenter.
  • Moving the CMS back to the CCR environment in the production datacenter once this datacenter is available again.

Server Configuration

Let us have a look at the five servers that I have in my virtual environment that will be used to construct and test the SCR scenario. They are:

  • NH-W2K3-SRV02, a combined domain controller, Client Access Server and Hub Transport server.
  • NH-W2K3-SRV03, initially set to the active node of the production CCR environment.
  • NH-W2K3-SRV04, initially set to the passive node of the production CCR environment.
  • NH-W2K3-SRV01, the first passive node of a standby cluster.
  • NH-W2K3-SRV05, the second passive node of a standby cluster.

Since the server names have incremental numbers, it would have been nice not to have servers NH-W2K3-SRV01 and NH-W2K3-SRV05 as the standby cluster but unfortunately I had already built the existing CCR environment up to server NH-W2K3-SRV04 and therefore did not want to reinstall the entire environment. In fact, server NH-W2K3-SRV01 used to be an Edge Transport server which is why the combined domain controller, Client Access Server and Hub Transport server is NH-W2K3-SRV02.

There are some other important names to identify:

  • The actual production cluster name is E2K7CLU01.
  • The standby cluster name for the backup datacenter is E2K7CLU02.
  • The CMS name is CCREX01. This is the name that the Outlook clients actually connect to.

You will note that there is only a single domain controller, Hub Transport server and Client Access Server within this setup. In the real world, the backup datacenter would contain additional domain controllers, Hub Transport servers and Client Access servers that would automatically be used by the CCR environment at the backup datacenter. As the focus of this article is about the recovery of the CMS to a new CCR environment using SCR, I shall be using the same domain controller, Hub Transport Server and Client Access Server for both the production and backup CCR environments. This keeps things simple for this article but of course in any real site resilience situation these additional servers should be considered.

One additional thing to note with this article is that all servers are running Windows 2003 and therefore the steps in this article relate to Windows 2003 and not Windows 2008. There are several different steps required if your servers are running on Windows 2008 that will not be included in this article. Maybe that will be the topic of a future article here on msexchange.org as Windows 2008 starts to be deployed.

Standby Cluster Installation

As I have already alluded to within this article, it is important to note the difference between the production CCR environment and the standby cluster in the backup datacenter. The production CCR environment is installed as detailed in Henrik Walther’s article, Installing, Configuring and Testing an Exchange 2007 CCR Based Mailbox Server on MSExchange.org. The standby cluster is installed slightly differently, since it is not designed to run a CMS from the outset. Broadly speaking, the main difference is that instead of installing the Active Clustered Mailbox Role on one cluster node and the Passive Clustered Mailbox Role on the other cluster node as is the case with CCR, both standby cluster nodes will be installed with the Passive Clustered Mailbox Role only. Therefore, in my example network, servers NH-W2K3-SRV01 and NH-W2K3-SRV05 are both configured with the Passive Clustered Mailbox Server role. This selection is made during the Exchange 2007 setup routine as you can see from Figure 1.


Figure 1:
Passive Clustered Mailbox Server Installation

One key consideration with the installation of the standby cluster based on the fact that SCR will be used is that the path for the database and log files must be the same for both the SCR source and SCR target machines. In other words, if the CCR environment is configured to place all database files into E:\Databases, then the location of the databases on the standby cluster nodes will also be set to E:\Databases as and when SCR is enabled.

Activate SCR

Since this is an article on using SCR to achieve a failover between two CCR environments, the first thing to do is to enable SCR for both storage groups on the CMS. This is done using the Enable-StorageGroupCopy cmdlet which has the important –StandbyMachine parameter. Since the SCR target is a standby cluster consisting of two passive nodes, either of these can be specified in the –StandbyMachine parameter and will, ultimately, become the active node running the CMS when it is recovered later. In this article, I am going to choose NH-W2K3-SRV01 as the SCR target server. Also, the Enable-StorageGroupCopy cmdlet has been updated to include the –ReplayLagTime parameter, which is used to specify an amount of time to elapse before the log files that have been replicated to the SCR target are actually replayed into the database. This is useful in various situations such as when logical corruption has occurred with the databases on the production CCR environment, since you have time to ensure that this corruption does not make its way into the databases on the SCR target server. By default, the value for the –ReplayLagTime parameter is 1 day so I am going to override this value in the test environment and configure a value of 0. Therefore, the cmdlets required to enable SCR for both storage groups are as follows:

Enable-StorageGroupCopy “CCREX01\First Storage Group” –StandbyMachine NH-W2K3-SRV01 –ReplayLagTime 0.0:0:0

Enable-StorageGroupCopy “CCREX01\Second Storage Group” –StandbyMachine NH-W2K3-SRV01 –ReplayLagTime 0.0:0:0

The running of these cmdlets is shown in Figure 2.

Image
Figure 2:
Enable-StorageGroupCopy cmdlets

Once the above cmdlets have been executed, a copy of the two storage groups is created on the target machine NH-W2K3-SRV01. In Figure 3 below, you can see the contents of the First Storage Group folder on NH-W2K3-SRV01. Since in my lab environment I have chosen to keep the database and transaction log files in the same folder location, you may notice that there is at least one key file missing from this folder: the actual database file. There is a reason for this which will be explained in part two of this article.


Figure 3:
Contents of the First Storage Group Folder

Summary

In part one of this article we have looked at the initial steps required to enable SCR from a source CCR environment to a target standby cluster. In part two of this article, we will look at the steps required to move the CMS from the production datacenter to the backup datacenter.

If you would like to read the next parts in this article series please go to:

About The Author

Leave a Comment

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Scroll to Top