Deploying Exchange 2007 Multi-site CCR Clusters – Do’s and Don’ts (part 3)

If you would like to be notified of when Henrik Walther releases the next part in this article series please sign up to our MSExchange.org Real-Time Article Update newsletter.

 

If you would like to read the other parts of this article series please go to:

 

 

 

Introduction

 

In part 2 of this four part article series, we took a look at the stretched Active Directory site strategies as well as recommended Network Latency and Heartbeat timeout values.

 

In this part 3, we will continue where we left off in part 2. We will look at transport dumpster strategies as well as placement and configuration of the file-share witness.

 

Transport Dumpster Strategy

 

If disaster strikes, the primary datacenter will cause all servers to be unavailable and the messages to be held in the transport dumpster of Hub Transport servers in the primary datacenter. Obviously, they cannot be re-submitted to the CMS after (lossy) failover to the passive cluster node in the backup datacenter. Depending on how many log files were lost for each storage group during the failover, data loss will occur. The result is potential end-user complaints. Note though, that it is possible (if at all) to move an HT server’s queue to an HT server in the backup datacenter and have the content of the queue re-submitted. Timing is all important here. In order to have any messages held in the transport dumpster re-submitted, the queues must be moved before the cluster submits the transport dumpster flush. Otherwise it will not be possible to get the messages in the transport dumpster re-submitted as they will be lost in this case.

 

For more information on how to move message queues to another HT server, take a look at the following link.

 

Placement of the File Share Witness

 

A CCR based cluster uses the Node majority with File Share Witness (FSW) quorum model, which basically means that although a CCR cluster only has two cluster nodes, there is a third one referred to as the file share witness. This is typically an HT server in the same AD site as the CCR cluster nodes. In this type of quorum model, two nodes are not enough to sustain a failure of any cluster node. To sustain failure of any node in a Node majority with File Share Witness (FSW) quorum based cluster, there must be a least three devices that can be considered as available. The FSW acts as the third available device in a two-node Node majority with File Share Witness (FSW) quorum cluster, which means that this type of cluster can sustain the failure of a single cluster node. In addition, the FSW protects against the cluster “split brain” syndrome and a problem known as a “partition in time”. Basically what this means is that the FSW must be available when a failover occurs to the passive node in the backup datacenter, otherwise you cannot bring the CMS online before you have created a new FSW on an HT server in the backup datacenter as well as re-configured the Windows Failover cluster to point to this HT server. This complicates the failover process significantly. If possible, another solution would be to place the FSW in a third datacenter. This does not mean that it is a must to deploy an additional Exchange 2007 HT server in that third datacenter. Using a Hub Transport server as the FSW is just a best practice recommendation when you deploy both CCR nodes in the same datacenter. For instance, you could use a file server in that third datacenter if you like.

 

You may have heard that using a CNAME record to point to the FSW is a good idea since this will make it a much simpler process to point a CCR cluster to another FSW. All that needs to be done is to have the FSW folder share pre-created with appropriate permissions and then update the CNAME record after the disaster has hit the primary datacenter. But even though Microsoft used to support this method, the guidance was revised (read more in this blog post). Today, the recommended method for re-provisioning a FSW share on another server is to use the Cluster service’s built-in “force quorum” capabilities.

 

So when the FSW is located in the primary datacenter and a disaster takes down all servers in the primary datacenter, the failover to the passive node in the backup datacenter will not happen automatically. This is because two votes must be available when a failover occurs in a MNS based cluster. In order to bring the cluster resources online on the passive node, first create a new FSW share on a HT server (if you have not pre-created it that is) in the backup datacenter and follow these instructions, then open a command prompt on the cluster node in the backup datacenter and type:

 

NET START CLUSSVC /forcequorum

 

This will force the cluster service to start on this cluster node. Now, open the Failover Cluster console and select the cluster name in the left pane. Then click More Actions in the Action pane and choose Configure Cluster Quorum Settings in the context menu as shown in Figure 1.

 

Note:
In Windows 2003 the /forcequorum switch was a maintenance mode switch. Basically we told Windows to get the cluster service started. With Windows Server 2008 this is no longer so. When using the /forcequorum switch with Windows Server 2008 Failover Clusters, you are telling the cluster that the configuration on this node is now the master. This, in turn, means that when the other cluster node comes back up and converges with the cluster, it will replicate the configuration information back from this node. This is an important change!

 


Figure 1: Selecting Configure Cluster Quorum Settings

 

Click Next.

 


Figure 2: Configure Cluster Quorum Wizard’s welcome page

 

Select Node and File Share Majority (for clusters with special configurations) as shown in Figure 3 and click Next.

 


Figure 3: Choosing the right quorum model

 

Click the Browse button and enter the name of the HT server on which you created the FSW share. Click Show Shared Folders (Figure 4) and select the FSW share. Click OK.

 


Figure 4: Selecting the new FSW share

 

Click Next twice then Finish.

 


Figure 5: Cluster Resources online on cluster node in the backup datacenter

 

When the primary datacenter is up again, create a new FSW share on the old HT server that had the FSW share. Now follow the above steps again in order to point the cluster to this FSW share.

 

When all involved servers are online in the primary datacenter, you can now move the CMS back to the cluster node in the primary datacenter.

 

Note:
Since Exchange 2007 RTM’d, I have received many questions in regards to the FSW. One of them was whether it is possible to use FSW in combination with DFS, so that you could have multiple servers acting as FSWs. Although it’s an interesting idea, this is unsupported territory.

 

We have reached the end of part 3, but fear not part 4 will be released soon. Stay tuned!

 

 

If you would like to be notified of when Henrik Walther releases the next part in this article series please sign up to our MSExchange.org Real-Time Article Update newsletter.

 

If you would like to read the other parts of this article series please go to:

 

 

About The Author

Leave a Comment

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Scroll to Top