Planning, Deploying, and Testing an Exchange 2010 Site-Resilient Solution sized for a Medium Organization (Part 11)

If you would like to read the other parts of this article series please go to:

Planning, Deploying, and Testing an Exchange 2010 Site-Resilient Solution sized for a Medium Organization (Part 1)
Planning, Deploying, and Testing an Exchange 2010 Site-Resilient Solution sized for a Medium Organization (Part 2)
Planning, Deploying, and Testing an Exchange 2010 Site-Resilient Solution sized for a Medium Organization (Part 3)
Planning, Deploying, and Testing an Exchange 2010 Site-Resilient Solution sized for a Medium Organization (Part 4)
Planning, Deploying, and Testing an Exchange 2010 Site-Resilient Solution sized for a Medium Organization (Part 5)
Planning, Deploying, and Testing an Exchange 2010 Site-Resilient Solution sized for a Medium Organization (Part 6)
Planning, Deploying, and Testing an Exchange 2010 Site-Resilient Solution sized for a Medium Organization (Part 7)
Planning, Deploying, and Testing an Exchange 2010 Site-Resilient Solution sized for a Medium Organization (Part 8)
Planning, Deploying, and Testing an Exchange 2010 Site-Resilient Solution sized for a Medium Organization (Part 9)
Planning, Deploying, and Testing an Exchange 2010 Site-Resilient Solution sized for a Medium Organization (Part 10)
Planning, Deploying, and Testing an Exchange 2010 Site-Resilient Solution sized for a Medium Organization (Part 12)
Planning, Deploying, and Testing an Exchange 2010 Site-Resilient Solution sized for a Medium Organization (Part 13)

Introduction

In part 10 of this multi-part article, we simulated a server level failure. That is we failed EX01 so that a failover occurred on both the database and client access array level. We then had a look at how this affected the three most popular Exchange clients – Outlook 2007/2010, Outlook Web App (OWA) and Exchange ActiveSync devices.

In this part 11, we’ll simulate a Client Access server failure and see how the clients are affected by the failover initiated by this type of failover. Then we will initiate a database switchover of all databases on EX01 to a server in the failover datacenter and see how this affects the Exchange clients.

Important:
The client access array level failovers described in this article may differ from the results you see during your own testing as this depends heavily on the load balancer solution used in the respective Exchange 2010 environment. As mentioned earlier in this multi-part article, I use a load balancer solution based on Load Master Devices from KEMP Technologies in each datacenter. The one in the primary datacenter is a physical Load Master device and the one in the secondary datacenter is using two Hyper-V based virtual Load Masters.

Simulating a Client Access Server Failure

Okay let’s simulate a failure a Client Access server level failure on server “EX01” in the primary datacenter. As can be seen in Figure 1 mailbox database 1 thorugh 6 are currently active on this server.

Figure 1: Mailbox Database 1 through 6 active on EX01

When looking at the statistics page on our load balancer solution in the primary datacenter, we have several Outlook MAPI, Outlook Anywhere, Outlook Web App and Exchange Activesync client connections to the CAS array (Figure 2).

Figure 2: Current connections to the CAS array in the primary datacenter

The connections have been load balanced across EX01 (192.168.2.221) and EX03 (192.168.2.222) as shown in Figure 3.

Figure 3: Client Connections to CAS Array is Load Balanced across EX01 and EX03

Bear in mind that even though a user mailbox is located in a database that’s active on let’s say EX01, it doesn’t mean that the client opening this mailbox will make an RPC or SSL connection to that server. It will pick any server (of course based on persistence method used) in the CAS array configured for the primary datacenter.

We can verify this using several methods. In this article, I’ll show you how to verify this using the “About” page in OWA. In OWA we can click on the question mark in the upper right corner and then “About” in the dropdown menu (Figure 4).

Figure 4: About option in Outlook Web App

The “About” page shows all kinds of useful information such as information about which Exchange Client Access server in the CAS array that OWA is connected to. It also shows the name of the mailbox server that holds the active copy of the database in which the mailbox is stored. Figure 5 shows the “About” page for a user that has an OWA session against EX01 and also have his mailbox in a database that currently is active on EX01.

Figure 5: User connected to OWA via EX01 and Mailbox stored in database currently active on EX01

Alrighty, it’s time to fail the CAS role related services on server “EX01”. This can be accomplished using several different methods. An easy way to do so is to stop the “Default Web Site” in the IIS Manager (Figure 6) as well as stoping the “Microsot Exchange Address Book” and “Microsoft Exchange RPC Client Access” services (Figure 7).

Figure 6: Stopping the Default Web Site in the IIS Manager

Figure 7: Stopping the Address Book and RPC Client Access Services

If switching over to the load balancer solution in the primary datacenter, we can see that although the virtual serrvices (except one) are up, there’s now only one real server (target server) for each virtual service available which is EX03.

Note:
Some of you might wonder why one of the virtual services are down? Well, this is because the load balancer has reverse SSL (SSL bridging) enabled. When using reverse SSL, it’s necessary to create a back-end virtual service for each real server (target server) so that the load balancer can inspect the content of the HTTPS packets.

Figure 8: Current status of the virtual services on the load balancer after EX01 has been turned off

Client Behaviour

So how do the top three Exchange client types (Outlook 2007/2010, Outlook Web App and Exchange ActiveSync) behave when the databases are activated on a DAG member server in the failover datacenter?

Outlook:
If an Outlook MAPI client has established a connection to EX01 in the CAS array, Outlook will disconnect and prompt the end user to enter his password when failing over and establishing a new session to EX03 in the CAS array. Outlook Anywhere clients will not be prompted for credentials when failed over to EX03. Said in another way, the end user will not notice a failover to another CAS server in the CAS array.

Figure 9: End user using an Outlook MAPI client prompted for password after a failover on the client access level

Figure 10: Outlook Anywhere Clients stays connected

Outlook Web Access:

End users with OWA sessions against EX01 will loose their existing SSL session (unlike Outlook Anywhere) and a new session needs to be established against EX03. This will result in the user being taken back to the FBA logon page as shown in Figure 11.

Figure 11: End user is taken back to the FBA logon page after a CAS array level failover

Exchange ActiveSync devices

Users with Exchange ActiveSync devices will not notice a CAS level failover in the primary datacenter no matter if they are connected to the CAS array via server “EX01” or “EX03”.

Simulating a Multiple Database Copies Failure

It’s time to take a look at how a failure of the disks storing the databases on server EX01 and EX03 which are both located in the primary datacenter.

As you should know by now database 1 through 6 are active on server “EX01” and database 7 through 12 are active on server “EX03”. Both servers are located in the primary datacenter.

Figure 12: Databases active on EX01 and EX03

To force the databases to activate on a server in the failover datacenter, we’ll use the same approach as we did back in part 9 where we had the database disk in server “EX01” fail by taking it offline using the Disk Management tool in the Server Manager.

Figure 13: Taking the Database Disk Offline

When the database disk is offline in both EX01 and EX03, its no longer visible in Windows Explorer and because of this Exchange 2010 or more specifically the Active Manager will initiate a database failover to EX02 in the failover datacenter as the database copies on this server have an activation preference set to “3”. Remember though that it isn’t only the activation preference the active manager will look at, but also the state of the content index, copy queue and replay queue length of any available passive database copies. This means that one or more of databases potentially could be activated on EX04.

Okay in this example, the state of the database copies on EX02 are all fine, and as can be seen in Figure 14 all databases are now active on EX02 and the copy status for database copies on EX01 and EX03 are now as expected “Failed and Suspended” as the disk holding the databases are gone.

Figure 14: Databases have now been activated on EX02 and database copies on EX01 and EX03 are failed and suspended

Note:
Let’s say you have 33 Active Databases on both EX01 and EX03. In this case you would probably wouldn’t want to have all 66 databases activated on EX02 but rather have 33 activated on EX02 and 33 activated on EX04. This can be accomplished by configuring a limit for how many databases can be activated on a DAG member server. To set the limit you can use “Set-MailboxServer –identity “EX02” –MaximumActiveDatabases 33”. Be careful when setting restrictions like this one. It could result in dismounted databases after having lost the database disks in EX01 and EX03 and loses these disks in EX04.

Client Behaviour

So how do the top three Exchange client types (Outlook 2007/2010, Outlook Web App and Exchange ActiveSync) behave when the databases are activated a DAG member server in the failover datacenter?

Outlook

Outlook MAPI clients will stay connected no matter if they are connected to the CAS array in the primary datacenter via EX01 or EX03. This is because the current RPC connections to the CAS array isn’t affected by a database level failover to the failover datacenter.

Figure 15: Outlook MAPI Clients stays connected

Outlook Anywhere (RPC over HTTP) clients connect using mail.exchangeonline.dk as the RPC Proxy endpoint and also have this same FQDN specified in “msstd” box (remember we used the Set-OutlookProvider cmdlet in a previous part of this multi-part article). If we had Outlook 2003 clients, they would continue to connect to the original RPC Proxy Endpoint (mail.exchangeonline.dk). This is because they do not support autodiscover. Outlook 2007 clients will receive new connection information (EWS URL and RPC proxy endpoint) but will ignore the RPC proxy endpoint URL (failover.exchangeonline.dk) it receives. Since the RPC Client Access service on EX01 and EX03 in the primary datacenter is still available, Outlook 2007 clients will be able to connect. Outlook 2010 will accept the new connection settings and connect to failover.exchangeonline.dk.

Figure 16: Failover.exchangeonline.dk URLs from autodiscover after database failover to failover datacenter

Figure 17: Failover.exchangeonline.dk URLs from autodiscover after database failover to failover datacenter

Since the “RpcClientAccessServer” attribute on the databases doesn’t change when a database *over occurs to another datacenter with another CAS array, you will also not see the Outlook client change the RPC endpoint.

Figure 18: RpcClientAccessServer attribute not updated after database failover to failover datacenter

Figure 19: Outlook will connect to FQDN of CAS array in Primary Datacenter after database failover to failover datacenter

Be aware that the WAN traffic between the two datacenters will increase significantly because of the cross-site RPC traffic between CAS and Mailbox servers.

Read more about these the architectural changes in another multi-part article I wrote here on MSExchange.org.

Outlook Web App (OWA)

The external URL configured for OWA is mail.exchangeonline.dk. When a database failover to the failover datacenter has occurred, the CAS servers (EX01 and EX03) in the CAS array configured for the primary datacenter will tell the OWA client that it went to the wrong AD site and tell the user to instead go to failover.exchangelabs.dk.

When the database failover has completed and the OWA session is refreshed, the end user will be presented with the information shown in Figure 20.

Figure 20: Database failover to other datacenter initiates a redirect

When clicking “Connect”, the browser will redirect to the FQDN (failover.exchangeonline.dk) of the failover datacenter as shown in Figure 21.

Figure 21: User must enter his crendentials again after a database failover to the failover datacenter

When the user enter his credentials and click “Sign in”, the user will connect to his mailbox again.

Figure 22: Access to mailbox via OWA restored

Exchange ActiveSync devices

The external URL configured for EAS is mail.exchangeonline.dk. When a database level failover to the failover datacenter has occurred, mobile devices will get an HTTP 451 from the CAS servers in the CAS array in the primary datacenter, and be told to instead use failover.exchangeonline.dk. So as long as the mobile device supports an HTTP 451 (redirect) which is the case for most of the newer devices, the user will be able to synchronize his mailbox with a mobile device after a database level failover to the failover datacenter.

Now that we have simulated a database level failure to the failover datacenter, let’s bring the disk in EX01 and EX03 back online and then update the databases copies by right-clicking on each database copy on EX01 and EX03 followed by selecting “Update Database”. If you have many databases in the environment, I recommend you instead use the Update-MailboxDatabaseCopy cmdlet.

When the database copies have been updated, you can redistribute the active mailbox databases across EX01 and EX03 using the RedistributeActiveDatabases script, I showed you back in part 7 of this multi-part article.

Figure 23: Redistributing Active Mailbox Databases across EX01 and EX03

As you can see from above a database level failover to a DAG member server in the failover datacenter is fully automatic and almost invisible to end users. We did not go through a database level switchover in this article but it has the same affect on the end user clients as a failover.

We have now reached the end of part 11. See you in a near future.

If you would like to read the other parts of this article series please go to:

Planning, Deploying, and Testing an Exchange 2010 Site-Resilient Solution sized for a Medium Organization (Part 11)

Introduction

Simulating a Client Access Server Failure

Client Behaviour

Outlook Web Access:

Exchange ActiveSync devices

Simulating a Multiple Database Copies Failure

Client Behaviour

Outlook

Outlook Web App (OWA)

Exchange ActiveSync devices

About The Author

Henrik Walther

Leave a Comment Cancel Reply

Introduction

Simulating a Client Access Server Failure

Client Behaviour

Outlook Web Access:

Exchange ActiveSync devices

Simulating a Multiple Database Copies Failure

Client Behaviour

Outlook

Outlook Web App (OWA)

Exchange ActiveSync devices

About The Author

Henrik Walther

Read Next

Leave a Comment Cancel Reply