Designing a Site Resilient Exchange 2010 Solution (Part 2)

If you would like to read the other parts in this article series please go to:

Introduction

In part 1 of this article series, we had a look at how the high availability and site resilience story looked in Exchange 2007. From there I moved on and described the first Exchange 2010 site resilient scenario which was an active/passive datacenter model where the same namespace was used in both datacenters. I also explained how a database *over and a complete site failover affects Outlook Anywhere client versions, Exchange ActiveSync (EAS), and Outlook Web Access (OWA).

In this part 2, I’ll describe scenario 2, which is an active/passive datacenter model with a different namespace for each datacenter.

Credits:
I would like to give a special thanks to Greg Taylor and Ross Smith IV, both Senior Program Managers on the Exchange Customer Experience team in the Exchange Product group at Microsoft. Both have provided me with invaluable information around Client Access Server (CAS) and Database Availability Group (DAG) high availability and site resiliency over the last couple of years. Without their help this multi-part article wouldn’t exist. Thanks guys!

Scenario 2: Active/Passive model with different namespaces

So as I mentioned earlier, the datacenter model we’re going to delve into in this, part two of my article series, is just like the one I talked about in part 1 – an active/passive model but instead we will use a different namespace for each datacenter. An active/passive model with a different namespace for each datacenter makes it a little more complex to design a resilient Exchange 2010 solution consisting of two datacenters compared to the single-namespace model described in part 1.

The model is depicted in the following illustration.

Figure 1: Scenario 2: Active/Passive model

When using an active/passive model with different namespaces, you only have active users (active database copies) in one datacenter and use a unique namespace (for instance mail.domain.com & failover.domain.com) in each datacenter. It’s important to note that seen from the user distribution model, the failover datacenter is passive but from the namespace perspective both datacenters are active.

Unless you have LAN quality communication between the two datacenters, it’s recommended to use a separate AD site for each datacenter instead of spanning a single AD site. To avoid unnecessary broadcast traffic etc. between the datacenters, it’s also recommended to use different subnets for each. We don’t have LAN quality communication between the two datacenters in this specific scenario so we have an AD site in each datacenter.

Client Access Server Infrastructure

The scenario depicted in Figure 1 includes 2 CAS servers in each datacenter. A redundant hardware load balancer and a RPC Client Access array have been deployed in each datacenter and are used to distribute client traffic evenly between the CAS servers.

As you also can see in the figure, the same SAN certificate is installed on all four CAS servers. The SAN certificate holds the following FQDNs:

Mail.exchangelabs.dk (certificate principal name)
Failover.exchangelabs.dk
Autodiscover.exchangelabs.dk

Note:
Using the same certificate on all CAS servers keeps the certificate costs down, but there’s also another benefit of using the same SAN certificate versus using a different certificate (with a unique certificate principal name) for each datacenter. I’ll talk about this a little later.

All internal and external web service URLs (OWA, ECP, EWS, OAB) on the CAS servers in the primary datacenter points to mail.exchangelabs.dk which resolves to the virtual IP address (VIP) of the load balancer in this datacenter. The internal and external web service URLs on the CAS servers in the failover datacenter points to failover.exchangelabs.dk, which again resolves to the VIP of the load balancer in this datacenter.

Since we typically only have active users connecting to the primary datacenter (at least the majority of the users connect to this datacenter unless a site failover occurs), the autodiscover record (autodiscover.exchangelabs.dk) in external DNS points to the load balancer in the primary datacenter. The internal “AutoDiscoverServiceInternalUri” on the CAS servers in the primary datacenter has been configured with a value of https://mail.exchangelabs.dk/autodiscover/autodiscover.xml, and the same goes for the CAS servers in the failover datacenter. Now you could choose to point the AutoDiscoverInternalUri on CAS servers in the failover datacenter at “https://failover.exchangelabs.dk/autodiscover/autodiscover.xml” but you can easily up on in a situation where SCP’s aren’t reachable during a site failover. Also, cross-site traffic caused by Autodiscover have a minor impact on the WAN link since autodiscover requests consists of small XML based text files.

The RPC Client Access arrays have been configured with different values. This is because you only can have one RPC Client Access array per AD site and when you have multiple RPC CA arrays they cannot be configured with the same FQDN. So an RPC CA array has been created for each datacenter named outlook-1.exchangelabs.dk and outlook-2.exchangelabs.dk respectively. The FQDN of each RPC CA array is only used by internal Outlook and don’t need to be included on the SAN list in the certificate as RPC traffic doesn’t use or require a certificate for it to be encrypted.

As I mentioned earlier, we use the same SAN certificate on all CAS servers, which means that the certificate principal name (mail.exchangelabs.dk) is the same for all CAS servers. By default this will break Outlook connectivity when either a database *over or a site failover occurs. This is because when Outlook Anywhere is enabled for a CAS server, the FQDN (Outlook Proxy Endpoint) specified will be used as the value for the “msstd” as well (Figure 2). If the “msstd” value doesn’t match the certificate principal name, Outlook Anywhere clients will not be able to connect.

Figure 2: Default Exchange Proxy Settings

Then why not use a different certificate for each datacenter where the certificate principal name in datacenter 1 is mail.exchangelabs.dk and failover.exchangelabs.dk in datacenter 2? Wouldn’t this fix the Outlook connectivity problem that occurs during cross-site *overs and site failovers? Nope unfortunately it won’t, well at least not for all Outlook client versions.

However as long as you use the approach described in this article, Outlook clients will connect just fine. What we do here is we use the same certificate in both datacenter. In addition, we configure the “msstd” value to be the same in both datacenters. In order to do so, we can use the “Set-OutlookProvider” cmdlet with the “CertPrincipalName” parameter, so that the “msstd” value is configured identically for CAS servers in both datacenter. In this example, we would use the following command:

Set-OutlookProvider EXPR –CertPrincipalName msstd:mail.exchangelabs.dk

The Outlook Provider setting is global so it only needs to be run once in the organization.

Note:
Some of you might wonder why the RPC CA array isn’t named the same as the external namespace mail.domain.com so that complexity is reduced a little. There’s a very good reason why it isn’t. You see if you use the same FQDN, Outlook Anywhere clients will experience approximately a 30 second delay every time they connect since Outlook by default tries to connect using TCP/IP before trying HTTP. Said in another don’t name the RPC CA array the same as the external namespace or something else that’s resolvable in external DNS.

Hub Transport Infrastructure

The scenario depicted in Figure 1 includes 2 Hub Transport (HT) servers in each datacenter. Traffic coming from external SMTP servers and from internal LOB applications goes to the hardware load balancer which distributes it evenly among the HT servers. Inbound messages only go to the primary datacenter. In case of a site failover to the failover datacenter, inbound mail flow is routed to the failover datacenter.

There’s often a lot of confusion around whether it’s supported to load balance SMTP traffic going to Exchange 2007/2010 HUB servers. And the answer is in the details. It’s supported to load balance HT servers using a HB or WNLB, but it isn’t supported to load balance connections between HT servers on your internal corporate production network using HLB or WNLB. You may only load balance inbound SMTP connections from applications (such as LOB application, MOSS, and SCOM 2007 etc.) and other non-Exchange sources as well as client connections. Steps on how to do this can be found in this previous article of mine.

Database Availability Group Design

The scenario depicted in Figure 1 includes 2 Mailbox servers in each datacenter. Only a single DAG is used and the DAG is stretched between the datacenters. We only have active database copies in the primary datacenter and each database has three copies.

Because we have an even number of DAG members, the witness server is configured in the primary datacenter. The reason for this decision is because if the network fails between the two datacenters, the mailbox databases will still stay mounted in the primary datacenter since it has majority (2 DAG members plus a witness = 3 votes versus two DAG members = 2 votes in the failover datacenter).

The “RpcClientAccessServer” value for all mailbox databases has been configured to: outlook-1.exchangelabs.dk.

Database switch & fail-overs

When an active database copy is moved from one DAG member server in the primary datacenter to another DAG member server in the failover datacenter, the “RpcClientAccessServer” property configured on the mailbox database isn’t changed. In addition, as long as the CAS servers are available in the primary datacenter, clients will continue to connect to the CAS servers in this datacenter and the CAS servers will then connect directly to the DAG member servers in the failover datacenter hosting the active database copies using RPC.

How does this affect the miscellaneous Exchange clients?

Outlook – Outlook Anywhere connects using mail.exchangelabs.dk as the RPC Proxy Endpoint and also have this FQDN specified in msstd box. Outlook 2003 clients will continue to connect to the original RPC Proxy Endpoint (mail.exchangelabs.dk), since they don’t support the autodiscover service. Outlook 2007 clients will receive new connection information (EWS URLs and RPC proxy endpoint) but will ignore the RPC proxy endpoint URL (failover.exchangelabs.dk) it receives, but since the CAS servers in DC1 are available, Outlook 2007 clients will still be able to connect. Outlook 2010 will accept the new connection settings and connect to failover.exchangelabs.dk. Also be aware that the WAN traffic between the two datacenters will increase significantly because of the cross-site RPC traffic between CAS and Mailbox servers.
Mobile devices (EAS) – The external URL configured for EAS is mail.exchangelabs.dk. When an *over has occurred, mobile devices will get an HTTP 451 from the CAS servers in the primary datacenter, and be told to use failover.exchangelabs.dk instead. So as long as the mobile device supports an HTTP 451 (redirect), things will be fine.
OWA – The external URL configured for OWA is mail.exchangelabs.dk. When an *over has occurred, the CAS server will tell the OWA client that it went to the wrong site and tell the user to go to failover.exchangelabs.dk. So things are fine for OWA clients as well.

As you can understand the way an *over affects Exchange clients in this scenario is acceptable although each Outlook client version behaves differently to an *over. By the way the Outlook 2007 issue described above will be fixed via an Outlook update in the near future.

Complete Site failover

If the primary datacenter is destroyed or for some other reason is unavailable and a site failure to the failover datacenter is performed by repointing DNS to this datacenter, how will this affect Exchange clients?

Outlook – Outlook Anywhere connects using mail.exchangelabs.dk as the RPC Proxy Endpoint and also have this FQDN specified in “msstd” box. Outlook 2003/2007/2010 will connect just fine when DNS has been repointed to the other datacenter since the certificate principal name matches the msstd value. The FQDN of RPC Client Access array changes but that doesn’t affect Outlook clients as long as the RpcClientAccessServer property on the mailbox databases isn’t changed which it normally shouldn’t be during a site failover in this scenario (this will create other problems, you don’t won’t to deal with during a site failover).
Mobile devices (EAS) – The external URL configured for EAS is mail.exchangelabs.dk. Mobile devices will connect just fine when DNS has been repointed to the other datacenter since failover.exchangelabs.dk is included on the SAN list of the certificate. Have in mind though that some old devices expect the principal certificate name to match the external URL specified for Exchange ActiveSync.
OWA – The external URL configured for OWA is mail.exchangelabs.dk. OWA clients will connect just fine when DNS has been repointed to the other datacenter since failover.exchangelabs.dk is included on the SAN list of the certificate.

One important thing to have in mind during a complete site failover is DNS delays. DNS updates can take from minutes to several hours depending on the topology and DNS TTL values specified for DNS records used by Exchange. To reduce the delays, it’s important you configure internal and external DNS records used for Exchange with a low TTL value (five minutes is a good best practice).

Okay scenario 2 has now been covered and as you can see this scenario is quite a good despite the different behavior among the Outlook client versions. Scenario 2 is recommended over scenario 1 which we covered in the last article.

We reached the end of part 2, but you can look forward to part 3 which covers a scenario with active users in both datacenters.

If you would like to read the other parts in this article series please go to:

Designing a Site Resilient Exchange 2010 Solution (Part 2)

Introduction

Scenario 2: Active/Passive model with different namespaces

Client Access Server Infrastructure

Hub Transport Infrastructure

Database Availability Group Design

Database switch & fail-overs

Complete Site failover

About The Author

Henrik Walther

Leave a Comment Cancel Reply

Introduction

Scenario 2: Active/Passive model with different namespaces

Client Access Server Infrastructure

Hub Transport Infrastructure

Database Availability Group Design

Database switch & fail-overs

Complete Site failover

About The Author

Henrik Walther

Read Next

Leave a Comment Cancel Reply