Transport High Availability in Exchange 2013 (Part 3)
If you would like to read the other parts in this article series please go to:
Safety Net is the enhanced version of Transport Dumpster. Introduced in Exchange 2007, this feature provided redundant copies of e-mails in case of a lossy failover in a Cluster Continuous Replication (CCR) or Local Continuous Replication (LCR) environments by having Hub Transport servers located in same AD site automatically redelivering those e-mails that they were holding in their transport dumpster queue. Transport Dumpster in Exchange 2010 provided the same level of protection, but for e-mails that had not been yet replicated to the passive database copies in a DAG. If a failure required an out-of-date database copy to be activated, e-mails would be automatically resubmitted from the Transport Dumpster to the new active copy.
In Exchange 2013, this functionality has been improved and is now called Safety Net.
Obviously there are a few similarities and differences between Transport Dumpster in Exchange 2010 and the new Safety Net:
|Safety Net remains a queue associated with a Mailbox server’s Transport service that keeps copies of e-mails processed successfully by the server.||DAGs are not required for Safety Net. If a Mailbox server is not a DAG member, Safety Net will store a copy of the delivered e-mails in another Mailbox server in the same AD site.|
|It is possible to specify how long copies of successfully processed e-mails should be kept by Safety Net before they expire and are deleted (2 days by default).||Safety Net is not a single point of failure anymore. Primary Safety Net and Shadow Safety Net provide resiliency for this feature - if the Primary Safety Net becomes unavailable for over 12 hours, e-mails are redelivered from the Shadow Safety Net.|
|Safety Net works in conjunction with Shadow Redundancy in a DAG environment. Shadow redundancy is not required to store another copy of the delivered e-mail in a shadow queue while it waits for the delivered e-mail to replicate to a database’s passive copies. The copy of the delivered e-mail is already kept in Safety Net so, if required, it can be resubmitted from Safety Net.|
|Transport high availability is no longer a best effort and Exchange 2013 tries to ensure e-mail redundancy. For this reason, it is no longer possible to specify a maximum size limit for Safety Net, only for how long should e-mails be kept before automatic deletion.|
|Safety Net also applies to Public Folders.|
How Safety Net Works
While Shadow Redundancy preserves a redundant copy of the e-mail while this is in transit, Safety Net preserves a redundant copy of the e-mail after this is processed successfully. Basically, safety net begins where shadow redundancy ends. Safety net uses the same concepts of boundary of transport high availability, primary e-mails, primary servers, shadow e-mails and shadow servers.
Figure 3.1: Exchange 2013 Transport High Availability
The Primary Safety Net, seen in the picture above, is located on the server that was holding the primary e-mail before it was processed successfully by the Transport service. This does not necessarily mean the destination Mailbox server, as the e-mail could have come through a Mailbox server in an AD site configured as a hub site. After the primary e-mail is processed by the primary server, it is moved to the Primary Safety Net on the same server from the active queue.
Shadow Safety Net, also seen in the picture above, is located on the server that was holding the shadow e-mail. When the shadow server determines the e-mail was processed successfully, it moves the shadow e-mail to the Shadow Safety Net on the same server from the shadow queue.
As Safety Net and Shadow Redundancy are very much interlinked, Shadow Redundancy needs to be enabled for Shadow Safety Net to work, which it is by default.
The following Set-TransportConfig parameters are used by Safety Net:
- ShadowRedundancyEnabled enables ($True) or disables ($False) Shadow Redundancy for all transport servers. Remember that Shadow Redundancy needs to be enabled for a redundant Safety Net;
- SafetyNetHoldTime specifies how long (2 days by default) successfully processed e-mails are kept in the Primary Safety Net and how long acknowledged shadow e-mails are stored in Shadow Safety Net. You can also set this value using the EAC by navigating to more options in the Receive connectors pane. Shadow e-mails that are not acknowledged expire from Shadow Safety Net after SafetyNetHoldTime + MessageExpirationTimeout. When using lagged database copies, in order to prevent data loss during Safety Net resubmits, SafetyNetHoldTime has to be the same or greater than ReplayLagTime on Set-MailboxDatabaseCopy.
The MessageExpirationTimeout parameter on Set-TransportService specifies how long an e-mail remains in a queue before expiring (2 days by default).
Please note that when running Get-TransportConfig we can still see the MaxDumpsterSizePerDatabase and MaxDumpsterTime parameters:
Figure 3.2: Legacy Dumpster Parameters
However, both these parameters are only used by Exchange 2010 and not 2013. MaxDumpsterSizePerDatabase has no replacement in Exchange 2013 while MaxDumpsterTime is replaced by the SafetyNetHoldTime parameter as already discussed.
E-mail Resubmission from Safety Net
Active Manager is the component responsible for initiating e-mail resubmission from Safety Net and no manual action is required. E-mails are resubmitted from Safety Net in two basic scenarios:
- After a failover (either automatic or manual) of a database in a DAG;
- After a lagged database copy is activated.
What is different between these two scenarios is simply the amount of time Safety Net goes back to resubmit e-mails. While in a DAG failover the new active database copy is only typically minutes to a few hours behind the old active copy, activating a lagged copy requires Safety Net to go back several days.
As we saw in the previous section, in order for Safety Net to be successful in resubmitting e-mails for a lagged copy, it is important that e-mails are stored in Safety Net for at least the same time of the lag time of the lagged copy. Therefore, SafetyNetHoldTime has to be greater or equal to ReplayLagTime for the lagged copy.
E-mail Resubmission from Shadow Safety Net
Resubmission of e-mails from Shadow Safety Net are also completely automated and manual intervention is not required.
When Active Manager requests e-mails to be resubmitted from Safety Net for a specific period in time, that request is sent to the Mailbox servers where Primary Safety Net is keeping the e-mail copies for the required time period. In vast deployments, it is common that the required e-mails are kept in Safety Net on several Mailbox servers, especially for a large requested time period.
Resubmitting e-mails from Safety Net is likely to produce a large number of duplicate e-mails. This is overcome by duplicate e-mail detection which prevents internal users from receiving duplicate e-mails on their mailboxes. However, this only works for e-mails within the Exchange organization. When these are sent to external recipients, duplicate e-mail detection does not work. To cater for this scenario, resubmission of e-mails from Safety Net is optimized to reduce duplicate e-mail delivery.
If, for some reason, Active Manager is unable to communicate with the Primary Safety Net, it will continue to try contacting it for 12 hours. After this 12 hour period, a broadcast to all Mailbox servers within the boundary of transport high availability is sent looking for other Safety Nets that contain e-mails for the target database for the required time period. A Shadow Safety Net will then respond and resubmit the e-mails matching the criteria.
It is worth exploring a particular scenario to show the intelligence of Safety Net. Let us assume the following:
- The queue database holding the Primary Safety Net got corrupted and a new one was created at 15:00. As such, all of primary e-mails kept in the Primary Safety Net from 10:00 to 15:00 are lost, but the server is able to keep copies of successfully delivered e-mails in Safety Net starting from 15:00;
- Active Manager requests e-mails to be resubmitted from Safety Net for a particular database from 10:00 to 17:00;
- Primary Safety Net resubmits all the e-mails it has for the requested time period, which due to problems with its queue database, is only from 15:00 to 17:00;
- Primary Safety Net then sends a broadcast to all Mailbox servers within the boundary of transport high availability searching for further Safety Nets that contain e-mails for the target database for the time period 10:00 to 15:00 for which the Primary Safety Net has no e-mails. A second resubmit request is generated on behalf of the Primary Safety Net by Shadow Safety Net so shadow e-mails for the target database for the time interval 10:00 to 15:00 are resubmitted.
In this third part of this article series we had a look at Safety Net, the new version of Transport Dumpster. Next we will explore how to achieve high availability for inbound and outbound e-mail flow.
If you would like to read the other parts in this article series please go to: