Need to recover failed member of a database availability group? Read this guide

Sponsored by Stellar Data Recovery

Database availability group, or DAG, is a core constituent of the mailbox server “high availability & resilience” framework in Exchange Server, which was first introduced in Exchange 2010. A DAG can comprise a group or cluster of up to 16 mailbox servers (members), each hosting copies of one or more databases from other members in the DAG. For example, Exchange Server 2016 Standard Edition can host up to five database copies, whereas the Enterprise Edition can host up to 100 database copies. The purpose of a DAG is to maintain availability of services in situations that involve failure of database, server, or network.

This is what a basic DAG setup looks like, comprising at least two member mailbox servers and a file share witness for high availability deployment of Exchange.

Basic DAG schematic comprising two mailbox servers and a witness server

Each of the DAG members in the setup can host active and passive copies of the database to support the switchover/failover mechanism, based on the underlying Windows Failover Cluster.

High availability mechanism of DAG – An overview

Exchange Server DAG relies on Active Manager, a component that runs inside the Microsoft Exchange Replication service (MSExchangeRepl.exe) on all mailbox servers in the DAG, to manage high-availability through automatic switchovers and failovers. Active Manager underpins the mechanism that decides the active and passive copies of a database in the DAG, and it also determines the copy of the database to be mounted in case of a failure.

All members of the DAG are engaged in a continuous replication process to update their passive database copies by replaying the transaction log data shipped from the DAG member hosting the corresponding active copy. The following image illustrates the replication of mailbox databases in a DAG.

Replication of mailbox database copies in DAG

In the case of a member server failure, the concept of quorum, used in Windows failover clusters, determines whether a DAG will remain online or will go offline. Quorum uses a “consensus of voters,” i.e., a shared view of members and resources to determine the state of DAG at any given point in time.  There are two quorum models, based on the number of members in a DAG:

  1. Node and File Share Majority: This model comes into the picture when there is an even number of members in the DAG. In this case, the witness server is included in the quorum to determine a majority and prevent split-brain scenarios.

For example, if one member fails in a four-member DAG, three-quarters of the members are still online; thus, a majority quorum is maintained, allowing the DAG to function. But if two members fail, two other members are still online, but without a majority. The file share witness serves as a tie-breaker in this situation to establish a 3/5 majority, allowing the DAG to remain online.

  1. Node Majority: This model is applicable for a DAG that has an odd number of members. In Node Majority model, the file server witness is not required to be included in the quorum voting process. For example, if one member fails in a three-member DAG, two members are still online, establishing a clear majority for the DAG to stay online. However, if two members fail, then only one member is online, resulting in loss of majority and the DAG going offline.

What happens when a DAG goes offline?

If the DAG goes offline, all of the databases in the DAG are dismounted, thus causing a downtime that would require manual intervention to restore DAG operations. So, it is crucial to recover the failed member “as early as possible” to restore the high availability of Exchange mailbox databases and also avoid the DAG from going offline, which is possible with the loss of quorum in case of subsequent failures.

How to recover a failed DAG member?

This section provides step-wise instructions for recovering a failed DAG member server, based on the Microsoft documentation. However, we will begin with a few preparatory steps, necessary for the success of this procedure, which includes the following:

Step 1: Remove the Database Copies from the Failed Member

This step involves removal of the database copies from the failed member server of the DAG. There are two methods to remove a mailbox database copy from the server, which are based on using Exchange Admin Center (EAC) or Exchange Management Shell.

Prerequisites:

Before removing the last passive copy of a database, ensure that Continuous Replication Server Logging (CRCL) is first disabled for that database.

Method 1: Using the Exchange Admin Center (EAC)

  1. Type https://<ServerFQDN>/ecp in your web browser to access the EAC
  2. Next, go to Servers > Databases
  3. Select the mailbox database whose copy you want to remove.
  4. Locate the passive copy of the database in the Details pane, and click Remove.
  5. Click ‘Yes’ on the warning dialog box to confirm the removal process.
  6. Manually delete any database and transaction log files from the server.

Method 2: Using the Exchange Management Shell

You can also use the Remove-MailboxDatabaseCopy cmdlet to remove a passive copy of a mailbox database. This cmdlet is available only in on-premises Exchange. Here’s an example that illustrates the syntax of Remove-MailboxDatabaseCopy cmdlet.

Example:

PowerShell
Remove-MailboxDatabaseCopy -Identity DB5\Exchange2019SRV1

The above command removes a copy of mailbox database DB5 from the mailbox server Exchange2019SRV1

Step 2: Remove the Failed Server from the DAG Configuration

You can remove a mailbox server from a database availability group by using the Remove-DatabaseAvailabilityGroupServer cmdlet. This cmdlet is available only in on-premises Exchange. The following example illustrates the syntax of the cmdlet:

Example:

PowerShell
Remove-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer Exchange2019SRV1

The above command removes the mailbox server Exchange2019SRV1 from the DAG DAG1.

How to Remove an Offline Server Member from the DAG?

To remove an offline DAG member that cannot be brought online, you would need to add the

ConfigurationOnly switch to the above command, as follows:

Example:

PowerShell
Remove-DatabaseAvailabilityGroupServer -Identity DAG3 -MailboxServer Exchange2019SRV1 –ConfigurationOnly

The above command removes the configuration settings for the offline mailbox server Exchange2019SRV1 from the active directory of the DAG DAG2.

In the case of an offline mailbox server, you would also need to manually evict the node from the cluster using the Remove-ClusterNode cmdlet, as follows:

Example:

PowerShell
PS C:\> Remove-ClusterNode -Name node9

The above command removes the node named node9 from the cluster.

PowerShell
PS C:\> Remove-ClusterNode -Name node9 –Force

The above command removes the node named node9 from the cluster without any confirmation prompt.

Step 3: Reset the Server’s Computer Account in the Active Directory

You can reset a computer account in AD by using Windows interface or command line, provided you meet the following prerequisites:

Prerequisites:

You must be a member of the Account Operators group, Domain Admins group, or Enterprise Admins group in Active Directory Domain Services (AD DS), or you must have been delegated the appropriate authority.

Note: The following methods apply to Windows Server 2008, Windows Server 2008 R2, and Windows Server 2012

Method 1: Using Windows Interface

  1. Click Start, click Control Panel, and then double-click Administrative Tools.
    • To access Active Directory Users and Computers in Windows Server 2012, click Start, type msc.
  2. Double-click Active Directory Users and Computers.
  3. In the console tree, click Computers.
  4. Right-click the computer in the details pane, and then click Reset Account.

Method 2: Using command line

  1. Click Start, click Run, type cmd and then click OK to open a command prompt
  2. Type the following command

dsmod computer <ComputerDN> -reset

  1. Press ENTER

Note: <ComputerDN> specifies the name of the computer that you want to reset

Step 4: Prepare the new server and join it to the AD

This step involves installation of a new Windows Server using the same computer name and preferably the same IP address as the failed server. After preparing the server, join it to the Active Directory domain and verify that all the drives are attached with the same drive letters that previously existed.

The configuration of the new server should match the failed server and the other DAG members in terms of network and storage. For example, it should have the same Service Packs and patches, and use the same hardware configuration, etc.

Step 5: Perform a Recovery Install of Exchange Server

The next step is to install Exchange Server for recovery. First, you would need to verify the exact build of Exchange to install. Use the Get-ExchangeServer cmdlet from another working member in the DAG to find out the build number.

Running the Get-ExchangeServer cmdlet without any parameters will return the traits of all the servers in the Exchange organization.

PowerShell
Get-ExchangeServer | Format-List

The above command returns a summary of all the Exchange servers in the organization.

You can refer to Microsoft documentation for Exchange build numbers. Next, follow these steps to do a recovery installation of the failed Exchange Server.

  1. Steps to Install Exchange in the Default Location:
  2. Open File Explorer on the target server.
  3. Right-click on the Exchange ISO image file and mount it.
    • Take note of the assigned virtual DVD drive letter.
  4. Press Windows + ‘R’ to open the Run box. Type cmd.exe, and click OK.
  5. Use the following syntax to install Exchange on the default location.
Console
<Virtual DVD drive letter>:\Setup.exe /IAcceptExchangeServerLicenseTerms /Mode:RecoverServer

Example:

PowerShell
E:\Setup.exe /IAcceptExchangeServerLicenseTerms /Mode:RecoverServer

The above command uses the Exchange installation files on virtual drive E: to install Exchange in the default location (%ProgramFiles%\Microsoft\Exchange Server\V15)

B) Steps to Install Exchange in Another Location:

To install Exchange at a location other than the default location, you would need to specify the location of the Exchange program files by using /TargetDir: switch, as per the following syntax:

Console
<Virtual DVD drive letter>:\Setup.exe /IAcceptExchangeServerLicenseTerms /Mode:RecoverServer [/TargetDir:<Path>]

Example:

PowerShell
E:\Setup.exe /IAcceptExchangeServerLicenseTerms /Mode:RecoverServer /TargetDir:"D:\Program Files\ Exchage-2019-SRV-1"

In case if you don’t know the install location, follow these steps to determine it:

  1. Press Windows + 'R'. Type exe, and click OK.
  2. Type msc and press Enter to start ADSI Edit configuration tool.
  3. Navigate to the following location:

CN=ExServerName,CN=Servers,CN=First Administrative Group,CN=Administrative Groups,CN=ExOrg Name,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=DomainName,CN=Com

  1. Right-click the Exchange server object, and then click Properties.
  2. Locate the msExchInstallPath attribute. This attribute stores the current installation path.
  3. Click Cancel and close ADSIEDIT.

After completing the recovery installation of Exchange, next, add the recovered server to the DAG.

Step 6: Add the Recovered Server to the DAG & Reconfigure Mailbox Database Copies

Use the Add-DatabaseAvailabilityGroupServer cmdlet to add a mailbox server to a DAG. This cmdlet is available only in on-premises Exchange. The following example illustrates the syntax of the cmdlet:

Example:

PowerShell
Add-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer Exchange2019SRV1

This command adds the mailbox server Exchange2019SRV1 to the DAG DAG1.

After adding the server to the DAG, the next step is to reconfigure the mailbox database copies (i.e., replicas) in the recovered mailbox server.

You can reconfigure the mailbox database copies by using the Add-MailboxDatabaseCopy cmdlet.

Example:

PowerShell
Add-MailboxDatabaseCopy -Identity DB01 -MailboxServer Exchange2019SRV1

This command adds a copy of the database DB01 to the DAG member server Exchange2019SRV1.

If case you need to configure database copies that had replay lag or truncation lag, you can use the ReplayLagTime and TruncationLagTime parameters to reconfigure those settings, as follows:

PowerShell
Add-MailboxDatabaseCopy -Identity DB02 -MailboxServer Exchange2019SRV1 -ReplayLagTime 4.00:00:00
Add-MailboxDatabaseCopy -Identity DB03 -MailboxServer Exchange2019SRV1 -ReplayLagTime 4.00:00:00 -TruncationLagTime 4.00:00:00

The reconfigured database copy will automatically start copying the database and log files from the active copy and start the replication process. This seeding duration will depend upon the database size and the network bandwidth.

Step 7: Verify the Recovery of the DAG Member

How do you know that you have recovered the DAG member successfully? Run the Test-ReplicationHealth and Get-MailboxDatabaseCopyStatus cmdlets to verify the health and status of the recovered DAG member.

Example:

PowerShell
Test-ReplicationHealth –Identity Exchange2019SRV1

This command tests the health of replication for the mailbox server Exchange2019SRV1

Example:

PowerShell
Get-MailboxDatabaseCopyStatus -Identity DB1 | Format-List

This command returns the status of all copies of the database DB1.

Example:

PowerShell
Get-MailboxDatabaseCopyStatus -Server Exchange2019SRV1 | Format-List

This command returns the status of all database copies on the mailbox server Exchange2019SRV1
Example:

PowerShell
Get-MailboxDatabaseCopyStatus -Identity DB1\Exchange2019SRV1 | Format-List

This command returns the status of the copy of database DB1 on the mailbox server Exchange2019SRV1

These cmdlets should fetch a healthy status of the mailbox replication process and database copies. However, sometimes the Get-MailboxDatabaseCopyStatus cmdlet may return a status, indicating the content index in failed state (typically faced with Exchange Server 2013). You can resolve this issue by reseeding the content index catalog from another DAG member by using the Update-MailboxDatabaseCopy cmdlet, as follows:

PowerShell
Update-MailboxDatabaseCopy -Identity DB1\ Exchange2019SRV1 -CatalogOnly

Further, you may sometime come across a replication issue after configuring the mailbox database copies, which could happen due to hardware failure, network problem, or log file corruption.

In such cases, you can use an Exchange database recovery software to extract the database copy from the active server and import it in the recovered server to facilitate the database seeding process. Stellar Repair for Exchange is a widely used Exchange recovery tool that is recommended by Microsoft MVPs and IT administrators; the tool can export the EDB file from one server directly to another server, keeping the mailbox data intact.

Ending notes

DAG is a widely used setup to enable high availability and resilience between sites for ensuring business continuity. However, it does not mean that the databases are guaranteed to remain in a healthy state. Databases can often end up corrupted and may not mount because of various reasons such as dirty shutdown due to failed updates, malware, ransomware, sudden power outage, or human error. In any case, it is recommended to have an updated backup of the database and a third-party Exchange recovery software to be able to effectively deal with database corruption error and recover database without downtime. And this stands true for database availability groups as well!

Featured image: Shutterstock

Bharat Bhushan

Bharat Bhushan is an experienced technical Marketer working at Stellar Data Recovery - expertise in data care. He is skilled in Microsoft Exchange Database, MSSQL Database troubleshooting & data warehousing. He is a Management Post Graduate having a strong grip in Technology & certified in SAP-SD, Oracle 10g & Informatica Powercenter 9.1.

Share
Published by
Bharat Bhushan

Recent Posts

Qumulo raises $125M for cloud data management across a hybrid setup

Qumulo is an up-and-coming data management solution focusing on managing files in a hybrid setup.…

2 days ago

Why SMBs need a standalone solution for Windows 10 patch management

Is patch management for the Windows PCs at your business driving you crazy? Maybe there's…

2 days ago

Microsoft Teams guest access: How to enable and manage it

Two of the main factors that affect the total cost of an organization’s Microsoft 365…

3 days ago

Samsung Galaxy Unpacked 2020: Everything you need to know

Samsung rolled out the all-new Galaxy Z Fold 2, Note 20, Note 20 Ultra handsets…

3 days ago

SAN vs. NAS: Detailed comparison of these two storage technologies

SAN and NAS provide dedicated storage for a group of users using completely different approaches…

3 days ago

Generation 1 virtual machines: Modernize them and bring them up to date

In many companies, Generation 1 virtual machines have been superseded by Gen 2 VMs. But…

4 days ago