Before we start to discuss the new Loose Truncation feature, let us have a quick high-level overview of what transaction logs are, what log truncation is and how it works in Exchange 2013 pre-Service Pack 1 (which is pretty much similar to Exchange 2010).
Database transaction logs record all changes to an Exchange database. These logs currently have a fixed size of exactly 1MB. When a transaction log is full, the transaction log is renamed with a numeric sequence number and a new current log is generated.
Over time, these log files accumulate and use all the available disk space if they are not periodically removed from the hard disk. Exchange automatically removes unnecessary log files by using one of the following methods:
- If circular logging is enabled, Exchange removes transaction logs soon after they have been written to the database file. This is slightly more complex in a Database Availability Group [DAG] scenario where Continuous Replication Circular Logging is used instead of “basic” circular logging, but that is not the point for this article;
- If circular logging is disabled, Exchange removes excess logs after a full or incremental backup.
This is known as Log Truncation, the process whereby unwanted transaction logs are deleted. But how is this exactly performed? For simplicity reasons, let us assume a DAG scenario without lagged database copies. The truncation behavior is determined by the replay lag time and truncation lag time settings for database copies. The following criteria must be met for a database copy’s log file to be truncated:
- The log file must have been successfully backed up, or circular logging must be enabled;
- The log file must be below the checkpoint (the minimum log file required for recovery) for the database;
- All other copies must have replayed the log file.
This means that each database copy keeps logs that need to be shipped to other database copies until all copies of a database confirm they have replayed those log files. However, log truncation does not occur on an active mailbox database copy when one or more passive copies are suspended or offline. When this happens, the other database copies stop log truncation so that they can assist in the re-sync process of the failed copy, thus preventing the failed copy from needing a full reseed to return to a healthy state when it comes back online.
This means that when performing maintenance activities that take an extended period of time (for example, several days), administrators are likely to experience considerable log file buildup. As such, in order to prevent the log drive from filling up with transaction logs, it is recommended to remove the affected passive database copy instead of simply suspending it. An alternative is to temporarily enable circular logging, but this is only recommended when there are 3 or more database copies.
This non-truncation behavior is fine as long as the database volumes have a considerable amount of free disk space, allowing them to build up transaction logs for a while until the failed database copy(ies) comes back online and without causing any issues. However, for environments with limited amount of free space, this behavior can cause a few problems. To help overcome this scenario, Loose Truncation was introduced.
Loose Truncation in SP1
Loose Truncation is a new feature that was introduced in Exchange 2013 Service Pack 1. Its purpose is to prevent possible disk space issues that can occur in environments with DAGs when one or more copies of a database is offline for an extended period of time, the scenario we previously discussed. When enabled, loose truncation changes the “normal” truncation behavior. Each database copy tracks its own free disk space and starts to truncate transaction log files independently if the available disk space falls behind a set threshold configurable by the administrator.
For the active copy, the oldest straggler (the passive database copy that is farthest behind in log replay) is ignored and truncation only looks at the oldest remaining passive copies. The active database copy is where global truncation is calculated, and all the passive copies will attempt to respect the truncation decision made on the active copy. Despite the implication of the name MinCopiesToProtect (discussed in the next section), Exchange will only ignore the oldest known straggler at the time truncation is run.
For a passive copy, if space gets low, it will independently truncate its log files using the configured parameters also described in the next section.
If you get to a position where loose truncation was triggered and did its job, once the failed or suspended passive node comes back online, it will be in a FailedAndSuspended state. In this scenario, if AutoReseed is configured, the affected copy will be automatically reseeded (for more information please refer to the Exchange 2013 Automatic Reseed article on MSExchange.org). If not, then the database copy will need to be manually seeded.
The required number of healthy copies, the free disk space threshold, and the number of logs to keep are all configurable parameters. By default, the free disk space threshold is 200GB, and the number of logs to keep is 100,000 (100GB) for passive copies, and 10,000 (10GB) for active copies.
Enabling Loose Truncation
To enable and configure loose truncation (which is disabled by default), we need to edit the Windows Registry on each DAG member. My guess is that as this was a late addition to SP1, the Exchange product team did not have enough time to add and adequately test a new PowerShell cmdlet for this feature. However, I would not be surprised if this gets introduced in the next Cumulative Update.
The three settings I mentioned earlier have their own registry values that can be configured, all of them stored under HKLM\Software\Microsoft\ExchangeServer\v15\BackupInformation. The BackupInformation key and all the following DWORD values do not exist by default so they need to be manually created:
- LooseTruncation_MinCopiesToProtect – this key represents the number of passive copies to protect from loose truncation on the active copy of a database. It is also used to enable loose truncation, so setting its value to 0 disables loose truncation;
- LooseTruncation_MinDiskFreeSpaceThresholdInMB – this key sets the threshold of available disk space (in MB) for triggering loose truncation. If free disk space falls below this value, loose truncation is triggered. If this registry value is not configured, the default value used by loose truncation is 200GB;
- LooseTruncation_MinLogsToProtect – this key stipulates the minimum number of log files to retain on healthy copies whose logs are being truncated. If this registry value is configured, then the configured value applies to both active and passive copies. If it is not configured, then default values of 100,000 for passive database copies and 10,000 for active database copies are used.
Please be aware that the behavior of LooseTruncation_MinLogsToProtect (if used) is different for active and passive database copies. For the active database copy, this specifies the number of extra logs that are retained preceding those that are required by the protected passive copies and the required range of the active copy.
On passive database copies, this specifies the number of logs maintained from the latest available log. One tenth of this number is also used to maintain logs prior to the required range of this passive copy. The two limits are in place to ensure that lagged database copies do not take up too much space, since their required range is typically very large.
In the following screenshot we can see these settings configured:
Loose truncation gives Exchange administrators an alternative to circular logging during extended maintenance or failover scenarios where one or more database copies are kept offline or in a filed state. In these situations, administrators traditionally either enable circular logging in order to prevent logs from building up, or remove the offline/failed database copy(ies) from the DAG.
However, it is important to keep backups in mind. Some environments backup passive database copies while others backup active database copies. In both scenarios, it is possible that backup jobs will fail until the environment, or the particular database, is brought back into a “normal” operating state.