16 Tips to Optimize Exchange 2013 (Part 3)

If you would like to read the other parts in this article series please go to:

9.    Virtualization

Although fully supported with Exchange Server 2013, it seems that Microsoft is discouraging its use, as stated in The Preferred Architecture blog post:

Virtualization adds an additional layer of management and complexity, which introduces additional recovery modes that do not add value, as Exchange provides equivalent functionality out of the box.

Nevertheless, and since virtualization is widely used nowadays, here are a few guidelines and recommendations:

  • Supported hypervisor technology:
  • Use Windows Server 2012 R2 Hyper-V for the best experience, since it’s the most tested hypervisor for Exchange.
  • Size for physical resources, add ~12% CPU overhead for hypervisor. Best way to size for this is using Role Requirements Calculator.
  • Don’t oversubscribe resources. Use reservation options to ensure that Exchange gets the resources it needs.
  • Disable dynamic memory. Configure static memory for all Exchange VMs.
  • Host-based failover clustering and migration technologies are supported, but must result in cold boot or use technology similar to Hyper-V Live Migration for online migrations.
  • Snapshots are NOT supported.
  • Maximum of 2:1 virtual to physical CPU ratio, 1:1 recommended.
  • Must use block level storage, because Exchange 2013 doesn’t support the use of network attached storage (NAS) volumes, other than in a SMB 3.0 scenario.
  • VHDs must be fixed size. Differencing/delta disks are not supported.
  • Storage used by Exchange should be hosted in dedicated disk spindles. This storage can be virtual storage of a fixed size (for example, fixed VHDs in a Hyper-V environment), SCSI pass-through storage, or Internet SCSI (iSCSI) storage.
  • Hyper-V replica is not supported.
  • Exchange server virtual machines (including DAG nodes) can be combined with host-based failover clustering and migration technology as long as the virtual machines don’t save and restore state on disk when moved or taken offline. All failover activity must result in a cold start when the virtual machine is activated on the target node. All planned migration must either result in shut down and a cold start or an online migration that utilizes a technology such as Hyper-V live migration.
  • Jetstress testing in guests is supported on supported Windows hypervisors or ESX 4.1 (or newer)
  • Placing multiple mailbox DB copies or other Exchange dependencies on the same infrastructure impacts availability.
  • Important technical documentation: Exchange 2013 Virtualization.
  • For more detailed and specific guidelines for the 2 most used virtualization technologies, follow the recommendations on these whitepapers:

10. Performance and Scalability

I don’t mind repeating myself when I say that Exchange Server keeps getting better and better. For instance, regarding storage needs, Exchange 2013 shows an improvement of up to 50% in IOPS, when compared to its previous version. And let us not forget all the self-diagnosis and healing magic included in the product.

Nevertheless, and since there isn’t such thing as a one size fits all, sizing, tweaking and tuning some of the more critical elements, such as the hypervisor or storage system can have a significant impact on performance.

These are just a few hints and recommendations that can help you:

  • Sizing is absolutely critical. Don’t take shortcuts, don’t guess your users’ needs, don’t try to save money by cutting on the hardware (lately, that will probably cost you more), don’t follow unsupported configurations, don’t trust any tool available on the internet (except, of course, the tremendous Exchange 2013 Server Role Requirements Calculator), don’t underestimate future growth. For more guidance, there is a great MEC 2014 session available online Plan it the right way – Exchange Server 2013 sizing scenarios.
  • If sizing is critical, validation is crucial. Sizing is a theoretical exercise, please validate your environment with tools like JetStress before putting servers into Production.
  • Turn off hyper-threading, because Exchange Server will not take advantage of it. Actually, there can be a significant impact to memory utilization on Exchange servers when hyper-threading is enabled due to the way the .NET server garbage collector allocates heaps.
  • Install KB 2803754 on Windows Server 2008 R2 or KB 2803755 on Windows Server 2012 (Windows Server 2012 R2 doesn’t require it). Applying this hotfix reduces memory consumption in each store worker and decreases CPU spent in .NET garbage collector.
  • Network: Much of the network interface subsystem is tuned automatically. Server-based network adapters are capable of detecting the type and level of traffic passing through the network interface, and they self-tune to reflect this information. Ensure that the latest device drivers are maintained on the server.
    Enable RSS, since it helps to scale CPU utilization, particularly on 10GbE ports, by running the following command: netsh interface tcp set global rss=enabled
  • Use fixed size pagefile of size of RAM + 10MB, capped at 32778MB
  • Storage is one of the most common causes of performance bottlenecks. Please make sure you follow the guidelines available in Exchange 2013 storage configuration options. Here is a handful of tips:
    • On Mailbox servers (including multi-role), configure DAS storage controllers for 100% write cache. On CAS roles consider 25% write/75% read.
    • Partition Allocation Unit size for Mailbox servers – When formatting the volumes that will host Exchange databases, it is recommended the use of a NTFS allocation unit size of 64KB. The recommendation of 64KB is based on the performance improvements seen with large sequential read operations, such as streaming backup and some eseutil tasks.
    • Disk defragmentation is supported, but not required nor recommended for Mailbox servers.
    • The Data De-Duplication available in Windows Server 2012 is not supported
  • Active Directory can also represent a cause of Exchange performance issues. For Exchange 2013, use a ratio of 1 Active Directory global catalog processor core for every 8 Mailbox role processor cores handling active load, assuming 64-bit global catalog servers.
  • Multi role servers: Actually multi role is not only supported, but encouraged by Microsoft. Start with the most simple (and cheaper) solution: “commodity” 2U multi-role servers with JBOD storage. Just make sure that your processor utilization isn’t over 39% (this is just a safe number, since CPU utilization for a single role should not exceed 80%).

11. High-Availability and Monitoring

Like it or not, Exchange Server 2013 changed the way monitoring and operations were done. And that’s because the Exchange product team decided to incorporate in Exchange Server the kind of automated mechanisms necessary to operate and maintain a mission critical infrastructure, like the one that powers Office 365.

Exchange 2013 server roles now include a new monitoring and high availability feature known as Managed Availability. Managed Availability includes three main asynchronous components that are constantly doing work:

  • Probe Engine: Responsible for taking measurements on the server and collecting the data; results of those measurements flow into the monitor.
  • Monitor: Contains business logic used by the system to determine whether something is healthy, based on the data that is collected and the patterns that emerge from all collected measurements.
  • Responder Engine: Responsible for recovery actions. When something is unhealthy, the first action is to attempt to recover that component via multi-stage recovery actions that can include:
    • Restarting an application pool
    • Restarting a service
    • Restarting a server; and
    • Removing a server from service

Image
Figure 22: Managed Availability in Exchange Server 2013

Here are some Managed Availability tips:

  • In Exchange 2013 SP1, the default threshold for the low volume space monitor is 200 GB. In Exchange 2013 Cumulative Update 6 and later, the default threshold is 180 GB. In SP1 and later, you can configure the threshold by adding the following DWORD registry value (in MB) on each Mailbox server that you want to customize: HKEY_LOCAL_MACHINE\Software\Microsoft\ExchangeServer\v15\Replay\ParametersSpaceMonitorLowSpaceThresholdInMB
  • Use the Get-ServerHealth and Get-HealthReport PowerShell cmdlets to quickly create a status report for a particular server or a group of servers.
  • Recovery actions, results and related events are logged and can be viewed using Windows Event Viewer > Applications and Services Logs > Microsoft > Exchange > Managed Availability.
  • To control the amount of logging consumed by Managed Availability in non-production environments, edit Microsoft.Exchange.Diagnostics.Service.exe.config and add the following to the AppSetting section:
    <add key=”MaxDailyPerformancelogDirectorySize” value=”1024″/>
  • For more information about overrides, please read Customizing Managed Availability.

Managed Availability has taken the responsibility of most of the monitoring procedures that used to be handled by System Center Operations Manager (SCOM). Although the SCOM management pack no longer provides the same kind of functionality, Managed Availability now provides a more filtered set of data that can be used to build and report the overall health of the Exchange servers. SCOM is still a crucial monitoring element, because if recovery actions are unsuccessful, Managed Availability escalates the issue to a human through event log notifications that can then be picked up by the SCOM agent, triggering an alert.

There are many tools available to monitor Exchange Server, ranging from free tools and scripts, to more advanced solutions, but the one I still like to recommend is System Center Operations Manager 2012 with the Exchange Server 2013 Management Pack.

There’s no doubt that proactively monitoring can have a significant impact on overall availability, but there are also other improvements built in Exchange 2013 regarding high-availability that are worth mentioning:

  • Reduction in IOPS over Exchange 2010
  • New Managed Store written in C#
  • Support for multiple databases per disk
  • AutoReseed
  • Automatic recovery from storage failures
  • Lagged copy enhancements
  • Single copy alert enhancements
  • DAG network auto-configuration

But even with all these investments in the technology, high-availability is still strongly dependent of a lot of external and human factors, so here are a couple of tips:

  • Agree on an SLA for the Exchange infrastructure. Based on the availability target, carefully plan the necessary architecture. A good place to start is reading Planning for high availability and site resilience.
  • If possible, consider following the Preferred Architecture guidelines.
  • Build DAGs with Windows Server 2012 or later, since it includes some technical enhancements (Dynamic Quorum for example).
  • Enable loose truncation in Exchange 2013 SP1. Loose truncation is a new feature designed to ensure that disks holding the transaction logs waiting to be replayed into database copies don’t run out of space. To enable loose truncation, follow the procedures described in Managing mailbox database copies.
  • Install the hotfix provided in KB 2779069 on Windows Server 2012 to help you determine which cluster node is blocking a GUM update.
  • Use the CollectOverMetrics.ps1 and the CollectReplicationMetrics.ps1 scripts, included in the Scripts folder of an Exchange 2013 installation, to gather useful information about the DAG environment.

12. Advanced Troubleshooting

Your life wouldn’t be so interesting if you didn’t have to go through all the mind boggling process of troubleshooting a failed Exchange system, right? Even the best planned and well managed Exchange organization will face a challenging problem every now and then. Exchange 2013 can even BSOD under the following conditions:

  • Failure to schedule threads (Hung Threads)
  • Excessive I/O latency
  • Replay service working set
  • Managed Availability (Force Reboot)

Fortunately there are some great tools that come to the rescue of the intrepid IT Administrator, such as:

  • Office 365 Outlook Connectivity Guided Walkthrough – Whether you use Exchange Online, on-premises, or some combination of both, you will inevitably have an issue with Outlook performance, connectivity, profile corruption, or some other unknown Outlook disease before retirement. To assist you with these issues, there is the Office 365 Outlook Connectivity Guided Walk Through.
  • Microsoft Remote Connectivity Analyzer Tool – The Remote Connectivity Analyzer website enables IT Administrators to pinpoint connectivity issues by simulating connectivity from a location outside the customer environment. There is also a companion tool, Microsoft Connectivity Analyzer, which can be run locally and lets IT administrators run the same tests within the user’s environment.
  • Exchange Server Forums – The forum provides a place to discuss Exchange with users and Exchange Team members.
  • Office 365 Forums – The forum provides a place to discuss Office 365 issues.
  • Guided walkthroughs for Exchange, Lync, SharePoint, and Office 365 – This article lists the guided walkthroughs that are available for Microsoft Exchange Server, Lync Server, SharePoint Server, and Office 365 (including Exchange Online, Lync Online, and SharePoint Online).
  • System Center Operations Manager – Using the Exchange Server 2013 Management Pack for troubleshooting
  • Sysinternals Process Monitor – can become quite handy to check the several Exchange process and unexpected I/O in particular.
  • Windows Performance Monitor – this is actually the mother (and father) of all tools, crucial to detect CPU, memory and storage problems. There are many, many dedicated Exchange counters, to get a list run the following PowerShell cmdlet: Get-Counter -ListSet *msexchange* | Select-Object -ExpandProperty Counter
    There is one particular counter which is usually a good indicator of any user related performance problems: RPC Average Latency
Performance Counter Value
\MSExchangeIS Store(*)\RPC Average Latency < 100ms
\MSExchangeIS Client Type(*)\RPC Average Latency < 100ms
\MSExchangeIS Store(*)\RPC Operations/sec
\MSExchangeIS Client Type(*)\RPC Operations/sec

Table 4: RPC Average Latency counter

Summary

This concludes part 3 of this article. In the next and last part we’ll cover the “better-together” story, migration to the latest version of Exchange, publishing Exchange services securely and we’ll finish with a bunch of loose tips.

If you would like to read the other parts in this article series please go to:

Leave a Comment

Your email address will not be published.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Scroll to Top