Windows Disk Timeouts and Exchange Server 2010
A few months Bruce Langworthy wrote an excellent article regarding some new recommendations for setting the Windows Disk Timeout value - http://blogs.msdn.com/b/san/archive/2011/08/15/the-windows-disk-timeout-value-understanding-why-this-should-be-set-to-a-small-value.aspx.
This post got me thinking about Exchange and how we deal with I/O problems. If you haven't read Bruce's article, it explains that the default disk timeout of 60 seconds means that Windows will not report the hung I/O for 60 seconds and won't retry the I/O for 8 minutes. 8 minutes is far too long to wait before retrying a hung IO, so Microsoft is releasing new guidance recommending changing the Windows Disk Timeout setting to a value that aligns with your storage architecture.
The question in my mind for Exchange was simple, how does this disk timeout behavior affect Exchange DAG deployments; more specifically should I reduce the Windows Disk Timeout on my Exchange Servers as per the new recommendations or leave things alone??
To answer this question I approached some of our ESE developers to get their thoughts… this is what came from that discussion…
- The Windows Disk Timeout value is mainly intended for event logging and I/O retry.
- Prior to Exchange Server 2010, Exchange did not take any action for slow I/O other than report it in the event log.
- Exchange Server 2010 RTM introduced pre-emptive page patching (clean page overwrite) for pages affected by slow I/O.
- Exchange Server 2010 SP1 is the first version of Exchange to include intelligence for dealing with hung I/O and will actively fail (bugcheck) the server if the hung I/O is affecting active databases on a DAG node.