If you are familiar with Exchange, you know that when it comes to storage and performance, you need to ensure that Exchange does not have to compete with other disk-intensive applications such as Microsoft SQL. Fortunately, there is an Exchange JetStress tool that can help (which we will get to a bit later). With each version of Exchange, Microsoft releases an updated Exchange calculator where you can plan your storage based on specific criteria. The Exchange calculator is an Excel spreadsheet that allows you to input values as shown below:
To download this calculator, head over to this Microsoft web page.
You will notice it is for Exchange 2019. Each section allows you to enter in a value or select a value. Based on what you select and input, it then populates the tabs at the bottom such as:
- Role requirements
- Volume requirements
- Backup requirements
- Replication requirements
- Mailbox space modeling
- Storage design
- Version changes
Not a one-size-fits-all approach
Each environment will have different requirements. There is not a one-size-fits-all approach. For example, your organization may want to have four copies of the data, which changes your backup requirements and the amount of storage you need. While creating and maintaining the spreadsheet is out of scope for this article, you can read more about it at this link.
Once you have everything populated and you are happy with the design and have purchased your storage, you need to put it to the test. Microsoft provides a JetStress tool that works with Exchange 2013 and 2016. One of the requirements is that you have the correct DLL files placed in the install folder. There has been no mention of an update for Exchange 2019, but you should still be able to validate the storage and its performance. If you have a new storage area network (SAN) dedicated for Exchange, you should be able to run the tool hard. But if you are sharing the SAN with other applications, consider performing the Exchange JetStress after hours so you do not impact other applications. Here is the link to download the JetStress tool.
Exchange JetStress scenarios
Once you have downloaded the file and run the install, it is time to kick off the first simulation. At one customer of mine, they purchased storage for Exchange 2016/2019, and we started the tests in the scenario below:
- 4 databases, 1 copy — 2 threads
The tests ran for an hour at a time and we captured the output in an Excel spreadsheet so we can compare the values. We then moved to the next test:
- 4 databases, 2 copies — 2 threads
The tests again ran for an hour and we captured the output in our spreadsheet. We moved on to the next test:
- 4 databases, 3 copies — 2 threads
The tests again ran for an hour and the output was captured. We were happy with the databases and copies so we started increasing the threads. The next test had four threads:
- 4 database, 3 copies — 4 threads
Over the course of the day, we increased the threads to 10 (still with four databases and three copies), and we crashed a controller on the storage. This was because of a software bug from the vendor. We kept running the tests and, eventually, we crashed the entire storage unit.
The vendor had a new release, so we applied the firmware and started the same tests again. The tests ran for a few days before we had a controller crash on the storage unit. After a back-and-forth with the vendor, they reproduced the issue on their side and brought out a new set of firmware.
The tests went on for months until we reached a point that all the bugs were ironed out, and we were able to run our tests successfully with four databases, three copies using 10 threads. The unit was placed into production. The first Exchange 2016 pair was introduced, and the storage was added. Exchange 2016 storage was configured with a copy on this SAN and the other two split across two other SANs.
The underlying hypervisor on a particular day failed over onto the new storage and we started getting alerts for a database being down. It turned out that if you live-migrate a machine with this storage, it formats the underlying storage. Wait, what? Yep, a major flaw in the hardware. It took us a couple of hours to get the server recovered and back online seeding again.
We were able to reproduce the issue with a test machine and figured out that if you shut down the machine and move it, you don’t experience the loss of data. Back to the drawing board for the vendor.
Time well spent
An emergency patch was released and applied. This brings me to this conclusion: You need to test your storage over and over again and make sure that you are happy with it before it goes into production. Our tests took a year because we wanted to make sure that we had everything covered and did not have storage crashing on us. Hopefully, what I am writing will save you the heartache of losing production servers because you did not test your storage properly.
If you are not happy with the storage, delay your project delivery until everything is working 100% for you. Issues like the storage formatting the disk may happen only in production as the Exchange JetStress does not test failover of virtual machines.
Featured image: Shutterstock