What’s New for Exchange 2013 Storage?
By: Brien M. Posey
Many of Exchange Server 2013’s most noteworthy improvements are behind the scenes architectural improvements rather than new product features. Perhaps nowhere is this more true than Exchange Server’s storage architecture. Once again Microsoft invested heavily in Exchange’s storage subsystem in an effort to drive down the overall storage costs while at the same time improving performance and reliability. This article outlines some of the most significant storage related improvements in Exchange Server 2013.
Lower IOPS on Passive Database Copies
In failure situations, failover from an Active mailbox database to a passive database copy needs to happen as quickly as possible. In Exchange Server 2010, Microsoft expedited the failover process by maintaining a low checkpoint depth (5 MB) on the passive database copy. Microsoft’s reason for doing this was that failing over from an Active to a passive database copy required the database cache to be flushed. Having a large checkpoint depth would have increased the amount of time that it took to flush the cache, thereby causing the failover process to take longer to complete.
The problem was that maintaining a low checkpoint depth came at a cost. The server hosting the passive database copy had to do a lot of work in terms of pre-read operations in order to keep pace with demand while still maintaining a minimal checkpoint depth. The end result was that a passive database copy produced nearly the same level of IOPS as its active counterpart.
In Exchange Server 2013, Microsoft made a simple decision that greatly reduced IOPS for passive database copies, while also reducing the database failover time. Because much of the disk I/O activity on the passive database copy was related to maintaining a low checkpoint depth and because the checkpoint depth had a direct impact on the failover time, Microsoft realized that the best way to improve performance was to change the way that the caching process worked.
In Exchange 2013, the cache is no longer flushed during a failover. Instead, the cache is treated as a persistent object. Because the cache no longer has to be flushed, the size of the cache has little bearing on the amount of time that it takes to perform the failover. As such, Microsoft designed Exchange 2013 to have a much larger checkpoint depth (100 MB). Having a larger checkpoint depth means that the passive database doesn’t have to work as hard to pre-read data, which drives down the IOPS on the passive database copy by about half. Furthermore failovers normally occur in about 20 seconds.
Although the idea of driving down IOPS for passive database copies might sound somewhat appealing, some might question the benefit. After all, passive database copies are not actively being used, so driving down the IOPS should theoretically have no impact on the end user experience.
One of the reasons why reducing the IOPS produced by passive database copies is so important has to do with another architectural change that Microsoft has made in Exchange Server 2013. Unlike previous versions of Exchange Server, Exchange Server 2013 allows active and passive database copies to be stored together on the same volume.
If an organization does choose to use a single volume to store a mixture of active and passive databases then reducing the IOPS produced by passive database will have a direct impact on the performance of active databases.
This new architecture also makes it easier to recover from disk failures within a reasonable amount of time. Exchange Server 2013 supports using volume sizes of up to 8 TB. With that in mind, imagine what would happen if a disk failed to needed to be reseeded. Assuming that the majority of the space on the volume was being used, it would normally take a very long time to regenerate the contents of the failed disk.
Part of the reason for this has to do with the sheer volume of data that must be copied, but there is more to it than that. Passive database copies are normally reseeded from an active database copy. If all of the active database copies reside on a common volume than that volumes performance will be the limiting factor affecting the amount of time that it takes to rebuild the failed disk.
In Exchange Server 2013 however, volumes can contain a mixture of active and passive database copies. This means that the active database copies of likely reside on different volumes (typically on different servers). This means that the data that is necessary for rebuilding the failed volume will be pulled from a variety of sources. As such, the data source is no longer the limiting factor in the amount of time that it takes to reseed the disk. Assuming that the disk that is being reseeded can keep pace, the reseeding process can occur much more quickly than it would be able to if all of the data were coming from a single source.
In addition, Exchange Server 2013 periodically performs an integrity check of passive database copies. If any of the database copies are found to have a status of FailedAndSuspended. If such a database copy is found then Exchange will check to see if any spare disks are available. If a valid spare is found then Exchange Server will automatically remap the spare and initiate an automatic seating process.
As you can see, Microsoft has made a tremendous number of improvements with the way that Exchange Server manages storage in DAG environments. Passive database copies generate fewer IOPS, and failovers happen more quickly than ever before. Furthermore, Exchange Server can even use spare disks to quickly recover from certain types of disk failures.