The Pros and Cons of Using Database Availability Groups
Guest Post By: Brien M. Posey
Database Availability Groups (DAGs) are Microsoft’s go to solution for providing high availability for Exchange 2010 (and Exchange 2013) mailbox servers. Even so, it is critically important for administrators to consider whether or not a DAG is the most appropriate high availability solution for their organization.
The primary advantage offered by DAGs is that of high availability for mailbox servers within an Exchange Server organization. DAGs make use of failover clustering. As such, the failure of a DAG member results in any active mailbox databases failing over to another DAG member.
At first this behavior likely seems ideal, but depending on an organization’s needs DAGs can leave a lot to be desired. One of the first considerations that administrators must take into account is the fact that DAGs only provide high availability for mailbox databases. This means that administrators must find other ways to protect the other Exchange Server roles and any existing public folder databases. Incidentally, Exchange Server 2013 adds high availability for public folders through DAGs, but DAGs cannot be used to protect any additional Exchange Server components.
In spite of the limitations that were just mentioned, DAGs have historically proven to be an acceptable high availability solution for medium sized organizations. While it is true that DAGs fail to protect the individual server roles, Exchange stores all of its configuration information in the Active Directory, which means that entire servers can be rebuilt by following these steps:
- Reset the Active Directory account for the failed server (reset the account, do not delete it).
- Install Windows onto a new server and giving it the same name as the failed server.
- Install any Windows patches or service packs onto the new server that were running on the failed server.
- Join the server to the Active Directory domain.
- Create an Exchange Server installation DVD that contains the same service pack level that was used on the failed server.
- Insert the Exchange installation media that you just created and run Setup /m:RecoverServer
The method outlined above can be used to recreate a failed Exchange Server. The only thing that is not recreated using this method are databases, but databases are protected by DAGs. As such, these two mechanisms provide relatively comprehensive protection against a disaster. Even so, the level of protection afforded by these mechanisms often proves to be inadequate for larger organizations.
One of the reasons for this has to do with the difficulty of rolling a database back to an earlier point in time. Microsoft allows DAG members to be configured as lagged copies. This means that transaction logs are not committed to the lagged copy as quickly as they would otherwise be. This lag gives administrators the ability to activate an older version of the database if necessary. The problem is that activating a lagged copy is not an intuitive process. Furthermore, activating a lagged copy always results in data loss.
The other reason why DAGs are not always an adequate solution for larger organizations has to do with the difficulty of providing off-site protection. Exchange Server 2010 supports the creation of stretched DAGs, which are DAGs that span multiple datacenters. Although being able to fail over to an off-site datacenter sounds like a true enterprise class feature, the reality of the situation is that architectural limitations often prevent organizations from being able to achieve such functionality
The most common barriers to implementing a stretched DAG are network latency and Active Directory design. Stretched DAGs are only supported on networks with a maximum round trip latency of 500 milliseconds. Additionally, DAGs cannot span multiple Active Directory domains, which means that the domain in which the DAG members reside must span datacenters.
Even if an organization is able to meet the criteria outlined above, they must construct the DAG in a way that will ensure continued functionality both in times of disaster and during minor outages. In order for a DAG to function, it must maintain quorum. This means that at least half plus one of the total number of existing DAG members must be functional in order for the DAG to remain online. This requirement is relatively easy to meet in a single datacenter deployment, but is quite challenging in stretched DAG environments.
One of the issues that must be considered when building a stretched DAG is that Exchange cannot tell the difference between a WAN failure and the failure of the Exchange servers on the other side of the WAN link. As such, the primary site must have enough DAG members to maintain quorum even in the event of a WAN failure. Ideally, the primary site should have enough DAG members to retain quorum during a WAN failure and still be able to absorb the failure of at least one member in the primary site.
Another problem with stretched DAGs is that the requirement for the primary site to have enough DAG members to always maintain quorum means that if the DAG will never failover to the remote site, even if the entire primary datacenter is destroyed. The secondary site lacks enough DAG members to achieve quorum without an administrator manually evicting nodes from the DAG.
As you can see, DAGs tend to deliver an acceptable level of functionality in single datacenter environments, but the limitations that are inherent in stretched DAGs make them impractical for use in multi-datacenter deployments. Larger organizations are typically better off implementing other types of redundancy rather than depending on DAGs. One possible solution for example is to virtualize an organization’s Exchange servers and then replicate the virtual machines to a standby datacenter. This approach will usually make the process of failing over to an alternate datacenter much simpler and more efficient.