Windows Geoclusters, Stretch-Clusters, and RecoverPoint/CE Failover

Taking a page out of Chief EMC Blogger Chuck Hollis‘ playbook, I’m attaching the graphics from entire PPT file that I thought would be important to highlight for this blog and its readers.  Some of the graphics didn’t fit to the page as well as I thought it would (I need to shrink them further). So if you like what you see, you can download the whole PPT right here: RecoverPointCE-MSfailoverclusterPPT

In a nutshell, EMC’s RecoverPoint/Cluster Enabler extends a Microsoft cluster across two sites.  A Microsoft cluster normally provides local site “HA” or high availability of server nodes, and RecoverPoint/CE adds “DR” or disaster recovery (AFTER) by stretching the second node to anywhere outside of your primary datacenter.  This presentation walks you through the basics behind that simple idea and provides some additional background.   Slide building credit goes to Gary Archer, a great guy who is always keeping me sharp on RecoverPoint’s latest features.

Recovery Time Objective: Targeted amount of time to restart a business service after a disaster event

Recovery Point Objective: Amount of data lost from failure, measured as the amount of time from a disaster event

Various approaches for DR and their RTO rankings

Microsoft Failover Clusters (formerly MSCS (or Wolfpack if you go back really far)) provides local HA, not DR across a site.  For this, you need to S-T-R-E-T-C-H your cluster. EMC’s Cluster Enabler is one way to do it, and using RecoverPoint with it would be like have your iPhone on Verizon.  Not the best analogy, but you get my point I hope!

Basic requirements – use SYNCHRONOUS or ASYNCHRONOUS – distance is not the issue but 400 ms latency ASYNC and 4 ms latency SYNC

Leverages majority node set clustering.    If you have 2 nodes/servers on Site A and 2 nodes/servers on Site B you will need a “tiebreaker” for deciding how to remain online after a failure – most common method for this tiebreaker is File Share Witness.  Many articles can give you additional background on majority node set clustering – it’s a good thing to know – I will point you to the blog from an old friend of mine John Toner, who writes about geographically dispersed clusters.

The architecture. 

What each piece does:  CE is a filter driver that “catches” Microsoft Cluster failure events and let’s the RecoverPoint-managed disk systems know to failover as appropriate.  Very sophisticated logic is built-in to prevent cluster split-brain – scenarios where the link is down and the application (such as a SQL server database) doesn’t know what is the correct owner of the disk resources.

See if you see what is happening above – AUTOMATIC FAILOVER.

Integrates with and supports Hyper-V

Works with latest features like Live Migration – so you can Live Migrate workloads locally for HA and failover remotely for DR.  You can control if you want to failover locally before failing over across a site.

Self explanatory – the failover steps in detail.

More detail of Live Migration support – note synchronous requirement.

Multi-array support.  We can create consistency groups with storage devices from multiple arrays in the same group.  This allows fora lot of interesting failover implementations (failover locally first, not remotely for example) and lets you keep components grouped together… like an entire SharePoint farm.

Hey, it works with Oracle on Windows too.

Recap of the benefits – hopefully it makes sense and it’s the reason that customers love this integration – with RecoverPoint/CE you get more control, less bandwidth required (3-12x savings on bandwidth as reported by RP customers), and it’s integrated with Microsoft Clusters to enable seamless failover.

Now that is a cool product.


1 thought on “Windows Geoclusters, Stretch-Clusters, and RecoverPoint/CE Failover

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s