Getting More from Scale-out Storage: Site to Site Replication
Here at Coho, our primary focus is on protecting your data and keeping it available in the face of the unexpected. We protect data from local failures such as hardware outages, media corruption, network and node loss — just to mention a few scenarios. Moreover, our DataStream 2.0 release adds a new snapshot implementation that allows you to take thousands of immutable snapshots without impacting performance for quick recovery purposes. Scriptable and flexible snapshot APIs open up several other interesting use cases, like the ability to quickly attach and index or analyze snapshots of production VM data. A recent blog post explains some of our latest offerings in this area.
Today, I am very excited to introduce the next phase in Coho’s DataStream data availability: site to site replication.
Site to site replication, sometimes referred to in the industry as disaster recovery replication, enables you to keep a snapshot-consistent copy of the data you want on a remote site. There is no distance limitation; the other site can be as far away as your business requires. In our testing we have replicated data between sites from 800 miles to nearly 8,000 miles apart (within North America and also between continents if you are curious on the distance). As long distances introduce higher latency, we have engineered the product to take that into account to ensure we are as resilient as possible.
How we engineered it.
The Coho DataStream scale-out storage architecture already performs synchronous object replication between nodes (as opposed to less efficient and less flexible techniques, such as volume mirroring ). Coho’s new site replication system extends our current replication implementation with an asynchronous path that replicates scheduled snapshots to the remote site, and is configurable at a per-VM granularity. A granular data availability mechanism is critical for scale-out enterprise storage because, as with other data services offered in the DataStream stack, it allows administrators to focus on providing for the specific needs of each of their users and each of their workloads.
When you configure replication and select the workloads (virtual machines) you want to protect, there is no performance impact: Asynchronous replication moves bulk data transfer off of the live data path and allows data to be shipped to the remote site in the background.
We added the ability for you to control when replication happens, on a per workload basis. We don’t believe that all data is created equal and hence some workloads may have recovery time objectives (RTOs) that necessitate their being replicated more frequently than others.
Scheduled replication is directly integrated with VMware vSphere to provide application-consistent snapshots (VSS on Windows, sync on Linux) as desired. An application-consistent snapshot will typically take longer, because OS and application write buffers need to be flushed to disk. However, consistent snapshots recovery more quickly and provide a clearer specification of the data that is actually protected in a given snapshot. Coho’s snapshots are storage-based snapshots (and not not VMware-specific), meaning that the mechanisms can be applied to arbitrary data stored on the DataStream, and that snapshot operations may be triggered at both a UI and API level outside of the VMware toolset. The result of this clean decoupling is to further simplify the environment and to reduce potential performance issues on VMware vSphere hosts.
The replication stream is kept completely secure on the wide area network by encrypting the replication traffic with an SSL connection using a 256-bit AES CBC cipher (TLS 1.0). Replication traffic can also configurably be isolated on a separate VLAN for increased isolation and manageability.
Our replication mechanism is based around tracking “deltas”: It ships only the data that has changed over the window of time between snapshots. Additionally, replication traffic is compressed in-line to minimize the bandwidth used.
For customers that have limited site-to-site link bandwidth, the system can enforce limits on replication traffic to control how much data is flowing and to ensure optimal use of physical links, while meeting the desired recovery point objective (RPO). The system provides both bandwidth throttling and alerting in the face of link overload and deadline misses.
Finally, we assume the customer wants to use the remote site as a production site and not leave that hardware as a passive, disaster-only storage system. Our approach defaults to active/active replication pairings in which both sides of a replication link are hosting live production data, and using eachother as a recovery resource. Active/active replication means higher utilization of resources and greater value on storage spending.
As with other DataStream features, the two sites are not required to have the same hardware configurations. They just need adequate space available for the replicated data. Coho already supports multiple generations of hardware both within a site cluster as well as across sites. When we designed site replication, it was important to us to give customers as much flexibility as possible, removing traditional design constraints.
If you have interest in protecting your data with zero RPO, please comment on this article and we will get back to you.
To summarize the key features in Coho’s Site to Site replication:
- Asynchronous, periodic, snapshot-based replication
- Active – Active site support
- Virtual Machine granularity
- Bandwidth throttling
- Simple UI with one-time setup and very easy configuration
- Flexible replication schedule
Remote replication is hardly the last chapter in Coho’s evolving suite of configurable approaches to data protection. Expect more, equally exciting developments over the next year.
8,442 total views, 2 views today