Scale-out storage architecture, SDN style
At Coho Data we talk about our association with SDN (Software Defined Networking) most conspicuously, through our use of SDN switches built into our product. We use these SDN switches (currently Arista 7050s) to provide the connecting fabric for our storage nodes. They’re COTS switches, but we use the hardware in unique ways. If you’re familiar with UCS blades, it’s somewhat analogous to the way Cisco use switching hardware for their fabric interconnects.
We make use of this OpenFlow enabled SDN equipment to control the IP storage traffic flow and distribute load across our storage nodes (we call these nodes our MicroArrays). The way this works is super interesting, and alone could fill several deep-dive blog posts (and hopefully will), but that’s not where I’m heading with today’s tale. Instead I want to focus on how we’ve applied some of the high-level SDN concepts to the entire Coho Data architecture. It’s really is an all-encompassing vision that is deeply ingrained in our product. So boys and girls, are you sitting comfortably? Then I will begin.
SDN is arguably the most clinically defined of the many Software Defined catchphrases in use today. Unlike SDS, SDDC, etcetera; most industry pundits agree that SDN revolves around decoupling the different roles that traditional networking equipment would stuff in every box. The idea is that abstracting the Control Plane and Management Plane from the Data Plane enables dynamic and remotely programmable control. This feeds new networking topologies that are both more manageable and scalable.
There’s a good reason why industry support of SDN came to pass. Namely, the changing data center landscape. Never scared to pass up the chance to wander off on a tangent, à la Billy Connolly style, here is some background.
The network as we know it
As I’m sure you’re accustomed to, traditional data center networking models have used a three-tier architecture:
– Core switches at the top
– Distribution switches in the middle
– Access layer at the bottom
This model favours the North-South traffic that was all the rage with the classic client/server model. It’s optimized for getting ones and zeros into and out-of a data center, so desktops can connect to their servers. Most server traffic is forced up to the core switches, to route the layer 3 decisions back down to the appropriate server subnet, or forwarded out onto the wider network. The distribution switches manage higher-function tasks like ACLs or QoS. And the access layer provides less expensive port connectivity to devices. However this model, while making a lot of sense for the busy corporate head office with a great deal of end user devices to connect, and a relatively small number of servers, has started to make a lot less sense for modern data centers.
Most heavily virtualized data center have a more distributed application approach these days, with perhaps less physically connected devices, but an increase in locally network addressable servers. Instead of the siloed, scaled-up, frugal IP address designs of old, there’s now a huge increase in East-West traffic. The core/distribution/access layer model increasingly suffers from congestion on its uplinks. And relying on spanning tree for protection from forwarding loops means inherent redundancy and scalability concerns as it blocks paths that could otherwise be utilized.
Network design is a-changin’
As a result several new topologies have gained in popularity; the most in vogue being the leaf and spine design. Flattening the switching, helping to reduce the number of hops (therefore latency), and maximizing the link aggregation possible. Don’t believe me? Feel free to dig through some papers on the theory of Bisectional Bandwidth and you’ll start to understand why (and tangentially cure that nagging insomnia).
So why the network topology history lesson? Aren’t we meant to be talking about Coho’s storage architecture here? Well, it just so happens that similar pressures being felt in data center network designs, is something that Coho identified early in their thoughts around designing the next generation of storage. Scaled-up storage accustomed to heavy North-South traffic wasn’t scaling sufficiently, putting increasing strain on the top-level controllers. The storage landscape was also evolving rapidly in other ways. Spinning disks were not following Moore’s Law for their performance profile (in fact they weren’t getting any faster), whilst all the time they continued to creep up in capacity, perversely driving down the IOPS/GB. Flash technologies were emerging that leapfrogged performance in orders of magnitude. Things had to give.
Coho made a bedrock decision to embrace a scale-out model using distributed storage nodes. We needed to accommodate more East-West traffic, and wanted to reduce the layers between the storage targets and the initiators. In the same way that networking devices are incorporating SDN abstraction to become multi-purpose, to programmatically deal with not just L2/L3 tables but other network functions, and control traffic flow more economically; Coho designed their DataStream system to follow a similar data/control/management plane abstraction.
The internals of Coho Datastream systems are made up of a few key building blocks. And note that I’m using the engineering internal names for these components!
Vaccine (“VAX” in the diagram above) is Coho Data’s customized OpenFlow controller implementation. An agent runs on each switch (“OFA” or OpenFlow Agent in the diagram), and the OpenFlow Controller is physically abstracted and runs on one of the MicroArrays. The vaccine agent runs alongside an NFS daemon presenting a unified storage IP address. All hosts are provided a single storage target and the system intelligently routes the client connections to multiple backend nodes. The node that runs the Vaccine Controller is elected through a distributed coordination service. Vaccine is heart of Coho’s “Control Plane”.
The SSAPP is a small “Management Plane” VM. In our current release it runs on the primary switch, with its configuration distributed across several nodes for protection. This is the front-end tool that users interact with, and runs the following services:
– Web UI – the web interface is hosted from the SSAPP
– Installer – the full installation routine, upgrade process and recovery service
– Snapshot scheduler – Coho Data storage has had array-based snapshotting and cloning baked in since the first release and the SSAPP quarterbacks its scheduling
– Monitoring and Alerting functions
– vCenter integration service – regularly polls vCenter for VM workload information to match to stored VM objects
– Perf stat counters
The SSAPP also participates, along with some of the microarrays, in a distributed coordination service that ensures that the cluster continues to function correctly in the face of partial failure or physical connectivity issues.
Each MicroArray contains one or more NADs (I’m cringing as I type this, but the name is an internal one, and it’s kind of stuck with the team). A NAD is a balanced trio of CPU, Network connectivity and flash, currently one NIC, one socket, and one PCIe SSD. The primary focus of each NAD is to move data onto the appropriate storage devices with as little interference as possible. It achieves this using a bare metal object store implementation that we call CLOS.
CLOS is our proprietary flash-first, automatically tiered object store (not to be confused with this CLOS). CLOS acts as a sort of “Data Hypervisor”, that runs within each NAD, and is conceptually very similar to the VMM that runs within each ESXi host. It virtualizes the flash hardware into coarse-grained objects, which in turn form the building blocks for scale, data migration, and failure recovery in the architecture.
Above CLOS, Coho’s Data Profiles abstract our protocol support with its Global Namespace and sit atop an Object Store presented by the Data Hypervisor.
Coming back to the Software Defined model, the forwarding ports on the switch, microarray hardware, fast packet forwarding (Data Hypervisor) would all belong in the Data Plane. Dispatch is a service running on each NAD that acts like a network forwarding table, being driven by state information from our coordination service. The coordination service (from the Control Plane) pushes (programs) a subset of information to provide the best forwarding decisions. Dispatch also drives the stripping and replication between microarrays, snapshotting/cloning, and the rebalancing of data when nodes are added or removed.
Software defined networking defined storage…
This tortured SDN analogy helps show the basis of our web-scale architecture. Unlike scale-up monolithic arrays with dual controllers that force storage traffic to hairpin through core controllers with a North-South bias, we use a scale-out flat hierarchy instead. As we add more nodes, there’s no funneling of traffic. All ports on our switches are non-blocking, line-rate 10GbE connectors. We can directly attach the 10Gb switch to each ESXi host, making the full cross-sectional bandwidth available into the ESX cluster. Each node adds CPU and memory that would otherwise be fixed to a five-year controller refresh in traditional storage systems. As the system scales, the addition of each node also adds more 10GbE connectivity, PCIe flash devices and capacity disks to the system.
This has been a really fast tour of some of the more important moving parts within Coho’s product architecture. While we make really creative use of SDN switching, we’ve also borrowed several of the core insights behind SDN, in particular the clean separation of a fast and scalable data path, and separate centralized control and management functionality.
14,729 total views, 3 views today