Data, Storage, and SDN: An Application Example

Data, Storage, and SDN: An Application Example"

A lot of the usual discussion that we all read surrounding SDN is about how the introduction of more flexible networks is going to solve a bunch of very real problems from a network operations perspective.  The rise of server virtualization has made provisioning and isolation of network resources even harder than it already was, and SDN promises to make that better.  Similarly, large organizations like Microsoft and Google are talking about the wins that they are getting in terms of explicit, wide-area traffic management within large-scale enterprise networks.

What’s really exciting about these discussions though, is the idea that new applications will emerge.  Fine-grained, data path programmability in the network might actually change how we approach broader issues in application design and emerging applications might integrate directly against network infrastructure.  To date though, there haven’t been many clear examples of application use cases for the exciting new functionalities that SDN switching is offering up.

In this article, I’d like to tell you a story of a storage application use case.  For the past two years, we have been working on an enterprise storage system that embeds SDN switching hardware directly within our platform.  We’ve worked with OpenFlow and with chipset APIs to manipulate TCAMs and forwarding tables directly.  Despite the fact that SDN hasn’t been broadly deployed in many of the customer networks that we sell into yet, we’re able to distill concrete value out of today’s SDN switch hardware by using it as an embedded component of our storage system.  And it is paying off spectacularly.

Some Storage Background: Why the network hasn’t (really) mattered until now.

For the past two decades, vendors have built big boxes full of spinning disks, aggregating them together with techniques like RAID, and then exporting some abstraction like a virtual block device or file system over the network.

From a performance perspective, these spinning disks are awful.  If you access them sequentially, modern disks will offer you about 100MB/s of data — in other words, at best a single disk can about saturate a 1Gb link.  Unfortunately, this never happens.  Random I/O involves seeks on disks, and throughput falls through the floor.  With random I/O, that same disk will offer you 2 MB/s or less.  The broad deployment of virtualization has meant that enterprise storage systems are serving more concurrent workloads (lots of VMs) and more opaque workloads (virtual hard disks, instead of individual documents), broadly known as the “I/O Blender” effect.

The result of this is that in almost all situations, running a single fat pipe between the array and the network has been sufficient.  The bottleneck has always been the disks.

Flash is suddenly a problem.

We’ve had flash hardware in storage for about ten years.  Early flash was expensive and unreliable.  It was great at random access, but otherwise performed a lot like disks, and that’s exactly how they were treated in storage systems.  Vendors replaced some of the spinning disks with SSDs, and generally used those SSDs as a cache.  Business as usual, the SSDs and disks shared a pretty slow SAS or SATA bus, which still had aggregate performance that could fit on a 10Gb connection.

Then everything changed.  In 2010, we began to see the emergence of PCIe-based flash hardware.  Flash devices moved off of the storage bus altogether and now share the same high speed interconnect that the NIC lives on.  Today, a single enterprise PCIe flash card can saturate a 10Gb interface.

This is one of the predicating observations that we made a few years ago in starting our company: That storage was about to change fundamentally from a problem of aggregating low performance disks in a single box into a challenge of exposing the performance capabilities of emerging solid state memories as a naturally distributed system within enterprise networks.  By placing individual PCIe flash devices as addressable entities directly connected to an SDN switch, our approach has been to promote a lot of the logic for presenting and addressing storage into the network itself.

How SDN Solves Storage Problems

The initial customer environment for storage that we are trying to address is that of a virtualized NFS-based environment.  VMware, for instance, is deployed across a bunch of hosts and is configured to use a single, shared NFS server.  How can we take advantage of SDN in order to allow expensive PCIe flash to be shared across all of these servers, and avoid imposing a bottleneck on performance?

Problem 1: The Single IP Endpoint

We can’t change the client software stack on a dominant piece of deployed software like ESX.  As a result, scalability and performance have to be solved in a way that support legacy protocols.  IP-based storage protocols like NFS cook in an assumption that the server lives at a single IP address.  In the past, people have build special-purpose hardware to terminate and proxy NFS connections in order to cache or load-balance requests, but SDN allows us to go further.

The NFS server implementation in our system includes what is effectively a distributed TCP stack.  When a new NFS connection is opened to the single configured IP address, an OpenFlow exception allows us to assign that connection to a lightly loaded node in our system.  As the system runs, our stack is free to migrate that connection, interacting with the switch to redirect the flow across storage resources.  As a result, we are able to offer the full width of connectivity through the switch as a path between storage clients and storage resources.  This approach is similar to proposals to use OpenFlow as the basis of load balancing, with the difference that it is the application itself that is driving the placement and migration of connections in response to it’s own understanding of how those connections can best be served.

This decoupling of client connections from a specific storage controller at the end of the wire solves an immediate scalability problem that until now has needed either interface changes on the client (NFSv4 delegation or PNFS) or complex administration (carefully splitting a namespace across several controllers).  It also allows us to treat stored data as a completely fluid resource: as client connections can be moved in response to load and access pattern, so can the underlying data.  As a result, OpenFlow provides the flexibility to dynamically adapt and scale the system over time.

Problem 2: High-performance multi-tenant isolation.

One of my co-founders, Keir Fraser, and I both worked to develop the Xen virtual machine monitor when we were graduate students at the University of Cambridge.  When we were working on Xen, we spent a lot of time focused on the fact that a hypervisor really had only a single job: isolated sharing.  The hypervisor needed to take a server that was over-resourced for any single application, and allow it to be safely shared among many concurrent tenants.

SDN is extending this isolated sharing for virtual machines out to the network.  It’s allowing the isolated sharing of network resources and allowing entire distributed systems of VMs to be provisioned and managed as a unit.  In this regard, the VMs involved are actually just a resource above the virtual network that connects them.  By the same measure, storage resources, and data itself can be another such isolated resource.

By virtualizing network-attached flash resources to isolated networks — be they OpenFlow defined, NSX-based end-system tunnels, or even (gulp) VLANs — we benefit from the ability to take expensive and high-performance storage resources and map them directly to the systems that consume them.  In storage, sharing resources this way has always required some form of central mediator, with the side effect of always inherently having a bottleneck in performance.

Coho DataStream Architecture

As an example, binding virtualized networks to virtualized flash means that alongside a reliable and scalable NFS instance, an alternate tenant can have direct access to virtual flash resources and integrate directly into their application stack.  Isolation in this manner lets us deploy a storage system that both supports legacy protocols and allows new and more efficient presentations of data to be developed along side, all on the same hardware.  This idea is demonstrated in the block diagram of our storage stack, shown above.

How will SDN and Applications Evolve?

I think we all expect SDN to result in significant change from an application perspective in datacenter networks.  However, there seems to be the idea floating around that this is going to surface as some sort of SDN “app store” where you download and install exciting new types of functionality for your network.  As we start to see customer networks adopt and deploy SDN, our products will be able to more broadly integrate and achieve higher degrees of datacenter-wide performance management.

Through the coming years, I really hope that the SDN community will continue to evolve standards quickly, and that systems will stay implementation-focused around rough consensus and running code.  Most of all though, I hope that everyone — from the people implementing OpenFlow controllers and clients, to chipset vendors that are building spectacularly cool data path functionalities — continue to think about applications.  There is a tendency in building standards to avoid exposing features that you may regret and have to support later: my sense is that applications are the things that are going to really make SDN succeed, and that the more functionality you can give application writers, the more they can make SDN work for them.

2,094 total views, 3 views today