Evolve or Die
Today’s storage consumer faces an environmental threat.
For decades, the hardware environment where storage was built — the disks, networks, and CPUs that were used to provide storage — changed slowly and incrementally. The tectonic plates of small and incremental improvement drifted, providing year on year improvements to the piece parts of storage systems. Storage was simply a boring but necessary part of the datacenter.
The arrival of commodity flash memories, just over a decade ago, introduced the first of a set of dramatic environmental changes to how storage systems were built: These early SSDs were clearly faster than disks and represented the potential for a spectacular performance improvement in the datacenter, but they were also challengingly imperfect pieces of hardware. These early SSDs were expensive, offered highly variable performance, and introduced durability concerns because they would wear out after a moderate amount of update traffic. Storage system designers were forced to adapt to these changes. They reconsidered file system structure to match program/erase boundaries, they build deduplication to reduce capacity consumption and limit wear. In short, they innovated; they evolved their storage systems to deal with the reality of the new datacenter environment presented by emerging flash memories.
Today’s surprising reality is that the environmental change that began with early SSDs was simply a warning. It was the beginning of a near complete inversion of the properties and assumptions on which datacenter software had been built. As a result of Intel’s aggressive investment in nonvolatile memory technologies, we now face a storage environment in which change itself is the only constant. Consider the following realities:
Storage component pricing is falling much faster than the traditional period of storage ownership. Storage systems have historically been sold as stand-alone hardware appliances, with service agreements that span 3-5 years before data is migrated and units are replaced. Today’s reality is that solid-state pricing is falling by 50% at about an 18 month frequency, and per-device performance is doubling over that same period of time.
Storage hardware is becoming incredibly dense, and that density is a completely new source of problems. Individual storage devices have become about one thousand times faster on a per-device basis, but they have also become about ten times more expensive at the same scale. While NVMe flash devices represent a performance-dense source of efficient persistent state, they only achieve this value if they are kept busy. Storage now faces a similar challenge to the one that idle CPUs faced a decade ago, and that led to the introduction of virtualization in the datacenter. To really expose the capabilities of emerging storage hardware, the entire software stack must be redesigned. Technologies such as the Storage Performance Development Kit (SPDK) must be employed to actually expose the raw performance offered by today’s enterprise NVMe, but, even more than this, storage designs must integrate deeply with the network to ensure that there is sufficient aperture of connectivity and rack-locality to allow data to be accessed at the full potential of the devices that house it.
Storage isn’t storage any more. Storage used to be an almost pejorative term: placing large volumes of data “in storage” was like putting it in a garage, or an attic. The ratio of performance to storage capacity has been decaying to the point where spinning disk drives represent a significant risk to safely performing RAID rebuilds when they fail. With the advent of high performance NVMe and DIMM-attached storage, this simply isn’t true. High performance memories aren’t about performance at all, they represent the fact that all of an organization’s data is no longer “in storage.” Suddenly, and like never before, this data is available, immediately actionable, and just waiting to be analyzed. Storage systems must come to terms with the analytic potential that they now offer, and respond to the fact that their role and potential value has completely changed.
Evolve or die.
The datacenter environment in which storage systems are built is changing today faster than it has in the history of computing. There is no place in this environment for the storage systems of a decade ago, and there is no place for even the first-generation “all-flash arrays” that were born in response to early flash memories. Coho Data builds a scalable, software-defined data platform that is a response to this rapid and dramatic change in environment: Our goal is motivated precisely by the points above. In particular, our system offers dynamic and incremental scalability, allowing customers to add the state-of-the-art hardware that they need, as they need it, over time. It integrates with software-defined networking to ensure that the performance of a scalable and disaggregated storage system is fully exposed, and that rack-locality is used to protect the network’s core. Finally, Coho integrates directly with big-data protocols like HDFS to avoid the siloing of big data, and rather to allow data to be analyzed in situ, presenting rapid time to value on the customer data that we store. Coho has benefited from a close working relationship with Intel for years: I believe that we were the first enterprise storage product to ship using Intel’s initial PCIe flash drive, the 910. We worked with early development releases of the Data Plane Development Kit (DPDK) and SPDK technologies to demonstrate their value within storage datapath implementations. We built a converged, multi-tenant big data reference architecture for UBS AG. Coho’s innovation in storage software is the perfect complement to the rapid strides that Intel is achieving in storage platform hardware.
The datacenter is changing faster than ever before. To survive the next decade, and to truly expose the capabilities and rate of innovation being presented by Intel’s high-performance nonvolatile memory portfolio, evolvability is the singular most important feature of a modern storage system. Coho Data’s DataStream platform is engineered to embrace these dramatic and exciting changes.
The image at the top of this page is a slightly cropped version of “Ice age fauna of northern Spain” by Mauricio Antón. It is available through Wikimedia commons.
7,243 total views, 7 views today