Technologies that transform.
Change fascinates me. It always has.
More specifically, I’m talking about imposed change. Not the weather, not the seasons, not the length of your hair. The act of making a calculated and directed change, of moving a complex system from one place to another, hopefully better, one.
Out of control
A formative moment for me, personally, with regard to thinking about change and software happened in the middle of Kevin Kelly’s “Out of Control” which I read as an undergraduate student when it was published about 20 years ago. There’s a fantastic anecdote in the book about attempts by a naturalist, David Wingate, to reestablish indigenous cedar trees on a small island near Bermuda.
Wingate was trying to restore the cedars as a habitat for an endangered bird, the cahow, but was faced with the challenge that the conditions on the island were far too hostile for young cedar saplings to become established enough to grow. Kelly mentions this great term to describe how Wingate solved the problem: the introduction of a “scaffolding species.” To allow the cedars to take root, Wingate planted a faster-growing evergreen shrub (called the casuarinas) around the outside of the island to act as a windbreak over the early years when the young cedars were vulnerable. As the cedars matured, they eventually displaced the casuarinas as the dominant tree on the island. The casuarinas were scaffolding — a stepping stone for the establishment of a cedar forest on the island.
The scaffolding species isn’t symbiotic, because if it does its job it will create a change that renders itself less useful, and probably less prosperous, than it is in its initial role. The scaffolding species is a necessary incremental step to move from one stable state to another one.
In his book, Kelly describes this approach in the context of machines in a way that has always resonated with me, especially with regard to software:
Complex machines must be made incrementally and often indirectly. Don’t try to make a functioning mechanical system all at once, in one glorious act of assembly. You have to first make a working system that serves as a platform for the system you really want.
Virtualization as a scaffolding species
When I was a graduate student working on Xen, I often gave talks that described virtualization as a scaffolding species in the datacenter. My sense was that while there were a lot of immediate-term wins to be had in virtualizing physical servers, the more significant benefit of virtualization would be that it would enable the development of new OS and application technologies that could be deployed and evolved alongside all of the legacy Windows apps and (gasp) even 16-bit real mode applications that organizations still depended on.
This didn’t turn out to be true in quite the way that I had anticipated: As a naive young graduate student working on Xen, I thought that the broad success of virtualization would lead to a new renaissance of operating system research: that we’d suddenly start seeing all sorts of cool new application-specific OSes being built and deployed alongside Linux and Windows VMs. While there were some great projects in this direction (and there continues to be some really exciting work on things like Unikernels), virtualization certainly didn’t shake up the OS landscape in the way that I initially imagined it would.
Virtualization did, however, present two very interesting properties that I think fundamentally qualify it as a scaffolding technology:
- Infrastructure programmability. Virtualization allowed for the automation and programmability of the manual, human tasks that surrounded the provisioning and configuration lifecycle of application stacks. The notion of a “software defined datacenter” is remarkably accurate here in that suddenly devops staff became empowered to think about their servers and their networks as innately programmable abstractions. Suddenly we saw PowerShell being used to achieve backup and scale-out, and spectacularly cool python scripts to take advantage of things like spot pricing on AWS compute nodes. In short, virtualization has introduced the idea of a programmable, infrastructure-level control plane, and allowed us to think more clearly about our systems from a resourcing perspective.
- The need for internal orchestration. Virtualization has also punctuated a significant weakness in the software that we were building: the control plane programmability that was realized outside our VMs turned out to be a lot easier to manage, both over time (software upgrades, etc) and scale (increasing node count) than the applications that we were running within them. Put another way: the ability to write a script that would stand up a web server farm of 10,000 AWS VMs was spectacular. Making those VMs actually do useful things together, reconfiguring them, and upgrading them all proved to be a rather more challenging problem. In this regard, my sense is that virtualization’s ability to facilitate scale has resulted in systems that are large and complex enough as to precipitate a need for containerization and orchestration technologies like CoreOS, Kubernetes, and Docker.
I suspect that as container and orchestration technologies mature and improve, we will see them start to displace VMs as a core software abstraction in the datacenter. This absolutely doesn’t mean that virtualization will go away: anyone who has tried to get resource isolation right using Linux’s cgroups, or has thought about the multi-tenant attack surface presented by OS containers will likely agree that there are elements of virtualization that can squarely benefit, and probably help simplify container architectures. However, in the same manner that virtualization allowed us to reason about and program our infrastructure in a way that we weren’t previously able to, containers promise to provide the next useful layer of abstraction in terms of building, deploying, and maintaining really interesting software systems, and letting us think in terms of applications and services rather than their component (virtual) hardware.
Lessons for data
Let’s revisit Wingate and his scaffolding species for a second: it turns out that there’s another side to the story — one that wasn’t in Kelly’s book — and that the windbreak may not have been the only result of introducing those casuarinas into the ecosystem. In fact, it turns out that the casuarinas are also pretty good at eroding cliffs and dunes along beaches, so much so that there is a public campaign to get rid of them in Barbados today.
The fact that casuarinas have had an adverse side effect doesn’t at all invalidate the idea of a scaffolding species, but I think it does present a useful precautionary lesson: that benefits of change may not apply to all aspects of a system. Some things may in fact get worse.
I think that this may the case for data. Virtualization and containerization are fundamentally computing abstractions. They are mechanisms for wrapping software up, and managing that software through an API. A VM’s data ends up being virtualized too, which usually results in it being wrapped up inside an opaque image file and stored on enterprise storage. You might think that the higher-level of virtualization used by containers would have addressed this (that was my hope at least), but it appears that discussions about containers seem to be headed down the same block-level abstractions for persisting application state.
I’ve heard concerns about this aspect of data in a couple of different contexts lately. Here’s one: several large enterprise environments that I’ve talked to have sophisticated development teams that are running home-brewed, scalable applications composed in VMs, containers, or both. Each app (or service, or microservice, depending on your taste) is some combination of service logic, key-value storage, and presentation. In the best case, this is the fully realized dream of a service-oriented architecture. Unfortunately, a mix of different stakeholders, each running their own KV installs and application code, creates a pretty funky franken-PaaS in which all the platform services need to run correctly for the system (of services) to work as a whole, and yet every single service owner has to be available to handle breakages not just in their code, but in the platform components that they have built in to it.
Where it was historically the case that the storage administrator was responsible for maintaining a durable and highly available network attached file system for all the apps in the environment, these organizations now see several additional layers of software involved in presenting that data as part of some higher level service. Put another way: the storage admin used to be an expert on the safety and presentation of data, and the application owner was an expert on the app. This is similarly true today, except for the fact that the app is adding multiple new layers to store and present data. The app is taking on responsibility for the availability and consistency of data but neither the storage admin, nor the app owner is a Cassandra/Mongo/Couch/Redis/Riak expert, and all of these things suddenly exist in the environment. This is absolutely not a dig at KV technologies: the customers that have raised concerns about this diversity and sprawl in the presentation of application data all want to support their application teams with whatever tools are necessary, but I’ve heard frustration over the fact that the sheer number of deployed KV systems (1) reflects a failing of traditional storage systems to provide useful, scalable APIs for emerging applications, and (2) make enterprise data more fragile, by making its availability harder to support.
Here’s another example: a large financial firm that we work with is faced with the problem of what they call “Hadoop sprawl.” Groups in their organization are building out individual Hadoop clusters to perform data-intensive compute jobs specific to their group. Here’s how it works: group of traders want to build out a bunch of models and run them against data that’s stored on the existing enterprise storage. They buy a half rack of servers and get them installed in a broom closet down the hall. Then they write scripts that pull copies of data out of enterprise storage, into the new Hadoop cluster, run analysis, and then copy results to managed storage. I am of course caricaturing cowboy traders and dusty broomclosets for effect here, but this is a meme that I’ve heard repetitively from specific customers in finance, entertainment, and medical applications over the past year: They are scared that separate analytics clusters forfeit the value that they have built into the management and availability of their existing infrastructure. They are scared that data is at risk during the time that it is pushed out to these external silos. They are concerned about the resulting inefficiency of extra network copies, and islands of analytics clusters throughout their organizations.
So despite all the talking that we see in the press about the volume of data that we are generating and the broad need to derive value from that data — to make it actionable — many of the systems we build are only serving to make our data more fragile and more isolated. The technologies that we have built are allowing us to better manage and scale applications, but they aren’t necessarily making our data easier to work with, and at the end of the day all of this stuff — even your applications — is just data. And as data, your infrastructure should be letting you and your organization get value out of it.
The year ahead
This is the challenge that I’ve given the Coho engineering team in 2015. We have built a solid, scalable, and high-performance storage platform that is proving itself out in numerous virtualized environments today. Over the next year, I’d like to show you how the thing we have built up until now is the first step. That our storage support for virtual environments is a scaffolding technology: it solves a real and immediate enterprise IT problem today, the ability to effortlessly provision high performance, cost effective enterprise storage and scale it in response to your demands for capacity, performance, or both.
That initial support is scaffolding because over the next year, we would like to show you how a flexible and carefully designed storage platform with balanced compute and connectivity is a step toward solving the broad new set of emerging problems that organizations face today. The real, core challenge that I see our team solving is that of providing the scaffolding to support both traditional and emerging forms of enterprise software and software development.
I’m excited about some of the projects that we’ll show off over the next year, largely because we’ve been working with some really interesting and engaged customers on some incredibly fun problems. If you’d like a hint at some of this direction, you might enjoy watching my presentation at Storage Field Day 6 last fall.
Here’s to an exciting 2015 for all of us. All the best to you and yours!
The image at the top of this article was apparently a 2009 photo contest winner for pictures along China’s Yangtze river. The original source is here, although I found it on the very excellent scaffoldage tumblr.
7,602 total views, 5 views today