Letting it all scale out

 

Coho Scales

I must own about 30 power cables for my notebook.  I forget the things everywhere.  I can’t seem to go on a family vacation without realizing that I’ve left my cable at home, and then have to spend and hour looking for a Best Buy or an Apple Store to buy a new one.  In the end, I’ve settled on just owning enough power cables to have them everywhere that I commonly work.  I’ve got one by my bed, one at my desk, and one in each of the conference rooms at the office.  Laptop power is a service, and I’ve made that service broadly available to myself.

Scale-out is a little bit like this. The term is horrifically overused, but it is also probably the most central idea to what we’ve been building at Coho. Coho’s architecture is distributed: you can buy it in pieces and grow your system over time. Performance and capacity scale as you do. This is probably what you would expect from a scale-out storage product.

When people think about scale out, the property that has historically been emphasized is the fact that you can get to very large storage capacities. While this is a really important thing architecturally, size isn’t everything. The ability to grow (and shrink!) a storage system on demand, to change its composition over time, has some other really big impacts that are worth thinking about. In this post, I’d like to summarize some of the reasons that scale-out is disruptive for enterprises today, and why customers should be paying attention to them.

Here’s a high-level summary:

  • Scale-out changes the way storage is bought.  

A system that can expand on demand is one that can be purchased on demand.  This makes it easier for enterprises to evaluate small deployments of scale out systems, even in the middle of refresh cycles for larger storage installations.  It also means that you only buy the storage you need as you need it, which protects storage investment in the face of the fact that flash prices are falling by 50% every 18-24 months.

  • Scale-out eliminates forklift upgrades.  

A storage system that scales as a distributed system needs to allow new nodes to be added over time.  It also must survive nodes that fail, reprotecting data on other nodes.  As a result, scale-out systems live much longer than their component parts.  They do away with the planned data migrations that happen when traditional arrays reach end of support and have to be replaced.

  • Scale-out eliminates performance bottlenecks.

As storage systems continue to take advantage of high-performance flash, the individual “disks” become faster than a traditional controller-based design is capable of exposing to applications.   The Flash devices can’t expose their full potential in that model, because either the controller CPU or the interconnect become saturated.  Well designed scale-out architectures allow performance to scale alongside capacity, exposing more immediate value to applications as a system grows.

I’ll go into a bit more detail about each of these points below, but here’s one additional thing that I’ve found useful in discussing the difference between scale-out and traditional storage architectures with customers:  in large IT environments, today’s administrators are taking a bunch of hardware-based products and delivering a service-based offering.

They are on the hook for changing large capital investments into resources that are consumed as a utility,  for making the replacement of hardware as invisible as possible to their customers and so on.  The most significant difference between scale-out and traditional storage architectures is that a scale-out system is effectively a service, and one that is virtualized and decoupled from the individual pieces of hardware that are used to build it.  Like server virtualization, this is a strong match for the way that IT environments are being asked to offer services because they remove a lot of the burden involved in managing and aggregating traditional silos of storage hardware.

Now, here’s some additional detail on each of the three points above:

Scale-out changes the way storage is bought.

Enterprise storage has, for a long time, been based around a sales cycle that is three to five years long.  You do some planning, you buy a system that is probably more than you need right now and hopefully enough to meet your needs four years from now.  At the end of that period of time, you repeat this exercise, and then arrange to copy all the data from your old box to your new one, and manage all the associated configuration headaches around the movement of that data.

Of course, as your environment grows, you start to have multiple of these storage silos, and in any given year (or month!) some of them are being refreshed, and to the burden of managing hardware upgrades and data migrations is eternal.

More than this, the decision of what storage to buy is a really terrifying one.  Budgets are large, and replacing something that your admin team has experience with, even if that system is a right pain to manage, with something unknown is a large, scary undertaking.

One thing that has been remarkable about Coho’s customers is that they have frequently been buying our product off-cycle.  As an example, we recently sold a small 4-microarray system into an environment that has about 8PB of installed NFS storage that is only half way through it’s lifetime.  After evaluating our product, the customer decided to purchase Coho in order to offload a relatively small set of performance demanding workloads off of their existing storage investment.  In this case, they are moving Oracle transaction logs from a number of production systems, and freeing up performance on their incumbent storage system.

This is a pattern that our sales team is seeing regularly: As a high-performance scale-out product, it is reasonable to deploy Coho’s product incrementally, along side an existing storage investment.  Customers like this because it allows them to gain trust and experience in our system as they grow it on a quarterly basis.

Scale-out eliminates forklift upgrades.

A scale-out storage system is a utility.  That utility is greater than the sum of its parts because it is decoupled from those parts.  By allowing storage nodes to be added and removed over time, and by virtualizing that physical hardware, scale out systems achieve an immediate benefit that is similar to that of server virtualization: physical servers can be replaced and re-racked without the need to interrupt service to customers.

One major outcome of this is that the lifetime of a scale out system can be longer than that of it’s component parts.  This is clearly illustrated by today’s cloud storage environments.  Amazon’s storage services have been running for years, hosting multi-tenant customer data with no exposure to the continuous evolution of hardware that the system is based on.

Our experience with this aspect of of the system architecture is that the service-oriented view of a storage system is actually much more than avoiding migration planning at the end physical component lifetimes:  it has deep impact on the way that we develop and deliver features.

Since releasing our first product, we have continued to evolve and improve the Coho software stack.  Viewing the system as a storage service, our customers understand that we are capable of being responsive to feature requests.  Our production deployments have actually gained performance, and incorporated new features over the time that they have been deployed.  In viewing the storage stack as a service, rather than a hardware appliance with a specific software release the scale-out view of software release and refinement is one of continuous improvement, and it’s an aspect of the system that we’ve had a lot of enthusiastic customer feedback for.

Scale-out eliminates performance bottlenecks.

Traditional monolithic array-style storage designs are bottlenecked: they have a single (often redundant) controller, and a limited amount of network connectivity.  As you add disks, you don’t add additional processing power or additional connectivity.  To a point the extra spindles may improve random access performance, but it’s diminishing returns.

In my next post, I’ll summarize some of the history in scale-out architectures, and talk about how preserving performance is both important and challenging.  Right now though, I need to run and find my power cable.

Interested in learning more about Coho and our products? Check out ESG’s report on our initial product offering, or our slightly gorier technical white paper that describes the system in a bit more detail.

6,436 total views, 2 views today