Journey of a Storage Admin: Scale-Up to Scale-Out to Virtualized and Back Again.
It’s not easy to be an IT professional these days and with all the new emerging technologies, it’s hard to distinguish what’s truly innovative in the world of storage. I think back to the simpler days when you had only a few vendors to choose from. How far back you might ask? Well, when I was the storage admin both at Warner and Disney, the usual suspects were EMC, NetApp, HP and HDS. Having the NetApp storage experience working in the IT department at NetApp 15 years ago helped me land the admin gig at ESC Entertainment, a subsidiary of Warner Bros. which was farmed out to finish the Matrix Trilogy.
Houston, we have storage problems…
Upon my arrival, the decision was already made to go with NetApp as the building block for the environment. However we did have a couple of BlueArcs which had been pushed off to do less extensive jobs. It quickly became apparent to me that Post Visual FX workloads had an insatiable appetite for storage, so we leveraged the fastest FAS series filers NetApp had to offer. We had several of these filers but to maximize performance, each head controller had to be sized accordingly, which required a lot of time to be spent managing the storage.
As you can guess, this didn’t scale and we had to break up the dataset across multiple filers, creating problems for our pipeline. Due to the nature of the project, we often dealt with scenes exceeding the volume size, leading to the need to rebalance the performance on a maxed-out filer for a particular dataset to maintain performance on that filer. To add fuel to the fire, we would experience unexpected growth in our compute farm for rendering scenes that needed to meet production deadlines. There were a host of other issues but you get the picture: the storage was inflexible and became the workflow bottleneck.
We responded to these limitations as all admins do: duct tape and glue, or rather, using scripts, homegrown tools and Windows DFS to abstract the namespace. But as the environment grew (~200TB usable, which was a lot at the time) managing it all was becoming an OPEX nightmare especially since there was no virtualization (More on this later). Most of my time was spent migrating, load balancing datasets and connections across the filers due to the silo and performance plateauing effect. Times got so bleak that we created a song, “working on some shot moves”, for the late nights when we had to conduct data transfers.
The aim isn’t to throw NetApp under the bus, as it was great technology during the time and widely adopted in the M&E industry. But what surprises me is that a decade later the typical Enterprise IT team still faces problems with scaling storage, managing performance and having to migrating data sets across volumes in some shape or form constantly. Oftentimes the root cause of these issues can be traced back to the traditional head controller scale up architecture.
Going back to my story, I was hoping to revamp the storage environment during my tenure at ESC by making sure we addressed the needs of workloads by having a solution that:
- Addressed hotspots automatically
- Didn’t require external data migration
- Didn’t require creating and managing multiple volumes/containers
With these requirements in mind we realized the optimal solution would be storage architecture that was distributed and scaled out as our environment grew and addressable with a global namespace, things the Coho Datastream does (more on this later). The two companies that fit these requirements at the time were: Spinnaker and Isilon Systems.
Isilon had just come onto the scene and Spinnaker was making headway in some of the studios, but that changed dramatically when NetApp acquired Spinnaker and failed to meet any of deadlines they set for integrating the distributed architecture with the NetApp’s filers. This leads many studios to start looking into Isilon Systems as an alternative, but we never got to the opportunity to deploy either of these solutions at ESC as Warner decided to close the studio down at the time due to management reasons.
However, I would get my chance to test Isilon’s mettle soon enough. . .read how in part 2
6,780 total views, 1 views today