Understanding the Container Storage Interface Project

Containers have become intensely important to software developers and system administrators – for good reason. Containers can isolate application code and even entire application stacks. That helps Ops teams keep test and production environments separate. In turn, that enhances security efforts and it gives IT more control of their environments. So, yay.

But containers are still an evolving technology – which sounds better than, “We’re still figuring it out as we go.” And, as with nearly all the hairy problems computer professionals ever contend with, the messy bits are in integration. There are a lot of moving pieces, and where they meet (or fail to), we encounter friction.

As a result, even if your shop is committed to container technology, getting underway isn’t as easy as it seems.

First, as with any technical strategy, a development team has to choose the container orchestration architecture that’s the best choice for its shop. Fortunately, there are several good choices, such as Kubernetes, Mesosphere, Cloud Foundry, and Docker. One development team might choose Cloud Foundry, another Mesosphere, and so on; each platform serves a set of use cases.

But after choosing a container architecture, the process gets more complex. Soon, a developer finds their team lost in yak shaving. That’s immensely frustrating. We want to solve this problem – not to deal with downstream issues that are distractions from the job-at-hand. We don’t want to invest time in cleaning up an old mess or building an integration tool before we can even get started.

And that’s where the Container Storage Interface project (CSI) comes in.

But let’s take a step back, so you can understand the problem that CSI solves. I’ve devoted a lot of time and energy to this, so I’m rather passionate about it.

Container orchestrators have difficulty running applications that require data to persist between invocations of the containers. When a virtual machine (VM) is stopped and restarted (on the same node or elsewhere) the data (which may be a file system, database, whatever) is preserved. That’s because that data is encapsulated inside the virtual machine.

In contrast, a container only contains the applications and its associated software dependencies; it does not include the underlying file system. This limits the application types you can run inside of a container. At least, it limits the value of running a stateful application in a container because that single container could only run on a specific node. So, in order to take advantage of containers, developers have to investigate the storage options and, too often, create unique custom solutions.

Vendor and open source storage providers all have some kind of API – and that’s the nub of the problem. Because each storage product’s API has different semantics, nuances, and interfaces, developers have to learn each one to take advantage of the software’s features. Multiply that by the number of container orchestrators, and you see the yaks lining up for their haircuts. Particularly if you need to change container orchestrators or storage providers.

It’s tough for users, but the lack of standardization presents problems for vendors, too. Storage companies have to choose which container orchestrators to support (and notably, which ones not to support); or they have to duplicate effort to support all of them. It’s very much like the problems an independent software vendor (ISV) faces when a new operating system comes along: Will it take off? Should we invest the time in it?

Remember what it was like when mobile application developers needed to write every line of code for each possible mobile device? Yeah, like that. Nobody knows what works with what, and which version has bugs when you try to integrate this particular Tab A into that particular Slot B. The only way to figure things out is by trial and error. Few development teams (or project managers) want to be told, “Embrace unpredictability,” so they glom onto one “solution” and are demotivated to change the architecture because they’re afraid of the downstream side-effects.

This slows down the adoption of containers, software-defined storage, and more modern infrastructures. Instead, the uncertainties cause people to continue to deploy older, legacy infrastructure. Fragmentation in this market has severely limited its ability to be embraced.

It isn’t as though this is a new problem; this cycle has repeated time and again. Earlier technology evolutions certainly have had to deal with the process of creating reliable standards. For example, we struggled with choosing a database and then jiggling application data to integrate with another department’s software. By now, we should know the importance of building towards integration. A rising tide after all raises all boats.

We are still doing storage like it’s 1999. It’s time to create a container storage interface that works across the industry. Thus now, is the point when your voice matters most.

The Container Storage Interface (CSI) is a universal storage interface – effectively an API specification– that aims to enable easy interoperability between container orchestrators and storage providers. The result is application portability across infrastructures. CSI will enable container orchestrators to leverage any storage provider (cloud or otherwise) to consume storage services; storage providers can provide storage services to any container orchestrator.

That sounds marketing-buzzwordy, doesn’t it? The point isn’t simply to create a single way for developers to incorporate storage into container-based software. That’d be only a matter of jargon and vocabulary (“You say tomato, I say to-MAH-to”). But a real interface takes into account what each platform can and cannot do. For example, one platform might let you mount more than one volume, and any API has to support that capability while also preventing its use on the other platforms. If we were talking about cars, the analogy might be an API responding, “This car model doesn’t have a back seat, so you can’t do this action.”

Three communities have a stake in creating a Container Storage Interface: container orchestrators, storage providers, and the end-user community. “Users” encompasses several groups, each with its own sensitivities, including operations teams, technology architects, and storage specialists. Right now, the CSI project wants input from all of them.

We have a pretty good spec, I think. We’ve collaborated with a number of people, and have contributed over two years of our experience from REX-Ray. But does it address concerns that people really have? Is there a feature or capability that needs to be included? We need as many voices in the community as possible to help us streamline this interface and make it work. The beauty of working with a community is hearing thoughts and ideas from all facets of a problem. So please, join us, lend us your voice and your thoughts.

How You Can Get Involved:

This is a public Google+ group for all CSI contributors. All the public meetings and discussions are shared here. Visit this group page for news and updates on CSI activities.

A smaller Google+ group of maintainers/approvers of CSI who maintain impartiality and have the benefit of end users in mind. Visit this page to stay up to date on the project.