Composable Infra, SDI, RSA… WTH is all this stuff – and what’s the direction?

I often get asked “what does the future of the datacenter look like?” and often in the context of some buzzwords like “composability”, “software defined infrastructure” (SDI), Rack-Scale Architecture (RSA).   These are real ideas, if less than real “things”.   It’s also often asked around “what comes after HCI” which is funny – because HCI is only just getting started 🙂

It’s a legitimate question though… “what do you think the future of the datacenter looks like, and where are you trying to steer/innovate?”

My answer comes down to 6 simple points.   In my opionin, the “Data Center of the Future”:

  1. …WILL BE SMALLER. On premises datacenters are going to be “smaller”. Many workloads will continue to migrate to SaaS and also more workloads than today will run on public PaaS/IaaS clouds. This will drive an inherent cycle of consolidation of the players in this space.
  2. …WILL REMAIN IMPORTANT. In spite of #1 – on premises datacenter may be smaller, but will still be huge, and be really important to customers. Many workloads for economic reasons, for data gravity reasons, for governance (not security) reasons will run on-premises. It’s for this reason that people are coming around to the idea: cloud is an operating model, not a place – and the answer is a Hybrid Cloud model. This trend also leads to inevitable industry consolidation. Sorry about that – but it’s not us that are “forcing” this – it’s inevitable.
  3. … HAS TWO DOMINANT EMERGING APPROACHES. On premises datacenter stacks, while software-defined, every day are tending to two design centers – and I’ll spend more time on this later, so keep these two ideas in your head:
    1. “vertically integrated” stacks where upper layer parts of the stack are fixed (think: VMware-oriented VxRail, VxRack SDDC; Microsoft-oriented Azure Stack; RedHat OpenStack/CloudForms/OpenShift Ready Bundle)
    2. “Horizontally integrated” stacks which stop at the physical or lowest levels of the virtual pool abstraction levels (think: VxRack FLEX, VxBlock, and I also think that HPE Synergy is an example).

    4. … WILL BE MOSTLY SOFTWARE DEFINED. More and more, the stacks are software-defined on industry standard hardware – for compute, for storage, and for network. The server is the foundational bedrock, the base hardware building block of the datacenter. I’ve said it before – it is the official Dell Technology position that “the majority of x86 workloads are ready to run on SDS/HCI models – today”. YES – there will be important workloads that run on external storage/CI models due to: “specific data service, capacity density, or extreme latency variability reasons”. But – with each passing day, customers should design for the general case, then design for the exceptions.

    5. … WILL DEPEND ON DATA CENTER FABRICS…. And these will be different and threaten network incumbency. Both “vertically integrated” and “horizontally integrated” stacks will require a data center fabric – the evolution of spine/leaf networks + SDN – that will link together Blocks, Racks, Appliances – in both “vertically integrated” and “horizontally integrated”. We started on Vscale with Cisco in 2015, and have seen a lot of joint success. There’s opportunities for other types of Datacenter fabrics on the verge of more volume IMO.  These are data center fabrics that disaggregate the hardware and the software.  These are data center fabrics that take Open Networking hardware approaches, ones that lean in like crazy on things like NSX – for overlay networks, for security, for network services. Further out there are datacenter fabrics on the horizon (and here we’re talking 2019/2020+) like Gen-Z that could provide a critical element to “rack scale” composability – where pooling and aggregation/disaggregation of memory and CPU become possible. Right now, it’s marketing hoo-ha, but the IDEA has merit.

    6. … WILL KEEP SHIFTING TO “BUY”. Every day, more and more customers are realizing that all the stuff I’ve been talking about in 1-5 are something that is a total waste of time to build, but rather they view their datacenter stacks (both “vertically integrated” and “horizontally integrated”) as a commodity to consume – in some cases as capital expense, in some cases as operational expense, and in some cases as a managed service. Along with point #1, this will drive an inherent cycle of consolidation of the players in this space. The next few years will be brutal – and I think people shouldn’t be surprised to see only a few players left standing. We’re determined to be one of them, but it isn’t going to be easy.

Now that those 6 principles are out there – I want to spend a a little more time on 3 – for three reasons: a) it’s a real architectural fork the road; b) I’m finding most people are thinking of the differences; c) “composability does not apply for one type (“vertical”) and does for the other (“horizontal”).

The operating word for “vertically integrated” is simple. That’s the customer promise… the “brand promise”.

“Vertically integrated” is simple because of fundamental assumptions of workloads that drive a tightly integrated approach – all the way to the top of the stack. You start with one fundamental limiting assumption – and this leads to great simplification, and can pay massive dividiends.   This is a core “simplicity over flexibility design choice” – simple, harsh, but honest.   This could be choosing to assume that everything on the stack will be a vSphere VM, or a specific KVM instance (think AHV), or a linux container with K8S.

It’s notable that the hyper scale clouds may be internally built out of parts – but the whole design is a “vertically integrated” approach – starting with workload target, and designing the whole stack – software and commodity hardware around that target.   In fact – this explicitly means that the hyper scale cloud have multiple vertically integrated stacks around different purposes – whether it’s EC2 VM vs. bare metal instances, or the Azure blob store, or Google Cloud Engine’s infra that supports TensorFlow use cases.

Conversely, the operating word for “horizontally integrated” is flexible. The “horizontally” integrated stack isn’t built into very specific workloads – and therefore needs much richer programmability, telemetry, and intelligence – at very low level, right down to bare metal.

This makes “horizontally integrated” stacks by definition more flexible, but ALSO less integrated than “vertically integrated” stacks, and certainly more complex.   It’s also a much more “natural” head space for people in Enterprise IT who generally have always approached infrastructure as by definition a “horizontal asset” supporting

ProTip: “simple” & “flexible” are both nice words, but not something you can simultaneously optimize for.   Anyone who tell you otherwise is naïve or trying to sell you something that is vaporware, or is smoking something.

Here’s a picture I’ve used to express these ideas:

image

People all use different words for these two design approaches, which makes it all confusing, but if you step back and squint, you can see these two distinct approaches emerging.

Sometimes words like “composability” or “software defined infrastructure” is applied to the horizontal approach only – but I think that’s wrong. Those ideas are equally applicable at different LEVELS of the stack in the two approaches.

The logic flaw I see over and over is that people debate about which one is “right”. Frankly, if I’m a betting man, I think people bias to simplicity over time (which tends towards vertically integrated stacks), but for many customers – the answer is either or both.

So – with that all said, what about “Composable Infrastructure”?

Point #3, #4, #5 are important parts to Dell Technologies answer to “composable infrastructure” – and we are SHIPPING.

Before you read on, dear reader – I would strongly recommend reading the great blog post here by Robert Hormuth, VP/Fellow, Server Division CTO, Dell EMC.

You back?

I agree with Rob on most points, and want to add something – you can think of our two posts are two viewpoints as two humans looking at the same space.

If you put our viewpoints together, you get that we at Dell EMC think that stacks will require 2 things:

  1. elastic pools that can be aggregated/disaggregated compute/network/storage;
  2. rich APIs for programmability, telemetry, and “built in” AI/ML logic.

This is pretty aligned with the IDC definition:

“datacenter infrastructure that seeks to (dis)aggregate compute, storage, and networking fabric resources into shared resource pools that can be available for on-demand allocation (i.e., “composable”).” – Composable Infrastructure Is About IT Efficiency and Business Agility, Ashish Nadkarni, IDC, Jan 2017

—-

That said there are THREE things that need to be said to separate marketing bulls#$% from reality:

FIRST: We think that “programmability/telemetry/logic” part is as important as the “aggregation/disaggregation” part. We think doing this bound intrinsically to any given closed/proprietary hardware platform, just won’t work in the long run.

Our early work on this is here: https://github.com/dellemc-symphony and builds on top of the other open efforts to create a composable system-level API, with the sort of system-level functions one would expect.

I’m NOT saying we have this figured out. There are also multiple efforts in play. I think we can do great things with Intel SNAP. I think we can do great things with Puppet, Chef, and Ansible. It all needs to build on open at the component API levels. In server land – iDRAC has embraced Redfish. In the storage domain, likewise we are cranking on trying to drive this forward. Today, this is wrapped up in open CoprHD efforts (commercially in the ViPR Controller). I really hope the SNIA Swordfish efforts have some success – though SMI-S didn’t take the world by storm. Storage has really lacked from any solid vendor effort here (I know that EMC was trying like heck with SMI-S). Innovation always precedes a standard, but mass market impact comes with standards. We have a LONG way to go here, and the team is running fast and hard.

SECOND: Today, data center fabrics for pooled/shared memory semantics don’t exist, so memory aggregation is a load of hoo-haa. Networking fabrics for Ethernet and SCSI/FC protocols – they are many orders of magnitude off for the latency/throughput/bandwith requirements, and totally lack the protocol support. The early efforts around Gen-Z may ultimately be the way that the industry rallies around (we are supporters – again, a great Rob Hormuth post here).

Our own acquisition and efforts around DSSD (lots of interesting stuff here) were one of the first attempts at a commercial PCIe fabric efforts (picture below)…

image

… you’re looking at one of the first PCIe Fabrics – and we took a hard run at it and failed (DSSD Intellectual property has been refactored into other work in the server and storage domains), so I think we can talk about this more than some.   It’s notable that it the DSSD PCIe fabric still lacked memory semantic – a critical pre-requisite for a real memory-class fabric.

So if we build a little aggregation/disaggregation checklist, we get:

  • Aggregating & Disaggregating storage (transactional/unstructured/object) via SDS = check.
  • Aggregating & Disaggregating network via SDN = check – mostly.
  • Disaggregating compute in a host = check.  Umm… this is called virtualization/containerization.
  • Aggregating compute across hosts = nope, dope. You can pool how you MANAGE hosts in a pool, but you can’t make a 5 CPUs across hosts act as one (though there are some interesting startups here).
  • Disaggregating memory in a host = check … again kernel mode virtualization and containerization.
  • Aggregating memory = nope, dope. Again, virtualization can present all the RAM in a vSphere cluster as a giant pool, you can even oversubscribe all the VMs on a host, but you cannot have a VM that exceeds the RAM of a host.   It’s notable that there are interesting startups here – but I’m deeply skeptical without memory-class fabrics – you know who you are… call me 🙂

CPU and memory aggregation are smoke and mirrors today, or depend on ridiculously proprietary and esoteric hardware. Is it likely to happen over time? I bet it memory aggregation will. The concept of memory-class fabrics with memory semantics – it’s eminently possible, and there are obvious and present workload needs where more RAM and other NVRAM models than you can fit in a server.

I’m more skeptical about CPU aggregation (pooling CPU and making it look like one across commodity/industry standard hosts) – not because it’s not technically possible (though again, relatively far out) – but rather because compute workloads increasingly leverage smaller, and smaller higher level abstractions. Perhaps over time this pendulum will swing, but I don’t see it – at least not on foreseeable time-scales. I’ve been wrong before 🙂

THIRD: We think that proprietary hardware just doesn’t win in mid-to-long term windows. Everyone loves a funky cool bit of hardware innovation – and they CAN have an impact, but the window is super tight. I’m personally guilty of succumbing to its siren call – for me DSSD was an object lesson in this principle. I want to note – you CAN have hardware level innovation that is proprietary that takes the world by storm – but it needs to be multiple orders of magnitude of a jump (DSSD had this), and even then, you have 2 maybe 3 years to make it cook, before the scaled wave of the industry standard catches up (DSSD missed this).

There is an important economics of scale factor at play here. This doesn’t mean “proprietary = bad”, but “proprietary without a path towards standard = bad, and will be short lived”.

—-

Ok, next set of ideas to absorb that you have those three “bull@#$-o-meters

Now – at WHAT LEVEL do the concepts of “composability” apply? This, I think, is the biggest head-fake.

Personally, I think they apply at multiple abstraction points.   Reminder – look at this picture:

image

Vertically integrated stacks are NOT used in a “composable” way at the low-level of hardware. They are composable much higher in the stack – at the commonly at the “virtual pool” layer and sometimes even above that (the IaaS or container/cluster manager level or at the PaaS level).

In vertically integrated stacks you compose and aggregate/disaggregate pools of compute/storage/networking/RAM – and you use them in a programmable fashion – but you do it at the WAY WAY above the hardware layer… think of container/cluster manager level (where a cluster manager – or software-based hardware mediation layer – think BOSH, think Ansible, think Puppet)

Even HCI stacks that started “horizontally” but are trying to move “up stack” start to de-emphasize the lower-level/multi-workload/multi-stack approach… They start to pivot from the “horizontal rectangle” to the “vertical rectangle”…. and start to more tightly bolt their stack together.   The industry example I see here is what Nutanix is doing – which very much seems to be working furiously on this around their own stack – but this intrinsically moves towards a “vertical” approach vs. a “horizontal one”.  If you used to love them for their “horizontal-ness” they need to now convince you of their new “vertical-ness”.

Even if you could add intelligence at the low hardware layer in a vertically integrated stack I’m not sure you should.

It’s not smart to make low level aggregation/disaggregation & telemetry/programmability in these stacks – even if it added value, it likely that value would be “swamped out” in the stack – net it would be a waste of time or innovation that could be done higher up.

Sidebar: people who claim that “hardware infrastructure will be ‘application aware’” I think might be missing the point… or are a hardware vendor desperately fighting forces of commoditization. The value in these vertically integrated stacks is specifically rooted in the fact that the infrastructure hardware is NOT ‘application aware’ but is instead ‘application decoupled’ via abstraction far above the server/network/storage hardware.

Now, what about composability in horizontally integrated stacks? 

That’s a different story.

In those horizontally integrated stacks you cannot depend on a specific higher order abstraction – after all, the infra stops at infra.

In the case where a customer says “I want pools of infrastructure, and at the lowest level of infrastructure I want to layer on top a broad set of use cases” – in that case, the concepts of composability become important.

Pause and think about this…

If you can’t say that every host (and associated network/memory/persistence resources) will run ESXi, or Windows 2016, or RedHat Linux, or a given container abstraction, or heck as a bare metal host – but could run ANY of them… When then you ABSOLUTELY need hardware-level composability.   You need generally applicable disaggregation/aggregation API surfaces, programmability and telemetry.   And that flexibility will always come at a complexity trade-off.    This approach tends to be more “natural” or “resonant” with IT traditionalists – who bias to “horizontal” approaches (because it reflects what they are used to).

This is where we are running for vertical stacks – with iDRAC Redfish, with Open Manage Essentials, with CoprHD/ViPR, with Swordfish, with open networking, with NSX, and with Symphony tying them together into system level operational constructs.

What’s the net?

  • While it will take time to see if “vertically integrated stacks” or “horizontally integrated stacks” win in the long run, or whether like so many things in IT – it’s a case of “and” versus “or” – there is clearly a market need for multi-purpose infrastructure stacks in the enterprise.
  • We are building and shipping kick-a$$ horizontal stacks like VxRack FLEX built on industry standard hardware, software defined approaches, with an open composable API/logic layer – for customers who want flexibility  (it’s in the name of the offer dammit!) in exchange for a complexity trade off – and working furiously to get the programmability/telemetry API work right…  
  • at the same time that we simultaneously build the industry’s best vertically integrated stacks around VMware (VxRail & VxRack SDDC), Microsoft (Azure Stack), SAP (SAP HANA Appliances) and RedHat (RedHat Cloud Platform) ecosystems for customers who want simplicity in exchange for a rigidity trade off.

Those are BOTH personal priorities for me, a priorities for the team, a priorities for the company.   Q: Which is “right”?   A: Varies by customer.   For many, it’s both.   

If you want my opinion, my bet is that in the long run, vertically integrated stacks will tend to win more.   Why?  After all – it has to overcome customer fear of “lock-in” and “islands of infrastructure” (horizontal has ruled the day in the past).   Ultimately I bet that vertical stacks will tend to win because simplicity is an incredible force for good – which, if I’m right, means we’re entering an era of “stack wars”.