Applying a Factory Model to Artificial Intelligence and Machine Learning

Doug Cackett
By Doug Cackett

EMEA Big Data & IoT Solution Lead, Dell EMC Consulting October 29, 2018

We’ve understood for a long time that organizations who spend more on, and are better at, deriving value from their data using analytics significantly outperform their peers in the market. All of us also know, because we feel it, that the pace of change is ever increasing.  I see this all the time with the customers I work with, many of whom seem to be suffering from the “Red Queen” effect – each having to change and innovate faster just to keep standing still, let alone make progress against a tide of change.

I’ve also had cause to re-read Salim Ismail’s book, “Exponential Organizations”, recently which got me thinking: in order to compete, we should be designing and building solutions that allow us to exponentially grow our capacity to create value from data in order to meet business demand. Importantly though, how do we do that without also exponentially growing infrastructure costs (a bad idea) and the number of Data Scientists employed (an impossible dream)? That’s a really great exam question and one I’d like to explore in this blog.

Scaling Analytics with a Factory Model

I think that one of the reasons Artificial Intelligence and Machine Learning (AI / ML) are on the radar of all CxOs these days is because they’re seen as a way of closing the yawning gap most companies have between their capacity to collect data and their ability to apply it in the form of actionable insights. In other words, there’s a certain amount of ‘magical thinking’ going on here and AI/ML is the new magic being applied.

Our view is that the answer to our exam question lies in a more industrialized process.  We have been using a factory model concept with our customers to help them to address this central question of scaling efficiently. Take a look at the model in its entirety and then I’ll dissect it.

# 1: How do you drive innovation with AI/ML technologies?

As AI / ML technologies, packaging, frameworks and tooling are emerging so rapidly, there’s a real need to evaluate these new capabilities with a view to understanding the potential impact they might have on your business. The right place to do that is an R&D Lab.

At this point, we’re just trying to assess the technology and identify the potential business value and market impact. Is it a potentially disruptive technology that we need to start thinking about, perhaps disrupting ourselves before we get disrupted by others?  Or, it may just be a slightly better mousetrap than the one we are already using. By assessing the technology at the edge, we can answer questions around the planning horizon and take the appropriate steps to introduce the technology to the right parts of the business so it can be evaluated more completely.

The most important thing to bear in mind here is that this is a critical business function. It can’t be seen as a purely academic exercise conducted by an isolated team. Disruption is a modern reality and an existential threat to every business – the R&D function is a strategic investment that links you and your business to tomorrow’s world.

Development can happen both ways around of course. As well as technology-led, it might be that your Lean Innovation team is scanning the technology horizon to fill engineering gaps in a product that’s being brought to market. Close cooperation between these teams resulting in a melting pot of innovation is exactly what’s needed to survive and thrive over the long term. Today is the least amount of change you will ever see – we had all better get used to it!

The goal, regardless of whether development started from an idea sparked from the technology or the business side, is to become something of significance to the organization. It could be adding a net new product to the current portfolio, together with the corresponding line in the annual chart of accounts, or perhaps a more fundamental change is needed to maximize its potential, spinning it out to a completely new company.  If you’re interested in reading more around this topic, I’d recommend reading Geoffrey Moore’s book, “Zone to Win”.

As strategic developments progress, they will mature from Horizon 3 to Horizon 2 and finally into the more immediate Horizon 1, assuming they continue to be viewed as adding value to the business. At this point, if you haven’t already done so, you may like to stop here and quickly read my previous blog Industrializing the Data Value Creation Process, that looked at a conceptual framework for thinking about the way we extract commercial value from data – it might help you understand the process side of what I’m about to explain!

#2: How do you prioritize Horizon 1 activities?

At its heart, given the infinite demand and finite resources available in most organizations, you need to decide what you are going to spend your time on – this prioritization challenge needs to be based on a combination of factors, including overall strategy, likely value and current business priorities, as well as the availability of the data required.

The data doesn’t need to be available in its final form at this stage of course, but you may need to have at least some accessible to start the discovery process. Besides, data has a nasty habit of tripping you up, as it almost always takes longer to sort out legal and technical issues than you think, so addressing these kinds of challenges before you begin the data discovery work is normally a sound investment.

If data is the new oil, then the first, and most crucial step is discovering the next reserve under the ground. In our case, we’re typically using AI/ML to find a data pattern that can be applied to create commercial value. That discovery process is really crucial so we need to ensure our Data Scientists have the right environment and tools available so we have the best possible chance of finding that oil if it’s down there!

#3: How do you maximize Data Scientist productivity?

We know from experience that one size really doesn’t fit all, especially when it comes to Data Science. Some problems will be data heavy and others data light. Some will require extensive time spent data wrangling while others use heavy-weight GPU acceleration to plough through deep and computationally heavy neural networks. Libraries and tooling is also very likely to be different and may be driven by the personal preferences of the Data Scientists doing the work. Now, while you could force them all to use the one environment and one set of tools, why would you do that if your goal is to maximize productivity and employee satisfaction? The very last thing you need if you’re trying to scale up the Data Science work you’re doing is for your Data Scientists to be walking out of the door because they don’t like the setup. While I’m all in favor of standardization where it makes sense, technology has really moved past the point where this is strictly necessary.

If you scale Data Science by crowding all of your Data Scientists around a single production line with just the one set of tools and shared resources, they can’t help but get in each other’s way. Besides, the production line will inevitably be running at the pace of the slowest Data Scientist….or worse, the production line may even break because of the experiments one Data Scientist is undertaking.

It’s not that Data Scientists don’t collaborate and work in teams – it’s more that each will be much more productive if you give them a separate isolated environment, tailored specifically to the challenge they are faced with and tools they know. That way they get to independently determine the speed of the production line, which tools they use and how they are laid out. See my related blog Applying Parenting Skills to Big Data: Provide the Right Tools and a Safe Place to Play…and Be Quick About It!.

#4: How do you address data supply chain and quality issues?

Just like at a production line you might see at BMW or Ford, if we want to avoid any interruptions in production we need to ensure our supply chain delivers the right parts just in time for them to be assembled into the end-product. In our case this is all about the data, with the end product being a data product of some kind such as a classification model that could be used to score new data or perhaps just the scored results themselves.

As we never want to stop the production line or fail our final assembly, we also need to make sure the data is of an acceptable quality level. Since we don’t want to do that validation right next to the production line, we need to push the profiling and validation activity as far upstream as we can so it doesn’t interfere with the production line itself and any quality problems can be properly addressed.

#5: How do you scale compute and storage?

With a suitable infrastructure in place, combined with access to the right data and tooling, the Data Scientist is all set to do their work.

In most, if not all cases, the Data Scientist will need read access to governed data that is in scope for the analysis, along with the ability to upload external data assets that could be of value. They will also need to be able to iteratively save this data as the source datasets are integrated, wrangled and additional data facets generated to improve model performance. In traditional environments, this might mean a significant delay and additional costs as data is replicated multiple times for each Data Scientist and each use case, but it doesn’t have to happen that way! The other advantage with moving away from a legacy Direct Attach Storage (DAS) approach is that most Network Attached Storage (NAS) and cloud deployments provide change on write snapshot technologies, so replicas take near zero additional capacity and time to create in the first place with only the changed data consuming any capacity.

While we’re on the topic of cost and scale, of course the other thing you want to independently scale is the compute side of things. As I’ve already mentioned, some workloads will be naturally storage heavy and others compute heavy and storage light.  Data Science discovery projects also tend to be ephemeral in nature, but that’s also true of many production workloads such as ETL jobs. By leveraging the flexibility of virtualized infrastructure and dynamically managing resources, you can scale them up and down to match performance needs.  In this way, you can manage the natural variations in business activity and complimentary workloads to dramatically increase server utilization rates.  That heavy ETL load you process at the end of each financial period could be scaled out massively overnight when the Data Science team isn’t using the resources and scaled back when they are.  Through a virtualized approach, we can create differently shaped environments and make better use of the resources at our disposal. A simple control plane makes operational considerations a non-issue.

Once the discovery task is completed, the Data Scientist will want to save their work for future reference and share any new artefacts with their peers. Assuming they found something of value that needs to be put into production, they can prepare a work package that can be dropped into the Agile Development team’s engineering backlog.

#6: How do you accelerate the time to production?

The Agile Development team will typically include a blend of Data Architects, Agile Developers and junior Data Scientists with work prioritized and picked off the backlog based on available resources, as well as effort estimates and business priorities.

The same rules apply to the Agile Development team as they did for the Data Scientists.  Keeping them busy and effective means making sure they have everything they need at their disposal. Waiting for suitable development and analytical environments to be provisioned or data to be authorized or secured is not a good use of anyone’s time!  Using the same virtualized approach, we can quickly create an environment for the agile team that includes a more limited set of Data Science tooling (for scoring models) and the tool chain needed for the development work.

All provisioned in seconds, not weeks or months.

The next stage in the route to production for our data product will be more formal User Acceptance Testing (UAT). We can use our virtualized as-a-Service provisioning yet again here, only this time, rather than including the Agile Developers’ tool chain, we’ll include the testing infrastructure in the environment build instead.

The other aspect of efficiency worth noting is that for the most part the time allocated to the Data Science, development and testing tasks is very predictable. The Data Scientists’ work will often be time boxed – producing the best model possible within a set amount of time. In a traditional approach, additional delays are introduced because nobody could predict when the Data Science work would actually be started because of the unpredictable nature of provisioning delays. Addressing this one issue means that each team has a much better chance of sticking to schedule, making the entire process more dependable.

Once development and testing is completed, we need to move our new data product into a production setting. As discussed previously, some workloads are ephemeral in nature while others are not, often because they are stateful or their workloads can’t be resumed if they were to fail for some reason. Operationalizing the workload means selecting the appropriate environment based on its characteristics and then implementing and instrumenting it up appropriately.  This is an interesting topic in its own right and worthy of a follow-up blog!

#7 How do you know if the model is performing as designed?

Having changed the business process in some fashion because of our new data product, we need to have some way of monitoring its performance – ensuring our real-world results are as expected, triggering either management attention or a simple model rebuild when performance declines below acceptable limits.

In practice, this can often mean adding a new measure or report to existing BI solution or real-time monitoring dashboards. To facilitate this, the Agile Development team may have already created an additional SQL view describing performance that can be simply picked up and consumed by the BI team, greatly simplifying implementation.

Putting It All Together

To achieve success with Artificial Intelligence and Machine Learning, it’s critical that you have the right teams within your organization, including Data Scientists, R&D, Lean Innovation, and Agile Development, as well as an industrialized ‘data factory’ process that enables you to get value from your data as efficiently as possible. Technology of course plays a critical role as well, as you need to be able to provision environments quickly and securely, provide tool flexibility, and have the optimal infrastructure in place to support Data Science workloads.

At Dell EMC, we work with customers at all stages of analytics maturity to plan, implement and optimize solutions and infrastructure that enable organizations to drive value from data and support advanced techniques, including artificial intelligence and machine learning. That includes working across the people, process and technology aspects in a grounded and pragmatic fashion to accelerate the time to value and maximize Big Data investments.

If you’re looking for a trusted partner to help you on your analytics journey and take advantage of the latest technologies and techniques, Dell EMC Consulting is here to help. Learn more about our services and solutions and contact your account rep to discuss further. I also welcome you to join the conversation by posting your thoughts or questions below.

Before you go

Make sure to download the Factory Model for Artificial Intelligence and Machine Learning Interactive Infographic [best viewed in full screen mode].