Refined Thinking like a Data Scientist Series

By Bill Schmarzo
CTO, Dell EMC Services (aka “Dean of Big Data”) December 25, 2017

We don’t need “citizen data scientists”; we need “citizens of data science.”

I’ve written several blogs and conducted numerous student and executive training sessions associated with getting business stakeholders to “think like a data scientist.” We are not trying to turn the business stakeholders into data scientists. Instead we want to train the business stakeholders in “thinking like a data scientist” – to become citizens of data science – by understanding where and how data science can impact their business in order to accelerate organizational adoption.

The Thinking like a Data Scientist process has evolved over time. So I’m going to use this blog as an opportunity to document the refined (and hopefully simplified) process.

The business stakeholder objectives for the “Thinking like a Data Scientist” methodology is:

Identify the right decisions to make, predictions to create, and hypotheses to test
Evolve from descriptive questions about what happened, to predictive questions about what is likely to happen and prescriptive questions about what actions to take
Brainstorm different variables and metrics (data sources) that might yield better predictors of business performance
Blend metrics and variables to create actionable “scores”
Identify where and how analytics can optimize key business and operational processes, reduce compliance and security risks, optimize product performance, uncover new business opportunities and create a more compelling user engagement

The “Thinking like a Data Scientist” methodology has evolved as we’ve applied it across client engagements, and have learned what works and what doesn’t work. So I will use this blog to update the methodology and supporting materials (see flow below).

I’ll also use this blog as an opportunity to pull together all the “Thinking like a Data Scientist” blogs into a single location. Besides, pulling all of these blogs in a single blog makes it easier for me when assigning reading to my University of San Francisco business students.

Some Classroom Prerequisites

Before we dive into the methodology, let’s start by defining data science:

Data science is about identifying variables and metrics that might be better predictors of performance.

It is the word “might” that is key to the “Thinking like a Data Scientist” process. The word “might” is a license to be wrong, a license to think outside the box when trying to identify variables and metrics that might be better predictors of performance.

In order to create the “right” analytic models, the Data Science team will embrace a highly iterative, “fail fast / learn faster” environment. The data science team will test different variables, different data transformations, different data enrichments and different analytic algorithms until they have failed enough times to feel “comfortable” with the variables that have been selected. See the blog “Demystifying Data Science” to better understand the role of “might” in the data science process.

Step 1: Identify Target Business Initiative

If you want your data science effort to be relevant and meaningful to the business, start with a key business initiative. A key business initiative is characterized as:

Critical to immediate-term business performance (12 to 18 months)
Documented (either internally or publicly)
Cross-functional (involves more than one business function)
Owned and/or championed by a senior business executive
Has a measurable financial or Return on Investment goal
Has a defined delivery timeframe (9 to 12 months)

Examples of key business initiatives could include:

Improving customer retention by 10% may be worth $25M over the next 12 months
Reducing obsolete and excess inventory by 10% may be worth $45M over the next 12 months
Improving on-time deliveries by 5% may be worth $85M over the next 12 months
Reducing unplanned network down-time by 5% may be worth $70M over the next 12 months

These key business initiatives can be found in annual reports, analyst briefings, executive conference presentations, press releases, or maybe just ask your executives what are the organization’s most important business initiatives over the next 12 to 18 months (see Figure 1).

Figure 2: Chipotle’s Annual Report and Their Key Business Initiatives

Figure 1: Chipotle’s Annual Report and Their Key Business Initiatives

Step 2: Identify Business Stakeholders

Step 2 identifies the business stakeholders (and constituents) are those functions that either impact or are impacted by the targeted business initiative. These stakeholders and constituents are the targets for your “Thinking like a Data Scientist” training as they have the domain knowledge necessary to improve analytic model effectiveness and drive organizational adoption (see Figure 2).

Figure 2: Identify Business Stakeholders or Constituents

Ideally for each stakeholder or constituent, you would create a single-slide persona that outlines that stakeholder’s or constituent’s roles, responsibilities, decisions and pain points (see Figure 3).

Figure 3: Business Stakeholder Persona

Step 3: Identify Business Entities

Step 3 identifies the business entities (sometimes called “strategic nouns”) around which we will create and capture analytic insights. Business entities include customers, patients, students, physicians, store managers, engineers, and agents. But business entities can also include “things” such as jet engines, wind turbines, trucks, cars, medical devices and even buildings (see Figure 4).

Figure 4: Identify Key Business Entities (or Strategic Nouns)

Ideally the data science team will create an analytic profile for each individual business entity to help in the capture, refinement and re-use of the organization’s analytic insights. Analytic Profiles capture the organization’s analytic assets in a way that facilities the refinement and sharing of those analytic assets across multiple use cases (see Figure 5).

Figure 5: Analytic Profiles

An Analytic Profile consists of metrics, predictive indicators, segments, scores, and business rules that codify the behaviors, preferences, propensities, inclinations, tendencies, interests, associations and affiliations for the organization’s key business entities. See the blog “Analytic Profiles: Key to Data Monetization” for more details on the workings of an Analytic Profile.

Step 4: Brainstorm Data Sources

Step 4 is focused on leveraging the domain expertise of the business stakeholders to identify those variables and metrics (data sources) that might be better predictors of performance.

To facilitate the brainstorming of data sources, we will take the business stakeholders through an exercise to convert some of their descriptive questions into predictive questions that support the targeted business initiative. That is, we will transition the stakeholders from asking descriptive questions about what happened, to ask predictive questions about what is likely to happen. Figure 6 shows an example of the “descriptive to predictive” questions conversion.

Figure 6: Converting Descriptive Questions to Predictive Questions

We then take a couple of the most important predictive questions and add the following phrase to that predictive question in order to facilitate the data source brainstorming process: “…and what data sources might we need to make that prediction?”

For example:

What will revenues be next month and what data sources might we need to make that prediction?
How many new customers are we likely to acquire next quarter and what data sources might we need to make that prediction?

Then ask the stakeholders to work in small groups to identify and capture the potential data sources on Post It notes (one variable or data source per Post It note). We then bring all the stakeholders back together to create an aggregated list of potential variables and metrics (data sources) that the data science team might want to test (see Figure 7).

Figure 7: Brainstorming Data Sources (Variables and Metrics)

After brainstorming the data sources, then the business stakeholders rank the data sources for each use case based upon that data source’s likely predictive value to that use case (we use a range of 1 to 5 in Figure 14). While this process is highly subjective, it’s surprising how accurate the business stakeholders will be in judging what data sources are likely to be the most relevant (see the final result in Figure 8).

Figure 8: Ranking Data Sources vis-à-vis Use Cases

Step 5: Capture and Prioritize Analytic Use Cases

Step 5 brainstorms the decisions necessary to support the targeted business initiative, groups the decisions into similar clusters (use cases), and then prioritizes the use cases based upon business value and implementation feasibility over the next 12 to 18 months.

The decisions are gathered via a series of interviews and facilitated brainstorming sessions with the business stakeholders and constituents (see Figure 9).

Figure 9: Brainstorm Decisions by Stakeholder or Key Constituent

NOTE: During the facilitated brainstorming sessions, it is critical to remember facilitation rule #1: All ideas are worthy of consideration!

Next via a facilitated group exercise with the key business stakeholders and constituents, the decisions are grouped together in similar subject areas (see Figure 10).

Figure 10: Group Decisions into Common Subject Areas

NOTE: During this facilitated grouping exercise, there will be much discussion to clarify the decisions and the grouping of those decisions into similar use cases. Capture these conversations, as the insights from these conversations might be instrumental in the data science execution process.

Finally, you want to prioritize use cases (on axis of business value and implementation feasibility) to create an Analytic Use Case Roadmap (see Figure 11).

Figure 11: Prioritize Analytics Use Cases

NOTE: During the prioritization process, there will again be much discussion about why certain use cases are positioned vis-à-vis other user cases from both a business value and implementation feasibility perspective. Capture these conversations as they might yield critical insights that impact the ultimate funding of the data science project.

Step 6: Identify Potential Analytic Scores

Step 7 focuses on grouping variables and metrics into similar clusters that the data science team can then explore as the basis for creating analytic “scores” or recommendations. Scores are analytic models comprised of a variety of weighted variables that can be used to support key operational decisions. Maybe the most familiar score is the FICO score, which combines a variety of weighted metrics about a loan applicant’s financial and credit history in order to create a single value, or score, that lenders use to determine a borrower’s likelihood to repay a loan.

For our example, we can start to see two groupings of variables around two potential scores: “Local Economic Potential” and “Local Vitality” (see Figure 12).

Figure 12: Grouping Variables into Potential Scores

Scores are critical components in the “Thinking Like a Data Scientist” process. They are the collaboration point between the business stakeholders and the data science team in developing analytics to support the decisions and the key business initiative. Scores support the key operational decisions that the business stakeholders making in support of our targeted business initiative.

Step 7: Identify Recommendations

Step 8 ties everything together: the scores that support the recommendations to the key operational decisions that support our business initiative. The worksheet in Step 8 is best created in collaboration with business stakeholders (who understand the decisions and can envision the potential recommendations) and the data science team (who understand how to convert the scores into analytic models). See Figure 13.

Figure 13: Linking Decisions to Recommendations to Analytic Scores

Thinking like a Data Scientist Summary

I expect that this process will continue to evolve as we execute more data science projects and collaborate with the business stakeholders to ensure that the data and the data science work is delivering quantifiable and measurable business value.

As they famously say: Watch this space!