Refined Thinking like a Data Scientist Series
- December 26, 2017
We don’t need “citizen data scientists”; we need “citizens of data science.”
I’ve written several blogs and conducted numerous student and executive training sessions associated with getting business stakeholders to “think like a data scientist.” We are not trying to turn the business stakeholders into data scientists. Instead we want to train the business stakeholders in “thinking like a data scientist” – to become citizens of data science – by understanding where and how data science can impact their business in order to accelerate organizational adoption.
The Thinking like a Data Scientist process has evolved over time. So I’m going to use this blog as an opportunity to document the refined (and hopefully simplified) process.
The business stakeholder objectives for the “Thinking like a Data Scientist” methodology is:
The “Thinking like a Data Scientist” methodology has evolved as we’ve applied it across client engagements, and have learned what works and what doesn’t work. So I will use this blog to update the methodology and supporting materials (see flow below).
I’ll also use this blog as an opportunity to pull together all the “Thinking like a Data Scientist” blogs into a single location. Besides, pulling all of these blogs in a single blog makes it easier for me when assigning reading to my University of San Francisco business students.
Before we dive into the methodology, let’s start by defining data science:
Data science is about identifying variables and metrics that might be better predictors of performance.
It is the word “might” that is key to the “Thinking like a Data Scientist” process. The word “might” is a license to be wrong, a license to think outside the box when trying to identify variables and metrics that might be better predictors of performance.
In order to create the “right” analytic models, the Data Science team will embrace a highly iterative, “fail fast / learn faster” environment. The data science team will test different variables, different data transformations, different data enrichments and different analytic algorithms until they have failed enough times to feel “comfortable” with the variables that have been selected. See the blog “Demystifying Data Science” to better understand the role of “might” in the data science process.
If you want your data science effort to be relevant and meaningful to the business, start with a key business initiative. A key business initiative is characterized as:
Examples of key business initiatives could include:
These key business initiatives can be found in annual reports, analyst briefings, executive conference presentations, press releases, or maybe just ask your executives what are the organization’s most important business initiatives over the next 12 to 18 months (see Figure 1).
Step 2 identifies the business stakeholders (and constituents) are those functions that either impact or are impacted by the targeted business initiative. These stakeholders and constituents are the targets for your “Thinking like a Data Scientist” training as they have the domain knowledge necessary to improve analytic model effectiveness and drive organizational adoption (see Figure 2).
Ideally for each stakeholder or constituent, you would create a single-slide persona that outlines that stakeholder’s or constituent’s roles, responsibilities, decisions and pain points (see Figure 3).
Step 3 identifies the business entities (sometimes called “strategic nouns”) around which we will create and capture analytic insights. Business entities include customers, patients, students, physicians, store managers, engineers, and agents. But business entities can also include “things” such as jet engines, wind turbines, trucks, cars, medical devices and even buildings (see Figure 4).
Ideally the data science team will create an analytic profile for each individual business entity to help in the capture, refinement and re-use of the organization’s analytic insights. Analytic Profiles capture the organization’s analytic assets in a way that facilities the refinement and sharing of those analytic assets across multiple use cases (see Figure 5).
An Analytic Profile consists of metrics, predictive indicators, segments, scores, and business rules that codify the behaviors, preferences, propensities, inclinations, tendencies, interests, associations and affiliations for the organization’s key business entities. See the blog “Analytic Profiles: Key to Data Monetization” for more details on the workings of an Analytic Profile.
Step 4 is focused on leveraging the domain expertise of the business stakeholders to identify those variables and metrics (data sources) that might be better predictors of performance.
To facilitate the brainstorming of data sources, we will take the business stakeholders through an exercise to convert some of their descriptive questions into predictive questions that support the targeted business initiative. That is, we will transition the stakeholders from asking descriptive questions about what happened, to ask predictive questions about what is likely to happen. Figure 6 shows an example of the “descriptive to predictive” questions conversion.
We then take a couple of the most important predictive questions and add the following phrase to that predictive question in order to facilitate the data source brainstorming process: “…and what data sources might we need to make that prediction?”
Then ask the stakeholders to work in small groups to identify and capture the potential data sources on Post It notes (one variable or data source per Post It note). We then bring all the stakeholders back together to create an aggregated list of potential variables and metrics (data sources) that the data science team might want to test (see Figure 7).
After brainstorming the data sources, then the business stakeholders rank the data sources for each use case based upon that data source’s likely predictive value to that use case (we use a range of 1 to 5 in Figure 14). While this process is highly subjective, it’s surprising how accurate the business stakeholders will be in judging what data sources are likely to be the most relevant (see the final result in Figure 8).
Step 5 brainstorms the decisions necessary to support the targeted business initiative, groups the decisions into similar clusters (use cases), and then prioritizes the use cases based upon business value and implementation feasibility over the next 12 to 18 months.
The decisions are gathered via a series of interviews and facilitated brainstorming sessions with the business stakeholders and constituents (see Figure 9).
NOTE: During the facilitated brainstorming sessions, it is critical to remember facilitation rule #1: All ideas are worthy of consideration!
Next via a facilitated group exercise with the key business stakeholders and constituents, the decisions are grouped together in similar subject areas (see Figure 10).
NOTE: During this facilitated grouping exercise, there will be much discussion to clarify the decisions and the grouping of those decisions into similar use cases. Capture these conversations, as the insights from these conversations might be instrumental in the data science execution process.
Finally, you want to prioritize use cases (on axis of business value and implementation feasibility) to create an Analytic Use Case Roadmap (see Figure 11).
NOTE: During the prioritization process, there will again be much discussion about why certain use cases are positioned vis-à-vis other user cases from both a business value and implementation feasibility perspective. Capture these conversations as they might yield critical insights that impact the ultimate funding of the data science project.
Step 7 focuses on grouping variables and metrics into similar clusters that the data science team can then explore as the basis for creating analytic “scores” or recommendations. Scores are analytic models comprised of a variety of weighted variables that can be used to support key operational decisions. Maybe the most familiar score is the FICO score, which combines a variety of weighted metrics about a loan applicant’s financial and credit history in order to create a single value, or score, that lenders use to determine a borrower’s likelihood to repay a loan.
For our example, we can start to see two groupings of variables around two potential scores: “Local Economic Potential” and “Local Vitality” (see Figure 12).
Scores are critical components in the “Thinking Like a Data Scientist” process. They are the collaboration point between the business stakeholders and the data science team in developing analytics to support the decisions and the key business initiative. Scores support the key operational decisions that the business stakeholders making in support of our targeted business initiative.
Step 8 ties everything together: the scores that support the recommendations to the key operational decisions that support our business initiative. The worksheet in Step 8 is best created in collaboration with business stakeholders (who understand the decisions and can envision the potential recommendations) and the data science team (who understand how to convert the scores into analytic models). See Figure 13.
I expect that this process will continue to evolve as we execute more data science projects and collaborate with the business stakeholders to ensure that the data and the data science work is delivering quantifiable and measurable business value.
As they famously say: Watch this space!