Economic Value of Data (EvD) Challenges

Bill Schmarzo
By Bill Schmarzo

CTO, Dell EMC Services (aka “Dean of Big Data”) April 7, 2017

Well, my recent University of San Francisco research paper “Applying Economic Concepts To Big Data To Determine The Financial Value Of The Organization’s Data And Analytics Research Paper” has fueled some very interesting conversations. Most excellent! That was one of its goals.

It is important for organizations to invest the time and effort to understand the economic value of their data because data has a direct impact on an organization’s financial investments and monetization capabilities. However, calculating economic value of data (EvD) is very difficult because:

  • Data does not have an innate fixed value, especially as compared to traditional assets, and
  • Using traditional accounting practices to calculate EvD doesn’t accurately capture the financial and economic potential of the data asset.

And in light of those points, let me share some thoughts that I probably should have been made more evident in the research paper.

Crude oil is a commodity. West Texas Intermediate (WTI), also known as Texas light sweet, is a grade of crude oil used as a benchmark in oil pricing. This grade is described as light because of its relatively low density, and sweet because of its low sulfur content.  WTI is a light crude oil, with an API gravity of around 39.6, specific gravity of about 0.827 and less than 0.5% sulfur[1].

And here’s the important factoid about a commodity: every barrel of Texas light sweet is exactly like any other barrel of Texas light sweet. One barrel of Texas light sweet is indistinguishable from any other barrel of Texas light sweet. Oil is truly a commodity.

However, data is not a commodity. Data does not have a fixed chemical composition, and pieces of data are NOT indistinguishable from any other piece of data. In fact, data may be more akin to genetic code, in so much as the genetic code defines who we are (see Figure 1).

Figure 1: Genetic Code

Figure 1: Genetic Code [2]

Every piece of personal data – every sales transactions, consumer comment, social media posts, phone calls, text messages, credit card transactions, fitness band readings, doctor visits, web browses, keyword searches, etc. – comprises another “strand” of one’s “behavioral genetic code” that indicates one’s inclinations, tendencies, propensities, interests, passions, associations and affiliations.

And it’s not just the raw data that holds valuable strains of our “behavioral genetic code”, the metadata about our transactional and engagement data are a rich source of insights into our behavioral genetic code. For example, look at the metadata associated with a 140-character tweet. 140 characters wouldn’t seem to be much data. However, the richness of that 140-character tweet explodes when you start coupling the tweet with all the metadata necessary to understand the 140-characters in context of the conversation (see Figure 2).

The Bottom-line:

Data is not a commodity, which makes determining the economic value of data very difficult, and maybe even irrelevant, using traditional accounting techniques. Which brings us to the next point…

The challenge with using accounting or GAAP (generally accepted accounting principles) techniques for determining the economic value of data is that accounting uses a retrospective view of your business to determine the value of assets. Accounting determines the value of assets based upon what the organization paid to acquire those assets.

Instead of using the retrospective accounting perspective, we want to take a forward-looking, predictive perspective to determine the economic value of data. We want to apply data science concepts and techniques to determine the EvD by looking at how the data will be used to optimize key business processes, uncover new revenue opportunities, reduce compliance and security risks, and create a more compelling customer experience. Think determining the value of data based upon “value in use” (see Table 1).

Accounting Perspective Data Science Perspective
Historical valuation based upon knowing what has happened Predictive valuation based upon knowing what is likely to happen and what action one should take
Value determination based upon what the organization paid for the asset in the past Value determination based upon how the organization will monetization the asset in the future
Valuations are known with 100% confidence based upon what was paid for the asset Valuations are based on probabilities with confidence levels dependent upon how the asset will be used and monetized
Value determination based upon acquisition costs (“value in acquisition”) Value determination in use based upon how the data will be used (“value in use”)

Table 1:  Accounting versus Data Science Perspectives

This “value in use” perspective traces its roots to Adam Smith, the pioneer of modern economics. In his book “Wealth of Nations,” Adam Smith[3] defined capital as “that part of a man’s stock which provides him a revenue stream.” Adam Smith’s concept of “revenue streams” is consistent with the data science approach looking to leverage data and analytics to create “value in use”.

We have ready examples of how other organizations determine the economic value of assets based upon “value in use” starting with my favorite data science book – Moneyball.  Moneyball describes a strategy of leveraging data and analytics (sabermetrics) to determine how valuable a player might be in the future. One of the biggest challenges for sports teams is to determine a player’s future value since player salaries and salary cap management are the biggest management challenges in sports management. Consequently, data science provides the necessary forward-looking, predictive perspective to make those “future value” decisions.

Sports organizations can not accurately make the economic determination of a player’s value based entirely on their past stats. To address this challenge, basketball created Real Plus-Minus (RPM)[4]. Real Plus-Minus is a predictive metric (score) that is designed to predict how well a player will perform in the future.

The Bottom-line:

We need to transition the economic vale of data conversation away from the accounting retrospective of what we paid to acquire the data, to a data science predictive retrospective of how the data is going to be used to deliver “value in use.”

Data is an asset that can’t be treated like a commodity because:

  1. Every piece of data is different and provides unique value based upon the context (metadata) of that data, and
  2. Traditional retrospective (accounting) methods of determining EvD won’t work because the intrinsic value of the data is not what one paid to acquire the data, but the value is in how that data will be used to create monetization opportunities (“data in use”).

To exploit the economic value of data, organizations need to transition the conversation from an accounting perspective (of what has happened) to a data science perspective (on what is likely to happen) on their data assets. Once you reframe the conversation, the EvD calculation becomes more manageable, more understandable and ultimately more actionable.

[1] https://en.wikipedia.org/wiki/West_Texas_Intermediate

[2] Edited by Seth Miller User:arapacana, Original file designed and produced by: Kosi Gramatikoff User:Kosigrim, courtesy of Abgent, also available in print (commercial offset one-page: original version of the image) by Abgent – Original file: en:File:GeneticCode21.svg, Public Domain, https://commons.wikimedia.org/w/index.php?curid=4574024

[3] “Wealth of Nations”, http://geolib.com/smith.adam/won1-04.html

[4] https://cornerthreehoops.wordpress.com/2014/04/17/explaining-espns-real-plus-minus/