On the Value of Data

Data has become increasingly intertwined with our daily lives as more companies collect, analyze, and utilize it—and its use is growing exponentially. Data is everywhere. IoT is opening up new possibilities in how easily and efficiently we can collect it. Breakthroughs in machine learning are creating demand for larger data sets to improve the accuracy of predictive models. It’s clear that at least some data has value.

In 2006, Clive Humby—a mathematician, entrepreneur, and business strategist—said, “Data is the new oil.” Maybe you’ve heard it before. It has become a kind of popular memetic aphorism. The meaning of this quote was expanded on later by Michael Palmer, vice president of the Association of National Advertisers. He added, “Data is just like crude. It’s valuable, but if unrefined it cannot really be used.” Whether or not you believe this to be true, it’s difficult to argue that data has no value.

Why is data valuable?

The answer is easy: Because the old oil is valuable! The new oil must be valuable!

Not a good enough answer? Okay. There’s more to it than that. So, what do we even mean when we say “valuable”? Money? That’s certainly the most common way to measure the value of data in many companies.

Money"Money" by 401(K) 2013 is licensed under CC BY-SA 2.0

In the business world, data is often used in attempts to increase revenue and decrease costs. Data doesn’t always have intrinsic value. A company could have mountains of data that’s worth nothing (or worse, its value could be a monetary loss depending on how much effort was involved in accumulating it), but when the data does have some insight to offer, it gains monetary value.

  • In the U.S. alone, according to one big data market revenue forecast, the market value is expected to increase from the current 103 billion by 2027.
  • In a global forecast, the big data market is expected to climb from its current estimate of 273 billion by 2026.
  • Based on a survey from 2015, an estimated 40% of companies worldwide were analyzing big data, and they reported:
    • An average increase in revenue by 8%.
    • An average reduction in costs by 10%.

These examples only scratch the surface of the monetary value of data, but this value is only an effect, not a cause.

The true value of data is determined by its ability to solve problems through insight and how people can use it. Valuable data solves problems. For example, from the same 2015 survey (the last example above), the top benefits of analyzing data were:

  • Better strategic decisions.
  • Improved control of operational processes.
  • Better understanding of customers.

These benefits almost certainly led to some of the increased revenue we saw in the survey, but, more importantly, they improved quality of life. A better understanding of customers leads to happier customers. Better strategies and control of operational processes can result in less stressful work environments for employees. Even if a company is only interested in the money at the end of the process, the data itself must possess non-monetary value to get there. This is the more profound value of data and it’s the cause of data having monetary value.

Now, let’s move away from business and money to focus solely on the underlying value. The concept of data has been around for a long time. From tally marks on stones to rows of servers full of data, a lot has changed, but the core of what makes data valuable hasn’t. Sciencific research is always a great place to look to see the impact of the value of data.

Maternal Health in Developing Countries"Maternal Health in Developing Countries" by United Nations Photo is licensed under CC BY-NC-ND 2.0

In 2006, the US maternal mortality rate average was 13 deaths per 100,000 births. In the same year, California started the California Maternal Quality Care Collaborative, which focuses on decreasing these numbers using data. They analyzed data on contributing factors and used what they found to create toolkits consisting of best practices for given scenarios. Since then, the numbers in the US as a whole have climbed, while decreasing in California. In the first 7 years, they lowered the mortality rate by 55%.

YearUS TotalCalifornia
200613.316.9
200712.711.1
200815.514.0
200916.611.6
201016.99.2
201119.37.4
201219.96.2
201322.07.3

Kras-Driven Lung Cancer"Kras-Driven Lung Cancer" by National Institutes of Health (NIH) is licensed under CC BY-NC 2.0

Another example is cancer detection. You might have heard or read about this. It has done its rounds in the media, and for good reason. Good data paired with machine learning can greatly improve the chances of detecting cancer early.

  • This 2020 article from Nature describes the impact and success they’ve seen so far with lung cancer detection:
    • A decrease in mortality by 20-30%
    • Their system correctly identifies early-stage lung cancer 94% of the time (outperforming a panel of 6 senior radiologists)
  • Another 2020 article from Nature discusses a wider application of cancer in general. They’ve trained a model on 60,000 tumor samples:
    • Recognizes around 150 types of cancer
    • The model found 79.2% of lesions vs 80.7% found by radiologists with at least 10 years of experience, but the model’s numbers are expected to increase in the future

Why is quality data important?

quality¿"quality¿" by MrVJTod is licensed under CC BY-SA 2.0

Before you can solve problems or answer questions with data, you must first validate its quality. This will ultimately determine whether or not it has value. Some of the widely accepted measures of data quality are:

  • Accuracy - one of the most critical aspects. Without accurate data, any answers or findings aren’t accurate either.
  • Completeness - should provide enough attributes to satisfy its use requirements.
  • Consistency - values for a given attribute should be consistent across all systems using it.
  • Timeliness - should be as up-to-date as possible. The further it falls behind, the more likely it will become inaccurate.
  • Uniqueness - shouldn’t have unnecessary exact duplicates or non-exact match duplicates (see: Consistency—there could be overlap here)
  • Validity - all values of a given attribute should use the same format.

Without quality data, the results of an analysis aren’t likely to be accurate or helpful. This can lead to many problems. A few of the most obvious would be:

  • Huge monetary losses
  • Poor quality research results lead to the spread of incorrect or false information (which could affect an almost infinite number of aspects of our lives)
  • Or, worst-case scenario, if you’re working in a field that deals with human life or safety, it could lead to death or injury.

The future of data

Futuristic Utopian city"Futuristic Utopian city" by Futuristic Society is licensed under CC PDM 1.0

Data can offer value in a myriad of ways. With the continual growth and evolution of data collection, analytics, IoT, and machine learning, the ways in which data proves its value will certainly do the same. We will undoubtedly continue to see improvements to our quality of life. However, as data is involved in more and more aspects of our lives, it will become increasingly important to ensure that our data is of the best quality.

As an optimistic futurist, I hope that more people can acknowledge the potential of data and use it responsibly, conscientously, and to benefit humanity as a whole. Of course, as with any power, tool, breakthrough, or knowledge, there will always be people that use it for nefarious purposes. So far, the impact of data on the world looks like a net positive. We just need to do our part to make sure it stays that way.