Many industrial companies have been collecting data for several years if not several decades with varying degree of success in terms of the quality of the data collected. We asked ourselves, if data is the new gold, why are companies struggling with converting it to real tangible revenue? Afterall, this was the reason we started Viking Analytics; to help industrial companies get to actionable insights from their data. We discovered this is a difficult and a complex question to answer. There is a wide range of reasons varying from technical challenges such as IT infrastructure, difficulty of accessing the data, and data quality to organizational challenges such as lack of well-defined ways of working around the data. In this article, however, I would like to focus on a specific aspect: the misconception between data and information.
When we talk about data technically, we are talking about collections of zeros and ones stored in a database somewhere. A question such as “how much data has your company collected in the past year?” can be answered quickly and accurately down to the last bit; An example answer could be “N GB from M different sensors at a rate of X sample per minutes”.
But the same question regarding the amount of information collected in the past year cannot be easily answered! To start with, information is a fuzzy concept, and we don’t have same quantitative measures we have for the data to measure it. Often by information we mean insight or knowledge obtained or that could potentially be extracted from the data.
Actually, there is an entire field called information theory established by Claude Shannon that deals with mathematical definition of information. What Shannon taught us was that the amount of information in an event is a function of probability of that event. The more unlikely the event, the more information it contains. For example, compare the following two sentences:
- Last December, it snowed heavily in Stockholm.
- Last December, it snowed heavily in Barcelona.
Both sentences have the same length, and we can assume that they contain the same amount of data i.e., same amount of memory needed to store them. But as a human, we find the second sentence more surprising as heavy snow in Barcelona is a much rarer event than heavy snow in Stockholm. The surprise is the felt sense that we have received more information by reading the second sentence.
Is data the new gold?
Now that we understand the difference between data and information, it is easy to see that data is not gold. The value in the data is proportional to the information that can be extracted from it. The correct analogy, in my view, is that data is better understood as a gold mine and it is the information, aka, the insights extracted from it which is the gold.
One reason that this misconception has taken root is that there are applications in which mining information from data is much easier and has reached a high-level maturity. The main example here is the type of data that relates to humans and our behaviors that are collected by many companies and monetized in variety of ways, from sales and marketing to politics.
How to get to the gold?
The confusion between gold and goldmine is widespread across the industry and has resulted in companies investing large sums in infrastructure and tools to collect and store the data, creating a large gold mine without having a clear plan on how to extract the gold from it. One of the signs is that most experts in industry are using general purpose tools such as spreadsheets to extract and analyze the data. So many man hours are being wasted every day in the process of extracting of data from multiple databases, merging into a single data set, and cleaning the data before the analysis can even start!
However, there are many steps that companies can take to become better at extracting gold pieces from the goldmine they have been building over years. One of the first steps in solving any problem is making it visible and measurable. To this end, companies need to start measuring the number of hours their experts spend on data extraction, clean-up and preparation activities in contrast to actual data analysis. This is simple enough to be done regularly and create a KPI to track the company’s progress towards reducing the overhead when it comes to insight extraction. Fortunately, there are good tools in the market that can help you with this. Wink wink! 🙂
Insight management is another area that needs more attention from companies. There is a lack of process and tools in companies when it comes to capturing insights and expertise and making it easily accessible. It’s common to see analysis of cases sent around as attachment in emails or as spreadsheet or report document on SharePoint. Instead, data needs to be augmented by the analysis done by the experts and their conclusion. This makes it much easier to compare future events with previous ones and speeds up the resolution of similar cases in the future. This methodology has the additional benefits of helping knowledge transfer between a young newcomer who take over from a more experienced colleague.
Lastly, companies need to assign dedicated time for exploratory activities. Allowing experts to explore the data without having a pressure to solve specific case allows companies to identify issues early and propose improvement and optimization possibilities. Here, using tools that can help the expert to explore more efficiently with help of machine learning algorithm can be huge dividend over time.