As insurers work toward becoming digital organizations, I’ve made the point
At its simplest, data literacy is getting people to understand that data has value—that what they’re doing with data is one link in a bigger value chain. There are different types of data, with different branches of value that are important to be aware of. But let’s begin with an overview of the different types of data that every insurance company will have access to.
Hot, Warm, and Cold Data
In any organization, data is used for three purposes. First, it’s the "essential oil"of the business operations. This is transactional data, or operational data, flowing around operational systems—and is known as "Hot" data. If this data is flowing through online (digital) systems, it is known as Online Transactional Processing (OLTP) data. The second purpose of data is for regular, unchanging reporting—for example, regulatory reporting—and this is known as "Warm" data. The third and currently sexiest use of data is to fuel analytics and data science. When used for this purpose, data is moved away from the operational systems—and for that reason it’s known as "Cold" data. And if it’s drawn from online or digital systems, it is known as Online Analytical Processing (OLAP) data. This use of data delivers the insight and perceived higher or derivative ‘value’ from the data.
So, data can (and does) move between these states; data literate organizations will understand and distinguish between these three different types of data. There are several benefits of thinking about data in this way. First, it breaks the constant dilemma of addressing the need to migrate from legacy systems before an organization can get value from data. This is not true; there is no such need. Second, looking at data in this way enables an organization to release the insight and value in the data more quickly. On a more organization-wide level, data viewed this way rapidly improves the quality and speed of decision making. Finally, it provides for a more thorough understanding of the data that will be needed to assist legacy migrations in the future.
Good Data Eliminates Headaches
It’s of paramount importance to get the data right from the beginning. Organizations that find themselves having to clean the data is a symptom of their not having understood the value chain. In an ideal world, at the point of collection, only clean data will go into the ecosystem. Here’s an example: when an IT team designs a form to attract the customer, they might or might not have data validation around it. We’ve all filled in online forms—some tell us, "that’s not a valid ZIP code." That’s data validation happening at the point of collection. That is how you get clean data, how you will start your data value chain from a good high point, not a difficult low point. As annoying as it can be to input a piece of information into a form only to be told it’s invalid, pre-setting those parameters enables the collection of clean data.
There are several ways to get to data validation and data verification. One way is by using an “airlock,” which allows data to come into your organization but doesn’t allow it through to the operational systems until it’s been through this filter. What happens at that airlock is really important. It may require validation at the point of entry: are all required fields there, in the right format, and do they make sense? For example, does an email address contain @? If it doesn’t, the system knows that it’s not an email address.
You can also run an algorithm on data that’s coming in to determine its adherence to ‘normal.’ Is there anything wildly above or below the median? Is there a pattern in the data: is it perhaps time series data showing a pattern that’s trending up or down too quickly? There are many automated approaches to take with incoming data, to clean it or scent-check it before it gets properly ingested into operations systems.
Branches and Principles of the Data Value Chain
Another aspect of data literacy that is so close as to be often overlooked is that the data value chain isn’t strictly linear. Branches come off it, each with its own value. One of the most important of these branches is analytics. Unless your organization is aware that that data value chain has to feed an analytics value chain as well, you’re likely to take missteps—for example, not creating, up front, the capability to plug into that value chain to get the data to the point where you need it, the analytics. Another important branch is regulatory reporting. Data has to take a detour off the value chain to go there, as well, and the data needs to be clean and accurate. A third branch off the data value chain is the forensic chain. Whereas the other two branches offer monetary value, this one offers evidentiary value that you can present to regulators. This data may come from multiple sources, but understanding that each of those actually sits in that evidentiary chain is what’s important.
Without getting those right, there’s not much point in having the data in the first place—if we can’t analyze it and use it to produce our regulatory reports and do due diligence. In the insurance world, one example is a disclosure on paper from a customer. That paper data will get scanned or keyed in; accuracy here is critical, because any underwriting has to be based on what the customer actually said in their disclosure. Here, there’s a very weak evidentiary link in the chain between the paper and the systems—but the value of having clean data is obvious.
Getting back to our Hot, Warm, and Cold data—for all three, several fundamental principles apply. The data should be ‘Governed’—implying data ownership, catalogue, dictionary and lineage—concepts we’ll get into in a future article. Governance ensures quality data that is understood and trusted, and that can be found and used quickly.
The 5 V’s of data—Volume, Velocity, Variety, Veracity, Value—should be considered for each use. OLTP (‘Hot’) development should be matched by ‘Warm’ and OLAP (‘Cold’) developments. If this doesn’t happen, value cannot be leveraged from the data—it becomes difficult to ‘stick’ the data together, and there will be a proliferation of processes based on manual interventions. Start-ups understand the interdependence of OLTP and OLAP. ‘Colors’ or levels of ‘warmth’ don’t infer a relative importance of data types. They are all equally important and business is dependent upon them.
A final word on data governance. I consider this to be the solution for ensuring that you are on the right path to data literacy and eventually data maturity: by wrapping data governance around the value chain, you have a methodology for understanding what you can and can’t do with data, and for making sure you can trust—and understand—your data from end to end. It takes some strategic vision, but by understanding its full data value chain, an insurance company will be well on the path to data literacy. These days, that’s fundamental.