Recently, Bloomberg’s Matt Levine
Are we reaching a point in which everyone can be a data scientist? The possibility of everyone empowered through technology and their own abilities to become citizen data scientists is real, though likely a few years away. In the meantime, there is an emerging cadre of data-driven and trained individuals making their way into organizations – from a variety of directions.
There are still many questions that data scientists need to address, and their combinations of skills are key. Glenn Hofmann, data scientist with New York Life, describes data science as being “at the intersection of statistics, programming and the specific business application or any application for it.” In a recent interview in ARMA’s New York Metro chapter newsletter, he observes that there is no single description for the job of data scientist, which “comes under multiple names. We used to call it analytics, now data science is a more popular name. There are strong parallels to predictive modeling, artificial intelligence and machine learning. You could call those specific kinds of data science.”
Skills all data scientists should bring to the table include statistical modeling and coding languages, Hofmann advises. “On the tool side, the most common tools in data science are SAS, R and Python and they are very flexible. Point-and-click tools also exist but they are less powerful.”
Data scientists tackle all sorts of projects – from credit scores to self-driving cars. In the insurance business, he notes, “it deals with things like underwriting models, which can predict somebody’s insurance risk. Or it could be marketing predictions, like who is likely to respond to an offer for insurance or who is likely to be retained as a customer. It could be customer segmentation; what types of customers do we have and how do we treat them.”
For those interested in pursuing the profession, “there is no single path to becoming a data scientist anymore,” he points out. “It used to be that statistics or computer science could be the entry point. Those are still good foundations. Nowadays people also may have any technical degree or get involved in data science by learning on the job and analyzing a lot of data. Or they could come from psychology or economics and do a lot of data analysis from that side.”
Hofmann also described the challenges insurers are facing with navigating the complexity of GDPR, the European Union’s data privacy dictates that includes “the right to be forgotten.” GDPR is a challenge for many organizations, encompassing “quite a bit of technical difficulty, because not only are you talking about the primary system but the data that typically proliferates to multiple systems.”
For example, for an insurer, there’s a need to look across many repositories—“it goes from the original sales and marketing systems to an underwriting system to a claims system to a consumer relationship management system to a billing system to a fraud detection system,” he relates. “So, the data proliferates to many systems, and of course these systems have back-ups that go in various directions, because if there is an outage you don’t want to lose the data. So, you would have to delete the data from all of those systems and the back-ups, which is somewhat cumbersome. I think most companies today in the U.S. don’t yet have the infrastructure to do all that so that would have to be built, which does involve cost.”