“Data scientist” is currently one of the hottest job titles in the world of analytics. However, as is common in emerging fields, there isn’t a clear definition of what the job actually is. Whatever the data scientist’s specific responsibilities, the general view is that there aren’t enough of them. But, is this really true? Before we jump to this conclusion, let’s try to make sense of what the title and profession really are.
In our view, recent advances in analytics, technology infrastructure and visualization, as well as their application to business problems, require a data scientist to have four fundamental skills:
• Business analysis: The ability to articulate how information, insights and analytics can help leadership answer key questions—and even determine which questions need answering. The data scientist will need a thorough understanding of the business across the value chain (from marketing, sales, distribution, operations, pricing, products, finance, risk, etc.) in order to do this well.
• Analytic expertise: The ability to determine the most appropriate techniques for different classes of problems, apply the relevant techniques to business problems, and translate the results and insights in such a way that the businesses can understand their value, is an important skill. This ability is predicated on a thorough understanding of statistical (e.g., regression analysis, cluster analysis, and optimization techniques) and computational techniques (e.g., machine learning, natural language processing, graph/social network analysis, neural nets, and simulation modelling).
• Data technology expertise: A thorough understanding of external and internal data sources, how they are gathered, stored and retrieved. This will enable the data scientist, and by extension, the business as a whole, to 1) extract, transform and load data stores; 2) retrieve data from external sources (through screen scraping and data transfer protocols); 3) use and manipulate large big data data stores (like Hadoop, Hive, Mahoot and an entire range of emerging big data technologies); and 4) use the disparate data sources to analyze the data and generate insights.
• Visualization expertise: A thorough understanding of visual art and design. This is important because it enables those who aren’t professional data analysts to interpret data. Accordingly, the data scientist should be able to 1) take statistical and computational analysis and turn it into understandable graphs, charts and animations; 2) create visualizations (e.g., motion charts, word maps) that clearly show insights from data and corresponding analytics; and 3) generate static and dynamic visualizations in a variety of visual media (e.g., reports, screens—from mobile screens to laptop/desktop screens to HD large visualization walls, interactive programs, and perhaps soon, augmented reality glasses).
One might wonder if someone who possesses all of these skills really exists. Are there really individuals out there who are at home with senior executives, analytics “geeks,” technology “nerds” and new-age “digital artists”? Are there individuals who can see the big picture, are comfortable dealing with ambiguity, and at the same time can pay attention to the details of data definition and analysis? How many logical, right-brain thinkers and problem solvers who also happen to be creative, left-brain artists can there be?
The answer is not many. Accordingly, finding individuals who possess all of these skills is a real challenge. However, there are two potential solutions:
• An organization can look to form teams of individuals who collectively possess these skills. These insight teams would address high-value business issues within tight timeframes. They initially would form something like a skunk works and rapidly experiment with new techniques and new applications to create practical insights for the organization. Once the team is fully functional and proving its worth to the rest of the organization, then the latter can attempt to replicate it in different parts of the business.
• An organization can identify individuals who possess at least some of these skills and train them in the other ones. For example, a business architect who is the liaison between the business and technology groups can learn at least some of the analytical and visualization techniques that typify data scientists. Similarly, a business intelligence specialist who has sufficient understanding of the company’s business and data environment can learn the analytical techniques that characterize data scientists. Considering the extensive mathematical and computational skills necessary for analytics work, it arguably would be easier to train an analytics specialist in a particular business domain.
While there is no silver bullet to solve the perceived shortage of data scientists, by taking a skills- and team-based approach, organizations can start addressing some existing business challenges. Considering the necessity of better data science both now and in the future, it is likely that both business and the educational system will start creating more data scientists, but it is going to take some time until they are available in appreciable numbers. Until then, teaming and on-the-job training are going to be the most practical ways to meet this need.
Anand Rao co-leads PwC’s Future of Insurance initiative and leads PwC's Analytics and Decision Sciences Group’s innovation and market awareness activities.
Readers are encouraged to respond to Anand using the “Add Your Comments” box below.
This blog was exclusively written for Insurance Networking News. It may not be reposted or reused without permission from Insurance Networking News.
The opinions of bloggers on www.insurancenetworking.com do not necessarily reflect those of Insurance Networking News.