Models and algorithms are getting integrated into all aspects of the insurance value chain. Pricing algorithms are much more sophisticated, claims analytics models materially improve the claims process, decision-making tools allow underwriters to focus on underwriting, and customer behavior models provide insight into the existing book of business as well as support the identification of new areas of growth.
Traditional modeling approaches were extremely powerful when the scope of the analytics problems was relegated to one or two disciplines. These techniques required skilled resources to develop and maintain. As the scope of application increased to more and more disciplines, the demand for resources became a limiting factor. From this environment the machine learning approach arose.
The popularity of these techniques increased not only because they tend to have greater predictive power than traditional modeling tools, but also because they incorporate automation more explicitly, and thus, they are faster to build and maintain. However, these approaches have their challenges – namely, they are inherently "black boxes", so understanding and interpreting what the model will do is extremely difficult. This lack of interpretability decreases the value that these cutting-edge techniques can add.
The nature of modeling techniques
Traditional modeling tools – like General Linear Models (GLMs) – are oftentimes referred to as parametric solutions, where the resulting prediction of behavior can be expressed as an equation, like a standard rate order calculation. For example, with a parametric solution, this class of business will be offered a 10% discount. This form is useful because it is easier to interpret; however, these solutions can be more time-consuming to create and maintain.
Making statements such as "the risk is high because the model says so" is generally not well received. Thus, the key challenge is to articulate why the model predicts high and low risks. The results of these efforts produce a model that will be able to identify high versus low risk.
Model transparency and interpretability
Machine learning models are powerful solutions as they result in something that is very predictive in an automated fashion. Where things become problematic, especially in the insurance sector, is the need to understand why the model is predicting a high or low result. Is it because the risk has changed? What element of the change in risk caused the change in prediction? These are critical questions that underwriters, claims managers, pricing and product teams, distribution channels, and CEOs want to know. The use of models, even if efficient, must be accompanied by transparency. One approach to solving the issue of transparency is to look at the model structure itself.
However, the model structure in a machine learning model is usually a series of complex decision trees which are hard to understand. This approach is unlikely to produce a satisfactory explanation.
Another approach is to study the factor importance charts that are standard outputs of typical machine learning algorithms. This allows you to identify which factor is most influential in the model.
However, there are still many unanswered questions. We don't know how factors correlate with risk, what factor groupings are appropriate, and whether those groupings should vary by some other factor. We lose that information in the building of the Gradient Boosting Machine or GBM, which combines the predictions from multiple decision trees to generate the final predictions. Moreover, there is no way to recover it without approximating it.
The approximation used to answer the first question is called a "partial dependency plot." This statistical tool allows the interpreter to explain complex models in more basic statements.
This is quite useful when trying to get a sense of what the model is saying, however, it is fundamentally an approximation. There are other tools that modelers use (such as SHAP values, H-statistics, ICE charts, etc.); however, these are all still fundamentally approximations of what the model is going to do.
Machine learning algorithms
A more modern solution is to rebuild the machine learning algorithm to explicitly integrate
The gain from this segmentation is then quantified relative to the entire model. The result of this approach is to have an explainable model with a high degree of predictive power that is constructed in an automated fashion.
Let's face it, transparency matters in the insurance business. Internal users of models want to feel confident that the result helps achieve company objectives. External users of models want to feel confident that the insurer is charging the right rate for the risk. Thus, the importance of model transparency and interpretability should not be underestimated.
Machine learning