Machine Learning for Risk Modeling
Over the past few years, the world has realized that data is, by far, the most valuable asset for an organization in the digital economy. Much like a financial asset that needs management and governance for appreciation of its value, raw data needs some work to extract actionable insights. These insights can then provide an organization competitive differentiation. Machine learning (ML) is one of the technical approaches to extract insights from data. ML commonly dwells in the realm of predictive analytics, where we attempt to peek into the future using our understanding of the past and present. While this is a quite broad description, there are more specific use cases of ML across industry verticals. In this article, I discuss a use case of ML that have direct impact on cost incurred or revenue generated by an organization. I take the domain of health insurance as example, to explain the intuition and approach using ML techniques. However, the general idea of using ML for risk modeling is pervasive across many other industry verticals and segments.
When an insurer underwrites a contract, it assigns a financial value to the risk covered by the contract, and the insured party pays a premium to offload the risk to the insurer. From the insurer’s perspective, the gross risk value is based on the likelihood of the risk materializing for a certain proportion of the insured entities. This is an oversimplified description but should suffice in context of this use case. So, the insurer calculates the premium such that it can cover its own risk. If the insurer’s stance is too defensive, it’ll charge a high premium. However, that also makes it less competitive. On the other hand, being too aggressive with lower premiums exposes it to higher financial risk. There are many more parameters, including laws and regulations, that the insurer considers for the premium computation. The point is that risk valuation and premium computation are fundamental to financial success of an insurer.
Traditionally, risk modeling has been performed by the actuarial function of insurers. This may involve, among other techniques, statistical analyses, and Monte Carlo simulation. Another approach to risk modeling is predictive analytics using ML. The idea is to predict the likelihood of events happening in future, based on historical data of the same or similar events. Let’s take health insurance as an example. For one individual, the insurer first looks at past data to see what health incidents have occurred for others in the same category. The term ‘category’ refers to a combination of several attributes like age, gender, occupation, lifestyle, existing medical conditions etc. Next, the insurer may consider other correlated future events for that individual. For instance, a person progressing in career may move from a field job to a desk job. This influences how active the person’s lifestyle is, and in turn, influences the risk related to cardiovascular diseases. Environmental factors may also contribute to a change in the risk profile. As an example, consider an asthmatic person moving residence from temperate to polar climate.
Clustering is a ML technique commonly used for customer segmentation. Using this technique, the health insurance underwriter can observe distinct groups in the population of insured entities. This can be a starting point of estimating the risk value, and therefore, the premium, of insurance contracts catering to these groups. This is not a new concept – you may have noticed your health insurance premium climbing steeply when you get to a certain age. It’s possible that the clustering ML model decides to assign you to a different group on basis of an age threshold.
Clustering is an unsupervised ML technique, meaning that it doesn’t need a labeled dataset to train and test. There are other supervised ML algorithms, such as multiclass classification, that a health insurer can use for finer control on risk valuation. Suppose that a customer opts for additional cover on his/her policy for certain ailments and calls out pre-existing medical conditions when requesting a quote for the additional cover. For this use case, the insurer can train a supervised ML model on history data of health insurance policies including claims and settlements. The data should also include attributes related to the insured entity’s demography and pre-existing medical conditions. From this dataset, the ML model learns to recognize patterns related to occurrence of diseases and settlement value of claims. Once this model is trained and tested, given a new insurance proposal, it can then predict the quantified probability of encountering a health claim event for each of the ailments covered in the proposal. Higher the probability, higher the risk, and subsequently the premium quoted by the insurer.
Note that the purpose of the ML models is to quantify relative value of risk. The output of these models could be supplied as input to the actuarial process. In other words, computation of specific financial value of an insurance contract may consider the probability values of future health events. This is significant for other reasons as well, particularly from the perspective of compliance. Insurance is a regulated industry domain, and insurers need appropriate controls in their core processes to ensure fairness, interpretability and explainability.
Amazon SageMaker is a service from AWS that addresses the complete life cycle of ML projects. There are several pre-trained ML solutions as part of SageMaker JumpStart. In context of risk modeling, you can look at two JumpStart solutions related to similar use case – one for credit rating and another for price optimization. Go ahead and explore the JumpStart solutions to know more about data preparation techniques and ML algorithms used to solve for specific business outcomes.
— Author: Anirban De