Oracle AI & Data Science Blog
Learn AI, ML, and data science best practices

Why Generalized Linear Models Are The Future of Rate Indication Processes

Insurance premium is a product of three main factors: current base rate, proposed rate change, and risk score relativity. The base rate ensures that adequate premium is collected on an aggregate basis to cover insurance claims, claim adjustment expenses, underwriting expenses, and the targeted profit provision. The proposed rate change ensures that all future dynamics that will affect rate adequacy are properly accounted for in future premiums. The risk score relativity factor ensures that premiums are actuarially fair—that is, the higher the risk, the higher the premium.

Currently, actuaries obtain proposed rate changes from an actuarial process called Indications, and risk score relativity from Generalized Linear Models (GLMs). However, in this article, I argue that the GLM already contemplates all three factors, and is the best machine to use for them.

The Merits of GLMs

The Generalized Linear Model is a statistical methodology that has long been developed and known in the academic world. It allows the actuary to derive risk relativities for countless number of variables, and affords him the flexibility to model different distributions of insurance losses. Given the varying distributional forms of insurance risk metrics—severity, frequency, and pure premiums—this flexibility is not taken lightly. It also allows the actuary to choose the functional form (such as identity, log, or power function) of the relationship between the risk measure being modeled and the relativity variables under consideration. Additionally, the actuary is able to assess whether his estimated risk relativities are signal or noise using a prolific number of model diagnostic measures such as standard errors, Chi-Squared Statistics, Archaic Information Criterion (AIC), F-statistics, and many others.

The GLM has one more edge that is far more underutilized than the ones aforementioned: Aside from forging risk relativities, it can also predict the best (minimum mean squared error) loss cost estimate for each insured unit for any exposure period with a greater capacity for segmentation, greater accuracy, and lesser effort. This means that the actuarial tradition of deriving an overall base rate and the policy risk relativity score separately should be replaced with a fresher and more powerful culture of directly predicting each policy’s loss costs with a GLM[1]

Below, we look at how GLMs accommodate each of the three main features in traditional rate making (base rate, rate changes, and risk relativity) and propose a new framework for actuarial indications.

Overall (Base) Rates 

The intercept term in a GLM measures the overall rate level; it can be varied by any dimension desired by the actuary: region, state, industry, or any broader category. In fact, as with all GLM estimates, it has desirable statistical properties. It is one of the most statistically efficient (lowest variance) estimators of the base rate: As a maximum likelihood estimate, it achieves the Cramér–Rao lower bound on variance. It's fair to point out that, in traditional rate making, it's not typical to assess the variability of the actuarial base rates, and all relies on the pricing actuary's ability to instinctively determine whether his estimate (of base rate) is noise or signal. This is a test that even experts steeped in statistics have often failed (See page 113 of Kahneman’s revolutionary book "Thinking Fast and Slow"). However, GLMs force the actuary to know the variability and statistical significance of all of his estimated parameters including the base rate (i.e. the intercept).

The other statistical benefit of a GLM estimate is that it is consistent (i.e. approaches the true value with enough data) at worst, and unbiased at best. Unfortunately, this cannot be said about the actuarial base rate. In fact, because it's derived outside the GLM but combined with risk relativities carved from GLMs, the actuarial base rate is likely to pick up effects already contemplated in the GLM, and hence is biased. Suppose a TX automobile book of business has a disproportionate number of reckless drivers. In the current rate making culture, reckless drivers in TX will be double penalized, one through the actuarially-derived TX base rate, and the other through the GLM risk relativity for reckless driving possibly captured by Motor Violation Records.

There is, however, a silver lining an actuary may tout in an attempt to save the current system: For base rates of smaller states, an actuary can use credibility analysis to combine the unstable experience of the smaller state with a more stable complement (say the countrywide base rate) to derive desirably stable base rates. While this is valid, there are GLM variants such as Generalized Linear Mixed Models (GLMMs) that allow for the sort of credibility weighting done in an actuarial analysis. The reader should read Klinker 2011 for an exposition of actuarial application of GLMM and its similarities with Buhlmann Credibility. Therefore, there is no good reason, at least known to me as of this writing, for actuaries to derive base rates outside GLMs.

Proposed Rate Changes

Current rates cease to be adequate in the future for three main temporal changes: general market factors (technology, tort laws, prices, etc.), business mix, and relationship between losses and risk variables. Market factors and business mix are easily accommodated in a GLM by including an econometric trend term and risk attributes in a predictive model. The coefficient of the trend term measures how premium is expected to change with time, while those of the risk attributes measure how premiums change with differing risk characteristics. The relationship between losses and risk variables is checked by regular updates of the pricing models.

Risk Relativity Score

While actuaries get the risk relativities from a GLM (and so are efficient estimates), how they use them in pricing mitigates their statistical merits. I will describe one such misuse. Most pricing actuaries would multiply the relativities together to get a predicted risk estimate. After this, they would partition this product into a number of risk groups, and then map each risk group to a risk score factor. That becomes the policy factor that gets multiplied by the base rate to get the proposed premium. Meanwhile, the predicted policy pure premium obtained directly from the GLM, as a maximum likelihood estimate, is statistically the best estimate of the policy's risk exposure. Therefore, every tweak unnecessarily chips away chunks and chunks of its statistical efficacy.

The Future of Indications

The GLM estimate should be the pinnacle of, and not a mere input for, proposed policy premiums. The actuary should find few, if any, reasons to do analyses outside of it. If appropriately parameterized and estimated, it is the actuary's most accurate (least bias and variance) measure of risk that can be carved from historic data. It can also contemplate most of the technical dynamics that are important in insurance pricing: credibility, trends, and interactions, experience rating, just to name a few. The indications process should occur entirely within a GLM framework, with minor episodes involving merely a refitting of current models with new data, and major ones being a new development of the latest and greatest predictive models.

We’re in a defining revolutionary moment where insurance companies are receiving unthinkable volumes of data from their insured risks, thanks to advances in telematics and IoT. With this privilege comes the competitive pressure of using every bit of this big data to help paint a coherent picture about risks. The winner will no doubt be the one who leverages the power of machines to process this ceaseless data endowment to understand and write risks profitably.


[1] Many other equally viable statistical methods such as Classification Trees, Random Forests, and Neural Networks are available for the actuary to use. However, we will continue to use GLMs (because of its popularity)  to loosely represent all such statistical methods 

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.