Zero-inflated count data models for claim frequency in general insurance


Student thesis: Master's Thesis

View graph of relations


  • Ching Han YIP

Related Research Unit(s)


Awarding Institution
Award date15 Jul 2004


One of the distinctive features of claim frequency data collected from general insurance is that it is often zero-inflated. Therefore, traditional application of Poisson and negative binomial distributions for model fitting may not be adequate due to the presence of excess zeros. The source of excess zeros may come from the unreported claims due to minor losses caused by the deductible agreement and/ or under the no claim discount (NCD) system in motor insurance. Spurious dispersion appears as the number of observed zeros exceeding the number of expected zeros under the Poisson or even the negative binomial distribution assumptions. The purpose of this study is to illustrate and discuss alternative methods of modeling claim frequency distribution in general insurance with the presence of many zero counts. Various zero- inflated count data models are considered. In addition, the use of quasi-likelihood approach is explored to address the over-dispersion problem. A motor insurance data set is used to demonstrate the application of various zero-inflated models. Performances of the models are evaluated by the log- likelihood and related statistics. It is found that the zero- inflated models, especially the zero-inflated double Poisson regression model, provide a substantially better fit than traditional Poisson model and negative binomial model in predicting the insurance claim count. The quasi-likelihood modeling approach of zero- inflated Poisson model reduces to the negative binomial model, but with a different parameterization. In conclusion, the zero-inflated count data model would be a scrumptious choice in modeling the claim count data in general insurance as it extends the Poisson model, incorporates the excess zeros and the extra dispersion in the Poisson part. In regression setting, with an appropriate set of risk factors, the level of risk of customers can be correctly evaluated in accordance to individual characteristics. The findings in this thesis provides insurance practitioners a better and more precise method for the modeling of the claim frequency distribution, which in turn can help in refining their ratemaking and hence the loss reserving process.

    Research areas

  • Mathematical models, Insurance claims