<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main"></title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Nataliia</forename><forename type="middle">V</forename><surname>Kuznietsova</surname></persName>
							<email>natalia-kpi@ukr.net</email>
							<affiliation key="aff0">
								<orgName type="institution">National Technical University of Ukraine &quot;Igor Sikorsky Kyiv Polytechnic Institute&quot;</orgName>
								<address>
									<addrLine>ave. Beresteiskyi 37</addrLine>
									<postCode>03056</postCode>
									<settlement>Kyiv</settlement>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Claude Bernard Lyon 1 University</orgName>
								<address>
									<addrLine>43 boulevard du 11 Novembre 1918</addrLine>
									<postCode>69622</postCode>
									<settlement>Villeurbanne cedex</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Illia</forename><forename type="middle">O</forename><surname>Kvashuk</surname></persName>
							<email>illiakvashuk@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">National Technical University of Ukraine &quot;Igor Sikorsky Kyiv Polytechnic Institute&quot;</orgName>
								<address>
									<addrLine>ave. Beresteiskyi 37</addrLine>
									<postCode>03056</postCode>
									<settlement>Kyiv</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Anna</forename><forename type="middle">O</forename><surname>Chemanova</surname></persName>
							<email>ankachemanova@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">National Technical University of Ukraine &quot;Igor Sikorsky Kyiv Polytechnic Institute&quot;</orgName>
								<address>
									<addrLine>ave. Beresteiskyi 37</addrLine>
									<postCode>03056</postCode>
									<settlement>Kyiv</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="department">Information Technologies and Security</orgName>
								<address>
									<addrLine>November 30</addrLine>
									<postCode>2023</postCode>
									<settlement>Kyiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">F9BDE4D9224EEE88D12F425A5AA7E3F3</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:57+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Car-insurance 1</term>
					<term>Generalized linear models 2</term>
					<term>Scorecard 3</term>
					<term>Survival models 4</term>
					<term>Claims forecasting 5 1</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, several car insurance claims problems are analyzed and solved via existing statistical models implementation for real-world datasets. The first problem which was studied is the problem of measuring the probability of a claim for a specific policy. This problem is solved by using a set of families of generalized linear models with an additional approach to analyze data by utilizing survival models. The best generalized linear model is then chosen according to statistical criteria. The second problem considers distinct classes of policies. A number of claims and prices are forecasted for the different groups. Same approach as for the first problem, generalized linear models are used and the best model is chosen according to statistical criterion. The third problem is the problem of scorecard generation. A brief interpretation and result of the built scorecard is also provided.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Usually, the insurance activity is aimed to protect the property interests of individuals and legal entities in the event at the expense of monetary funds, which are formed from the insurance premiums paid by policyholders. One of the main conditions for the effective functioning of the insurance market is the reliability of its participants -insurance companies. Supporting the ability of insurance companies operating in the market to fulfill their obligations promptly and as a whole. That is their financial stability which is a special starting point for the actual manifestation and implementation of the insurance function. The current financial state of the insurance companies requires the search for new forms and methods of increasing their competitiveness and financial stability. They need to create special decision-support systems for more effective assessment of the policies, more precise forecasting of the probability of claims, evaluate the possible losses and develop more flexible conditions for insurance policy evaluation.</p><p>The variety of risk manifestation forms and the frequency and complexity of the consequences of their implementation determine the need for an in-depth analysis of possible risks and economicmathematical justification of the financial policy of insurance companies. For every car insurance company importance of proper policy selection for a given client cannot be overestimated. The insurance premium is formed according to the client's expectation to be prone to raise claims and the size of those claims <ref type="bibr" target="#b0">[1]</ref><ref type="bibr" target="#b1">[2]</ref><ref type="bibr" target="#b2">[3]</ref>.</p><p>Information for the determination of terms and conditions of policies can be separated into two parts: data concerning a driver and a car. Age, driving, and length of insurance policy are the values that define a driver part of the information. However, some aspects like driver's habits are hard to collect, describe and analyze. On the other hand, information about cars can be specified and collected concerning some technical criteria <ref type="bibr" target="#b1">[2]</ref>. It can range from the car type to a quantity of cylinders or safety bags. A practical task is to model and forecast claims with information about cars being available in abundance, hence requiring selections and filtering in search of its most relevant parts.</p><p>A completely separate issue is creating models that can be used to predict some aspects of a claim based on selected data. One of the most important tasks is to predict the probability of a claim for a specific case. Insurance firms need to have a proper model for predicting and forecasting claims for different clients. Meanwhile, those methods should be easily interpreted and thus explained to clients or regulators about key factors that affect the terms and conditions of policies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Problem statement</head><p>This work is concentrated on solving the main problems, which appear in the insurance field. The first and foremost task is that the claim expectation should be forecasted for a given client. It could be measured by the claim's probability. Companies need a way to approximate the chances of claims to properly form policies' terms for a given client. This task requires taking into account the client's data and forming a decision based on it.</p><p>The second task is forecasting the number of claims for each group. The importance of this task is quite understandable while it is a part of company policy selection. By grouping clients by aggregating values, groups can be created. For these groups, the number of claims can be estimated and the models for forecasting can be built. The approach can follow two possible scenarios: modeling only the number of claims or total spending on a group.</p><p>Third task the model creation, which is usually paired with interpretation. This interpretation can provide valuable insight into what values increase the probability of the claim. This allows us to create scorecards that can be built to provide an easy tool to make decisions directly from data provided by a client. The main objective of this study is to define not only the probability and cost (value) of each claim but also the subset of the most damaged cases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methods</head><p>The appropriate approach usually depends on the task but the most important is that it is determined by the flow of data extraction and preparation. The same method can be applied to the same data but different approaches and pre-processing techniques may affect the results. For example, <ref type="bibr" target="#b2">[3]</ref> provides us with the flow and handling of data and objectives very similar for use in this work. Data is collected on an open platform. Claims are analyzed and the number of which match our task is predicted. However, due to the dataset restriction, the preprocessing was added which yielded comparatively stuffiest results but lacked interpretability due to PCA usage.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Generalized linear models</head><p>Generalized linear models (GLM) were the main tools used during our research. They provided a unified framework for modeling and forecasting the target variables <ref type="bibr" target="#b2">[3]</ref>. Due to the variable's nature and the different tasks that were tackled, the number of family distributions was used to deal with the problems from different sides and selections of the most fitting.</p><p>The generalized linear model is an extension of a simple linear regression model. A linear relationship between variables is the simplest case for researching links between factors. However, this is not true for most real-world processes where the relationship is more complicated than linear. In this case linking function is introduced. There are a number of different families that were used in the research.</p><p>The general way of writing down the generalized linear model is as follows:</p><formula xml:id="formula_0">𝑋𝛽 = 𝑔(𝜇),</formula><p>where X denotes the independent variables and 𝛽 is a parameters vector 𝑔 is a link function to transform the scale of dependent variable 𝜇 to suit a linear relationship. Generalized models can be used for discrete or continuous variables which provides it with a significant advantage.</p><p>Logistic regression (LR) is a statistical method that is used for classification values into different categories. In the scope of the research, the logistic regression was used for modeling claim probability for the one police.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>𝑋𝛽 = 𝑙𝑜𝑔𝑖𝑡(𝜇) = ln</head><p>.</p><p>Normal or Gaussian generalized model uses an identity link function which is the same as simple linear regression. 𝑋𝛽 = 𝜇. Poisson regression is a statistical model that is used when the dependent variable is a count of occurrence. Its link function is following: 𝑋𝛽 = ln(𝜇).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Survival models</head><p>To predict claims or similar events like death or accidents, survival models can be used. They can be utilized when the outcome can be traced along some period of time <ref type="bibr" target="#b3">[4]</ref>.</p><p>The simplest form of survival model is a table with all events noted with timestamp of occurrences. It may give a significant insight into the time periods when most events occur.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Scorecards</head><p>Scorecards are special tables constructed in a way to provide scores for every feature, summing up the scores for a record, the total points can be estimated. It is possible to move records to one of the preselected categories by assigning levels to the score.</p><p>Scorecards are powerful practical tools that can be used to fast identify policies with high risks <ref type="bibr" target="#b3">[4]</ref>. Scorecards are built by using Weight of evidence -WoE, Information value -IV, and Population Stability Index -PSI. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>𝑊𝑜𝐸 = ln</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Other methods and models</head><p>The prediction of the insurance field is huge and rich with many approaches and methods that are effective for forecasting the probability of claims <ref type="bibr" target="#b4">[5]</ref><ref type="bibr" target="#b5">[6]</ref><ref type="bibr" target="#b6">[7]</ref><ref type="bibr" target="#b7">[8]</ref><ref type="bibr" target="#b8">[9]</ref>. Some methods cover not only the same objective as the current study but are also applied to handling more financially oriented data, missing data, and combining results of the several models <ref type="bibr" target="#b4">[5]</ref><ref type="bibr" target="#b5">[6]</ref><ref type="bibr" target="#b6">[7]</ref><ref type="bibr" target="#b7">[8]</ref><ref type="bibr" target="#b8">[9]</ref>. Let's make a brief overview of these methods and present results in a general table <ref type="table" target="#tab_0">Table 1</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.1.">Decision Tree</head><p>A decision tree (DT) and its variation is a family of classification methods that are built on a tree structure for handling the decision-making process based on binary decisions on each step. This allows to apply of the method to data with non-linear relations between features and target variables.</p><p>There are several extensions of the basic model: random forest, CART models as part of multivariable trees. An example of research is in the work <ref type="bibr" target="#b21">[22]</ref>. The random tree is used for classification tasks so a direct comparison of this method with the regression family of methods doesn't seem to be direct. There are tasks like determination of whether the claim will happen at all which can be approached by both methods but with prediction of continuous variable only one method could be used.</p><p>The simplest model is straightforward: each node checks features and directs the pipeline to one of two possible branches till the final is reached. However, this model is not suitable for complex data since it tends to overfit and variable selection can be biased.</p><p>One of the very popular extensions that was also covered by work <ref type="bibr" target="#b21">[22]</ref> is CART or Classification and Regression Trees. It overcomes the limitation of the original model by allowing to model and predict regression variables without restricting original capabilities for categorical methods.</p><p>Another method is Random Forest. It combines several decision trees which in turn can be regressive together and via weighting of their output comes up with a single decision. It can be seen as a statistical-machine learning algorithm.</p><p>A further development that might not be so widespread in the Insurance topic but noteworthy is multivariable trees which use multivariable values for response variables.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.2.">Machine Learning</head><p>Support Vector Machines (SVM) is a method dedicated to providing solutions for both classification and regression problems. It is a supervised learning algorithm in which the idea is based on a hyperplane. This hyperplane of space of fewer dimensions is target one and is used for decisionmaking and boundary creation for target point separation.</p><p>One of the SVM key features is that in cases when it is not possible to find a plane in the current domain it can transfer inputs to higher dimensions in order to find a hyperplane in a new, higher dimension. This allows us to overcome obstacles that the target dimension possesses. SVM can be modified to solve regression tasks in <ref type="bibr" target="#b22">[23]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Dataset</head><p>Necessary data for model creation were obtained from a Car Insurance dataset provided on a Kaggle web site <ref type="bibr" target="#b23">[24]</ref>. The mentioned dataset is oriented on technical aspects of the car with most variables featuring physical parts of a machine for which police is formed.</p><p>Dataset consists of two parts which were used for training and testing. It contained 58592 and 39063 records for each part respectively. Each record is a unique policy with information about the owner of the policy and the car. The dataset has information about whether there was a claim during the upcoming 6 months for the insurance. This was a target value during the first stage of modeling. Additionally, the dataset had information about a range of different features with the total number of variables being equal to 44.</p><p>For further research, the grouping by several variables has been made with the aim of forming groups of special clients and policies for which modeling was made.</p><p>It's important to note that the dataset doesn't contain information about financial data. There is no information about the price of cars and insurance premiums for a policy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Modeling Results</head><p>Modeling and forecasting have been done using generalized linear models for binomial, gaussian, and Poisson types. Scorecard was also generated to assist in decision-making and interpretation of the results.</p><p>Modeling for the probability of claim was done by building two models -binomial and Gaussian. The comparison presented in Table <ref type="table" target="#tab_1">2</ref> has shown that Gaussian performs significantly better. From 44 variables several were selected based on correlogram and common sense:</p><p> age_of_car -how old is the car;  policy_tenure -the length of the policy up to date;  area_cluster -the area where most driving by the policy holder is done;  make -the car's manufacturer;  atr -synthesized variable based on the car's features: extra safety bags, lamps, etc.  ncap_rating -rating the car's safety given by the agency.</p><p>The target relationship is then represented by the following formula:</p><formula xml:id="formula_1">𝑖𝑠_𝑐𝑙𝑎𝑖𝑚 = 𝑔(𝑘 + 𝑘 × 𝑎𝑔𝑒_𝑜𝑓_𝑐𝑎𝑟 + 𝑘 × 𝑝𝑜𝑙𝑖𝑐𝑦_𝑡𝑒𝑛𝑢𝑟𝑒 + 𝑘 × 𝑎𝑟𝑒𝑎_𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑘 × 𝑚𝑎𝑘𝑒 + 𝑘 × 𝑎𝑡𝑟 + 𝑘 × 𝑛𝑐𝑎𝑝_𝑟𝑎𝑡𝑖𝑛𝑔 ).</formula><p>It can be seen that no significant outliers in the data by judging of the distribution of the predicted values. The maximum claim probability for the whole dataset according to the model is not bigger than 0.2. This can be interpreted as uncertainty in the provided data. There are examples of claims availability and absence for the records with match all key features. All together it undermines the meaning of concentrating on one record.</p><p>The confusion matrix further highlighted the problem of such an approach. With a threshold of 0.1 it was apparent that models underperform (binomial) which is presented in Table <ref type="table" target="#tab_2">3</ref> and the confusion matrix for the normal distribution which is presented in Table <ref type="table">4</ref>. In the next stage the modelling was made based on survival theory. It is possible to construct a survival model where each claim is treated as the death of a member of the population. We will count the length of the policy as a measure of time. Thus, the claims population "survives" during the policy length interval. It was decided that high-quality prognoses cannot be derived from existing data when claims prediction is done in the scope of the simple policy.</p><p>Let's build a Cox proportional hazards model:</p><formula xml:id="formula_2">𝑐𝑜𝑥𝑝ℎ(𝑓𝑜𝑟𝑚𝑢𝑙𝑎 = 𝑆𝑢𝑟𝑣(𝑝𝑜𝑙𝑖𝑐𝑦_𝑡𝑒𝑛𝑢𝑟𝑒, 𝑖𝑠_𝑐𝑙𝑎𝑖𝑚) ~ 𝐹 (𝑎𝑔𝑒 + 𝑎𝑟𝑒𝑎 + 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛_𝑑𝑒𝑛𝑠𝑖𝑡𝑦, 𝑑𝑎𝑡𝑎 = 𝑐𝑎𝑟_𝑖𝑛𝑠𝑢𝑟𝑎𝑛𝑐𝑒_𝑡𝑖𝑏𝑏𝑙𝑒)),</formula><p>where n = 58592, number of events = 3748.</p><p>It can be seen however that length of policy indeed has an effect on the claims number amounts but this observation is rather trivial and cannot be used to make a decision since only short-range policies should be preferred (Figure <ref type="figure" target="#fig_1">1</ref>). Therefore, the relationship between the length of the policy and the frequency of lawsuits was revealed. At the moment of time 1, 1.6 and 1.7 year duration there is a sharp increase in claims. It is possible to perform separation and in the future to focus on the threshold values found. Also from the survival model is easier to determine the duration of the most risky policies and to define the possible new policies politics. The calculation of individual cases (a claim for each policy separately) showed the absence of parameters and characteristics that would accurately indicate the onset of a claim. All probabilities for each policy lie between 0.001 and 0.12. In this case, a decision was made to proceed to the consideration of individual segments.</p><p>Grouping of data by segment, manufacturer, and machine brand was performed. Thus, we have moved from looking at an individual car to the segment as a whole, where individual characteristics are of little importance.</p><p>Two values can be calculated for segments:</p><p>1. The number of claims in the segment.</p><p>2. Amount of payment by segments.</p><p>It was decided to implement a further approach to working with groups. The Poisson generalized linear model was chosen as the model to forecast the number of cases. It showed a high level of accuracy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>57</head><p>The equation for modelling relationships was presented in the such way: 𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑒𝑑_𝑁 = 𝑔(𝑡𝑜𝑡𝑎𝑙 + 𝑠𝑒𝑔𝑚𝑒𝑛𝑡 + 𝑎𝑟𝑒𝑎_𝑐𝑙𝑢𝑠𝑡𝑒𝑟 + 𝑎𝑖𝑟𝑏𝑎𝑔𝑠 + +𝑚𝑎𝑘𝑒 + 𝑝𝑜𝑙𝑖𝑐𝑦_𝑡𝑒𝑛𝑢𝑟𝑒 + 𝑎𝑔𝑒_𝑜𝑓_𝑐𝑎𝑟).  As can be seen in Figures <ref type="figure" target="#fig_4">3 and 4</ref> the claims' number prediction across groups has a higher quality degree. This also shows that despite the low ability to predict each unique case, prediction of the group is a much easier task.  Another approach was chosen for dealing with the group. It was about forecasting the price of all cars for which claims were issued. The gaussian model was used as the most appropriate. This also showed significant accuracy (Table <ref type="table" target="#tab_4">6</ref>).</p><p>Additionally, in the dataset, the car's price was missing data. For this model, the following approach was used: 1. To find the average price for every class. 2. Adjust it according to the attribute feature. 3. To group price per category to create a new feature -total (price).</p><p>The equation for modeling is as follows: 𝑝𝑎𝑖𝑑_𝑝𝑟𝑖𝑐𝑒 = 𝑔(𝑡𝑜𝑡𝑎𝑙 + 𝑔𝑟𝑜𝑢𝑝_𝑝𝑟𝑖𝑐𝑒 + 𝑠𝑒𝑔𝑚𝑒𝑛𝑡 + 𝑝𝑜𝑙𝑖𝑐𝑦_𝑡𝑒𝑛𝑢𝑟𝑒 + + 𝑎𝑔𝑒_𝑜𝑓_𝑐𝑎𝑟). We need to understand which variables and intervals for these variables are the most significant in the aim of our insurance task. Information value (IV) is one of the most useful techniques for selecting important variables in a predictive model. This helps to rank the variables based on their importance. On Figure <ref type="figure" target="#fig_5">5</ref> it is presented how many claims cases were and how they correlated in accordance to different values of the car's age. Finally, the scorecard was built (Table <ref type="table" target="#tab_5">7</ref>). It provided information about values that are associated with high risk of a claim for this dataset. Non-significant values have been filtered out. The remaining variables describe continuous data -age of policyholder and policy tenure for which binning is made. Categorical variables were also presented in the work -area of clusters which were named in the initial dataset and ranges from C1 to C22 and variables that related to technical aspects: rear mirror availability and functionality, brakes type, and transmission type. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>Today car insurance companies require a lot of information to decide policies and conditions <ref type="bibr" target="#b4">[5]</ref>.</p><p>Even though a vast amount of information can be collected it doesn't guarantee the ability to create a model that can predict a claim for a specific policy with a significant level of accuracy due to the randomness the of claim's nature. Some special cases can be chosen, less or more prone to claims cases can be selected but it doesn't allow to make a robust prediction according to the results. From a built model for probability prediction, the gaussian generalized model has been chosen. It shows that claims' nature cannot be determined based on some specific features or its combinations since for same key variables. There are examples of policies with and without claims. Obtained values show a high level of centering which doesn't allow to select intervals for confident claim selection and hence undermines the usefulness of such an approach. The problem of single-claim prediction is the hardest one. For the claims risk management, we need to forecast the probability of each claim, of each type of claim, and to develop a special scoring card in an understandable and easily interpretable manner with the key features automatically.</p><p>More promising are results for a group of claims where policies are selected and combined under the same group with similar features. Such groups have a higher degree of an accuracy and can be modeled and forecasted with respect to number of claims or total cars' price for which claims have been made. Overall, the results show a low ability to predict specific cases but relatively high confidence in forecasting in big groups.</p><p>It is worth noting that different methods like Random Forest could perform better with the task of predicting claims per observation which can be examined in consequent researchers.</p><p>Finally, the scorecard is a high-quality tool to make decisions for clients directly. It is not only easy to interpret but to use. We used the scorecard to determine in an understandable and easily interpretable manner the key features. It yields great results on the grouped data and provides valuable insights about the tendencies. It is also useful to implement the scorecards instrument as a good tool for telecom and different finance for the big data tasks <ref type="bibr" target="#b24">[25,</ref><ref type="bibr" target="#b25">26]</ref> where it is needed to evaluate some scores and influence of characteristics as well.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>.</head><label></label><figDesc>𝐼𝑉 = ∑ (%𝑜𝑓𝑛𝑜𝑛 − 𝑒𝑣𝑒𝑛𝑡𝑠 − %𝑜𝑓𝑒𝑣𝑒𝑛𝑡𝑠) ⋅ 𝑊𝑂𝐸. 𝑃𝑆𝐼 = % 𝑜𝑓 𝑟𝑒𝑐𝑜𝑟𝑑𝑠 𝑏𝑎𝑠𝑒𝑑 𝑜𝑛 𝑠𝑐𝑜𝑟𝑖𝑛𝑔 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑖𝑛 𝑆𝑐𝑜𝑟𝑖𝑛𝑔 𝑆𝑎𝑚𝑝𝑙𝑒 (𝐴) − % 𝑜𝑓 𝑟𝑒𝑐𝑜𝑟𝑑𝑠 𝑏𝑎𝑠𝑒𝑑 𝑜𝑛 𝑠𝑐𝑜𝑟𝑖𝑛𝑔 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑖𝑛 𝑇𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑆𝑎𝑚𝑝𝑙𝑒 (𝐵) * 𝑙𝑛(𝐴/𝐵).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Survival model for the insurance policies</figDesc><graphic coords="7,130.08,350.16,348.96,188.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Real (black) and estimated (green) values plotted together</figDesc><graphic coords="8,120.96,142.44,353.16,190.56" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Value of claimed cars (black) and estimated (green)</figDesc><graphic coords="8,119.76,480.12,354.84,191.52" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Against segments to which cars belong A-Utility</figDesc><graphic coords="9,72.00,72.00,451.08,231.96" type="vector_box" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 .</head><label>5</label><figDesc>Figure 5. Informational value of the variable age_of_car</figDesc><graphic coords="10,99.12,72.00,410.64,209.76" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Comparison of different methods used for the insurance field</figDesc><table><row><cell>Article &amp; year</cell><cell>Purpose algorithms</cell><cell>Algorithms</cell><cell>Performance</cell><cell>The Best</cell></row><row><cell></cell><cell></cell><cell></cell><cell>metrics</cell><cell>Model</cell></row><row><cell>(Smith et al.</cell><cell>Classification to Predict</cell><cell>Decision tree (DT),</cell><cell>Accuracy ROC</cell><cell>Neural</cell></row><row><cell>2000) [9]</cell><cell>Customer Retention</cell><cell>Neural Networks</cell><cell></cell><cell>Networks</cell></row><row><cell></cell><cell>Patterns</cell><cell></cell><cell></cell><cell>(NN)</cell></row><row><cell>(Günther et al.</cell><cell>Classification to predict</cell><cell>Logistic regression</cell><cell>ROC</cell><cell>Logistic</cell></row><row><cell>2014) [10]</cell><cell>the risk of leaving</cell><cell>and GAMS</cell><cell></cell><cell>regression</cell></row><row><cell>(Weerasinghe</cell><cell>Classification to predict</cell><cell>LR, DT, NN</cell><cell>Precision Recall</cell><cell>Neural</cell></row><row><cell>and</cell><cell>the number of claims</cell><cell></cell><cell>Specificity</cell><cell>networks</cell></row><row><cell>Wijegunasekara</cell><cell>(low, fair, or high)</cell><cell></cell><cell></cell><cell></cell></row><row><cell>2016) [11]</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>(Fang et al.</cell><cell>Regression to forecast</cell><cell>Random Forest</cell><cell>R-squares</cell><cell>Random</cell></row><row><cell>2016) [12]</cell><cell>insurance customer</cell><cell>(RF), LR,DT Support</cell><cell>RMSE</cell><cell>Forest</cell></row><row><cell></cell><cell>profitability</cell><cell>Vector Machines</cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell>(SVM), Gradient</cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell>Boosting (GB)</cell><cell></cell><cell></cell></row><row><cell>(Subudhi and</cell><cell>Classification to predict</cell><cell>Decision trees,</cell><cell>Sensitivity</cell><cell>SVM</cell></row><row><cell>Panigrahi 2017)</cell><cell>insurance fraud</cell><cell>SVM, Multilayer</cell><cell>Specificity</cell><cell></cell></row><row><cell>[13]</cell><cell></cell><cell>Perceptron (MLP)</cell><cell>Accuracy</cell><cell></cell></row><row><cell>(Mau et al. 2018)</cell><cell>Classification to predict</cell><cell>Random Forest</cell><cell>Accuracy AUC</cell><cell>RF</cell></row><row><cell>[14]</cell><cell>churn, retention, and</cell><cell></cell><cell>ROC F-score</cell><cell></cell></row><row><cell></cell><cell>cross-selling</cell><cell></cell><cell></cell><cell></cell></row><row><cell>(Jing et al. 2018)</cell><cell>Classification to predict</cell><cell>Naive Bayes,</cell><cell>Accuracy</cell><cell>Both have</cell></row><row><cell>[15]</cell><cell>claims occurrence</cell><cell>Bayesian, Network</cell><cell></cell><cell>the same</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>accuracy</cell></row><row><cell>(Kowshalya and</cell><cell>Classification to predict</cell><cell>J48, RF, Naive</cell><cell>Accuracy</cell><cell>Random</cell></row><row><cell>Nandhini 2018)</cell><cell>insurance fraud and</cell><cell>Bayes</cell><cell>Precision Recall</cell><cell>Forest</cell></row><row><cell>[16]</cell><cell>percentage of premium</cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>amount</cell><cell></cell><cell></cell><cell></cell></row><row><cell>(Sabbeh 2018)</cell><cell>Classification to predict</cell><cell>RF, AdaBoost, MLP,</cell><cell>Accuracy</cell><cell>AdaBoost</cell></row><row><cell>[17]</cell><cell>churn problem</cell><cell>Stochastic GB,</cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell>SVM, K-nearest</cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell>Neighbor (KNN),</cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell>DT, Naive Bayes,</cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell>LR, Linear</cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell>Discriminant</cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell>Analysis (LDA)</cell><cell></cell><cell></cell></row><row><cell>(Stucki 2019)</cell><cell>Classification to predict</cell><cell>LR, RF, KNN, Ada</cell><cell>Accuracy F-</cell><cell>Random</cell></row><row><cell>[18]</cell><cell>churn and retention</cell><cell>Boosting Trees, NN</cell><cell>Score AUC</cell><cell>Forest</cell></row><row><cell>(Dewi et al.</cell><cell>Regression to predict</cell><cell>Random forest</cell><cell>MSE</cell><cell>Random</cell></row><row><cell>2019) [19]</cell><cell>claims severity</cell><cell></cell><cell></cell><cell>Forest</cell></row><row><cell>(Pesantez-</cell><cell>Classification to predict</cell><cell>XGBoost, Logistic</cell><cell>Sensitivity</cell><cell>XGBoost</cell></row><row><cell>Narvaez et al.</cell><cell>claims occurrence</cell><cell>regression</cell><cell>Specificity</cell><cell></cell></row><row><cell>2019) [20]</cell><cell></cell><cell></cell><cell>Accuracy</cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell>RMSE ROC</cell><cell></cell></row><row><cell>(Abdelhadi et al.</cell><cell>Classification to predict</cell><cell>J48, NN, XGBoost,</cell><cell>Accuracy ROC</cell><cell>XGBoost</cell></row><row><cell>2020) [21]</cell><cell>claims occurrence</cell><cell>Naive Bayes</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc></figDesc><table><row><cell>Models' comparison</cell><cell></cell><cell></cell></row><row><cell>Model</cell><cell>Residuals</cell><cell>AIC</cell></row><row><cell>Binomial</cell><cell>3475.5</cell><cell>828.34</cell></row><row><cell>Gaussian</cell><cell>27300</cell><cell>27364</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 Confusion</head><label>3</label><figDesc></figDesc><table><row><cell>matrix (binomial)</cell><cell></cell><cell></cell></row><row><cell>Actual \ predictions</cell><cell>0</cell><cell>1</cell></row><row><cell>0</cell><cell>50039</cell><cell>4805</cell></row><row><cell>1</cell><cell>3161</cell><cell>587</cell></row><row><cell>Table 4</cell><cell></cell><cell></cell></row><row><cell>Confusion matrix (normal)</cell><cell></cell><cell></cell></row><row><cell>Actual \ predictions</cell><cell>0</cell><cell>1</cell></row><row><cell>0</cell><cell>51791</cell><cell>3053</cell></row><row><cell>1</cell><cell>3371</cell><cell>377</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 5</head><label>5</label><figDesc></figDesc><table><row><cell>Model</cell><cell>Residuals</cell><cell>AIC</cell></row><row><cell>Poisson</cell><cell>629.27</cell><cell>1425.6</cell></row><row><cell>Normal</cell><cell>1425.6</cell><cell>5952.5</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 6</head><label>6</label><figDesc></figDesc><table><row><cell>Result</cell><cell></cell><cell></cell></row><row><cell>Model</cell><cell>Residuals</cell><cell>AIC</cell></row><row><cell>Normal</cell><cell>8.4797e+11</cell><cell>5886.3</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 7</head><label>7</label><figDesc></figDesc><table><row><cell cols="2">Scorecard for a claim's prediction</cell><cell></cell><cell></cell></row><row><cell>Number</cell><cell>Variable</cell><cell>Binning</cell><cell>Score</cell></row><row><cell>of</cell><cell></cell><cell></cell><cell></cell></row><row><cell>interval</cell><cell></cell><cell></cell><cell></cell></row><row><cell>0</cell><cell>age_of_policyholder</cell><cell>[-inf ~ 0.384615384615385)</cell><cell>0.74</cell></row><row><cell>1</cell><cell>age_of_policyholder</cell><cell>[0.384615384615385 ~ 0.442307692307692)</cell><cell>0.06</cell></row><row><cell>2</cell><cell>age_of_policyholder</cell><cell>[0.442307692307692 ~ 0.490384615384615)</cell><cell>-0.51</cell></row><row><cell>3</cell><cell>age_of_policyholder</cell><cell>[0.490384615384615 ~ 0.634615384615385)</cell><cell>0.25</cell></row><row><cell>4</cell><cell>age_of_policyholder</cell><cell>[0.634615384615385 ~ inf)</cell><cell>-0.84</cell></row><row><cell>0</cell><cell>area_cluster</cell><cell>C17,C20,C9,C7,C1,C10,C15</cell><cell>2.61</cell></row><row><cell>1</cell><cell>area_cluster</cell><cell>C16,C13,C5,C12,C6</cell><cell>1.13</cell></row><row><cell>2</cell><cell>area_cluster</cell><cell>C11,C3,C2,C8</cell><cell>-0.79</cell></row><row><cell>3</cell><cell>area_cluster</cell><cell>C4,C19,C14,C22,C21,C18</cell><cell>-2.13</cell></row><row><cell>0</cell><cell>policy_tenure</cell><cell>[-inf ~ 0.211309751692924)</cell><cell>5.3</cell></row><row><cell>1</cell><cell>policy_tenure</cell><cell>[0.211309751692924 ~ 0.813392835491761)</cell><cell>1.49</cell></row><row><cell>2</cell><cell>policy_tenure</cell><cell>[0.813392835491761 ~ inf)</cell><cell>-3.86</cell></row><row><cell>0</cell><cell>is_day_night_rear_view_mirror</cell><cell>No</cell><cell>0</cell></row><row><cell>1</cell><cell>is_day_night_rear_view_mirror</cell><cell>Yes</cell><cell>0.26</cell></row><row><cell>0</cell><cell>steering_type</cell><cell>Manual,Power</cell><cell>0.05</cell></row><row><cell>1</cell><cell>steering_type</cell><cell>Electric</cell><cell>0.17</cell></row><row><cell>0</cell><cell>rear_brakes_type</cell><cell>Drum</cell><cell>0.05</cell></row><row><cell>1</cell><cell>rear_brakes_type</cell><cell>Disc</cell><cell>0.26</cell></row><row><cell>0</cell><cell>is_tpms</cell><cell>No</cell><cell>0.05</cell></row><row><cell>1</cell><cell>is_tpms</cell><cell>Yes</cell><cell>0.26</cell></row><row><cell>0</cell><cell>make</cell><cell>[-inf ~ 2)</cell><cell>0.05</cell></row><row><cell>1</cell><cell>make</cell><cell>[2 ~ inf)</cell><cell>0.19</cell></row><row><cell>0</cell><cell>transmission_type</cell><cell>Manual</cell><cell>0.05</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Applying CDMA technique to network-on-M</title>
		<author>
			<persName><forename type="first">Xin</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tapani</forename><surname>Ahonen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jari</forename><surname>Nurmi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><surname>Denuit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Marechal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pitrebois</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Actuarial modelling of claim counts: risk classification, credibility and bonus-malus systems</title>
				<editor>
			<persName><surname>Walhin</surname></persName>
		</editor>
		<imprint>
			<publisher>Wiley</publisher>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Unravelling the predictive power of telematics data in car insurance pricing</title>
		<author>
			<persName><forename type="first">R</forename><surname>Verbelen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Antonio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Claeskens</surname></persName>
		</author>
		<idno type="DOI">10.1111/rssc.12283</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of the Royal Statistical Society, Series C (Applied Statistics)</title>
		<imprint>
			<biblScope unit="volume">67</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="1275" to="1304" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Generalized linear models</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">Ashworth</forename><surname>Nelder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">W M</forename><surname>Wedderburn</surname></persName>
		</author>
		<idno type="DOI">10.2307/2344614</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of the Royal Statistical Society: Series A (General)</title>
		<imprint>
			<biblScope unit="volume">135</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="370" to="384" />
			<date type="published" when="1972">1972</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">V</forename><surname>Kuznietsova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">I</forename><surname>Bidyuk</surname></persName>
		</author>
		<title level="m">Theory and practice of financial risk analysis: systemic approach</title>
				<meeting><address><addrLine>Kyiv</addrLine></address></meeting>
		<imprint>
			<publisher>Lira-K</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Predictive Modeling of Insurance Claims Using Machine Learning Approach for Different Types of Motor Vehicles</title>
		<author>
			<persName><forename type="first">V</forename><surname>Selvakumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">K</forename><surname>Satpathi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">T V</forename><surname>Praveen Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">V</forename><surname>Haragopal</surname></persName>
		</author>
		<idno type="DOI">10.13189/ujaf.2021.090101</idno>
	</analytic>
	<monogr>
		<title level="j">Universal Journal of Accounting and Finance</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="14" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Combining Predictions of Auto Insurance Claims</title>
		<author>
			<persName><forename type="first">C</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<idno type="DOI">10.3390/econometrics10020019</idno>
	</analytic>
	<monogr>
		<title level="j">Econometrics</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">19</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Claim Amount Forecasting and Pricing of Automobile Insurance Based on the BP Neural Network</title>
		<author>
			<persName><forename type="first">W</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Guan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Cui</surname></persName>
		</author>
		<idno type="DOI">10.1155/2021/6616121</idno>
	</analytic>
	<monogr>
		<title level="j">Hindawi Complexity</title>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Improving Imbalanced Data Classification in Auto Insurance by the Data Level Approaches</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hanafy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ming</surname></persName>
		</author>
		<idno type="DOI">10.14569/IJACSA.2021.0120656</idno>
	</analytic>
	<monogr>
		<title level="j">IJACSA) International Journal of Advanced Computer Science and Applications</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">6</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">An analysis of customer retention and insurance claim patterns using data mining: a case study</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">A</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Willis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brooks</surname></persName>
		</author>
		<idno type="DOI">10.1057/palgrave.jors.2600941</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of the Operational Research Society</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<biblScope unit="page" from="532" to="541" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Modelling and predicting customer churn from an insurance company</title>
		<author>
			<persName><forename type="first">C.-C</forename><surname>Günther</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">F</forename><surname>Tvete</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Aas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">I</forename><surname>Sandnes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ø</forename><surname>Borgan</surname></persName>
		</author>
		<idno type="DOI">10.1080/03461238.2011.636502</idno>
	</analytic>
	<monogr>
		<title level="j">Scandinavian Actuarial Journal</title>
		<imprint>
			<biblScope unit="volume">2014</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="58" to="71" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">A Comparative Study of Data Mining Algorithms in the Prediction of Auto Insurance Claims</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">P M L P</forename><surname>Weerasinghe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Wijegunasekara</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">European International Journal of Science and Technology</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2016-01">January, 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Customer profitability forecasting using Big Data Analytics: A case study of the insurance industry</title>
		<author>
			<persName><forename type="first">K</forename><surname>Fang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Song</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.cie.2016.09.011</idno>
	</analytic>
	<monogr>
		<title level="j">Computers &amp; Industrial Engineering</title>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Use of Optimized Fuzzy C-Means Clustering and Supervised Classifiers for Automobile Insurance Fraud Detection</title>
		<author>
			<persName><forename type="first">S</forename><surname>Subudhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Panigrahi</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.jksuci.2017.09.010</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
		<respStmt>
			<orgName>Journal of King Saud University -Computer and Information Sciences</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Forecasting the next likely purchase events of insurance customers: A case study on the value of data-rich multichannel environments</title>
		<author>
			<persName><forename type="first">S</forename><surname>Mau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Pletikosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wagner</surname></persName>
		</author>
		<idno type="DOI">10.1108/IJBM-11-2016-0180</idno>
	</analytic>
	<monogr>
		<title level="j">International Journal of Bank Marketing</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page">6</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Research on Probability-based Learning Application on Car Insurance Data</title>
		<author>
			<persName><forename type="first">L</forename><surname>Jing</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sharma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Feng</surname></persName>
		</author>
		<idno type="DOI">10.2991/macmc-17.2018.14</idno>
	</analytic>
	<monogr>
		<title level="m">proceedings of the 2017 4th International Conference on Machinery, Materials and Computer (MACMC 2017)</title>
				<meeting>the 2017 4th International Conference on Machinery, Materials and Computer (MACMC 2017)<address><addrLine>Amsterdam</addrLine></address></meeting>
		<imprint>
			<publisher>Atlantis Press</publisher>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Predicting fraudulent claims in automobile insurance</title>
		<author>
			<persName><forename type="first">G</forename><surname>Kowshalya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Nandhini</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICICCT.2018.8473034</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT)</title>
				<meeting>the 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT)<address><addrLine>Coimbatore, India</addrLine></address></meeting>
		<imprint>
			<date>April 20-21</date>
			<biblScope unit="page" from="1338" to="1343" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Machine-learning techniques for customer retention: A comparative study</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">F</forename><surname>Sabbeh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Advanced Computer Science and Applications</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="273" to="281" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">O</forename><surname>Stucki</surname></persName>
		</author>
		<title level="m">Predicting the Customer Churn with Machine Learning Methods: Case: Private Insurance Customer Data</title>
				<meeting><address><addrLine>Lappeenranta, Finland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
		<respStmt>
			<orgName>LUT University</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Master&apos;s dissertation</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Analysis Accuracy of Random Forest Model for Big Data -A Case Study of Claim Severity Prediction in Car Insurance</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">C</forename><surname>Dewi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Murfi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Abdullah</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICSITech46713.2019.8987520</idno>
	</analytic>
	<monogr>
		<title level="m">Paper presented at 2019 5th International Conference on Science in Information Technology (ICSITech)</title>
				<meeting><address><addrLine>Yogyakarta, Indonesia</addrLine></address></meeting>
		<imprint>
			<date>October 23-24</date>
			<biblScope unit="page" from="60" to="65" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Predicting Motor Insurance Claims Using Telematics Data-XGBoost versus Logistic Regression</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pesantez-Narvaez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Guillen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Alcañiz</surname></persName>
		</author>
		<idno type="DOI">10.3390/risks7020070</idno>
	</analytic>
	<monogr>
		<title level="j">Risks</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">70</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">A proposed model to predict auto insurance claims using machine learning techniques</title>
		<author>
			<persName><forename type="first">S</forename><surname>Abdelhadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Elbahnasy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Abdelsalam</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Theoretical and Applied Information Technology 30th</title>
		<imprint>
			<biblScope unit="volume">98</biblScope>
			<biblScope unit="issue">22</biblScope>
			<biblScope unit="page" from="3428" to="3437" />
			<date type="published" when="2020-11">November 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Predictive analytics of insurance claims using multivaria te decision trees</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Quan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">A</forename><surname>Valdez</surname></persName>
		</author>
		<idno type="DOI">10.1515/demo-2018-0022</idno>
	</analytic>
	<monogr>
		<title level="j">Depend. Model</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="377" to="407" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Motor Insurance Claim Status Prediction using Machine Learning Techniques</title>
		<author>
			<persName><forename type="first">E</forename><surname>Alamir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Urgessa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hunegnaw</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Gopikrishna</surname></persName>
		</author>
		<idno type="DOI">10.14569/IJACSA.2021.0120354</idno>
	</analytic>
	<monogr>
		<title level="j">IJACSA) International Journal of Advanced Computer Science and Applications</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">3</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title/>
		<author>
			<persName><surname>Kaggle</surname></persName>
		</author>
		<ptr target="https://www.kaggle.com/datasets/ifteshanajnin/carinsuranceclaimprediction-classification?resource=download" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Data Mining Methods, Models and Solutions for Big Data Cases in Telecommunication Industry</title>
		<author>
			<persName><forename type="first">N</forename><surname>Kuznietsova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bidyuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kuznietsova</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-82014-5_8</idno>
	</analytic>
	<monogr>
		<title level="j">Lecture Notes on Data Engineering and Communications Technologiesthis link is disabled</title>
		<imprint>
			<biblScope unit="volume">77</biblScope>
			<biblScope unit="page" from="107" to="127" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Analysis and Development of Mathematical Models for Assessing Investment Risks in Financial Markets</title>
		<author>
			<persName><forename type="first">N</forename><surname>Kuznietsova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Bateiko</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-3503/paper9.pdf" />
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceeding</title>
				<imprint>
			<biblScope unit="volume">3503</biblScope>
			<biblScope unit="page" from="92" to="101" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
