Introduction

Formation of Life Quality Indicators System through Search Algorithm of Association Rules

Lyudmila P. Bilgaeva

Dashidondok Sh. Shirapov

Grigoriy V. Badmaev

0 0 East Siberia State University of Technology and Management , Russia

The paper is devoted to the search of association rules for the formation of the indicators system that a ects the quality of life. The search of association rules is carried out in the transactional database based on the method of AprioriTid algorithm to calculate such metrics as support, con dence and lift. It results in the extraction of useful association rules showing the relationship of life quality indicators, which can be used later to solve the problems of analysis and forecasting.

extraction algorithm of frequent sets of database the property of monotony the associative search of life quality indicators truncation of candidates

Introduction

At present, issues of life quality are relevant, as the current economic crisis has primarily a ected the population. In general, the standard of living depends on a competent social policy pursued by the state. Solving social problems requires the adoption of management decisions based on real information. This requires research aimed at identifying the main factors a ecting the life quality.

In this paper we propose to use methods of searching association rules to identify the most important indicators of life quality that will enable the authorities to plan and implement certain measures to improve the population living standards.

To search association rules is one of the tasks of Data Mining, the modern technology of intellectual data analysis, which includes nding regularities between some related events, the identi cation of related objects and their location in the space of states. To nd associations such a database is typically used in which all objects are connected to each other, provided that the database is consistent and integrative.

Basic theoretical principles of association rules search

There are many techniques, which allow solving the problem of nding association rules. They have the same mathematical approach, but the ways of the method implementation are di erent. Let us consider the basic theoretical principles of these methods.

The association rule of context K is an expression of the form

A ! B; where A; B M .

The context K is a tuple (G; M; I), where G is a set of objects, M is a set of features, but I G M .

When association rules are searched, special metrics are used: Support, Con dence, Lift.

Association rule A ! B Support is a quantity de ned by the formula: The Support value indicates which part of the G objects contains A [ B. The Con dence of the association rules is de ned by the formula: ( 1 ) ( 2 ) ( 3 ) Support(A ! B) = j(A [ B)0j

jGj Con dence(A ! B) = j(A [ B)0j

jA0j Lift(A ! B) = j(A [ B)0j jA0j jB0j The Con dence value shows, which part of the objects that contain A, also contains A [ B.

The following quantity is called the association rule utility (Lift): In other words, the utility is the ratio of Con dence(A ! B) to the Support(B). The Lift value indicates the usefulness of the rule. If the found utility value is more than 1, then the rule is considered to be useful.

The task of mining Association rules is to nd all Association rules of the context for which the values support and con dence exceed certain set values min_support and min_confidence, respectively.

Searching the frequent sets of data is limited to the minimum support value (min_support), which is set by the user [1{3]. Search of association rules is made within the frequent sets of data and is limited to the minimum con dence (min_confidence) and utility value. The minimum con dence is generally set by the user.

AprioriTid method, as well as the Apriori method, is based on the antimonotony property, the key property when nding multielement frequent sets of data [4, 11]. It is formulated as follows: 8A; B

M; A

B ) Support(B)

Support(A) ( 4 )

It means that: { with an increase of the set size its support either decreases or does not change; { for any set of characteristics support does not exceed the minimum support of any of its subsets; { the set of n size characteristics will be frequent only if all its n 1-element subsets are frequent. 3

Valid method choice

To select a search method of the association rules the authors developed de nite criteria and comparatively analyzed the certain amount of methods. The results are given in Table 1.

Software module of associative search of population life quality indicators

It is convenient to extract data from a database applying database records identifiers, i.e. TID. TID also enables you to identify whether the generated rules belong to a particular database record. To soTlvhee ptohsesibpirliotyblteomtruonfcattheecafnodrimdaatetsioanlloowfsacustytisntge museolefssinadndicautnorerlsiabtlheartulaes eacttthleifire generation stage in order to optimize the memory used. quality of the population, we developed a system the architecture of which is 4 Software module of associative search of population life quality indicators presented in Figure 1.

To solve the problem of the formation of a system of indicators that affect life quality of the population, we developed a system the architecture of which is presented in Figure 1. concentration factor)." of results.

Minimum support and minimum confidence are specified by the user.

The system starts with setting up the parameters, such as the minimum

While conducting experiments one can consider various transaction and attributes sets, stuheprpefoorret s(umchinaspuarpam),ettehr eas ma“inseirmiaulnmumcboenrofdtehneceexp(ermiminencto”nisfu)seadn. d a serial number of the exTpheerfiumncetniotn. Tofhreultersagnesnaecrattiioonn icsobnatseendto,ni.teh.eeAapcrhiorreiTciodrdmeitnhoad,dtahteabblaocske tdaiabglrea,misoaf swehticohf ips oshsoswibnl einaFtitgruirbeu2t.es which are coded indicators of life quality. For example,

It starts with generating single-element data sets that are candidates for rules. Support, i.e, the in a database entry 1, 5, 7 , 1 is an indicator of \Actually available income number of repetitions in alfl databasegtransactions involved in the experiment, is counted for each of othfetmh.e population, %", 5 { \Life expectancy at birth in years", 7 { \The Gini coe

Tciheennttw(oin-ecloemmeentcsoentsc,ethnrtere-aetlieomnenftascettso,r.)..",.i-element sets, where 2 ≤ i ≤ k, are generated in the iMteriantiiomn.um support and minimum con dence are speci ed by the user.

The same sets that are redundant are removed from the resulted sets.

While conducting experiments one can consider various transaction and at

After that support is calculated for each of the remain database sets, then the current set tsruipbpuotrtevsaslueetsjs,utphiesrceofmoprearesducwhithathpeamrainmimeatlesrupapsoart \msinesruiap,l sneutbmy btheeruosefrt.he experiment" is useIdf.the condition jsup ≥ minsup is met, then the association rule formation begins, otherwise the cTuhrreenftusnetcitsiorenmoovferdu.les generation is based on the AprioriTid method, the block

Confidence and utility (lift) are calculated for the generated rule.

diagram of which is shown in Figure 2.

If the confidence value is greater than or equal to the minimum confidence value and the lift It starts with generating single-element data sets that are candidates for rules. value is greater than or equal to 1, then the rule is considered to be credible and useful, otherwise it Sisudpepleoterdt., i. e, the number of repetitions in all database transactions involved in the experiment, is counted for each of t3hem.

Then two-element sets, three-element sets, . . . , i-element sets, where 2 i k, are generated in the iteration.

Generating single-element data sets and calculating their

… support i = 2, k Generating i-element data sets

… Removing redundant sets

j = 1, count

Calculating j-set support

jsup ≥ minsup

true Forming a rule and counting its utility false

Deleting of set and their support, the generated association rules and the values of the confidence and utility parameters for each of them. 5 The results of the experiments

The same sets that are redundant are removed from the resulted sets.

We made many experiments with the AprioriTID method of association rule to search for a system of indicators thAafttaefrfetcht alitfesuqupapliotyr.tTishecasulcbusylastteemd ofofrtheeaicnhdiocfattohrse prreompoasienddbayttahbeaasuethsoertss, then the in [10] was takencausrrinepnutt sdeattas.upTphiosrstuvbsaylusteemjspuropviidsescoemighptamreadinwinitdhicatthoers mofinthimepaolpsuulaptpioonrt minsup, life quality and thseeftacbtoyrsththeatuisneflru.ence on each of them.

Database transactions were formed from the original data, which contained a various

If the condition jsup minsup is met, then the association rule formation number of attributes representing the coded life quality indicators and factors distinguished begins, otherwise the current set is removed. according to the experts’ opinion. Overall, there were formed 25 transactions with the various number of attributes fCroomn fdiveentcoe saenvedntueteinl.itWyh(eLnifuts)inagrethceatlrcaunlsaatcetidonfsorwtithhefigveenaetrtraibteudtesruanled. more, fourteen ones included, there were no results of the experiments. The generation of

If the Con dence value is greater than or equal to the minimum con dence value and the lift value is g4reater than or equal to 1, then the rule is considered to be credible and useful, otherwise it is deleted.

Visualization of the results allows us displaying the initial transactions, frequent sets of data and their support, the generated association rules and the values of the con dence and utility parameters for each of them. 5

The results of the experiments

We made many experiments with the AprioriTID method of association rule to search for a system of indicators that a ect life quality. The subsystem of the indicators proposed by the authors in [10] was taken as input data. This subsystem provides eight main indicators of the population life quality and the factors that in uence on each of them.

Database transactions were formed from the original data, which contained a various number of attributes representing the coded life quality indicators and factors distinguished according to the experts' opinion. Overall, there were formed 25 transactions with the various number of attributes from ve to seventeen. When using the transactions with ve attributes and more, fourteen ones included, there were no results of the experiments. The generation of association rules begins with using 15 attributes in a transaction. Figure 3 shows a fragment of the original database transaction with ve and seven attributes.

In Figure 4 you can see a fragment of frequent item sets containing six or seven attributes, the support value of which is equal to three. Four valid useful rules presented in Table 2 were generated based on the frequent item sets above.

The experiment resulted in the generation of fourteen valid and useful association rules. Since any association rule is an operation of implication, it is possible to combine them through a conjunction operation provided that the conjunction is true. After converting a logical expression ve association rules were obtained. They are represented in Table 3.

Here it is seen that to generate the association rule 251 ! 252 15 database transactions were used. This rule means that the \Mortality" indicator (252) is a ected by the \Birth rate" indicator (251).

Or, for example, Rule 230 ^ 235 ! 238 ^ 239 ^ 241 means that \Life quality index" (230) and \Purchasing power" (235) indicators are in uenced on with such indicators as \Paid services volume per capita" (238), \Growth rate of the minimum subsistence level" (239) and \Employment rate of the population" (241).

During the experiments the graphs were plotted. Figure 5 shows the graph of relation between the number of rules and the number of transactions, a trend line was made. Computational experiments with the developed software were carried out. They enabled us to obtain valid and useful association rules for the population life quality indicators, the number of which depends on the input data.

The experiments outcome shows that the indicators and factors in each association rule are interrelated. In addition, the results obtained demonstrate that it is possible to generate valid and useful association rules based on a transactional database. Having performed logical transformations over them, one can create a system of life quality indicators, which then can be used to solve problems of analyzing and forecasting the population life quality.

This approach will enable the state authorities to correct and reasonably develop strategic social and economic programs to improve the population life quality. 10. Saktoev, V.E., Sadykova, E.T.: Sustainable Development of Regional Economic Systems with Environmental Regulations. ZAO \Economy", Moscow, Russia (2011) 11. Zayko, T.A., Oleinik, A.A., Subbotin, S.A.: Association rules in data mining. Bulletin of NTU \KhPI" 39(1012), 82{95 (2013)

1. Agrawal , R. , Imielinski , T. , Swami , A. : Mining association rules between sets of items in large databases . In: Proceedings of the ACM SICMOD conference on management of data . pp. 207 { 216 . Washington , D.C. ( 1993 )

2. Agrawal , R. , Mannila , H. , Stricant , R. , Toivonen , H. , Verkamo , A.I. : Advances in knowledge discovery and data mining, chap . Fast Discovery of Association Rules , pp. 307 { 328 . American Association for Arti cial Intelligence Menlo Park, CA, USA ( 1996 )

3. Agrawal , R. , Srikant , R.: Fast algorithms for mining association rules in large databases . In: Proceedings of the 20th International Conference on Very Large Databases . pp. 487 { 499 . Santiago , Chili ( 1994 )

4. Billig , V.A. , Tsaregorodcev , N.A. , Ivanova , O.V. : Building association rules in medical diagnosis . International Journal of Software & Systems 2 , 146 { 157 ( 2016 )

5. Liu , H. , Wang , B. : An association rule mining algorithm based on a boolean matrix . Data Science Journal 6 , Supplement, 559 { 565 ( 2007 )

6. Olson , D.L. , Delen , D. : Advanced Data Mining Techniques . Springer Publishing Company, Incorporated ( 2008 )

7. Oreshkov , V.: Fpg { an alternative search algorithm for association rules ( 2014 ), uRL: https://basegroup.ru/community/articles/fpg

8. Rao , C.S. , Babu , D.R. , Shankar , R.S. , Kumar , V.P. , Rajanikanth , J. , Sekhar , C.C. : Mining association rules based on boolean algorithm { a study in large databases . International Journal of Machine Learning and Computing 3 ( 4 ), 347 { 350 ( 2013 )

Sahaaya

Arul Mary , S.A. , Malarvizhi , M.: A new improved weighted association rule mining with dynamic programming approach for predicting a user's next access . In: Proceedings of the ICAITA conference . vol. 2 , pp. 105 { 122 . Dubai , UAE ( 2012 )