Formation of Life Quality Indicators System
 through Search Algorithm of Association Rules

 Lyudmila P. Bilgaeva, Dashidondok Sh. Shirapov, and Grigoriy V. Badmaev

       East Siberia State University of Technology and Management, Russia
                              http://www.esstu.ru


      Abstract. The paper is devoted to the search of association rules for the
      formation of the indicators system that affects the quality of life. The
      search of association rules is carried out in the transactional database
      based on the method of AprioriTid algorithm to calculate such metrics
      as support, confidence and lift. It results in the extraction of useful as-
      sociation rules showing the relationship of life quality indicators, which
      can be used later to solve the problems of analysis and forecasting.

      Keywords: extraction algorithm of frequent sets of database, the prop-
      erty of monotony, the associative search of life quality indicators, trun-
      cation of candidates


1   Introduction

At present, issues of life quality are relevant, as the current economic crisis has
primarily affected the population. In general, the standard of living depends on
a competent social policy pursued by the state. Solving social problems requires
the adoption of management decisions based on real information. This requires
research aimed at identifying the main factors affecting the life quality.
    In this paper we propose to use methods of searching association rules to
identify the most important indicators of life quality that will enable the au-
thorities to plan and implement certain measures to improve the population
living standards.
    To search association rules is one of the tasks of Data Mining, the modern
technology of intellectual data analysis, which includes finding regularities be-
tween some related events, the identification of related objects and their location
in the space of states. To find associations such a database is typically used in
which all objects are connected to each other, provided that the database is
consistent and integrative.


2   Basic theoretical principles of association rules search

There are many techniques, which allow solving the problem of finding associ-
ation rules. They have the same mathematical approach, but the ways of the
method implementation are different. Let us consider the basic theoretical prin-
ciples of these methods.
    The association rule of context K is an expression of the form

                                    A → B,

where A, B ⊆ M .
    The context K is a tuple (G, M, I), where G is a set of objects, M is a set
of features, but I ⊆ G × M .
    When association rules are searched, special metrics are used: Support,
Confidence, Lift.
    Association rule A → B Support is a quantity defined by the formula:

                                              |(A ∪ B)0 |
                         Support(A → B) =                                     (1)
                                                 |G|

   The Support value indicates which part of the G objects contains A ∪ B. The
Confidence of the association rules is defined by the formula:

                                                |(A ∪ B)0 |
                       Confidence(A → B) =                                    (2)
                                                   |A0 |

The Confidence value shows, which part of the objects that contain A, also
contains A ∪ B.
   The following quantity is called the association rule utility (Lift):

                                           |(A ∪ B)0 |
                           Lift(A → B) =                                      (3)
                                            |A0 | · |B 0 |

   In other words, the utility is the ratio of Confidence(A → B) to the Support(B).
The Lift value indicates the usefulness of the rule. If the found utility value is
more than 1, then the rule is considered to be useful.
   The task of mining Association rules is to find all Association rules of the
context for which the values support and confidence exceed certain set values
min_support and min_confidence, respectively.
   Searching the frequent sets of data is limited to the minimum support value
(min_support), which is set by the user [1–3]. Search of association rules is
made within the frequent sets of data and is limited to the minimum confidence
(min_confidence) and utility value. The minimum confidence is generally set
by the user.
   AprioriTid method, as well as the Apriori method, is based on the anti-
monotony property, the key property when finding multielement frequent sets of
data [4, 11]. It is formulated as follows:

               ∀A, B ⊆ M, A ⊆ B ⇒ Support(B) ≤ Support(A)                     (4)

   It means that:
 – with an increase of the set size its support either decreases or does not
   change;
 – for any set of characteristics support does not exceed the minimum support
   of any of its subsets;
 – the set of n size characteristics will be frequent only if all its n − 1-element
   subsets are frequent.


3       Valid method choice

To select a search method of the association rules the authors developed definite
criteria and comparatively analyzed the certain amount of methods. The results
are given in Table 1.


         Table 1. Comparative analysis of methods of association rules search

                                                    Criteria
No.      Methods      Implementation     Small                 Application Possibility of
                        simplicity     number of      Speed       of ID     candidates’
                                       candidates              transactions truncation
    1       AIS             +              −            −           −            −
    2     SETM              +              −            −           +            −
    3     Apriori           −              +            +           −            +
    4   AprioriTID          +              +            +           +            +
    5   AprioriSome         −              +            +           −            +
    6      FPG              −              +            +           −            +


    The most appropriate method to solve the task of the associative search of in-
dicators affecting the population life quality, is the AprioriTid method proposed
by the group of authors [2].
    Simplicity of implementation is associated with such a data structure as a
table for storing intermediate results.
    Other methods, e. g. Apriori or FPG, use trees as data structures. These
data structures are more complicated to implement [6, 7, 9]. There are methods
of searching for association rules based on the Boolean matrix [5, 8].
    It is convenient to extract data from a database applying database records
identifiers, i. e. TID. TID also enables you to identify whether the generated
rules belong to a particular database record.
    The possibility to truncate candidates allows cutting useless and unreliable
rules at their generation stage in order to optimize the memory used.
4      Software module of associative search of population life
       quality      indicators
       It is convenient to extract data from a database applying database records identifiers, i.e. TID.
TID also enables you to identify whether the generated rules belong to a particular database record.
      The possibility
To solve   the problem  to truncate
                              of thecandidates
                                      formation allows
                                                    of acutting
                                                          systemuseless and unreliable
                                                                   of indicators     thatrules  at their
                                                                                            affect   life
generation stage  in order to optimize the memory   used.
quality of the population, we developed a system the architecture of which is
4 Software module of associative search of population life quality indicators
presented    in the
      To solve   Figure    1. of the formation of a system of indicators that affect life quality of the
                     problem
population, we developed a system the architecture of which is presented in Figure 1.

                   Web application

                   Initial data input

                   Association rules                     API bitrix                      Database
                       search

                    Visualization of
                        results
                           Fig.1. Architecture for Association rules mining system
                    Fig. 1. Architecture for Association rules mining system
 The system consists of a web application and a database interacting with each other through the API
 bitrix component. The database was created in a DBMS MySQL. As a web server a freely
 distributable program OpenServer is used. It is a portable server platform, which is a medium for
 webThe       system The
       development.        consists      of a web isapplication
                                 Web application         composed of and          a database
                                                                            five pages:   the maininteracting      with
                                                                                                     page, parameters
 setup, other
each     transactions     and attributes
                    through      the API    management,    generation of rules,
                                                bitrix component.            The and     visualization
                                                                                    database     was ofcreated
                                                                                                          results. in a
        The system starts with setting up the parameters, such as the minimum support (minsup), the
DBMS         MySQL. As a web server a freely distributable program OpenServer is
 minimum confidence (minconf) and a serial number of the experiment. The transaction content, i.e.
used.     It
 each record  is in
                  a portable
                      a database server
                                    table, is platform,      whichattributes
                                                a set of possible       is a medium
                                                                                  which areforcoded
                                                                                                webindicators
                                                                                                      development.
                                                                                                                 of life
The    Web
 quality.   Forapplication
                 example, in a is     composed
                                   database           of five
                                               entry {1,  5, 7},pages:
                                                                   1 is anthe    mainofpage,
                                                                            indicator           parameters
                                                                                          "Actually              setup,
                                                                                                     available income
 of the population,
transactions          and%",attributes
                                5 - "Life expectancy
                                              management, at birthgeneration
                                                                      in years", 7 -of"The   Giniand
                                                                                         rules,   coefficient  (income
                                                                                                       visualization
 concentration
of  results. factor)."
        Minimum support and minimum confidence are specified by the user.
      The
        While system        starts
                   conducting         with setting
                                  experiments      one canupconsider
                                                                the parameters,           such as
                                                                           various transaction    andthe    minimum
                                                                                                       attributes  sets,
support       (minsup),
 therefore such      a parametertheasminimum          confidence
                                       a “serial number                 (minconf)
                                                           of the experiment”            and a serial number of
                                                                                   is used.
the experiment.
        The function The          transaction
                           of rules   generation is content,
                                                       based oni.the e. each    record
                                                                         AprioriTid      in a database
                                                                                       method,  the block table,
                                                                                                            diagramisofa
 which
set   of ispossible
             shown inattributes
                          Figure 2.       which are coded indicators of life quality. For example,
        It starts with generating single-element data sets that are candidates for rules. Support, i.e, the
in   a database          entry {1, 5, 7}, 1 is an indicator of “Actually available income
 number of repetitions in all database transactions involved in the experiment, is counted for each of
of   the
 them.      population,        %”, 5 – “Life expectancy at birth in years”, 7 – “The Gini
coefficient       (income concentration
        Then two-element                                sets, ..., i-element sets, where 2 ≤ i ≤ k, are generated in
                                 sets, three-elementfactor)”.
 the iteration.
      Minimum support and minimum confidence are specified by the user.
        The same sets that are redundant are removed from the resulted sets.
      While      conducting experiments one can consider various transaction and at-
        After that support is calculated for each of the remain database sets, then the current set
tributes      sets,
 support value jsup    therefore
                          is compared suchwithathe
                                                 parameter        as a minsup,
                                                    minimal support      “serial set
                                                                                   number      of the experiment”
                                                                                       by the user.
is used.If the condition jsup ≥ minsup is met, then the association rule formation begins, otherwise
 the current   set is removed.
      The function          of rules generation is based on the AprioriTid method, the block
diagram Confidence
               of which  and utility
                              is shown(lift) are Figure 2.for the generated rule.
                                             in calculated
        If the confidence value is greater than or equal to the minimum confidence value and the lift
 valueItisstarts
             greaterwith
                       thangenerating
                             or equal to 1,single-element           data sets
                                              then the rule is considered         that
                                                                               to be     are candidates
                                                                                     credible                for rules.
                                                                                               and useful, otherwise   it
Support,
 is deleted.    i. e,  the    number       of  repetitions      in   all  database      transactions     involved     in
the experiment, is counted for each of them.
                                        3
    Then two-element sets, three-element sets, . . . , i-element sets, where 2 ≤ i ≤
k, are generated in the iteration.
                                                       …

                                          Generating single-element
                                         data sets and calculating their
                                                     support


                                                    i = 2, k


                                         Generating i-element data sets            …

                                           Removing redundant sets


                                                 j = 1, count


                                           Calculating j-set support


                                                                           false
                                                jsup ≥ minsup

                                                          true
                                         Forming a rule and counting               Deleting of set
                                                 its utility


                          Fig. 2. BlockFig.
                                          diagram   of association
                                              2. Block   diagram rules    generation rules generation
                                                                    of association
        Visualization of the results allows us displaying the initial transactions, frequent sets of data
and their support, the generated association rules and the values of the confidence and utility
parameters for each of them.
5 The results of the experiments
        We made many    The    same sets
                           experiments      that
                                         with  the are  redundant
                                                   AprioriTID         areofremoved
                                                                 method      associationfrom
                                                                                           rulethe   resulted
                                                                                                to search  for asets.
                        After
system of indicators that        that
                            affect lifesupport   is calculated
                                        quality. The  subsystem offorthe
                                                                       each   of the proposed
                                                                          indicators  remain database       sets, then the
                                                                                                 by the authors
in [10] was takencurrent
                     as input set
                               data.support   value jsup
                                       This subsystem         is compared
                                                         provides  eight mainwith      the minimal
                                                                                 indicators             support minsup,
                                                                                             of the population
life quality and theset  by the
                      factors thatuser.
                                   influence on each of them.
        Database transactions were formed from the original data, which contained a various
                        If the condition jsup ≤ minsup is met, then the association rule formation
number of attributes representing the coded life quality indicators and factors distinguished
                   begins, otherwise the current set is removed.
according to the experts’ opinion. Overall, there were formed 25 transactions with the various
                        Confidence
number of attributes from               and utility
                               five to seventeen.      (Lift)
                                                    When       arethe
                                                           using    calculated
                                                                       transactionsforwith
                                                                                        thefive
                                                                                             generated
                                                                                                 attributesrule.
                                                                                                            and
more, fourteen ones Ifincluded,
                           the Confidence value is greater than or equal to the minimumofconfidence
                                      there  were   no  results  of  the   experiments.    The   generation
                   value and the lift value is greater
                                                4      than or equal to 1, then the rule is considered
                   to be credible and useful, otherwise it is deleted.
   Visualization of the results allows us displaying the initial transactions, fre-
quent sets of data and their support, the generated association rules and the
values of the confidence and utility parameters for each of them.


5   The results of the experiments
We made many experiments with the AprioriTID method of association rule
to search for a system of indicators that affect life quality. The subsystem of
the indicators proposed by the authors in [10] was taken as input data. This
subsystem provides eight main indicators of the population life quality and the
factors that influence on each of them.
    Database transactions were formed from the original data, which contained
a various number of attributes representing the coded life quality indicators
and factors distinguished according to the experts’ opinion. Overall, there were
formed 25 transactions with the various number of attributes from five to seven-
teen. When using the transactions with five attributes and more, fourteen ones
included, there were no results of the experiments. The generation of association
rules begins with using 15 attributes in a transaction. Figure 3 shows a fragment
of the original database transaction with five and seven attributes.


            Fig. 3. Original transactions with five and seven attributes


    In Figure 4 you can see a fragment of frequent item sets containing six or
seven attributes, the support value of which is equal to three. Four valid useful
rules presented in Table 2 were generated based on the frequent item sets above.
Fig. 4. Fragment of the frequent item sets with six or seven attributes with their
support values

                             Table 2. Valid useful rules

                             Rules        Confidence       Lift
                          248 → 249 0.857142857143          1
                          251 → 252        1                1
                          257 → 259        1                1
                        234, 243 → 244     1                1


   The experiment resulted in the generation of fourteen valid and useful as-
sociation rules. Since any association rule is an operation of implication, it is
possible to combine them through a conjunction operation provided that the
conjunction is true. After converting a logical expression five association rules
were obtained. They are represented in Table 3.


                       Table 3. Results of the experiments

       Number of database
 No.                                           Association rules
          transactions
  1           15            251 → 252
  2           16            (248 → 249) ∧ (251 → 252)
  3           18            (248 → 249) ∧ (251 → 252) ∧ (257 → 259)
  4           20            (257 → 259) ∧ (234 ∧ 243 → 244) ∧ (248 → 249)
  5           23            (248 → 249) ∧ (257 → 259) ∧ (234 ∧ 243 → 244)∧ (230 ∧
                            235 → 238 ∧ 239 ∧ 241)


    Here it is seen that to generate the association rule 251 → 252 15 database
transactions were used. This rule means that the “Mortality” indicator (252) is
affected by the “Birth rate” indicator (251).
    Or, for example, Rule 230 ∧ 235 → 238 ∧ 239 ∧ 241 means that “Life quality
index” (230) and “Purchasing power” (235) indicators are influenced on with
such indicators as “Paid services volume per capita” (238), “Growth rate of the
minimum subsistence level” (239) and “Employment rate of the population”
(241).
    During the experiments the graphs were plotted. Figure 5 shows the graph
of relation between the number of rules and the number of transactions, a trend
line was made.


Fig. 5. Graph of relation between the number of rules and the number of transactions


    Figure 5 demonstrates that the number of rules depends on the number
of database transactions. The greater the number of transactions is, the more
association rules are generated, as evidenced by the trend line.
    In another chart shown in Figure 6, you can see the dependence of the number
of rules on the number of features in the transaction and the trend line. It should
be noted that the more elements in the transaction are, the more association
rules are generated. For example, if you have 12 features in the transaction the
maximum number of rules generated is equal to 4. You can see that the value
4 corresponds to 23 transactions, each one including 12 features, as shown in
Figure 6.
    Therefore, we can conclude that the number of rules depends on the number
of database transactions and the number of features in these transactions.


6   Conclusion

Computational experiments with the developed software were carried out. They
enabled us to obtain valid and useful association rules for the population life
quality indicators, the number of which depends on the input data.
    The experiments outcome shows that the indicators and factors in each asso-
ciation rule are interrelated. In addition, the results obtained demonstrate that it
is possible to generate valid and useful association rules based on a transactional
database. Having performed logical transformations over them, one can create
a system of life quality indicators, which then can be used to solve problems of
analyzing and forecasting the population life quality.
Fig. 6. Dependence of the number of rules on the number of features in the transaction


   This approach will enable the state authorities to correct and reasonably
develop strategic social and economic programs to improve the population life
quality.


References

 1. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of
    items in large databases. In: Proceedings of the ACM SICMOD conference on
    management of data. pp. 207–216. Washington, D.C. (1993)
 2. Agrawal, R., Mannila, H., Stricant, R., Toivonen, H., Verkamo, A.I.: Advances in
    knowledge discovery and data mining, chap. Fast Discovery of Association Rules,
    pp. 307–328. American Association for Artificial Intelligence Menlo Park, CA, USA
    (1996)
 3. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large
    databases. In: Proceedings of the 20th International Conference on Very Large
    Databases. pp. 487–499. Santiago, Chili (1994)
 4. Billig, V.A., Tsaregorodcev, N.A., Ivanova, O.V.: Building association rules in
    medical diagnosis. International Journal of Software & Systems 2, 146–157 (2016)
 5. Liu, H., Wang, B.: An association rule mining algorithm based on a boolean matrix.
    Data Science Journal 6, Supplement, 559–565 (2007)
 6. Olson, D.L., Delen, D.: Advanced Data Mining Techniques. Springer Publishing
    Company, Incorporated (2008)
 7. Oreshkov, V.: Fpg – an alternative search algorithm for association rules (2014),
    uRL: https://basegroup.ru/community/articles/fpg
 8. Rao, C.S., Babu, D.R., Shankar, R.S., Kumar, V.P., Rajanikanth, J., Sekhar, C.C.:
    Mining association rules based on boolean algorithm – a study in large databases.
    International Journal of Machine Learning and Computing 3(4), 347–350 (2013)
 9. Sahaaya Arul Mary, S.A., Malarvizhi, M.: A new improved weighted association
    rule mining with dynamic programming approach for predicting a user’s next ac-
    cess. In: Proceedings of the ICAITA conference. vol. 2, pp. 105–122. Dubai, UAE
    (2012)
10. Saktoev, V.E., Sadykova, E.T.: Sustainable Development of Regional Economic
    Systems with Environmental Regulations. ZAO “Economy”, Moscow, Russia
    (2011)
11. Zayko, T.A., Oleinik, A.A., Subbotin, S.A.: Association rules in data mining. Bul-
    letin of NTU “KhPI” 39(1012), 82–95 (2013)