=Paper= {{Paper |id=Vol-3304/paper31 |storemode=property |title=Research on Improved Apriori Algorithm Based on Simplified Boolean Matrix |pdfUrl=https://ceur-ws.org/Vol-3304/paper31.pdf |volume=Vol-3304 |authors=Jingpeng Ruan,Gang Fang }} ==Research on Improved Apriori Algorithm Based on Simplified Boolean Matrix== https://ceur-ws.org/Vol-3304/paper31.pdf
Research on Improved Apriori Algorithm Based on Simplified
Boolean Matrix
Jingpeng Ruan 1, and Gang Fang 2
1
    College of Electronic & Information Engineering, Chongqing Three Gorges University, Chongqing, China
2
    College of Computer Science & Technology Chongqing Three Gorges University, Chongqing, China

                 Abstract
                 The execution of the Apriori algorithm requires multiple scans of the database and produces a
                 large number of unnecessary frequent itemsets. To improve the efficiency of the Apriori
                 algorithm, an improved Apriori algorithm based on the simplified Boolean matrix is proposed
                 in this paper. This work reduces the number of database scans by introducing a Boolean matrix,
                 reduces the generation of frequent itemsets by simplifying the matrix, and introduces weight
                 vectors to simplify the support of the Apriori algorithm. Experimental results show that the
                 running time of this algorithm is significantly reduced compared with the apriori algorithm,
                 which greatly improves the operational efficiency of the apriori algorithm.

                 Keywords 1
                 Apriori algorithm; Weight vector; Boolean matrix; Matrix simplification

1. Introduction

    Because of its simple principle and easy implementation, Apriori algorithm is very suitable for
database Association Rule Mining. However, the conventional Apriori algorithm will produce a large
number of frequent itemsets in the operation process, which affects the efficiency of the execution
process. In addition, the algorithm will scan the database many times in the execution process, which
will also cause the low efficiency of the algorithm[1-3].
    In order to solve these problems, researchers have proposed many schemes to improve the algorithm,
such as a constrained association rule algorithm to reduce the generation of frequent itemsets by adding
user interest items[4], and some researchers have proposed an incremental association rule algorithm to
limit transaction cardinality[5]. However, the above improved Apriori algorithm does not have
advantages in the evaluation of support degree[6].Therefore, some researchers proposed a matrix-based
association rule generation algorithm, which can not only ensure the support of the algorithm, but also
reduce the number of database scans[7, 8].
    Boolean matrix is a matrix in which all elements are not "0" or "1". The Boolean matrix is introduced
to process the dataset. By scanning the database once, the Boolean matrix corresponding to the database
is generated, which reduces the scanning time of the database and improves the operational efficiency.
In addition, on the basis of introducing Boolean matrix, the matrix is further simplified, and then the
itemsets is generated from the simplified matrix, which can greatly reduce the generation of frequent
itemsets and further improve the efficiency of operation[9].
    In summary, an improved Apriori algorithm based on simplified Boolean matrix is proposed. The
Boolean matrix is introduced, and the transaction is regarded as the row of the matrix, and the item is
regarded as the column of the matrix. If the transaction contains the item, it is denoted as "1", otherwise,
it is denoted as "0". In addition, the weight vector is introduced to simplify the calculation of support
degree.And add a column at the end of the matrix, which is used to record the number of "1" occurrences
in each row of the matrix. The next step is to simplify the matrix. When calculating the support degree
in the previous step, if the support degree of a frequent itemset is not satisfied, the corresponding column

ICBASE2022@3rd International Conference on Big Data & Artificial Intelligence & Software Engineering, October 21-
23, 2022, Guangzhou, China
EMAIL: ry99997@163.com (Jingpeng Ruan); 350874385@qq.com (Gang Fang);
              © 2022 Copyright for this paper by its authors.
              Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
              CEUR Workshop Proceedings (CEUR-WS.org)



                                                                                 257
of the item in the matrix can be directly deleted, so as to realize the simplification of the matrix,when
generating frequent k-itemsets from frequent (k-1)-itemsets, If a line n2.1. Therefore, the frequent 3-itemset is the
frequent 3-itemset. However, the number of items in this matrix is only 1, and the frequent 4-itemset
cannot be generated from the frequent 3-itemset, so the final frequent 3-itemset is {ABC}, and the
algorithm ends.

3.2.    Experimental Verification

   In order to verify that the improved algorithm really improves the time performance of Apriori
algorithm, movielens dataset is selected, different support degrees are set, and Python language is used
for implementation. The line chart of the experimental results is shown in Figure 1:




                                                      261
                                           350                                      Apriori
                                           300                                      SM-Apriori

                                           250




                                Time (s)
                                           200
                                           150
                                           100
                                            50
                                             0
                                                 0.020   0.021   0.022   0.023   0.024   0.025
                                                           Minimum Support
Figure 1 Time comparison of the two algorithms under different support degrees

   According to the simulation results, it can be seen that when the minimum support degree increases,
that is, when the item is more frequent and the correlation is stronger, the time of the two algorithms
begin to approach. However, when the value of the minimum support is small, that is, the correlation is
weak, the time required by the improved algorithm is much lower than that of the classical Apriori
algorithm. Therefore, the improvement can effectively improve the efficiency of the algorithm.

4. Conclusion

    Aiming at the problem that the classical Apriori algorithm overscans and produces a large number
of unnecessary candidate item sets, an improved Apriori algorithm based on reduced Boolean matrix is
proposed, and the weight vector is introduced to simplify the calculation of support degree. The
improved algorithm can only scan the database once to form the corresponding Boolean matrix, and the
following steps of the algorithm are carried out in this matrix. At the same time, a new method to
generate frequent k-itemsets from frequent (k-1)-itemsets is also proposed. The experimental
comparison shows that the improved algorithm can significantly reduce the scanning time and time
required by the algorithm on the premise of ensuring the correct results, so as to improve the efficiency
of data mining.By introducing mathematical knowledge such as matrix and vector to improve algorithm
efficiency, it is expected to become a conventional method to improve algorithm efficiency in the future.

5. References

[1] action dataset, 2013. URL: http: //www.iesl.cs.umass.edu/data/data-umasscitationfield.
[2] The Operation Optimization of Small Index Method Based on Improved Apriori Algorithm [J].
     Mechatronics, 2017.
[3] CHEN W, ZHANG X, MANAGEMENT D O. Excavation of key risk factors based on Apriori
     algorithm for fire and explosion in storage and transportation of hazardous chemicals [J]. Fire
     Safety Science, 2017.
[4] CHENG Z. Application of Data Mining in Power Network Safety Evaluation [J]. Electrical
     Engineering, 2010.
[5] CHENGYU C, YING X. Research and Improvement of Apriori Algorithm for Association Rules
     [J]. Physical Review A, 2016: 1-4.
[6] GONG Q, LIU X, ZENG Y, et al. An energy efficiency solution based on time series data mining
     algorithm on elementary school building [J]. International Journal of Low-Carbon Technologies,
     2022.
[7] LIN W, WU Z, ZHOU H, et al. Research on mining association between weather and power grid
     based on Apriori algorithm [J]. Electrical Engineering, 2019.
[8] PRASAD G D, HANMANDLU M, SAHA T K. Real-Time Decoupled Equivalent Models for
     Static Security Analysis [J]. IEEE Transactions on Power Systems, 1.
[9] QIAO H. Research On QAR Data Mining Method Based On Improved Association Rule [J].
     Physics Procedia, 2012.
[10] ZHU W, GUO Q. Study on Association Rules Mining Algorithm for Micro-grid Fault Detection
     [J]. Electrical Engineering, 2015.


                                                                 262