Research on Improved Apriori Algorithm Based on Simplified Boolean Matrix Jingpeng Ruan 1, and Gang Fang 2 1 College of Electronic & Information Engineering, Chongqing Three Gorges University, Chongqing, China 2 College of Computer Science & Technology Chongqing Three Gorges University, Chongqing, China Abstract The execution of the Apriori algorithm requires multiple scans of the database and produces a large number of unnecessary frequent itemsets. To improve the efficiency of the Apriori algorithm, an improved Apriori algorithm based on the simplified Boolean matrix is proposed in this paper. This work reduces the number of database scans by introducing a Boolean matrix, reduces the generation of frequent itemsets by simplifying the matrix, and introduces weight vectors to simplify the support of the Apriori algorithm. Experimental results show that the running time of this algorithm is significantly reduced compared with the apriori algorithm, which greatly improves the operational efficiency of the apriori algorithm. Keywords 1 Apriori algorithm; Weight vector; Boolean matrix; Matrix simplification 1. Introduction Because of its simple principle and easy implementation, Apriori algorithm is very suitable for database Association Rule Mining. However, the conventional Apriori algorithm will produce a large number of frequent itemsets in the operation process, which affects the efficiency of the execution process. In addition, the algorithm will scan the database many times in the execution process, which will also cause the low efficiency of the algorithm[1-3]. In order to solve these problems, researchers have proposed many schemes to improve the algorithm, such as a constrained association rule algorithm to reduce the generation of frequent itemsets by adding user interest items[4], and some researchers have proposed an incremental association rule algorithm to limit transaction cardinality[5]. However, the above improved Apriori algorithm does not have advantages in the evaluation of support degree[6].Therefore, some researchers proposed a matrix-based association rule generation algorithm, which can not only ensure the support of the algorithm, but also reduce the number of database scans[7, 8]. Boolean matrix is a matrix in which all elements are not "0" or "1". The Boolean matrix is introduced to process the dataset. By scanning the database once, the Boolean matrix corresponding to the database is generated, which reduces the scanning time of the database and improves the operational efficiency. In addition, on the basis of introducing Boolean matrix, the matrix is further simplified, and then the itemsets is generated from the simplified matrix, which can greatly reduce the generation of frequent itemsets and further improve the efficiency of operation[9]. In summary, an improved Apriori algorithm based on simplified Boolean matrix is proposed. The Boolean matrix is introduced, and the transaction is regarded as the row of the matrix, and the item is regarded as the column of the matrix. If the transaction contains the item, it is denoted as "1", otherwise, it is denoted as "0". In addition, the weight vector is introduced to simplify the calculation of support degree.And add a column at the end of the matrix, which is used to record the number of "1" occurrences in each row of the matrix. The next step is to simplify the matrix. When calculating the support degree in the previous step, if the support degree of a frequent itemset is not satisfied, the corresponding column ICBASE2022@3rd International Conference on Big Data & Artificial Intelligence & Software Engineering, October 21- 23, 2022, Guangzhou, China EMAIL: ry99997@163.com (Jingpeng Ruan); 350874385@qq.com (Gang Fang); © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 257 of the item in the matrix can be directly deleted, so as to realize the simplification of the matrix,when generating frequent k-itemsets from frequent (k-1)-itemsets, If a line n2.1. Therefore, the frequent 3-itemset is the frequent 3-itemset. However, the number of items in this matrix is only 1, and the frequent 4-itemset cannot be generated from the frequent 3-itemset, so the final frequent 3-itemset is {ABC}, and the algorithm ends. 3.2. Experimental Verification In order to verify that the improved algorithm really improves the time performance of Apriori algorithm, movielens dataset is selected, different support degrees are set, and Python language is used for implementation. The line chart of the experimental results is shown in Figure 1: 261 350 Apriori 300 SM-Apriori 250 Time (s) 200 150 100 50 0 0.020 0.021 0.022 0.023 0.024 0.025 Minimum Support Figure 1 Time comparison of the two algorithms under different support degrees According to the simulation results, it can be seen that when the minimum support degree increases, that is, when the item is more frequent and the correlation is stronger, the time of the two algorithms begin to approach. However, when the value of the minimum support is small, that is, the correlation is weak, the time required by the improved algorithm is much lower than that of the classical Apriori algorithm. Therefore, the improvement can effectively improve the efficiency of the algorithm. 4. Conclusion Aiming at the problem that the classical Apriori algorithm overscans and produces a large number of unnecessary candidate item sets, an improved Apriori algorithm based on reduced Boolean matrix is proposed, and the weight vector is introduced to simplify the calculation of support degree. The improved algorithm can only scan the database once to form the corresponding Boolean matrix, and the following steps of the algorithm are carried out in this matrix. At the same time, a new method to generate frequent k-itemsets from frequent (k-1)-itemsets is also proposed. The experimental comparison shows that the improved algorithm can significantly reduce the scanning time and time required by the algorithm on the premise of ensuring the correct results, so as to improve the efficiency of data mining.By introducing mathematical knowledge such as matrix and vector to improve algorithm efficiency, it is expected to become a conventional method to improve algorithm efficiency in the future. 5. References [1] action dataset, 2013. URL: http: //www.iesl.cs.umass.edu/data/data-umasscitationfield. [2] The Operation Optimization of Small Index Method Based on Improved Apriori Algorithm [J]. Mechatronics, 2017. [3] CHEN W, ZHANG X, MANAGEMENT D O. Excavation of key risk factors based on Apriori algorithm for fire and explosion in storage and transportation of hazardous chemicals [J]. Fire Safety Science, 2017. [4] CHENG Z. Application of Data Mining in Power Network Safety Evaluation [J]. Electrical Engineering, 2010. [5] CHENGYU C, YING X. Research and Improvement of Apriori Algorithm for Association Rules [J]. Physical Review A, 2016: 1-4. [6] GONG Q, LIU X, ZENG Y, et al. An energy efficiency solution based on time series data mining algorithm on elementary school building [J]. International Journal of Low-Carbon Technologies, 2022. [7] LIN W, WU Z, ZHOU H, et al. Research on mining association between weather and power grid based on Apriori algorithm [J]. Electrical Engineering, 2019. [8] PRASAD G D, HANMANDLU M, SAHA T K. Real-Time Decoupled Equivalent Models for Static Security Analysis [J]. IEEE Transactions on Power Systems, 1. [9] QIAO H. Research On QAR Data Mining Method Based On Improved Association Rule [J]. Physics Procedia, 2012. [10] ZHU W, GUO Q. Study on Association Rules Mining Algorithm for Micro-grid Fault Detection [J]. Electrical Engineering, 2015. 262