An Approach to Data Mining Inside PostgreSQL Based on Parallel Implementation of UDFs © Timofey Rechkalov © Mikhail Zymbler South Ural State University, Chelyabinsk, Russia trechkalov@yandex.ru mzym@susu.ru Abstract. Relational DBMSs remain the most popular tool for data processing. However, most of stand-alone data mining packages process flat files outside a DBMS. In-database data mining avoids export- import data/results bottleneck as opposed to use stand-alone mining packages and keeps all the benefits provided by DBMS. The paper describes an approach to data mining inside PostgreSQL based on parallel implementation of user-defined functions (UDFs) for modern Intel many-core platforms. The UDF performs a single mining task on data from the specified table and produces a resulting table. The UDF is organized as a wrapper of an appropriate mining algorithm, which is implemented in C language and is parallelized based on OpenMP technology and thread-level parallelism. The library of such UDFs supports a cache of precomputed mining structures to reduce costs of computations. We compare performance of our approach with R data mining package, and experiments show efficiency of the proposed approach. Keywords: data mining, in-database analytics, PostgreSQL, thread-level parallelism, OpenMP. perimental evaluation of our approach are given in Sec- 1 Introduction tion 3. Section 4 briefly discusses related works. Sec- tion 5 contains summarizing comments and directions Currently relational DBMSs remain the most popular for future research. facility for storing, updating and querying structured data. At the same time, most of data mining algorithms 2 Embedding of data mining functions into suppose processing of flat file(s) outside a DBMS. However, exporting data sets and importing of mining PostgreSQL results impede analysis of large databases outside a 2.1 Motivation example DBMS [18]. In addition to avoiding export-import bot- tleneck, an approach to data mining inside a DBMS Our approach is aimed to provide a database application provides many benefits for the end-user like query op- programmer with the library of data mining functions, timization, data consistency and security, etc. which could be run inside DBMS as it shown in Fig. 1. Existing approaches to integrating data mining with relational DBMSs include special data mining lan- guages and SQL extensions, implementation of mining algorithms in plain SQL and user-defined functions (UDFs) implemented in high-level language like C++. The latter approach could serve as a subject of applying parallel processing on modern many-core platforms. In this paper, we present an approach to data mining inside PostgreSQL open-source DBMS exploiting ca- pabilities of modern Intel MIC (Many Integrated Core) [2] platform. Our approach supposes a library of UDFs where each one of them performs a single mining task on data from the specified table and produces a Figure 1 An example of using data mining function resulting table. The UDF is organized as a wrapper of inside PostgreSQL an appropriate mining algorithm, which is implemented in C language and is parallelized for Intel MIC platform In this example the mining function performs clus- by OpenMP technology and thread-level parallelism. tering by Partitioning Around Medoids (PAM) [8] algo- The paper is structured as follows. We describe the rithm for the data points from the specified input table proposed approach in the Section 2. The results of ex- and saves results in output table (with respect to the specified number of the input table's columns, number of clusters and accuracy). An application programmer is Proceedings of the XIX International Conference not obliged to export data to be mined from DBMS and “Data Analytics and Management in Data Intensive import mining results back. At the same time here PAM Domains” (DAMDID/RCDL’2017), Moscow, Russia, October 10–13, 2017 114 encapsulates parallel implementation [24] based on put table containing data to be mined. The rest parame- OpenMP technology and thread-level parallelism. ters are specific to the task (e.g. number of clusters, accuracy, etc.). 2.2 Component structure Fig. 2 depicts the component structure of our approach. The pgMining is a library of data mining functions each one of them is to be run inside PostgreSQL. The mcMining is a library that exports data mining func- tions, which are parallelized for modern many-core platforms and are subject of wrapping by the respective functions from pgMining library. Implementation of pgMining library uses PostgreSQL's SPI (Server Pro- gramming Interface), which provides low level func- tions for data access. Figure 3 Interface and implementation schema of func- tion from Frontend In fact, Frontend's function wraps the respective UDF from Backend, which is loaded into PostgreSQL and executed as “INSERT INTO … SELECT …” query to save mining results in the specified table. 2.4 Backend Fig. 4 depicts an example of Wrapper's function. Such a function is an UDF, which wraps a parallelized mining function from mcMining and performs as follows. First- ly, the function parses its input to form parameters to call mcMining function with. After that, the function checks if input table and/or auxiliary mining structures Figure 2 Component structure of the proposed are in the cache maintained by Cache manager and then approach load them if not. Finally, call of mcMining function with appropriate parameters is performed. The pgMining library consists of two following sub- systems, namely Frontend and Backend, where the for- mer provides presentation layer and the latter – data access layer of concerns for an application programmer. The Frontend provides a set of functions for mining inside PostgreSQL. Each function performs a single mining task (e.g. clustering, classification, search pat- terns, etc.) and produces a resulting table. The Backend consists of two modules, namely Wrapper and Cache manager. The Wrapper provides functions that serve as envelopes for the respective min- ing functions from mcMining library. The Cache man- ager supports cache of precomputed mining structures to reduce costs of computations. The mcMining library provides a set of functions to solve various data mining tasks in main memory and exploits capabilities of Intel many-core platforms. 2.3 Frontend An example of Frontend's function is given in Fig. 3. Such a function connects to PostgreSQL, carries out some mining task and returns exit code (0 in case of success, otherwise negative error code). As a side ef- Figure 4 Interface and implementation schema of fect, the function creates a table with mining results. function from Backend The function's mandatory parameters are ID of Post- greSQL connection, name of the input table, name of The Cache manager provides buffer pool to store the output table and number of first left columns in in- precomputed mining structures. Distance matrix is a 115 typical example of mining structure to be saved in misses. cache. Indeed, distance matrix A=(aij) stores distances between each pair of ai and aj elements in input data set. 3 Experimental evaluation Being precomputed once, distance matrix could be used many times to perform clustering or kNN-based classi- 3.1 Hardware, datasets and goals of experiments fication with various parameters (e.g. number of clus- To evaluate the developed approach, we performed ex- ters, number of neighbors, accuracy, etc.). periments on the Tornado SUSU supercomputer [9] whose node provides two different platforms, namely Intel Xeon CPU and Intel Xeon Phi coprocessor (cf. Tab. 1 for the specifications). Table 1 Specifications of hardware Specifications CPU Coprocessor Model, Intel Xeon X5680 Phi SE10X Figure 5 Interface of Cache manager module Cores 2×6 61 Frequency, GHz 3.33 1.1 The Cache manager exports the following two basic Threads per core 2 4 functions depicted in Fig. 5. The putObject function Peak performance, TFLOPS 0.371 1.076 loads a mining structure specified by its ID, buffer Memory, Gb 24 18 pointer and size into cache. The getObject searches in Cache, Mb 12 30.5 cache for an object with the given ID. An ID of mining structure is a string, which is made as concatenation of In the experiments, we used datasets with the char- input table's name and object's informational string (e.g. acteristics depicted in Tab. 2. “_distMatrix”). Table 2 Summary of datasets used in experiments 2.5 Library of parallel many-core algorithms Dataset dimen- # # Fig. 6 gives an example of function from mcMining sion clusters data library. Such a function encapsulates parallel implemen- points, tation through OpenMP technology and thread-level ×210 parallelism for Intel many-core platforms. FCS Human [3] 423 10 18 MixSim [13] 5 10 35 US Census [12] 67 10 35 Power Consumption [10] 3 10 35 In the experiments, we studied the following aspects of the developed approach. Firstly, we investigated the Figure 6 Interface of function from mcMining library speedup of mcPAM function to understand its scalabil- ity on both platforms depending on number of threads In this example, we use Partition Around Medoids employed. Secondly, we evaluated the runtime of (PAM) [8] clustering algorithm, which is used in a wide mcPAM function to understand how the performance on spectrum of applications where minimal sensitivity to both platforms depends on number of data points and noise data is required. The PAM provides such a prop- what benefits could we derive from precomputations of erty since it represents cluster centers by points of input the distance matrix. Finally, we compared the perfor- data set (medoids). mance of pgPAM function with implementation of The PAM firstly calculates distance matrix for the PAM algorithm from R data mining package [13]. given data points. Then in the BUILD phase, an initial clustering is obtained by the successive selection of 3.2 Results of experiments medoids until the required number of clusters have been The results of the first series of experiments on mcPAM found. Next, in the SWAP phase the algorithm attempts speedup are depicted in Fig. 7. On both platforms, to improve clustering in accordance with an objective mcPAM’s speedup is close to linear, when the number function. However, for large and high-dimensional da- of threads matches the number of physical cores the tasets PAM's computations are very costly. algorithm is running on (i.e. 12 cores for Intel Xeon and In our previous research [24], we parallelize PAM 60 cores for Intel Xeon Phi, respectively). for Intel Xeon CPU and Intel Xeon Phi coprocessor. In Speedup becomes sub-linear when the algorithm us- order to perform best on Intel many-core platforms the es more than one thread per physical core. The mcPAM PAM's parallel version exploits modifications of loops achieves up to 15× and 120× speedup on Intel Xeon and to provide vectorization of calculations and chunk-by- Intel Xeon Phi, respectively. Summing up, mcPAM chunk data processing to decrease number of cache demonstrates good scalability on both platforms. 116 (a) Intel Xeon CPU (b) Intel Xeon Phi coprocessor Figure 7 Speedup of the mcPAM function (a) FCS Human dataset (b) MixSim dataset (c) US Census dataset (d) Power Consumption dataset Figure 8 Performance of the mcPAM function 117 (a) FCS Human dataset (b) MixSim dataset (c) US Census dataset (d) Power Consumption dataset Figure 9 Performance of the pgPAM function (on Intel Xeon) Fig. 8 shows the results of the second series of ex- computer. We plan to perform this study as further re- periments on mcPAM performance. As was seen, search. PAM's SWAP phase is performed better on Intel Xeon We can see that pgPAM significantly overtakes R's while BUILD phase performance is equal for both plat- PAM in both cases when one thread or the maximum forms. number of threads are employed. Caching of distance Overall performance is better on Intel Xeon Phi than matrix improves the performance up to 80 percent of Intel Xeon when the algorithm deals with big dimen- overall runtime (in case of high-dimensional dataset). sionality dataset due to possibility of intensive vectori- zation in calculations of distance matrix. Since calcula- 4 Related work tions of distance matrix take from 15 to 80 percent of overall runtime, we can derive substantial benefits from The problem of integrating data analytics with relational caching of the distance matrix. DBMSs has been studied since data mining research originates. The results of the third series of experiments on comparison performance of pgPAM and PAM from R Data mining query languages include DMQL [5], data mining package are illustrated in Fig. 9. We carried MSQL [7], MINE RULE operator [14] and Microsoft's out these series of experiments on Intel Xeon platform DMX [28]. only due to the following reason. Running PostgreSQL There are many SQL implementations of data mining on Intel MIC platform demands Intel Xeon Phi Knights algorithms. SQL versions of classical clustering algo- Landing (KNL), which is the next generation product rithms include K-Means [16], EM [19], Fuzzy C- from Intel and is bootable device. However, Intel Xeon Means [15]. SQL versions of association rule mining Phi KNL is not available yet at Tornado SUSU super- algorithms include K-Way-Join, Three-Way-Join, 118 Subquery and Two-Group-Bys [25], Set-oriented Apri- 5 Conclusion ori [29], Quiver [23], Propad [27]. Classification in- cludes SQL implementations of decision trees [26], In this paper, we touch upon the problem of organizing kNN [31] and Bayesian classification [21]. SQL is also data mining inside a DBMS. We present an approach to successfully used in mining applications for data with implementation of in-database analytical functions for “non-relational” nature as graphs, for instance in search PostgreSQL that exploits capabilities of modern Intel for frequent graphs [4], detection of cycles in graph [1], many-core platforms. graph partitioning [11, 22], etc. Our approach supposes implementation of two li- User-defined functions-based approach. Integration braries, namely pgMining and mcMining. The of correlation, linear regression, PCA and clustering pgMiningis a library of data mining functions each one into the Teradata DBMS based on UDFs is proposed of them is to be run inside PostgreSQL. The mcMining in [17]. There are two sets of UDFs that work in a sin- is a library that exports functions to solve various data gle table scan, that is an aggregate UDF to compute mining tasks, which are parallelized for Intel MIC plat- summary matrices and a set of scalar UDFs to score forms. data sets. Experiments showed that UDFs are faster than The pgMining consists of Frontend and Backend SQL queries and UDFs are more efficient than C++, subsystems. The Frontend's function loads an UDF due to long export times. In [20] UDFs implementing from the Backend into PostgreSQL and executes it as common vector operations were presented and it was “INSERT INTO … SELECT …” query to save mining re- shown that UDFs are as efficient as automatically gen- sults in a table. The Backend consists of Wrapper and erated SQL queries with arithmetic expressions and Cache manager modules. The Wrapper provides func- queries calling scalar UDFs are significantly more effi- tions that serve as envelopes for the respective mcMin- cient than equivalent queries using SQL aggregations. ing mining functions. The Cache manager supports In-database mining frameworks. The ATLAS [30] is cache of precomputed mining structures to reduce costs a framework for in-database analytics, which provides of computations. SQL-like database language with user-defined aggre- Since our approach assumes hiding details of paral- gates (UDAs) and table functions. The system's lan- lel implementation from PostgreSQL, such an approach guage processor translates ATLAS programs into C++ could be ported to some other open-source DBMS (with code, which is then compiled and linked with the data- possible non-trivial but mechanical software develop- base storage manager and user-defined external func- ment effort). tions. Authors presented ATLAS-based implementa- We have evaluated our approach on previously im- tions of several data mining algorithms. plemented parallel clustering algorithm of mcMining The MADlib [6] is an open source library of in- library and four real datasets. Experiments showed good database analytical algorithms for PostgreSQL. The speedup and performance of the algorithm as well as MADlib is implemented by a big team and provides our approach derive benefits from caching of precom- many methods for supervised learning, unsupervised puted mining structures and overtakes R data mining learning and descriptive statistics. The MADlib exploits package. UDAs, UDFs, and a sparse matrix C library to provide As future work, we plan to implement other mining efficient representations on disk and in memory. As algorithms formcMining library and conduct experi- many statistical methods are iterative (i.e. they make ments on Intel Xeon Phi Knights Landing platform. many passes over a data set), authors wrote a driver UDF in Python to control iteration in such a way that all Acknowledgement large data movement is done within the database engine and its buffer pool. This work was financially supported by the Russian Comparison. In this paper, we suggest an approach Foundation for BasicResearch (grant No. 17-07-00463), to embedding data mining functions into PostgreSQL. by Act 211 Government of the Russian Federation (con- As some methods mentioned above our approach ex- tract No. 02.A03.21.0011) and by the Ministry of edu- ploits UDFs. The difference from the previous works cation and science of Russian Federation (government includes the following. Our approach supposes parallel- order 2.7905.2017/8.9). ization of UDFs for many-core platform that current DBMS is running on. All the parallelization details are References encapsulated in implementation of the UDF and are [1] Balachandran, R., Padmanabhan, S., Chakra- hided from the DBMS, so our approach could be ported varthy, S.: Enhanced DB-Subdue: Supporting Sub- to some other open-source DBMS (with possible non- tle Aspects of Graph Mining Using a Relational trivial but mechanical software development effort). In Approach. In: W.K. Ng, M. Kitsuregawa, J. Li, K. addition, our approach supposes a special module, Chang (eds.) Advances in Knowledge Discovery which provides a cache of precomputed mining struc- and Data Mining, 10th Pacific-Asia Conf., tures and lets UDF know to reuse these structures to PAKDD 2006, Singapore, April 9–12, 2006, Proc., reduce costs of computations. Lecture Notes in Computer Science, 3918, 119 pp. 673-678. Springer (2006). doi:10.1007/ [13] Melnykov, V., Chen, W.C., Maitra, R.: Mixsim: 11731139 77 An R Package for Simulating Data to Study Per- [2] Duran, A., Klemm, M.: The Intel Many Integrated formance of Clustering Algorithms. J. of Statisti- Core Architecture. In: W.W. Smari, V. Zeljkovic cal Software, Articles 51 (12), pp. 1-25 (2012). (eds.) HPCS, pp. 365-366. IEEE (2012) doi:10.18637/jss.v051.i12 [3] Engreitz, J.M., Jr., B.J.D., Marshall, J.J., Alt- [14] Meo, R., Psaila, G., Ceri, S.: A New SQL-like Op- man, R.B.: Independent Component Analysis: erator for Mining Association Rules. In: Mining Microarray Data for Fundamental Human T.M. Vijayaraman, A.P. Buchmann, C. Mohan, Gene Expression Modules. J. of Biomedical In- N.L. Sarda (eds.) VLDB'96, Proc. of 22th Int. formatics. 43 (6), pp. 932-944 (2010) Conf. on Very Large Data Bases, September 3–6, [4] Garcia, W., Ordonez, C., Zhao, K., Chen, P.: Effi- 1996, Mumbai (Bombay), India, pp. 122-133. cient Algorithms Based on Relational Queries to Morgan Kaufmann (1996) Mine Frequent Graphs. In: A. Nica, A.S. Varde [15] Miniakhmetov, R., Zymbler, M.: Integration of (eds.) Proc. of the Third Ph.D. Workshop on In- Fuzzy c-means Clustering Algorithm with Post- formation and Knowledge Management, PIKM greSQL Database Management System. Numeri- 2010, Toronto, Ontario, Canada, October 30, cal Methods and Programming 13 (2(26)), pp. 46- 2010, pp. 17-24. ACM (2010). doi:10.1145/ 52 (2012) (in Russian) 1871902.1871906 [16] Ordonez, C.: Integrating k-means Clustering with [5] Han, J., Fu, Y., Wang, W., Chiang, J., Gong, W., a Relational DBMS Using SQL. IEEE Trans. Koperski, K., Li, D., Lu, Y., Rajan, A., Stefanov- Knowl. Data Eng. 18 (2), pp. 188-201 (2006). ic, N., Xia, B., Zaiane, O.R.: Dbminer: A System doi:10.1109/TKDE.2006.31 for Mining Knowledge in Large Relational Data- [17] Ordonez, C.: Building Statistical Models and Scor- bases. In: E. Simoudis, J. Han, U.M. Fayyad (eds.) ing with UDFs. In: C.Y. Chan, B.C. Ooi, A. Zhou Proc. of the Second Int. Conf. on Knowledge Dis- (eds.) Proc. of the ACM SIGMOD Int. Conf. on covery and Data Mining (KDD-96), Portland, Or- Management of Data, Beijing, China, June 12–14, egon, USA, pp. 250-255. AAAI Press (1996) 2007, pp. 1005-1016. ACM (2007). [6] Hellerstein, J.M., Re, C., Schoppmann, F., doi:10.1145/1247480.1247599 Wang, D.Z., Fratkin, E., Gorajek, A., Ng, K.S., [18] Ordonez, C.: Statistical Model Computation with Welton, C., Feng, X., Li, K., Kumar, A.: The UDFs. IEEE Trans. Knowl. Data Eng. 22 (12), MADlib Analytics Library or MAD Skills, the pp. 1752-1765 (2010). SQL. PVLDB 5(12), pp. 1700-1711 (2012) doi:10.1109/TKDE.2010.44 [7] Imielinski, T., Virmani, A.: MSQL: A Query Lan- [19] Ordonez, C., Cereghini, P.: SQLEM: Fast Cluster- guage for Database Mining. Data Min. Knowl. ing in SQL Using the EM Algorithm. In: W. Chen, Discov. 3 (4), pp. 373-408 (1999). doi: J.F. Naughton, P.A. Bernstein (eds.) Proc. of the 10.1023/A:1009816913055 2000 ACM SIGMOD Int. Conf. on Management [8] Kaufman, L., Rousseeuw, P.J.: Finding Groups in of Data, May 16–18, 2000, Dallas, Texas, USA, Data: An Introduction to Cluster Analysis. John pp. 559-570. ACM (2000). doi: 10.1145/ Wiley (1990) 342009.335468 [9] Kostenetskiy, P., Safonov, A.: SUSU Supercom- [20] Ordonez, C., Garcia-Garcia, J.: Vector and Matrix puter Resources. In: L. Sokolinsky, I. Starodubov Operations Programmed with UDFs in a Relation- (eds.) PCT'2016, Int. Scientific Conf. on Parallel al DBMS. In: P.S. Yu, V.J. Tsotras, E.A. Fox, Computational Technologies, Arkhangelsk, Rus- B. Liu (eds.). Proc. of the 2006 ACM CIKM Int. sia, March 29–31, 2016, pp. 561-573. CEUR Conf. on Information and Knowledge Manage- Workshop Proceedings. 1576 (2016) ment, Arlington, Virginia, USA, November 6–11, [10] Lichman, M.: UCI Machine Learning Repository 2006, pp. 503-512. ACM (2006). doi:10.1145/ [http://archive.ics.uci.edu/ml/datasets/individual+ 1183614.1183687 household+electric+power+consumption]. Irvine, [21] Ordonez, C., Pitchaimalai, S.K.: Bayesian Classi- CA: University of California, School of Infor- fiers Programmed in SQL. IEEE Trans. Knowl. mation and Computer Science (2013) Data Eng. 22 (1), pp. 139-144 (2010). doi: [11] McCaffrey, J.D.: A Hybrid System for Analyzing 10.1109/TKDE.2009.127 Very Large Graphs. In: S. Latifi (ed.) Ninth Int. [22] Pan, C., Zymbler, M.: Very Large Graph Partition- Conf. on Information Technology: New Genera- ing by Means of Parallel DBMS. In: B. Catania, tions, ITNG 2012, Las Vegas, Nevada, USA, 16– G. Guerrini, J. Pokorny (eds.) Advances in Data- 18 April, 2012, pp. 253-257. IEEE Computer So- bases and Information Systems – 17th East Euro- ciety (2012). doi:10.1109/ITNG.2012.43 pean Conf., ADBIS 2013, Genoa, Italy, September [12] Meek, C., Thiesson, B., Heckerman, D.: The 1–4, 2013. Proc., Lecture Notes in Computer Sci- Learning-curve Sampling Method Applied to ence, 8133, pp. 388-399. Springer (2013). doi: Model-based Clustering. J. of Machine Learning 10.1007/978-3-642-40683-6 29 Research. 2, pp. 397-418 (2002) 120 [23] Rantzau, R.: Frequent Itemset Discovery with Knowledge Management, INAP 2004, and 18th SQL Using Universal Quantification. In: R. Meo, Workshop on Logic Programming, WLP 2004, P.L. Lanzi, M. Klemettinen (eds.) Database Sup- Potsdam, Germany, March 4–6, 2004, Revised Se- port for Data Mining Applications: Discovering lected Papers, Lecture Notes in Computer Science, Knowledge with Inductive Queries, Lecture Notes 3392, pp. 32-46. Springer (2004). doi: in Computer Science, 2682, pp. 194-213. Springer 10.1007/11415763 3 (2004). doi: 10.1007/ 978-3-540-44497-8 10 [28] Tang, Z., Maclennan, J., Kim, P.P.: Building Data [24] Rechkalov, T., Zymbler, M.: Accelerating Me- Mining Solutions with OLE DB for DM and XML doids-based Clustering with the Intel Many Inte- for Analysis. SIGMOD Record, 34 (2), pp. 80-85 grated Core Architecture. In: 9th Int. Conf. on Ap- (2005). doi:10.1145/1083784.1083805 plication of Information and Communication [29] Thomas, S., Chakravarthy, S.: Performance Evalu- Technologies, AICT 2015, October 14–16, 2015, ation and Optimization of Join Queries for Associ- Rostov-on-Don, Russia. Proceedings, pp. 413-417 ation Rule Mining. In: M.K. Mohania, A.M. Tjoa (IEEE, 2015). doi:10.1109/ICAICT.2015. (eds.) Data Warehousing and Knowledge Discov- 7338591 ery, First Int. Conf., DaWaK '99, Florence, Italy, [25] Sarawagi, S., Thomas, S., Agrawal, R.: Integrating August 30 – September 1, 1999, Proc., Lecture Association Rule Mining with Relational Database Notes in Computer Science, 1676, pp. 241-250. systems: Alternatives and Implications. Data Min. Springer (1999). doi:10.1007/ 3-540-48298-9 26 Knowl. Discov. 4 (2/3), pp. 89-125 (2000). [30] Wang, H., Zaniolo, C., Luo, C.: ATLAS: A Small doi:10.1023/A:1009887712954 but Complete SQL Extension for Data Mining and [26] Sattler, K., Dunemann, O.: SQL Database Primi- Data Streams. In: VLDB, pp. 1113-1116 (2003) tives for Decision Tree Classifiers. In: Proc. of the [31] Yao, B., Li, F., Kumar, P.: K Nearest Neighbor 2001 ACM CIKM Int. Conf. on Information and Queries and kNN-joins in Large Relational Data- Knowledge Management, Atlanta, Georgia, USA, bases (almost) for Free. In: F. Li, M.M. Moro, November 5–10, 2001, pp. 379-386. ACM (2001). S. Ghandeharizadeh, J.R. Haritsa, G. Weikum, doi:10.1145/502585.502650 M.J. Carey, F. Casati, E.Y. Chang, I. Manolescu, [27] Shang, X., Sattler, K., Geist, I.: SQL Based Fre- S. Mehrotra, U. Dayal, V.J. Tsotras (eds.) Proc. of quent Pattern Mining with FPGrowth. In: the 26th Int. Conf. on Data Engineering, ICDE D. Seipel, M. Hanus, U. Geske, O. Bartenstein 2010, March 1–6, 2010, Long Beach, California, (eds.) Applications of Declarative Programming USA, pp. 4-15. IEEE Computer Society (2010). and Knowledge Management, 15th Int. Conf. on doi:10.1109/ICDE.2010.5447837 Applications of Declarative Programming and 121