=Paper=
{{Paper
|id=Vol-3325/regular3
|storemode=property
|title=How Fast is AI in Pharo? Benchmarking Linear Regression
|pdfUrl=https://ceur-ws.org/Vol-3325/regular3.pdf
|volume=Vol-3325
|authors=Oleksandr Zaitsev Sebastian Jordan-Montaño,Stéphane Ducasse
}}
==How Fast is AI in Pharo? Benchmarking Linear Regression==
How Fast is AI in Pharo? Benchmarking Linear Regression Oleksandr Zaitsev1,2,*,† , Sebastian Jordan-Montaño2,*,† and Stéphane Ducasse2 1 Arolla, 25 Rue du Louvre, 75001 Paris, France 2 Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL, Park Plaza, Parc scientifique de la Haute-Borne, 40 Av. Halley Bât A, 59650 Villeneuve-d’Ascq, France Abstract As many other modern programming languages, Pharo spreads its applications into computationally demanding fields such as machine learning, big data, cryptocurrency, etc. This raises a need for fast numerical computation libraries. In this work, we propose to speed up the low-level computations by calling the routines from highly optimized external libraries, e.g., LAPACK or BLAS through the foreign function interface (FFI). As a proof of concept, we build a prototype implementation of linear regression based on the DGELSD routine of LAPACK. Using three benchmark datasets of different sizes, we compare the execution time of our algorithm agains pure Pharo implementation and scikit-learn — a popular Python library for machine learning. We show that LAPACK&Pharo is up to 2103 times faster than pure Pharo. We also show that scikit-learn is 8-5 times faster than our prototype, depending on the size of the data. Finally, we demonstrate that pure Pharo is up to 15 times faster than the equivalent implementation in pure Python. Those findings can lay the foundation for the future work in building fast numerical libraries for Pharo and further using them in higher-level libraries such as pharo-ai. Keywords pharo, benchmarking, machine learning, foreign function interface, lapack, linear regression 1. Introduction Pharo (https://pharo.org/) is an open-source dynamically-typed reflective, object-oriented pro- gramming language inspired by Smalltalk [1]. Pharo community is growing and spreading its applications into various domains of modern computer science: big data [2, 3], machine learn- ing [4],1 cryptocurrency,23 internet of things (IoT) [5]. Those domains require fast numerical algorithms capable of working with large datasets. Modern programming languages provide such libraries and use them as backend for more high-level ones such as scikit-learn4 , a popular IWST’22: International Workshop on Smalltalk Technologies * Corresponding author. † These authors contributed equally. $ oleksandr.zaitsev@inria.fr (O. Zaitsev); sebastian.jordan@inria.fr (S. Jordan-Montaño); stephane.ducasse@inria.fr (S. Ducasse) 0000-0003-0267-2874 (O. Zaitsev); 0000-0002-7726-8377 (S. Jordan-Montaño); 0000-0001-6070-6599 (S. Ducasse) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 1 https://github.com/pharo-ai/wiki 2 https://github.com/smartanvil/Fog 3 https://github.com/oliveiraallex/Pharocks-Cryptobot 4 https://scikit-learn.org/ machine learning library. Pharo also provides a library for numerical computations — PolyMath. However, as we will see in this paper, the current implementation of certain algorithms in Pharo is significantly slower than the ones provided by similar libraries in Python and R that delegate the low-lewel computations to the fast routines compiled in Fortran or C. In this paper, we propose to speed up the numerical computations in Pharo by calling low- lewel routines from the highly optimized external libraries such as LAPACK or BLAS. As a proof of concept, we propose a prototype implementation of the linear regression algorithm in Pharo based on the DGELSD routine of LAPACK.5 We benchmark our algorithm against both an alternative implementation in pure Pharo and scikit-learn — a widely used Python library for machine learning. The Pharo & LAPACK implementation is 1820 times faster than pure Pharo on a small-size dataset (200K rows) and 2103 times faster on a medium-size dataset (1M rows). This serves as a proof of concept that the speed of numerical algorithms, and more specifically machine learning in Pharo can be significantly improved by calling highly optimized external libraries through FFI. We also show that scikit-learn is still 8-5 times faster than our prototype implementation in Pharo (depending on the size of data), which might be caused by data preprocessing and other optimizations. We propose to further explore this difference in future work. Finally, we show that pure Pharo implementation of linear regression is about 5-15 times faster than the equivalent implementation in pure Python. This can be explained by the just-in-time compilation performed by Pharo virtual machine. The rest of this paper is structured in the following way. In Section 2, we discuss the possibility of implementing a machine learning library in Pharo and the challenges associated with it. In Section 4, we briefly explain the foreign function interface (FFI) and how it can be used to call C and Fortran routines from Pharo. In Section 3, we explain different implementations of the linear regression algorithm that we have selected for benchmarking. 2. Can We Do Machine Learning in Pharo? Modern machine learning is resource demanding. It requires fast algorithms capable of process- ing large volumes of data. Many algorithms in machine learning are based on matrix algebra and numerical optimization. This raises the need for fast libraries that implement the basic algebraic operations such as matrix-vector multiplication and also highly optimized algebraic algorithms as singular value decomposition, Cholesky decomposition, eigenvector and eigen- values calculation, etc. for different types of matrices, i.e., symmetric, orthogonal, hermitian, etc. In Pharo, there is PolyMath — a library for numerical computations and scientific computing. With its 60 packages and 903 green tests6 PolyMath provides a wide range of algorithms that cover various aspects of mathematics: differential equations, complex numbers, fuzzy algorithms, etc. It also provides matrix algebra, statistical distributions, and numerical optimization methods that can be particularly useful for machine learning. That being said, some algorithms that are 5 All the code, the datasets, and the instructions on how to reproduce our experiment are available at https: //anonymous.4open.science/r/lapack-experiment-55FB. 6 Measured at https://github.com/PolyMathOrg/PolyMath on June 1, 2022. Fast Machine Math Algebra Learning Python LAPACK SciPy scikit-learn Pharo LAPACK PolyMath pharo-ai Figure 1: Machine learning libraries depend on mathematical libraries that in their turn depend on Lapack - a highly-optimized library providing fast algebraic operations. implemented in PolyMath are very slow, compared to the similar numerical libraries in other languages. For example, as we will see in Section 6, the linear regression implementation that is based on the singular value decomposition (SVD) provided by PolyMath takes almost 5 hours to converge on a dataset with 5,000 rows and 5 columns. There is also an ongoing effort in Pharo community to implement the tools for data science, machine learning, artificial intelligence, data mining, etc.7 One such library is pharo-ai — a collection of different machine learning algorithms implemented in Pharo: linear and logistic regression, support vector machines, K-means, naive bayes, etc. It is inspired by scikit-learn, a similar machine learning library in Python. scikit-learn is a popular library that has many industrial applications. Many of its algorithms are very fast because internally scikit-learn depends on a mathematical library scipy, which in turn depends on LAPACK — an efficient low-level library implemented in Fortran that provides fast algorithms for linear algebra. We propose to improve the speed of PolyMath and pharo-ai by calling LAPACK routines through foreign function interface — a technique for calling external functions which is already implemented by Pharo (see Section 4). In Figure 1, we show how LAPACK is used by scikit- learn in Python and propose a way how it could be used by pharo-ai library. In this scheme, PolyMath implements various high-level mathematical algorithms by delegating low-level algebraic operations to LAPACK. Then pharo-ai simply depends on PolyMath to implement fast machine learning algorithms. As a proof of concept, in this paper we present a prototype implementation of linear regression in Pharo based on LAPACK. We benchmark the training time of our model and show that it is almost 500 times faster than the pure Pharo implementation. 7 For the non-extensive list of different machine learning libraries and tools in Pharo, see https://github.com/ pharo-ai/awesome-pharo-ml. 3. Linear regression and How it is Implemented Linear Regression is a machine learning algorithms that models a relationship between the input matrix 𝑋 and the output vector 𝑦 [6]. ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 𝑥11 . . . 𝑥1𝑛 𝑦1 𝜃0 ⎜ .. . .. . .. ⎠ ; 𝑦 = ⎝ .. ⎠ ; 𝜃 = ⎝ ... ⎟ . 𝑋=⎝ . ⎟ ⎜ ⎟ ⎜ ⎠ 𝑥𝑚1 . . . 𝑥𝑚𝑛 𝑦𝑚 𝜃𝑛 The task of linear regression is to find a set of parameters 𝜃 = (𝜃0 , . . . , 𝜃𝑛 ) such that the predictions 𝑦ˆ𝑖 = ℎ𝜃 (𝑥𝑖 ) are as close as possible to the real output values 𝑦𝑖 . The function ℎ𝜃 (also known as hypothesis function) is defined as: ℎ𝜃 (𝑥𝑖 ) = 𝜃0 + 𝜃1 𝑥𝑖1 + · · · + 𝜃𝑛 𝑥𝑖𝑛 This is an optimization problem: we need to minimize the objective function that is often defined as mean squared error: 𝑚 1 ∑︁ 𝐽(𝜃) = (𝑦𝑖 − ℎ𝜃 (𝑋𝑖: ))2 𝑚 𝑖=1 We have selected linear regression for benchmarking because: (1) it is one of the most well known and commonly used machine learning algorithms; (2) it can be solved by finding a minimum norm solution to the linear least squares problem — an algorithm of linear algebra that is implemented in LAPACK. Gradient Descent Solution Gradient descent is the iterative numerical optimization tech- nique that is used by many machine learning algorithms, including linear regression. It is based on the fact that derivative of the function in a given point is positive if the function is increasing and negative if it is decreasing. This means that by subtracting the value of derivative 𝜕𝜃𝜕 𝑗 𝐽(𝜃), scaled by a parameter 𝛼 (called the learning rate), from the current value of parameter 𝜃𝑗 , we decrease the cost 𝐽(𝜃): (𝑛𝑒𝑤) (𝑜𝑙𝑑) 𝜕 𝜃𝑗 = 𝜃𝑗 −𝛼 𝐽(𝜃) 𝜕𝜃𝑗 Least Squares Solution Alternatively, the problem of linear regression can be expressed in terms of linear least squares. In this case, 𝑋𝜃 = 𝑦 is viewed as a system of linear equation where 𝜃 is the unknown. Unless all points of the dataset lie on a straight line, this system has no solutions. However, it is possible to find the closest approximation to the solution by solving the system 𝑋𝜃 = 𝑦ˆ where 𝑦ˆ is the orthogonal projection of 𝑦 onto the columnspace of 𝑋 (thus the distance between true output 𝑦 and predicted output 𝑦ˆ is the smallest). This is the same as finding the optimal parameters ˆ𝜃 by minimizing the norm: ˆ𝜃 = argmin ⃒⃒𝑦 − 𝑋𝜃⃒⃒ ⃒⃒ ⃒⃒ 2 In this work, we compare the following implementations of linear regression: 1. Iterative gradient descent implementation in pure Pharo and pure Python. 2. Least squares implementation in Pharo (our prototype) and Python (scikit-learn) that are both based on a routine from LAPACK library. 4. Foreign Function Interface in Pharo As many modern programming languages, Pharo supports the Foreign Function Interface (FFI) — the mechanism that allows the call of routines from another programming language [7]. One of the common uses is to speed up computations in interpreted or virtual machine languages by calling routines from natively compiled libraries (e.g., *.dll files on Windows or *.so files on Linux and Mac). Here is an example of calling a C function uint clock() from Pharo using FFI: self ffiCall: #(uint clock()) library: 'libc.so.6'. The method ffiCall:library: is understood by every object in Pharo. It accepts two arguments which specify the shared library and the signature of a method that should be called. The value returned by this method will be the same as the return value of the external function that is being called. Calling LAPACK routine The Linear Algebra PACKage (LAPACK) is a Fortran library that provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems, matrix factorizations, etc. In this work, we use one specific routine of LAPACK that computes the minimum-norm solution to a real linear least squares problem — DGELSD.8 It accepts as input matrix 𝑋 and vector 𝑦 and finds the solution vector ˆ𝜃 which minimizes the norm ||𝑦 − 𝑋𝜃||2 (see Section 3). 5. Experiment Setup Research Questions In our study, we answer the following research questions: • RQ.1 - Measuring LAPACK speedup. How much time improvement can we achieve by calling LAPACK from Pharo? • RQ.2 - Comparing to scikit-learn. How does Pharo & LAPACK implementation compare to the one provided by scikit-learn? • RQ.3 - Comparing pure Pharo with Python. How does pure Pharo implementation of linear regression compare to equivalent pure Python implementation? 8 https://www.netlib.org/lapack/explore-html/d7/d3b/group__double_g_esolve_ ga94bd4a63a6dacf523e25ff617719f752.html Datasets for benchmarking We generated three datasets of different sizes using the make_regression() method of scikit-learn with a random linear regression model with a fixed seed. The sizes of those datasets can be seen in Table 1. The number of columns was fixed at 20 and the number of rows was gradually increased. We did this because the height (number of rows) and width (number of columns) of a dataset have different effect on the training time [6] and we did not want to mix them. The first 19 columns represent the input matrix 𝑋. They contain floating point numbers that are normally distributed around 0 with standard deviation 1. The last column of each dataset represents the output vector 𝑦 generated by applying a non-biased linear regression model with gaussian noise. Table 1 Three datasets that we generated for this study Name Columns Rows Size Small 200,000 20 82 Mb Medium 1,000,000 20 411 Mb Large 5,000,000 20 2.06 Gb Measuring the execution time In this study, we only measure that time that it takes to train the linear regression model as it is generally the most consuming part of the machine learning workflow. In Pharo, we measure time using the timeToRun method: time := [ "... some code..." ] timeToRun. In Python, we use the time library: import t i m e s t a r t _ t i m e = time . time ( ) # . . . some c o d e . . . time = time . time ( ) − s t a r t . time ( ) 6. Results In this section, we answer the research questions and discuss the results of our experiments. The benchmarks were run in a MacBook Pro 2015 with a Intel i7 processor of 4 cores and 2.2G Hz and 16 GB of RAM memory. The operating system is Monterrey v12.4. To run the benchmarks, we closed all the applications that could be closed. In addition, we disconnected the Internet and did not use the computer while experiment was in progress. The computer was plugged into the charger for the duration of the experiment. RQ.1 - Measuring LAPACK speedup To answer the first research question, we measured the execution time of two implementations of linear regression on the three datasets that were discussed in Section 5. The first implementation was implemented in pure Pharo (no external libraries) and based on gradient descent. The second implementation was based on least squares, implemented by calling the DGELSD LAPACK routine from Pharo. As can be seen in Table 2, the LAPACK implementation was 1820 times faster on small dataset and 2103 times faster on medium dataset. We could not measure the execution time of pure Pharo implementation on large dataset because it was too long and we had to stop it after 5 hours. The LAPACK implementation took about 15 seconds to converge on the same dataset. Table 2 The speedup achieved by calling LAPACK routines fom Pharo. Dataset Pharo Pharo & LAPACK Diff Small 00:19:39.090 00:00:00.648 1820× Medium 01:43:59.000 00:00:02.967 2103× Large ∞ 00:00:15.676 — Summary: Calling LAPACK routines from Pharo can provide a significant speedup. The linear regression implemented with Pharo&LAPACK is 462 times faster than the pure Pharo implementation when measured on small dataset and 284 times faster on medium dataset. RQ.2 - Comparing to scikit-learn We compared the execution time of LAPACK-based implementation of linear regression in Pharo to the one provided by LinearRegression class in scikit-learn, which is also based on the DGELSD routine of LAPACK. As can be seen in Table 3, scikit-learn is 8 times faster than our implementation on small dataset, 8 times faster on medium dataset, and 5 times faster on large dataset. Considering that scikit-learn is well designed library with many industrial users and our implementation is a prototype, we hypothesize that this difference is due to various optimizations that are performed in scikit-learn. For example, it uses a highly optimized data structure NumPy9 . A further study is needed to explain this phenomenon. Table 3 Comparison with scikit-learn (both implementations use LAPACK). Dataset Pharo & LAPACK scikit-learn Diff Small 00:00:00.648 00:00:00.079 8× Medium 00:00:02.967 00:00:00.790 8× Large 00:00:15.676 00:00:03.499 5× Summary: scikit-learn implementation is 8 times faster than our prototype implementa- tion in Pharo. It is 5 times faster on a large dataset. 9 https://numpy.org/ RQ.3 - Comparing pure Pharo with pure Python We also compare pure Pharo and pure Python implementations of linear regression based on gradient descent. As can be seen in Table 4, Pharo is 5 times faster than Python on small dataset and 15 times faster on medium dataset. We could not measure the execution time on large dataset, because in both cases it was too long and we had to stop the experiment. Table 4 Comparison between pure Pharo and pure Python Dataset Pharo Python Diff Small 00:19:39.090 01:39:54.275 5× Medium 01:43:59.000 08:25:00.000 15× Large ∞ ∞ — Summary: The implementation of gradient descent-based linear regression in pure Pharo (no external libraries) is about 5-15 times faster than the equivalent implementation in pure Python, depending on the size of the dataset. 7. Threats to Validity We consider the four types of validity threats that were presented by Wohlin et al., [8]: internal, external, construct, and conclusion validity. Below, we discuss the threats to internal and external validity. We did not find any threats to conclusion and construct validity. Internal Validity • The implementation of linear regression with LAPACK was based on least squares while the pure Pharo implementation was based on gradient descent. This poses a threat to internal validity because the time difference may be affected by the choice of algorithm. • The implementation that we propose for Pharo is a prototype, while scikit-learn is the industry standard which contains many optimizations. We tried to study the source code of scikit-learn as much as possible, but this still constitutes a validity threat. • Measuring the execution time is always prone to the noise introduced by different pro- cesses that are run on the computer. To reduce this noise, we have closed all unnecessary windows, disconnected the Internet, plugged the computer to the charger, and killed all processes that could be stopped. External Validity • In this work, we demonstrated how the training time of linear regression can be reduced using LAPACK. The same technique can be used for many other machine learning algo- rithms: logistic regression, neural networks, etc. Nonetheless, there are also algorithms that can not benefit from it because, unlike linear regression, they do not require heavy matrix-vector calculations. Some examples include k-means or k-nearest neighbours. Those algorithms can still be boosted with optimized C code called through FFI, but this requires a separate study. 8. Related Work Over the years, Pharo has been successfully applied in various domains of scientific comput- ing and numerical optimization. Many of those applications use the algorithms provided by PolyMath [9] — a library for scientific computing in Pharo. Other applications depend on the algorithms of artificial intelligence and machine learning [4]. Bergel et al., [10] use neural networks and a visual environment to verify various properties related to the source code. Cota et al., [11] use genetic algorithms for automatic tests generation. Pharo is also used for agent-based modeling by CORMAS (Common Pool Resources and Multi-Agent Systems) — a platform dedicated to natural and common-pool resources management [12, 13]. All those applications could benefit from speeding up numerical computations in Pharo. In this work, we benchmark our prototype implementation of linear regression against scikit- learn — a Python machine learning library that is widely used in both research and industry [14]. Pedregosa et al., [15] report that scikit-learn outperforms other machine learning libraries in Python (MLP, PyBrain, pymvpa, MDP, and Shogun) in 4 out of 6 classical machine learning algorithms. To the best of our knowledge, this is the first attempt to implement a fast numerical library in Pharo by calling an external library such as LAPACK. It is also the first study that benchmarks the machine learning algorithms in Pharo. 9. Conclusion In this work we presented a proof of concept that demonstrates how numerical computations in Pharo can be boosted by calling the routines highly optimized external libraries through FFI. We built a prototype implementation of linear regression algorithm in Pharo based on the DGELSD routine from LAPACK. We measured the execution time on three benchmark datasets and compared it to the time that is needed to train the analogous models in scikit-learn and in pure Pharo. We show that LAPACK-based implementation is up to 2103 times faster than pure Pharo but still 8-5 times slower than scikit-learn implementation, which also uses LAPACK underneeth. We also show that pure Pharo implementation is up to 15 times faster than the equivalent model implemented in pure Python. This interesting finding can be explained by just-in-time compilation that is performed by Pharo virtual machine. 10. Acknowledgements We are grateful to Vincent Aranega and Guillermo Polito for their valuable advice and expla- nations of just-in-time compilation in Pharo. Oleksandr Zaitsev also thanks Arolla10 software company for financing his work during this study. 10 https://arolla.fr References [1] A. P. Black, S. Ducasse, O. Nierstrasz, D. Pollet, D. Cassou, M. Denker, Pharo by Example, Square Bracket Associates, Kehrsatz, Switzerland, 2009. URL: http://books.pharo.org. [2] M. Marra, G. Polito, E. Gonzalez Boix, A debugging approach for live big data applications, Science of Computer Programming 194 (2020) 102460. doi:10.1016/j.scico.2020. 102460. [3] M. Marra, A Live Debugging Approach for Big Data Processing Applications, Ph.D. thesis, Vrije Universiteit Brussel, 2022. [4] A. Bergel, Agile Artificial Intelligence in Pharo, Apress, 2020. [5] A. Oliveira, S. Costiou, S. Ducasse, S. Stinckwich, A pharothings tutorial, in: A PharoThings Tutorial, Square Bracket Associates, 2021. URL: https://github.com/SquareBracketAssociates/ Booklet-APharoThingsTutorial. [6] A. Géron, Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems, " O’Reilly Media, Inc.", 2019. [7] G. Polito, S. Ducasse, P. Tesone, T. Brunzie, Unified ffi - calling foreign functions from pharo, 2020. URL: http://books.pharo.org/booklet-uffi/. [8] C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, A. Wesslén, Experimentation in software engineering: an introduction, Kluwer Academic Publishers, Norwell, MA, USA, 2000. [9] D. H. Besset, Object-Oriented Implementation of Numerical Methods An Introduction with Pharo, Square Bracket Associates, 2016. [10] A. Bergel, P. Melatagia, S. Stinckwich, An api and visual environment to use neural network to reason about source code, in: Conference Companion of the 2nd International Conference on Art, Science, and Engineering of Programming, 2018, pp. 117–120. [11] A. Cota Vidaure, E. Cusi Lopez, J. P. Sandoval Alcocer, A. Bergel, Testevoviz: Visual introspection for genetically-based test coverage evolution., in: VISSOFT, 2020, pp. 1–11. [12] P. Bommel, N. Becu, C. Le Page, F. Bousquet, Cormas: an agent-based simulation platform for coupling human decisions with computerized dynamics, in: Simulation and gaming in the network society, Springer, 2016, pp. 387–410. [13] C. Le Page, N. Becu, P. Bommel, F. Bousquet, Participatory agent-based simulation for renewable resource management: the role of the cormas simulation platform to nurture a community of practice, Journal of Artificial Societies and Social Simulation 15(1) (2012) 16. [14] M. Lang, M. Binder, J. Richter, P. Schratz, F. Pfisterer, S. Coors, Q. Au, G. Casalicchio, L. Kotthoff, B. Bischl, mlr3: A modern object-oriented machine learning framework in r, Journal of Open Source Software 4 (2019) 1903. [15] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, the Journal of machine Learning research 12 (2011) 2825–2830.