=Paper=
{{Paper
|id=Vol-2917/paper7
|storemode=property
|title=The Developing of the System for Automatic Audio to Text Conversion
|pdfUrl=https://ceur-ws.org/Vol-2917/paper7.pdf
|volume=Vol-2917
|authors=Vladyslav Tsap,Nataliya Shakhovska,Ivan Sokolovskyi
|dblpUrl=https://dblp.org/rec/conf/momlet/TsapSS21
}}
==The Developing of the System for Automatic Audio to Text Conversion==
The Developing of the System for Automatic Audio to Text Conversion Vladyslav Tsap, Nataliya Shakhovska, Ivan Sokolovskyi Lviv Polytechnic National University, 12 Bandera str., Lviv, 79013, Ukraine Abstract The paper is explored the AdaBoost algorithm, one of the most popular ensemble boosting methods. The main ensemble methods and their advantages and disadvantages are considered and focused on the AdaBoost algorithm. The dataset of the University of Belgium compiled is used. AdaBoost algorithm was experimentally applied to the data set, and its effectiveness was tested. This algorithm can slightly improve the result compared to "strong" classifiers. However, there are cases when even a difference of 3-5% is significant. Keywords1 machine learning, ensemble, Adaboost, performance, Keras, accuracy 1. Introduction There are many tasks in our life that require a lot of time, and it is not always possible to find the optimal solution by human forces. That is why to find an answer to this kind of problem using machine learning (ML), the purpose of which is to predict the result from the input data. In this case, you can quickly find a solution that will also be accurate and effective [1]. There are several types of machine learning [2]: Classical machine learning is used in the case when the task is simple, the data are quite simple, and the features are known; Reinforced learning - used when there is no processed, ready-made data, but there is an environment with which you can interact; Ensembles - used when the quality and accuracy of the result is critical; Neural networks and deep learning - often used when the data is complex, and the signs are not clear or to improve the already known model of MN. The paper will be devoted to the study of ensembles, and more specifically, Adaptive Boosting. The task will be to analyze the basic principles of the AdaBoost boosting ensemble (abbreviated form from Adaptive Boosting), find the areas where this ensemble is already used, and its software implementation a specific example. 2. Literature review At this stage of machine learning development, systems are being developed for the management and interaction of Internet of Things technology, the concept of a "smart city", self-driving cars, and more. These industries are extremely promising because the direction in which large technology companies are moving, as Apple, Amazon, Facebook, Google, and Microsoft are already using MN in products with apparent benefits. However, some tasks are so complex that no machine learning algorithm can handle them independently to give a sufficiently accurate answer. And at this point in the development of MN, ensembles that combine MoMLeT+DS 2021: 3rd International Workshop on Modern Machine Learning Technologies and Data Science, June 5, 2021, Lviv-Shatsk, Ukraine EMAIL: vladyslav.tsap.kn.2017@lpnu.ua (V. Tsap); nataliya.b.shakhovska@lpnu.ua (N. Shakhovska); sokolovskyi.vani@gmail.com (I.Sokolovskyi). ORCID: 0000-0002-8062-0079 (O. V.Tsap); 0000-0002-6875-8534 (N. Shakhovska); 0000-0002-0112-8466 (I.Sokolovskyi) ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) several algorithms will help us a lot to learn and correct mistakes simultaneously, thus improving the accuracy of the results many times over other algorithms. Due to their high accuracy and stability, ensembles are very common among giant IT companies, as it is crucial for them to quickly process large amounts of data and at the same time a reasonably accurate result. It is no secret that ensembles are now actively competing with neural networks, as both are quite effective in the same tasks. Interestingly, ensembles often use unstable algorithms that are unstable to the input data to get better results. It is then that these algorithms can go beyond the usual framework of solutions and at the same time correct each other's errors to get the correct answer. In fact, classification algorithms are combined in an ensemble to increase accuracy by creating a strong classifier from a number of weak classifiers. An example of such unstable algorithms is Regression or Solution Tree because one strong anomaly is enough to break the whole model. However, the very combination of these two algorithms in the ensemble gives an excellent result. For example, algorithms that do not fit are the Bayesian or KNN algorithm because they are very stable. There are three time-tested ways to make ensembles: stacking, bagging, and boosting [3]. In short, the peculiarity of stacking is that we teach several different algorithms and pass their results to the input of the last, who makes the final decision. The critical difference is different algorithms because if you teach the same algorithm on the same data, it will not matter. Regression is usually used as the final algorithm. However, stacking is rarely used among ensembles, as the other two methods are generally more accurate [4, 5]. Its peculiarity is that we train one algorithm many times on random samples from the source data for boosting. In the end, we average all the results, and, in this way, we get the answer. The most famous example of bagging is the Random Forest algorithm, which we can observe when we open the camera on a smartphone and see how the program has circled people's faces with yellow rectangles. Neural networks would be too slow in a particular task, and bagging is ideal here because it can count trees in parallel on all shaders of the video card. It is the possibility of paralleling that gives bugging an advantage over other ensembles [6]. A distinctive feature of the boosting ensemble is that we train our algorithms consistently, even though each subsequent one pays special attention to the cases in which the previous algorithm failed. Like in the running, we make samples from the source data, but now it's not entirely random. In each new sample, we take part of the data on which the previous algorithm worked incorrectly. In fact, we are learning a new algorithm from the mistakes of the previous one. This ensemble has a very high accuracy, which is an advantage over all other ensembles. However, there is also a downside - it is difficult to parallelize. It still works faster than neural networks, but slower than bugging. Also, boosting Sundays can be attributed to the fact that it can lead to the construction of cumbersome compositions, which consist of hundreds of algorithms. Such compositions eliminate the possibility of meaningful interpretation, require massive amounts of memory to store basic algorithms and spend a lot of time calculating classifications. A well-known example of boosting is the problem of classifying objects in an image because simple classifiers based on some features, are usually ineffective in this classification. Using boosting methods for this task is combining weak classifiers with a special method to improve the overall classification capability. The classification of features is a typical computer vision task, where it is determined whether an image contains a certain category of objects or not. This idea is closely related to recognition, identification and analysis. Also, boosting is widely used in the task of ranking the issuance of search engines. This happened when this problem was considered in terms of the loss function, which penalizes for errors in the order of issuance, so it was quite convenient to implement gradient boosting (English gradient boosting) in the ranking. One of the popular boosting algorithms is AdaBoost (short for Adaptive Boosting). It was proposed by Robert Shapiro and Joav Freud in 1996. AdaBoost was the basis for all subsequent research in this area. Its main advantages include high speed, as the construction time of the composition is almost entirely determined by the learning time of basic algorithms, ease of implementation, good generalized property, which can be further improved by increasing the number of basic algorithms, and the ability to identify emissions - the most "heavy" objects xi, for which in the process of increasing the composition of the weight wi take the largest values. Its disadvantages include the fact that AdaBoost is an algorithm with a convex loss function, so it is sensitive to noise in the data and is prone to overfitting compared to other algorithms. An example can be seen in Table 1; namely, we can draw a column error in the test - it first decreases, and then begins to grow, despite the fact that the error in learning is constantly decreasing. Also, the AdaBoost algorithm requires large enough training samples. Other methods of linear correction, in particular, bugging, are able to build algorithms of comparable quality on smaller data samples. The paper aimed to analyse “weak classifiers” and to choose the appropriative number of classisifers and theirs hyperparameters. Table 1. AdaBoost error rate on training and test data Number of classifiers Training error Testing error 1 0.28 0.27 10 0.23 0.24 50 0.19 0.21 100 0.19 0.22 500 0.16 0.25 1000 0.14 0.31 10000 0.11 0.33 3. Materials and Methods Stacking reinforces "weak" classifiers by uniting them in a committee. It has gained its "adaptability" because each subsequent classifier committee is built on objects that previous committees have misclassified. This is because correctly classified objects lose weight, and incorrectly classified objects gain more weight. An example of the algorithm can be explained using Figure 1. Figure 1. Diagram of the AdaBoost algorithm In Box 1, we assign weight levels to all points and use a decision stump to classify them as “pluses” or “minuses”. This "weak" classifier generated a vertical line on the left (D1) to classify the points. We see that this vertical line incorrectly divided the three "pluses", classifying them as "minuses". In this case, we give these three "pluses" more weight and apply this classifier again. In Box 2 we see that the size of the three incorrectly classified "pluses" is larger than the other points. In this case, the threshold classifier of solutions (D2) will try to predict them correctly. And really: now the vertical line (D2) correctly classified the three incorrectly classified "pluses". However, this has led to other classification problems - three "minuses" are incorrectly classified. Let's perform the same operation as in the previous time - assign more weight to incorrectly classified points and apply the classifier again. In Box 3, the three "minuses" have more weight. To correctly distribute these points, we again use the threshold classifier (D3). This time, a horizontal line is formed to classify the “pluses” and “minuses” based on the greater weights of the misclassified observation. In Box 4, we combine the results of classifiers D1, D2, and D3 to form a strong prediction that has a more complex rule than the individual rules of the “weak” classifier. And as a result, in Box 4, we see that the AdaBoost algorithm classified observations much better than any single "weak" classifier. We have a binary classification task with labels 𝑦 ∈ {−1, +1}. In general, the task is given as ℎ(𝑥) = ℎ(𝑥|𝑦), ℎ(𝑥) ∈ {−1, +1}, where classificatory is defined as 𝑦̂(𝑥) = 𝑠𝑖𝑔𝑛 {ℎ0 (𝑥) + ∑𝑁 𝑖=1 𝑐𝑖 ∙ ℎ𝑖 (𝑥)}, and loss function 𝔏(ℎ(𝑥), 𝑦) = 𝑒 −𝑦 ∙ ℎ(𝑥). The algorithm consists of the following steps: Input: training dataset (𝑥𝑖 , 𝑦𝑖 ), 𝑖 = 1,̅̅̅̅̅ 𝑁; basic algorithm ℎ(𝑥) ∈ {−1, +1} trained by weighted dataset; 𝑀 is the number of iterations. 1 The weight initialization: 𝑤𝑖 = , 𝑖 = ̅̅̅̅̅ 1, 𝑁. 𝑁 For 𝑚 = 1, 2, … , 𝑀: o To train ℎ𝑚 (𝑥) on training dataset with weights 𝑤𝑖 , 𝑖 = ̅̅̅̅̅ 1, 𝑁 ∑𝑁 𝑚 𝑖=1 𝑤𝑖 ∙ 𝕀[ℎ (𝑥𝑖 )≠𝑦𝑖 ] o To calculate weighted error 𝐸𝑚 = 𝑁 ∑𝑖=1 𝑤𝑖 o If 𝐸𝑚 > 0.5 or 𝐸𝑚 = 0: stop 1 1−𝐸 o To calculate 𝑐𝑚 = 2 ∙ 𝑙𝑛 𝐸 𝑚. 𝑚 o To increase all weights where the basic algorithm was wrong: 𝑤𝑖 ≔ 𝑤𝑖 ∙ 𝑒 2 ∙ 𝑐𝑚 , 𝑖 ∈ {𝑖: ℎ𝑚 (𝑥𝑖 ) ≠ 𝑦𝑖 } Output: ensemble 𝐹(𝑥) = 𝑠𝑖𝑔𝑛{∑𝑀 𝑚 𝑚=1 𝑐𝑚 ∙ ℎ (𝑥)} 1 1−𝐸 The coefficient for weights looks like the following: 𝑐𝑚 = 2 ∙ 𝑙𝑛 𝐸 𝑚, 𝑚 where 𝐸𝑚 or 𝐸𝑟𝑟𝑜𝑟 is the total number of incorrectly classified points for this training set divided by the size of our dataset. The coefficient 𝑐 or 𝐴𝑙𝑝ℎ𝑎 distribution based on Error rate is given on Figure 2. Figure 2. Graph the value of the coefficient c from the error value Note that when the basic algorithm works well or does not have the wrong classification, it leads to an error of 0 and a relatively high value of c. When the base classifier classifies half of the points correctly, and half incorrectly, the value of the coefficient c will be equal to 0 because this classifier is no better than random assumptions with a probability of 50%. And in the case where the primary classifier constantly gives incorrect results, the value of the coefficient c will become quite negative. There are two cases of alpha function: The value of c is positive when the predicted and actual results coincide, i.e. the point has been classified correctly. In this case, we reduce the point's weight because the work is going in the right direction. A value of c is negative when the predicted result does not match the actual result, ie the point was misclassified. In this case, it is necessary to increase the point's weight so that the same erroneous classification is not repeated in the next classification. In both cases, the basic classifiers are dependent on the result of the previous one. 4. Results We will use the dataset of the University of Belgium compiled in 2017 [13]. This dataset contains information on electricity consumption throughout the house and temperature and humidity in the house and weather conditions (temperature, humidity, pressure, wind speed, etc.) on the street. The structure of dataset is given below. 'data.frame': 19735 obs. of 29 variables: $ date : chr "2016-01-11 17:00:00" "2016-01-11 17:10:00" "2016-01-11 17 :20:00" "2016-01-11 17:30:00" ... $ Appliances : int 60 60 50 50 60 50 60 60 60 70 ... $ lights : int 30 30 30 40 40 40 50 50 40 40 ... $ T1 : num 19.9 19.9 19.9 19.9 19.9 ... $ RH_1 : num 47.6 46.7 46.3 46.1 46.3 ... $ T2 : num 19.2 19.2 19.2 19.2 19.2 ... $ RH_2 : num 44.8 44.7 44.6 44.6 44.5 ... $ T3 : num 19.8 19.8 19.8 19.8 19.8 ... $ RH_3 : num 44.7 44.8 44.9 45 45 ... $ T4 : num 19 19 18.9 18.9 18.9 ... $ RH_4 : num 45.6 46 45.9 45.7 45.5 ... $ T5 : num 17.2 17.2 17.2 17.2 17.2 ... $ RH_5 : num 55.2 55.2 55.1 55.1 55.1 ... $ T6 : num 7.03 6.83 6.56 6.43 6.37 ... $ RH_6 : num 84.3 84.1 83.2 83.4 84.9 ... $ T7 : num 17.2 17.2 17.2 17.1 17.2 ... $ RH_7 : num 41.6 41.6 41.4 41.3 41.2 ... $ T8 : num 18.2 18.2 18.2 18.1 18.1 18.1 18.1 18.1 18.1 18.1 ... $ RH_8 : num 48.9 48.9 48.7 48.6 48.6 ... $ T9 : num 17 17.1 17 17 17 ... $ RH_9 : num 45.5 45.6 45.5 45.4 45.4 ... $ T_out : num 6.6 6.48 6.37 6.25 6.13 ... $ Press_mm_hg: num 734 734 734 734 734 ... $ RH_out : num 92 92 92 92 92 ... $ Windspeed : num 7 6.67 6.33 6 5.67 ... $ Visibility : num 63 59.2 55.3 51.5 47.7 ... $ Tdewpoint : num 5.3 5.2 5.1 5 4.9 ... $ rv1 : num 13.3 18.6 28.6 45.4 10.1 ... $ rv2 : num 13.3 18.6 28.6 45.4 10.1 ... The idea is to classify high and low electricity consumption depending on the weather and learn to predict what the consumption will be depending on the weather. Since this is actual data, it can be used to forecast electricity costs; it will be possible to consider the feasibility of buying energy-efficient light bulbs, for example, or just think about how to reduce electricity consumption. Figure 3. The first 5 records from the dataset We see that there is data that we do not need - then we will clear it. These include, for example, a timestamp. Also, check our dataset for empty values. Checking for Outliers and removing extreme 1% of the data is provided. First, explaratory analysis is given. Distribution graphs of sampled columns is shown on Fig. 4. Figure 4. Distribution graphs of sampled columns Corellation matrix shows us the dependencies between columns. Here the dependencies between T1 and T2, T6 and T_out are found (Fig. 5). Figure 5. Corellation matrix Scatter and density plots are presented on Fig. 6. Figure 6. Scatter plots for all variables. After preparing the dataset, we will review the distribution of electricity use (Fig. 7, Fig. 8). Figure 7. Distribution of electricity consumption. Figure 8. Basic metrics for the energy consumption column We see that the average value is 100 units, which is why we will divide our dataset into 2 parts: low (up to 100) and high consumption (more than 100). Classification will take place on 25 parameters, and the type of consumption will be defined. We first start the Decision Tree - the primary classifier (Fig. 9). Figure 9. The results of the classification of the Decision Tree. Then create an AdaBoost classifier with the following structure: base_estimator ; n_estimators - the maximum number of ratings; learning_rate - influence on the weight of the classifier. As a "weak" classifier we will use the Decision Tree, which we used in the previous classification, the number of classifiers n_estimators will be set to 200, and the effect on learning_rate weights - 1, which is equal to the default value. Run the AdaBoost algorithm (Fig. 10). Figure 10. The results of the AdaBoost classifier. We see that the prediction is more accurate, which means that AdaBoost is appropriate and profitable to use in this case. In general, the improvement is more than 10%, which is a very good result. The comparison with other well-known classifiers is given too. LinearRegression() Average Error : 0.3065 degrees Variance score R^2 : 23.88% Accuracy : 83.00% SVR() Average Error : 0.2764 degrees Variance score R^2 : 23.67% Accuracy : 84.02% RandomForestRegressor(random_state=1) Average Error : 0.1932 degrees Variance score R^2 : 67.16% Accuracy : 85.71% LGBMRegressor(n_estimators=200, num_leaves=41) Average Error : 0.2009 degrees Variance score R^2 : 65.26% Accuracy : 85.54% XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1, importance_type='gain', interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=4, num_parallel_tree=1, random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', validate_parameters=1, verbosity=None) Average Error : 0.2180 degrees Variance score R^2 : 61.44% Accuracy : 85.11% To summarise, the classifier accuracy for Adabost is appropriated 5. Conclusion The paper is explored the AdaBoost algorithm, which is the most popular ensemble boosting method. The main ensemble methods and their advantages and disadvantages are consideres, as well as focused on the AdaBoost algorithm. In the second section, the AdaBoost algorithm was experimentally applied to the data set and its effectiveness was tested. In conclusion, we can say that the effectiveness of ensemble boosting methods compete with neural networks, but they are somewhat more stable and understandable, which is why they have an advantage in choice. Also, the AdaBoost algorithm can slightly improve the result compared to "strong" classifiers. However, there are cases when even a difference of 3-5% is significant. This can be said in the example of Google, which uses AdaBoost in search results, as well as Apple, Amazon, Facebook and Microsoft, which also use ensemble methods, simply do not publish their algorithms. They do it because it is financially profitable. And then even a few percent improvement is valuable. That is why this algorithm will be further developed and used. The limitations of the study are the following: The quality of the ensemble depends on the dataset. For an imbalanced dataset, the classification accuracy will be lower; The modeling of charged cases should be provided together with clustering analysis. The authors plan to model each separated cluster and compare the classification accuracy. The future extention of current work is ensemble development from weak classifiers. References [1] Y.Kryvenchuk, N.Boyko, I.Helzynskyy, T.Helzhynska, R.Danel “Synthesis control system physiological state of a soldier on the battlefield”. CEUR. Vol. 2488. (2019): 297–306. [2] E. Eslami, A.K. Salman, Y.Choi, A. Sayeed, Y. Lops. “A data ensemble approach for real-time air quality forecasting using extremely randomized trees and deep neural networks”. Neural Computing and Applications (2019): 1-17. [3] S.Ardabili, A.Mosavi, A. R.Várkonyi-Kóczy. “Advances in machine learning modeling reviewing hybrid and ensemble methods”. In International Conference on Global Research and Education. Springer, Cham. (2019): 215-227. [4] A.Alves. “Stacking machine learning classifiers to identify Higgs bosons at the LHC”. Journal of Instrumentation, 12(05). (2017): T05005. [5] Y. Freund “A more robust boosting algorithm”. arXiv preprint (2009). arXiv:0905.2138. [6] J.Dou, , Yunus, A. P., Bui, D. T., Merghadi, A., Sahana, M., Zhu, Z., ... & Pham, B. T. (2020). Improved landslide assessment using support vector machine with bagging, boosting, and stacking ensemble machine learning framework in a mountainous watershed, Japan. Landslides, 17(3), 641-658. [7] C.Ying, M.Qi-Guang, L.Jia-Chen, G.Lin. “Advance and prospects of AdaBoost algorithm”. Acta Automatica Sinica, 39(6). (2013): 745-758. [8] T.Hastie, S.Rosset, J.Zhu, H.Zou. “Multi-class adaboost”. Statistics and its Interface, 2(3) (2009): 349- 360. [9] J.Cao, S.Kwong, R.Wang. “A noise-detection based AdaBoost algorithm for mislabeled data”. Pattern Recognition, 45(12), (2012): 4451-4465. [10] D. P.Solomatine, D. L.Shrestha. “AdaBoost. RT: a boosting algorithm for regression problems”. In 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541). Vol. 2, 2004: 1163-1168. [11] Y.Wei, X.Bing, C.Chareonsak. “FPGA implementation of AdaBoost algorithm for detection of face biometrics”. In IEEE International Workshop on Biomedical Circuits and Systems. (2004): 1-6. [12] F.Wang, Z.Li, F.He, R.Wang, W.Yu, F.Nie. “Feature learning viewpoint of AdaBoost and a new algorithm”. IEEE Access, 7, (2019). 149890-149899. [13] O. M.Mozos, C.Stachniss, W.Burgard. “Supervised learning of places from range data using adaboost”. In Proceedings of the 2005 IEEE international conference on robotics and automation (2005): 1730-1735. [14] N.Shakhovska, S.Fedushko, N.Melnykova, I.Shvorob, Y.Syerov. “Big Data analysis in development of personalized medical system”. Procedia Computer Science, 160, (2019): 229-234. [15] R.Tkachenko, I.Izonin. “Model and Principles for the Implementation of Neural-Like Structures Based on Geometric Data Transformations”. In: Hu Z., Petoukhov S., Dychka I., He M. (eds) Advances in Computer Science for Engineering and Education. ICCSEEA 2018. Advances in Intelligent Systems and Computing, vol 754. Springer, Cham (2019) [16] Dataset https://archive.ics.uci.edu/ml/machine-learning-databases/00374/energydata_complete.csv