Evaluation and Comparison of the Processes in the Frozen Vegetable Production Using Machine Learning Methods Piotr Milczarski Faculty of Physics and Applied Informatics, University of Lodz, Pomorska str. 149/153, Lodz, Poland Abstract In the paper, the study of the carbon footprint (CF) assessment in the frozen vegetable production processes is shown in order to receive low-carbon products. Three methods of clusterization have been chosen for the production assessment. The results of clusterization are evaluated by five classification methods: k-Nearest Neighbors, Multilayer Perceptron, C4.5, Random Forrest and Support Vector Machines with a radial basis kernel function. In the chosen model with five clusters, the best clusterization methods are k-means followed by Canopy. Keywords 1 Carbon Footprint; clusterization; Canopy, k-means, Expectation-Maximization; k-Nearest Neighbors; Multilayer Perceptron; C4.5; Random Forrest; Support Vector Machines 1. Introduction The adoption of an action plan for the reduction of gaseous emissions by EU countries in 2014 requires the reduction of GHG emissions Greenhouse gas emissions from human by 30% by 2030, compared to the level in 2005 activities have been a major contributor to global [6]. The methods of calculating the carbon warming since the mid-twentieth century. footprint are most often based on well-known Agriculture and land-use change contributed to standards. Among them, the most used are: 17% of global anthropogenic greenhouse gas emissions in 2010 [1]. By 2050 the population  ISO14040: 2006 [7] – Environmental will be 9 billion people [2] to ensure supplying of management-life cycle assessment: principles food, agricultural production should be increased and framework, by 60%. Climate change can affect food  ISO14064-1: 2018 [8] – Greenhouse availability; for example, an increase in gases - Part 1: Specification with guidance at temperature, a change in the structure of rainfall the organization level for quantification and or extreme weather events may result in a reporting of greenhouse gas emissions and reduction in agricultural productivity [3, 4]. removals, Therefore, its main challenge has become to  ISO/TS 14067:2018 [9] – Greenhouse mitigate the threats that climate change poses to gases - Carbon footprint of products - food security. Requirements and guidelines for In response to the emerging threats of climate quantification, change, numerous programs, both global and  PAS2050 [10] – Specification for the regional, have been developed, the purpose of assessment of the life cycle greenhouse gas which is to slow down the growth rate of GHG emissions of goods and services. concentration [5]. Achieving climate policy goals Once the carbon footprint has been calculated, requires continuous monitoring of emissions and its detailed data helps to identify weaknesses, i.e. verification of the effectiveness of solutions for high-emission areas, that can be eliminated or the development of a low-emission economy. improved. Thus, the carbon footprint is an indicator of sustainable development ISIT 2021: II International Scientific and Practical Conference «Intellectual Systems and Information Technologies», September 13–19, 2021, Odesa, Ukraine EMAIL: piotr.milczarski@uni.lodz.pl (A. 1); ORCID: 0000-0002-0095-6796 (A. 1); ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 2. Carbon footprint assessment using 3. Carbon footprint assessment in Life Cycle Assessment (LCA) CFOOD project method In the case of the CFOOD project, we focus on Carbon footprint calculation is used as a tool the optimization of the frozen food production for assessing greenhouse gas emissions, helping process, so we consider a segment of the product to manage and reduce them. The carbon footprint life cycle from the moment of raw material is typically calculated using carbon emission delivery to the shipment of the finished frozen factors and activity data that can be assessed food to the recipient. The production process can through a Life Cycle Assessment (LCA). The be divided into several smaller stages: carbon footprint analysis according to the LCA  S1 – initial cooling of the raw materials methodology is carried out by identifying before the processing; potential environmental threats, usually  S2 – the raw material preparation for the throughout the entire life cycle of a product, i.e. production; from the extraction and processing of raw  S3 – raw material pre-processing on the materials, their transport, through main production line; production, distribution and use, to waste  S4 – product freezing in the cold tunnel; management [11]. However, in agricultural  S5 – product preparation to a coldstore. production, the emissions directly related to Each of the process stages is connected to energy consumption are not dominant [12]. A electric meter units. Each production stage has large part of GHG emissions on farms is gas also a preparation phase that is measured losses from farmland and livestock. While separately, e.g. S1 has a preparation phase that is calculating the carbon footprint with the use of denoted pS1, etc. agricultural emission models according to the IPCC reports, all emission sources are taken into In the research section, we have tested several account, both those related to energy carriers and clusterization methods and choose three: Canopy, processes taking place in the agricultural k-Means (KM) and Expectation-Maximization environment. (EM) [17][18]. We have tested several options LCA is a widely used approach to assess the with the cluster numbers and chosen five clusters actual environmental impact of a product from its for each method that should represent according production and use [11] [12] [13]. The standards to our experience some real-time situations that for assessing the product carbon footprint in LCA occur during the production and their accounting are mainly PAS 2050 [10] and ISO / TS 14067 [9]. systems: In the case of the CFOOD project, that is - Optimal production – the product has the presented in the paper, the focus is on the temperature from -25oC till -18oC at the end optimization of the frozen food production of the line; process, so we consider a segment of the product - Close to optimal – during the high season life cycle from the moment of raw material through-output should be higher, hence the delivery to the shipment of the finished frozen energy consumption should be lower, the food to the recipient product temperature is allowed to be from the According to the adopted LCA methodology, range -6oC and -18oC. the carbon footprint of a product consists of - Wrong accounting of some parameters e.g. carbon footprints generated at the following operators mistakes resulting in too high or too stages of its production. Hence the total CF for a low results e.g. the through-output. given product or its unit value can be expressed - Malfunction of the energy meters. It is a by the following formula [14][15][16]: different situation from the above one and r CF  CF i a i (1) might result in random results. The clusterization model with five clusters where: i is each of the stages of the product life should have at least 60 processes. After a year of cycle, i = a, m, t, u, and r, relate to the extraction the process measurement, till June 2021, we have of raw materials, production, transport, use as well collected 152 results only for the frozen onion as the recycling and disposal stage, respectively. production and 75 for the spinach. The other vegetables have less than 50 cases. Nonetheless, the other production e.g. broccoli and cauliflower - k-Means (KM) with Euclidean distance, max- should also be optimized. That is why in the candidates = 100, periodic-pruning = 10000, current work, the results of clusterization of 35 min-density = 2.0, T1 = -1.25 and T2 = -1.0. broccoli processes and 42 cauliflower ones are - Expectation–Maximization (EM) with max- presented in the current paper. candidates = 100, “minimum improvement in log likelihood” = 1E-5, “minimum In the previous work [15][16] to assess the improvement in cross-validated log onion and spinach production processes we have likelihood” = 1E-6, and “minimum allowable prepared the set of verified data and to assess the standard deviation” = 1E-6. trustworthiness of the production data we have compared the results of processes classification Table 2 using 5 classifiers: k-Nearest Neighbors, Multilayer Perceptron [17], C4.5, Random Forrest Canopy clusterization of broccoli production and Support Vector Machines with a radial basis Broccoli Cluster Canopy kernel function [17]. In the current paper, the Attribute 0 1 2 3 4 focus is on unsupervised methods i.e. pS1 0.09 0.39 0.08 0.13 0.13 clusterization [17] into the broccoli and S1 2.85 1.53 0.13 6.92 0.71 cauliflower processes. S2 0.11 0.03 0.10 0.11 0.05 pS3 0.02 0.06 0.05 0.00 0.07 S3 0.44 1.25 0.63 0.14 0.63 Table 1 pS4 1.59 1.75 5.22 0.14 5.36 K-means clusterization of broccoli production, S4 16.85 58.77 45.3 10.65 43.53 the units for stages i-th stage pS1, S1 etc. are in pS5 0.01 0.24 0.00 0.00 0.22 S5 0.21 1.74 0.00 0.21 0.42 kWh/ton, for pt in ton/h, for et in kWh/h pt 2.00 1.35 1.55 1.90 1.92 Broccoli Clusters K-Means et 42.19 85.69 82.9 33.65 100.1 Attribute 0 1 2 3 4 instances 16 3 3 8 5 pS1 0.08 0.32 0.04 4.19 0.09 S1 1.34 1.35 1.51 4.25 2.08 Table 3 S2 0.16 0.03 0.23 0.09 0.08 EM clusterization of broccoli production pS3 0.06 0.05 0.03 0.11 0.06 S3 0.91 1.14 0.70 0.21 1.38 Broccoli Cluster EM pS4 7.68 2.29 0.12 6.54 0.25 Attribute 0 1 2 3 4 S4 49.10 55.69 3.07 13.19 6.40 pS1 0.09 0.33 0.02 89.74 0.25 pS5 0.01 0.18 0.00 0.18 0.01 S1 3.17 13.28 1.16 6.92 1.46 S5 0.18 1.51 0.03 0.24 0.17 S2 0.08 0.11 0.23 0.14 0.06 pt 1.56 1.46 1.80 2.11 2.12 pS3 0.01 0.02 0.04 2.16 0.06 et 98.67 91.01 9.91 57.77 20.32 S3 0.27 0.55 0.77 0.14 1.01 instances 4 4 3 22 2 pS4 0.30 1.86 4.55 129.4 3.27 S4 8.60 38.08 20.92 11.29 52.48 In Tables 1-3 and 4-6 there are clusterization pS5 0.01 0.05 0.00 3.61 0.14 results of the broccoli and cauliflower production S5 0.18 0.68 0.02 0.27 1.02 processes. The units for stages i-th stage pS1, S1 pt 2.13 2.07 1.71 1.96 1.55 et 26.84 104.9 44.61 465.0 95.07 etc. are in kWh/ton, for pt in ton/h, for et in instances 19 2 5 1 8 kWh/h. The results are achieved using the chosen clusterization methods with five clusters: - Canopy: max-candidates = 100; periodic- Figures 1 and 2 show the energy consumption pruning = 10000 ; min-density = 2.0; T2 during the production on the energy meters of the radius = 0.804 and T1 radius = 1.005 chosen stages S1, S2, S3 and S4 for the chosen broccoli process with ID 373 and the cauliflower process with ID 365. Figure 1: Example of energy consumption for the broccoli production, process ID 373; the colors of the stages: S1 – brown, S2 – green, S3- light blue, S4 - dark blue. Figure 2: Example of energy consumption for the cauliflower production, process ID 365; the colors of the stages: S1 – brown, S2 – green, S3- light blue, S4 - dark blue. Table 4 Table 5 K-means clusterization of cauliflower production Canopy clusterization of cauliflower production Cauliflower Clusters K-Means Cauliflower Cluster Canopy Attribute 0 1 2 3 4 Attribute 0 1 2 3 4 pS1 0.52 0.18 5.46 6.97 519.2 pS1 5.23 0.50 519.2 0.70 0.10 S1 24.27 2.48 7.08 1.00 2.28 S1 4.52 24.42 2.28 14.62 7.16 S2 1.13 0.10 0.14 0.06 0.05 S2 0.11 1.60 0.05 0.35 0.08 pS3 0.17 0.06 0.16 3.20 157.7 pS3 1.35 0.09 157.7 0.01 0.01 S3 8.41 0.97 1.71 0.55 1.21 S3 1.34 8.24 1.21 0.77 2.72 pS4 0.43 5.22 3.67 22.58 678.1 pS4 11.26 0.36 678.1 0.11 0.18 S4 28.30 57.14 17.50 3.14 5.55 S4 17.43 26.35 5.55 4.30 11.93 pS5 0.02 0.22 0.14 0.84 48.59 pS5 0.42 0.01 48.59 0.00 0.01 S5 0.69 1.31 0.33 0.06 0.24 S5 0.37 0.55 0.24 0.13 0.58 pt 1.86 1.37 2.07 1.64 2.22 pt 1.80 1.87 2.22 1.67 1.81 et 127.0 92.66 79.17 81.15 3332 et 83.16 123.6 3332 36.75 44.63 instances 3 5 17 15 2 instances 27 2 2 3 8 4. Evaluation of the clusterization Table 7 Evaluation of the broccoli clusterization by the In the discussion presented in Tables 1-6 and, chosen classifiers the optimal clusters have been highlighted. All Broccoli evaluation results [%] Classifier values for the stages and their preprocessing phase Canopy KM EM are in kWh/ton, the production through output (pt) 3NN 85.7 97.1 97.1 in [ton/h]. K-means and EM seem to provide the C4.5 94.3 100 97.1 best assessment of the processes because it’s the MLP 97.1 94.3 97.1 best cluster that has the lowest energy RF 100 100 100 consumption from the three optimal clusters for SVM 100 100 100 each clusterization. Table 6 Table 8 EM clusterization of cauliflower production Evaluation of the cauliflower clusterization by the Cauloflower Cluster EM chosen classifiers Attribute 0 1 2 3 4 Cauliflower evaluation results pS1 3.44 0.50 0.17 34.90 519.2 Classifier [%] S1 4.13 23.95 2.13 0.06 2.28 S2 0.10 0.94 0.10 0.00 0.05 Canopy KM EM pS3 0.11 0.13 0.08 16.03 157.7 3NN 90.5 90.5 85.7 S3 1.31 6.59 0.96 0.00 1.21 C4.5 95.2 97.6 97.6 pS4 2.13 0.34 5.53 113.2 678.1 MLP 92.9 81.0 92.9 S4 11.01 22.59 54.4 0.28 5.55 RF 100 100 100 pS5 0.09 0.01 0.19 4.24 48.59 SVM 100 100 100 S5 0.23 0.58 1.11 0.01 0.24 pt 1.89 1.94 1.47 1.55 2.22 et 48.6 112.4 94.3 363.0 3332 5. Conclusions instances 27 4 6 3 2 In the paper, three clusterization methods have been shown that allow us to assess the processes To assess and to choose the clusterization and their impact on energy consumption and method we have used five machine learning hence, the carbon footprint. We have shown that methods as in our previous work [11][12]. All the all the clustering methods point out the processes clusterization results were assessed by the that are proper from the manufacturing point of classification methods with the same parameters. view. In the paper, the results for the broccoli and In Tab. 5 there are classification results of the cauliflower production taking into account 35 and production processes using the following 42 corresponding processes respectively have classifiers: been shown. Currently, we collect new processes - 3NN (kNN) 3-Nearest Neighbors; for the other vegetable products. The will be - Multilayer Perceptron (MLP) with a hidden analyzed using the clustering methods shown layer with 16 nodes for both productions with a above learning rate equal to 0.79 and momentum The k-means classifier is fast and simple, it has equal to 0.39 [13]; significant disadvantages because it is sensitive to - binary tree C4.5 with a confidence factor equal emissions that distort the average value. Although to 0.25, with a minimum number of instances it gives EM the best results in the assessment of per leaf equal 2; the whole production it is planned to use k-SVD - Random Forrest (RF) with the bag size percent and fuzzy k- means methods in future work. equal to 100, with maximum depth unlimited, number of execution slots equal to 1 and 100 iterations; 6. Acknowledgements - Support Vector Machine (SVM) with a radial basis function (RBF) given by the Eq. (2): The paper is co-financed by the Polish National Center for Research and Development, K(x,y) = exp(-0.05*(x-y)2) (2) grant CFOOD number life cycle greenhouse gas emissions of goods BIOSTRATEG3/343817/17/NCBR/2018. and services. British Standards Institution, 2011. 7. References [11] M.A. Renouf, C. Renaud-Gentie, A. Perrin, C. Kanyarushoki, F. Jourjon, “Effectiveness criteria for customised agricultural life cycle [1] O. Edenhofer, R. Pichs-Madruga, Y. Sokona, assessment tools”, J. Clean. Prod. 179, 2018, E. Farahani, S. Kadner, K. Kadner, A. 246–254 Seyboth, I. Adler, S. Baum, G. Myhre, et al. [12] D. Perez-Neira, A. Grollmus-Venegas, “Climate Change 2014: Mitigation of “Life-cycle energy assessment and carbon Climate Change” Working Group III footprint of peri-urban horticulture. A Contribution to the IPCC Fifth Assessment comparative case study of local food systems Report, Cambridge University Press: in Spain”, Landscape and Urban Planning Cambridge, UK, 2015. 172, 2018, 60-68 [2] Food and Agriculture Organization of the [13] A. Nabavi-Pelesaraei, S. Rafiee, S.S. United Nations (FAO). Regional Strategy for Mohtasebi, H. Hosseinzadeh-Bandbafha, K. Sustainable Hybrid Rice Development in Chau, “Energy consumption enhancement Asia, Food and Agriculture Organization of and environmental life cycle assessment in the United Nations Regional Office for Asia paddy production using optimization and the Pacific: Bangkok, Thailand, 2014. techniques”, J. Clean. Prod. 162, 2017, 571- [3] D.B. Lobell, W. Schlenker, J. Costa-Roberts, 586 “Climate trends and global crop production [14] P. Milczarski, A. Hłobaż, P. Maślanka, B. since 1980”, Science 2011, 333, 616–620. Zieliński, Z. Stawska, P.Kosiński, "Carbon [4] R.Y.M. Kangalawe, C.G. Mungongo, A.G. footprint calculation and optimization Mwakaje, E. Kalumanga, P.Z. Yanda, approach for CFOOD project", CEUR “Climate change and variability impacts on Workshop Proceedings 2683 (2019) 30-34 agricultural production and livelihood [15] P. Milczarski, B. Zieliński, Z. Stawska, A. systems in Western Tanzania”. Clim. Dev. Hłobaż, P. Maślanka, P. Kosiński, "Machine 2017, 9, 202–216. Learning Application in Energy [5] ECE Strategies and policies for air pollution Consumption Calculation and Assessment in abatement. United Nations, New York and Food Processing Industry", ICAISC (2) Geneva, 2007. (2020), Springer LNAI 12416, 369-379. [6] European Council Conclusions 2014. 2030 [16] Z. Stawska, P. Milczarski, et al., ”The carbon Climate and energy policy framework. footprint methodology in CFOOD project.” Conclusions – 23/24 October 2014, EUCO International Journal of Electronics and 169/14, Telecommunications, 2020, 66(4), 781–786 http://www.consilium.europa.eu/uedocs/cms [17] P. Harrington, “Machine Learning in _data/docs/pressdata/en/ec/145397.pdf Action.” Manning Publ. 2012. [7] ISO14040 - Environmental management-life [18] A.P Dempster, N.M. Laird, D.B. Rubin, cycle assessment: principles and framework. "Maximum Likelihood from Incomplete International Organization for Data via the EM Algorithm". Journal of the Standardization, Geneva, 2006. Royal Statistical Society, Series B. 39 (1), [8] ISO14064-1 - Greenhouse gases - Part 1: 1977, 1–38 Specification with guidance at the organization level for quantification and reporting of greenhouse gas emissions and removals. International Organization for Standardization, Geneva, 2018. [9] ISO/TS 14067 - Greenhouse gases - Carbon footprint of products - Requirements and guidelines for quantification. International Organization for Standardization, Geneva, 2018. [10] PAS 2050 (2011) “The Guide to PAS2050- 2011, Specification for the assessment of the