=Paper= {{Paper |id=Vol-3126/paper53 |storemode=property |title=Evaluation and comparison of the processes in the frozen vegetable production using machine learning methods |pdfUrl=https://ceur-ws.org/Vol-3126/paper53.pdf |volume=Vol-3126 |authors=Piotr Milczarski }} ==Evaluation and comparison of the processes in the frozen vegetable production using machine learning methods== https://ceur-ws.org/Vol-3126/paper53.pdf
Evaluation and Comparison of the Processes in the Frozen
Vegetable Production Using Machine Learning Methods
Piotr Milczarski
Faculty of Physics and Applied Informatics, University of Lodz, Pomorska str. 149/153, Lodz, Poland

                  Abstract
                  In the paper, the study of the carbon footprint (CF) assessment in the frozen vegetable
                  production processes is shown in order to receive low-carbon products. Three methods of
                  clusterization have been chosen for the production assessment. The results of clusterization are
                  evaluated by five classification methods: k-Nearest Neighbors, Multilayer Perceptron, C4.5,
                  Random Forrest and Support Vector Machines with a radial basis kernel function. In the chosen
                  model with five clusters, the best clusterization methods are k-means followed by Canopy.

                  Keywords 1
                  Carbon Footprint; clusterization; Canopy, k-means, Expectation-Maximization; k-Nearest
                  Neighbors; Multilayer Perceptron; C4.5; Random Forrest; Support Vector Machines


1. Introduction                                                                                   The adoption of an action plan for the
                                                                                              reduction of gaseous emissions by EU countries
                                                                                              in 2014 requires the reduction of GHG emissions
   Greenhouse gas emissions from human
                                                                                              by 30% by 2030, compared to the level in 2005
activities have been a major contributor to global
                                                                                              [6]. The methods of calculating the carbon
warming since the mid-twentieth century.
                                                                                              footprint are most often based on well-known
Agriculture and land-use change contributed to
                                                                                              standards. Among them, the most used are:
17% of global anthropogenic greenhouse gas
emissions in 2010 [1]. By 2050 the population                                                         ISO14040: 2006 [7] – Environmental
will be 9 billion people [2] to ensure supplying of                                               management-life cycle assessment: principles
food, agricultural production should be increased                                                 and framework,
by 60%. Climate change can affect food                                                                ISO14064-1: 2018 [8] – Greenhouse
availability; for example, an increase in                                                         gases - Part 1: Specification with guidance at
temperature, a change in the structure of rainfall                                                the organization level for quantification and
or extreme weather events may result in a                                                         reporting of greenhouse gas emissions and
reduction in agricultural productivity [3, 4].                                                    removals,
Therefore, its main challenge has become to                                                           ISO/TS 14067:2018 [9] – Greenhouse
mitigate the threats that climate change poses to                                                 gases - Carbon footprint of products -
food security.                                                                                    Requirements        and      guidelines     for
    In response to the emerging threats of climate                                                quantification,
change, numerous programs, both global and                                                            PAS2050 [10] – Specification for the
regional, have been developed, the purpose of                                                     assessment of the life cycle greenhouse gas
which is to slow down the growth rate of GHG                                                      emissions of goods and services.
concentration [5]. Achieving climate policy goals                                                 Once the carbon footprint has been calculated,
requires continuous monitoring of emissions and                                               its detailed data helps to identify weaknesses, i.e.
verification of the effectiveness of solutions for                                            high-emission areas, that can be eliminated or
the development of a low-emission economy.                                                    improved. Thus, the carbon footprint is an
                                                                                              indicator of sustainable development

ISIT 2021: II International Scientific and Practical Conference
«Intellectual Systems and Information Technologies», September
13–19, 2021, Odesa, Ukraine
EMAIL: piotr.milczarski@uni.lodz.pl (A. 1);
ORCID: 0000-0002-0095-6796 (A. 1);
              ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative
              Commons License Attribution 4.0 International (CC BY 4.0).
              CEUR Workshop Proceedings (CEUR-WS.org)
2. Carbon footprint assessment using                     3. Carbon footprint assessment in
   Life Cycle Assessment (LCA)                              CFOOD project
   method
                                                             In the case of the CFOOD project, we focus on
    Carbon footprint calculation is used as a tool       the optimization of the frozen food production
for assessing greenhouse gas emissions, helping          process, so we consider a segment of the product
to manage and reduce them. The carbon footprint          life cycle from the moment of raw material
is typically calculated using carbon emission            delivery to the shipment of the finished frozen
factors and activity data that can be assessed           food to the recipient. The production process can
through a Life Cycle Assessment (LCA). The               be divided into several smaller stages:
carbon footprint analysis according to the LCA                   S1 – initial cooling of the raw materials
methodology is carried out by identifying                   before the processing;
potential     environmental     threats,   usually               S2 – the raw material preparation for the
throughout the entire life cycle of a product, i.e.         production;
from the extraction and processing of raw                        S3 – raw material pre-processing on the
materials, their transport, through main                    production line;
production, distribution and use, to waste                       S4 – product freezing in the cold tunnel;
management [11]. However, in agricultural                        S5 – product preparation to a coldstore.
production, the emissions directly related to               Each of the process stages is connected to
energy consumption are not dominant [12]. A              electric meter units. Each production stage has
large part of GHG emissions on farms is gas              also a preparation phase that is measured
losses from farmland and livestock. While                separately, e.g. S1 has a preparation phase that is
calculating the carbon footprint with the use of
                                                         denoted pS1, etc.
agricultural emission models according to the
IPCC reports, all emission sources are taken into           In the research section, we have tested several
account, both those related to energy carriers and       clusterization methods and choose three: Canopy,
processes taking place in the agricultural               k-Means (KM) and Expectation-Maximization
environment.                                             (EM) [17][18]. We have tested several options
    LCA is a widely used approach to assess the          with the cluster numbers and chosen five clusters
actual environmental impact of a product from its        for each method that should represent according
production and use [11] [12] [13]. The standards         to our experience some real-time situations that
for assessing the product carbon footprint in LCA        occur during the production and their accounting
are mainly PAS 2050 [10] and ISO / TS 14067 [9].         systems:
    In the case of the CFOOD project, that is            -   Optimal production – the product has the
presented in the paper, the focus is on the                  temperature from -25oC till -18oC at the end
optimization of the frozen food production                   of the line;
process, so we consider a segment of the product         - Close to optimal – during the high season
life cycle from the moment of raw material                   through-output should be higher, hence the
delivery to the shipment of the finished frozen              energy consumption should be lower, the
food to the recipient                                        product temperature is allowed to be from the
    According to the adopted LCA methodology,                range -6oC and -18oC.
the carbon footprint of a product consists of            - Wrong accounting of some parameters e.g.
carbon footprints generated at the following                 operators mistakes resulting in too high or too
stages of its production. Hence the total CF for a           low results e.g. the through-output.
given product or its unit value can be expressed         - Malfunction of the energy meters. It is a
by the following formula [14][15][16]:                       different situation from the above one and
                         r
                 CF    CF
                        i a
                               i                   (1)
                                                             might result in random results.
                                                            The clusterization model with five clusters
where: i is each of the stages of the product life       should have at least 60 processes. After a year of
cycle, i = a, m, t, u, and r, relate to the extraction   the process measurement, till June 2021, we have
of raw materials, production, transport, use as well     collected 152 results only for the frozen onion
as the recycling and disposal stage, respectively.       production and 75 for the spinach. The other
                                                         vegetables have less than 50 cases. Nonetheless,
the other production e.g. broccoli and cauliflower    -     k-Means (KM) with Euclidean distance, max-
should also be optimized. That is why in the                candidates = 100, periodic-pruning = 10000,
current work, the results of clusterization of 35           min-density = 2.0, T1 = -1.25 and T2 = -1.0.
broccoli processes and 42 cauliflower ones are        -     Expectation–Maximization (EM) with max-
presented in the current paper.                             candidates = 100, “minimum improvement in
                                                            log likelihood” = 1E-5, “minimum
   In the previous work [15][16] to assess the
                                                            improvement      in    cross-validated    log
onion and spinach production processes we have
                                                            likelihood” = 1E-6, and “minimum allowable
prepared the set of verified data and to assess the
                                                            standard deviation” = 1E-6.
trustworthiness of the production data we have
compared the results of processes classification
                                                      Table 2
using 5 classifiers: k-Nearest Neighbors,
Multilayer Perceptron [17], C4.5, Random Forrest      Canopy clusterization of broccoli production
and Support Vector Machines with a radial basis                               Broccoli Cluster Canopy
kernel function [17]. In the current paper, the           Attribute     0        1        2       3     4
focus is on unsupervised methods i.e.                        pS1      0.09     0.39     0.08    0.13  0.13
clusterization [17] into the broccoli and                     S1      2.85     1.53     0.13    6.92  0.71
cauliflower processes.                                        S2      0.11     0.03     0.10    0.11  0.05
                                                             pS3      0.02     0.06     0.05    0.00  0.07
                                                              S3      0.44     1.25     0.63    0.14  0.63
Table 1                                                      pS4      1.59     1.75     5.22    0.14  5.36
K-means clusterization of broccoli production,                S4      16.85    58.77 45.3 10.65 43.53
the units for stages i-th stage pS1, S1 etc. are in          pS5      0.01     0.24     0.00    0.00  0.22
                                                              S5      0.21     1.74     0.00    0.21  0.42
kWh/ton, for pt in ton/h, for et in kWh/h
                                                              pt      2.00     1.35     1.55    1.90  1.92
                  Broccoli Clusters K-Means                   et      42.19    85.69 82.9 33.65 100.1
 Attribute     0      1       2        3      4           instances    16        3        3       8     5
    pS1      0.08   0.32 0.04 4.19          0.09
     S1      1.34   1.35 1.51 4.25          2.08      Table 3
     S2      0.16   0.03 0.23 0.09          0.08
                                                      EM clusterization of broccoli production
    pS3      0.06   0.05 0.03 0.11          0.06
     S3      0.91   1.14 0.70 0.21          1.38                               Broccoli Cluster EM
    pS4      7.68   2.29 0.12 6.54          0.25          Attribute     0       1       2         3      4
     S4      49.10 55.69 3.07 13.19         6.40             pS1      0.09    0.33    0.02 89.74       0.25
    pS5      0.01   0.18 0.00 0.18          0.01              S1      3.17    13.28 1.16        6.92   1.46
     S5      0.18   1.51 0.03 0.24          0.17              S2      0.08    0.11    0.23      0.14   0.06
     pt      1.56   1.46 1.80 2.11          2.12             pS3      0.01    0.02    0.04      2.16   0.06
     et      98.67 91.01 9.91 57.77 20.32                     S3      0.27    0.55    0.77      0.14   1.01
 instances     4      4       3       22      2              pS4      0.30    1.86    4.55 129.4       3.27
                                                              S4      8.60    38.08 20.92 11.29        52.48
In Tables 1-3 and 4-6 there are clusterization               pS5      0.01    0.05    0.00      3.61   0.14
results of the broccoli and cauliflower production            S5      0.18    0.68    0.02      0.27   1.02
processes. The units for stages i-th stage pS1, S1            pt      2.13    2.07    1.71      1.96   1.55
                                                              et      26.84   104.9 44.61 465.0        95.07
etc. are in kWh/ton, for pt in ton/h, for et in
                                                          instances    19       2       5         1      8
kWh/h. The results are achieved using the chosen
clusterization methods with five clusters:
- Canopy: max-candidates = 100; periodic-                Figures 1 and 2 show the energy consumption
    pruning = 10000 ; min-density = 2.0; T2           during the production on the energy meters of the
    radius = 0.804 and T1 radius = 1.005              chosen stages S1, S2, S3 and S4 for the chosen
                                                      broccoli process with ID 373 and the cauliflower
                                                      process with ID 365.
Figure 1: Example of energy consumption for the broccoli production, process ID 373; the colors of
the stages: S1 – brown, S2 – green, S3- light blue, S4 - dark blue.




Figure 2: Example of energy consumption for the cauliflower production, process ID 365; the colors of
the stages: S1 – brown, S2 – green, S3- light blue, S4 - dark blue.

Table 4                                              Table 5
K-means clusterization of cauliflower production     Canopy clusterization of cauliflower production
              Cauliflower Clusters K-Means                             Cauliflower Cluster Canopy
 Attribute     0       1       2       3       4      Attribute     0      1        2        3      4
    pS1      0.52    0.18    5.46    6.97    519.2       pS1      5.23   0.50 519.2 0.70          0.10
     S1      24.27   2.48    7.08    1.00    2.28         S1      4.52 24.42 2.28 14.62 7.16
     S2      1.13    0.10    0.14    0.06    0.05         S2      0.11   1.60     0.05     0.35   0.08
    pS3      0.17    0.06    0.16    3.20    157.7       pS3      1.35   0.09 157.7 0.01          0.01
     S3      8.41    0.97    1.71    0.55    1.21         S3      1.34   8.24     1.21     0.77   2.72
    pS4      0.43    5.22    3.67    22.58   678.1       pS4      11.26 0.36 678.1 0.11           0.18
     S4      28.30   57.14   17.50   3.14    5.55         S4      17.43 26.35 5.55         4.30 11.93
    pS5      0.02    0.22    0.14    0.84    48.59       pS5      0.42   0.01 48.59 0.00          0.01
     S5      0.69    1.31    0.33    0.06    0.24         S5      0.37   0.55     0.24     0.13   0.58
     pt      1.86    1.37    2.07    1.64    2.22         pt      1.80   1.87     2.22     1.67   1.81
     et      127.0   92.66   79.17   81.15   3332         et      83.16 123.6 3332 36.75 44.63
 instances     3       5      17      15       2      instances    27      2        2        3      8
4. Evaluation of the clusterization                   Table 7
                                                      Evaluation of the broccoli clusterization by the
    In the discussion presented in Tables 1-6 and,    chosen classifiers
the optimal clusters have been highlighted. All                      Broccoli evaluation results [%]
                                                        Classifier
values for the stages and their preprocessing phase                   Canopy        KM          EM
are in kWh/ton, the production through output (pt)        3NN           85.7       97.1        97.1
in [ton/h]. K-means and EM seem to provide the            C4.5          94.3        100        97.1
best assessment of the processes because it’s the         MLP           97.1       94.3        97.1
best cluster that has the lowest energy
                                                           RF            100        100         100
consumption from the three optimal clusters for
                                                          SVM            100        100         100
each clusterization.
Table 6                                               Table 8
EM clusterization of cauliflower production           Evaluation of the cauliflower clusterization by the
                   Cauloflower Cluster EM             chosen classifiers
 Attribute     0      1      2       3        4                       Cauliflower evaluation results
    pS1      3.44   0.50 0.17 34.90         519.2       Classifier                   [%]
     S1      4.13   23.95 2.13 0.06         2.28
     S2      0.10   0.94 0.10 0.00          0.05                      Canopy          KM          EM
    pS3      0.11   0.13 0.08 16.03         157.7         3NN           90.5          90.5       85.7
     S3      1.31   6.59 0.96 0.00          1.21          C4.5          95.2          97.6       97.6
    pS4      2.13   0.34 5.53 113.2         678.1         MLP           92.9          81.0       92.9
     S4      11.01 22.59 54.4 0.28          5.55           RF            100          100         100
    pS5      0.09   0.01 0.19 4.24          48.59         SVM            100          100         100
     S5      0.23   0.58 1.11 0.01          0.24
     pt      1.89   1.94 1.47 1.55          2.22
     et      48.6   112.4 94.3 363.0        3332      5. Conclusions
 instances    27      4      6       3        2
                                                          In the paper, three clusterization methods have
                                                      been shown that allow us to assess the processes
   To assess and to choose the clusterization
                                                      and their impact on energy consumption and
method we have used five machine learning
                                                      hence, the carbon footprint. We have shown that
methods as in our previous work [11][12]. All the
                                                      all the clustering methods point out the processes
clusterization results were assessed by the
                                                      that are proper from the manufacturing point of
classification methods with the same parameters.
                                                      view. In the paper, the results for the broccoli and
In Tab. 5 there are classification results of the
                                                      cauliflower production taking into account 35 and
production processes using the following
                                                      42 corresponding processes respectively have
classifiers:
                                                      been shown. Currently, we collect new processes
- 3NN (kNN) 3-Nearest Neighbors;                      for the other vegetable products. The will be
- Multilayer Perceptron (MLP) with a hidden           analyzed using the clustering methods shown
  layer with 16 nodes for both productions with a     above
  learning rate equal to 0.79 and momentum
                                                          The k-means classifier is fast and simple, it has
  equal to 0.39 [13];
                                                      significant disadvantages because it is sensitive to
- binary tree C4.5 with a confidence factor equal
                                                      emissions that distort the average value. Although
  to 0.25, with a minimum number of instances
                                                      it gives EM the best results in the assessment of
  per leaf equal 2;
                                                      the whole production it is planned to use k-SVD
- Random Forrest (RF) with the bag size percent
                                                      and fuzzy k- means methods in future work.
  equal to 100, with maximum depth unlimited,
  number of execution slots equal to 1 and 100
  iterations;                                         6. Acknowledgements
- Support Vector Machine (SVM) with a radial
  basis function (RBF) given by the Eq. (2):            The paper is co-financed by the Polish
                                                      National Center for Research and Development,
         K(x,y) = exp(-0.05*(x-y)2)           (2)
grant        CFOOD              number                    life cycle greenhouse gas emissions of goods
BIOSTRATEG3/343817/17/NCBR/2018.                          and services. British Standards Institution,
                                                          2011.
7. References                                        [11] M.A. Renouf, C. Renaud-Gentie, A. Perrin,
                                                          C. Kanyarushoki, F. Jourjon, “Effectiveness
                                                          criteria for customised agricultural life cycle
[1] O. Edenhofer, R. Pichs-Madruga, Y. Sokona,            assessment tools”, J. Clean. Prod. 179, 2018,
     E. Farahani, S. Kadner, K. Kadner, A.                246–254
     Seyboth, I. Adler, S. Baum, G. Myhre, et al.    [12] D. Perez-Neira, A. Grollmus-Venegas,
     “Climate Change 2014: Mitigation of                  “Life-cycle energy assessment and carbon
     Climate Change” Working Group III                    footprint of peri-urban horticulture. A
     Contribution to the IPCC Fifth Assessment            comparative case study of local food systems
     Report, Cambridge University Press:
                                                          in Spain”, Landscape and Urban Planning
     Cambridge, UK, 2015.
                                                          172, 2018, 60-68
[2] Food and Agriculture Organization of the         [13] A. Nabavi-Pelesaraei, S. Rafiee, S.S.
     United Nations (FAO). Regional Strategy for          Mohtasebi, H. Hosseinzadeh-Bandbafha, K.
     Sustainable Hybrid Rice Development in               Chau, “Energy consumption enhancement
     Asia, Food and Agriculture Organization of           and environmental life cycle assessment in
     the United Nations Regional Office for Asia          paddy production using optimization
     and the Pacific: Bangkok, Thailand, 2014.            techniques”, J. Clean. Prod. 162, 2017, 571-
[3] D.B. Lobell, W. Schlenker, J. Costa-Roberts,
                                                          586
     “Climate trends and global crop production      [14] P. Milczarski, A. Hłobaż, P. Maślanka, B.
     since 1980”, Science 2011, 333, 616–620.             Zieliński, Z. Stawska, P.Kosiński, "Carbon
[4] R.Y.M. Kangalawe, C.G. Mungongo, A.G.                 footprint calculation and optimization
     Mwakaje, E. Kalumanga, P.Z. Yanda,
                                                          approach for CFOOD project", CEUR
     “Climate change and variability impacts on
                                                          Workshop Proceedings 2683 (2019) 30-34
     agricultural production and livelihood          [15] P. Milczarski, B. Zieliński, Z. Stawska, A.
     systems in Western Tanzania”. Clim. Dev.
                                                          Hłobaż, P. Maślanka, P. Kosiński, "Machine
     2017, 9, 202–216.                                    Learning       Application      in      Energy
[5] ECE Strategies and policies for air pollution         Consumption Calculation and Assessment in
     abatement. United Nations, New York and              Food Processing Industry", ICAISC (2)
     Geneva, 2007.                                        (2020), Springer LNAI 12416, 369-379.
[6] European Council Conclusions 2014. 2030          [16] Z. Stawska, P. Milczarski, et al., ”The carbon
     Climate and energy policy framework.                 footprint methodology in CFOOD project.”
     Conclusions – 23/24 October 2014, EUCO
                                                          International Journal of Electronics and
     169/14,
                                                          Telecommunications, 2020, 66(4), 781–786
     http://www.consilium.europa.eu/uedocs/cms       [17] P. Harrington, “Machine Learning in
     _data/docs/pressdata/en/ec/145397.pdf                Action.” Manning Publ. 2012.
[7] ISO14040 - Environmental management-life         [18] A.P Dempster, N.M. Laird, D.B. Rubin,
     cycle assessment: principles and framework.          "Maximum Likelihood from Incomplete
     International        Organization         for        Data via the EM Algorithm". Journal of the
     Standardization, Geneva, 2006.                       Royal Statistical Society, Series B. 39 (1),
[8] ISO14064-1 - Greenhouse gases - Part 1:               1977, 1–38
     Specification with guidance at the
     organization level for quantification and
     reporting of greenhouse gas emissions and
     removals. International Organization for
     Standardization, Geneva, 2018.
[9] ISO/TS 14067 - Greenhouse gases - Carbon
     footprint of products - Requirements and
     guidelines for quantification. International
     Organization for Standardization, Geneva,
     2018.
[10] PAS 2050 (2011) “The Guide to PAS2050-
     2011, Specification for the assessment of the