=Paper= {{Paper |id=Vol-3102/paper12 |storemode=property |title=Rapid Analysis of Powders Based on Deep Learning, Near-Infrared and Derivative Spectroscopy |pdfUrl=https://ceur-ws.org/Vol-3102/paper12.pdf |volume=Vol-3102 |authors=Tegegn Dagmawi Delelegn,Italo Francesco Zoppis,Sara Manzoni,Alessio Mognato,Ivan Reguzzoni,Edoardo Lotti |dblpUrl=https://dblp.org/rec/conf/aiia/DelelegnZMMRL21 }} ==Rapid Analysis of Powders Based on Deep Learning, Near-Infrared and Derivative Spectroscopy== https://ceur-ws.org/Vol-3102/paper12.pdf
         Rapid Analysis of Powders Based on Deep
          Learning, Near-Infrared and Derivative
                      Spectroscopy ⋆

       Tegegn Dagmawi Delelegn⋆⋆1,2[0000−0002−5031−7589] , Italo Francesco
     Zoppis1[0000−0001−7312−7123] , Sara Manzoni1[0000−0002−6406−536X] , Alessio
       Mognato1[0000−0001−6462−9033] , Ivan Reguzzoni2 , and Edoardo Lotti2
     1
         Department of Computer Science, University of Milano-Bicocca, Milan, Italy
              {dagmawi.tegegn,italo.zoppis, sara.manzoni}@unimib.it
                             a.mognato@campus.unimib.it
                        2
                          SeleTech Engineering Srl, Milan, Italy
                     {i.reguzzoni,edoardo.lotti}@seletech.com
                                  www.seletech.com



          Abstract. Infrared spectroscopy has proved to be a powerful tool for
          solving organic chemistry problems and finds a widening field in many
          industries. Infrared absorption and its relation to the molecular struc-
          ture of organic material are discussed to give the essential background
          for detailed descriptions of techniques adopted in this work. Existing
          spectral analysis approaches rely on pre-processing and feature selec-
          tion methods to remove signal artifacts based on prior experiences. This
          work introduces a data-driven deep learning approach and successfully
          applies it to predict organic powders’ mixtures. In particular, in this
          work, we use a convolutional neural network to predict different compo-
          sition percentages of mixed organic powders. We show that using specific
          pre-processing steps, such as Savitsky Golay smoothing and derivatives,
          can increase the accuracy of the results.

          Keywords: Convolutional Neural Network · Near-Infrared (NIR) · Quan-
          titative Analysis · Savitsky Golay


1        Introduction

Since its discovery and application, Near-infrared (NIR) spectroscopy evolved
from an addon unit to a standalone unit in many areas. Numerous applications
of this methodology have been eminently successful and have become familiar
to many chemometricians. Daily, spectroscopy is performing analyses impossible
by any other method. More common analyses are completed in a few minutes,
which previously required hours.
⋆
     Supported by Seletech Engineering Srl
⋆⋆
     Copyright ©2021 for this paper by its authors. Use permitted under Creative Com-
     mons License Attribution 4.0International (CC BY 4.0).
    The underlying basis of applied infrared spectroscopy is that all organic sub-
stances possess selective absorption at specific frequencies in the infrared portion
of the electromagnetic spectrum. Infrared spectra of biological organic materials
are signals composed of peaks because of molecular vibrations of mostly O-H, C-
H, and N-H groups [15, 4] caused by their interaction with infrared light within
the NIR wavelength region (800-2500 nm).
    The plot of these transmission or absorption values versus frequency or wave-
length units constitutes an infrared spectrum characteristic of the sample. It is
used to describe the intrinsic factor of a sample. Applications of infrared spec-
troscopy can be divided into two general categories, qualitative and quantitative.
    A mixture of materials can be analyzed quickly and accurately so long as the
components present in the mix are known. From a study of the spectra of the
known compounds, it is usually possible to find a frequency at which only one
component possesses strong absorption and thus find its quantity in a mixture.
This rapid method, combined suitably with deep learning methods, has shown
an accuracy of 1 percent or better to a pair of organic mixtures [30].
    Near-infrared spectroscopy is often used to assess the quality of rice adopting
respectively PLS and a multi-linear regression (MLR) [34, 24]. Rapid methods,
like NIR technology combined with multivariate analysis (PCA and partial least
squares discriminant analysis (PLS-DA)), are used to detect fraud of cocoa pow-
der [32].
    Kernel-based methods, such as Support Vector Machines (SVM), improved
the multivariate analysis [29]. Machine learning improved previous models in
spectral profiles analysis [9, 8, 38], and in particular, Convolutional Neural Net-
works (CNN) for spectroscopy signal classification have reported promising re-
sults in the literature [30]. In their work[30] proposed a modified version of the
1D-CNN model proposed by [20], by tuning the hyper-parameters. They pre-
dicted the composition percentage of organic mixtures, dividing the dataset into
three main groups for testing purposes. 1D-CNN combined with near-infrared
spectra showed better results than traditional PLS models in classification and
regression tasks [37, 21, 37]. In this work, we show a quantitative application of
near-infrared spectroscopy with the use of Convolutional Neural Networks with
the aid of signal filtering and pre-processing methods.

1.1   CONTRIBUTIONS
The filtering method introduced by Savitzky Golay has long been used in the
absorption spectroscopy community for its ability to simultaneously smooth and
differentiate absorption spectra. In this investigation, the Savitzky Golay method
is shown applicable to our ranges and improve the results.
    The total number of mixtures of the datasets is fifteen , and each mixtures
comes in different quantities of each component. While the number of the base
materials is six. We extracted their near-infrared spectrum profiles. From our
experimental setup, we delivered an answer to these research questions:
 – RQ1 Can we predict the composition percentage with high accuracy of
   mixtures of three materials?
 – RQ2 Can we improve the prediction results by adding extra features like
   the derivative to the original spectra?

    This paper is structured as follows. Section 2 mainly presents the dataset
and the 1D-CNN architecture. Section 3 shows the experiments’ setup and the
results. In Section 4 the results are discussed and Section 5 presents the conclu-
sions.


2     MATERIALS AND METHODS

In this section, we present the data collection process and the methods adopted
to predict the quantity of the materials in each mixture. The data collection
comprises several steps: the sample preparation procedure of the six organic
powders and the data acquisition that describes the spectral data’s mechanism.
Furthermore, we describe the convolutional neural network architecture and its
parameters, the Savitsky Golay filter, and the use of derivatives.


2.1   Sample Preparation

The sample preparation procedure is the same as described here [30]. Including
the new mixtures of three materials, we added in this work, follow the same
steps. However, the mixtures’ homogeneity is still not guaranteed due to the
unique characteristics of the materials used, such as grain size and tendency to
form lumps.
   We made fifteen pairwise combinations using cocoa (Cocoa), ice sugar (Ice-
Sugar), baby milk powder (BabyMilk), potato starch (Potato), rice starch (Rice),
and baking soda (NaHCO3). We added other pair mixtures to the first dataset
totaling 69 samples. Their composition percentage is made up from set: P =
{15, 25, 33, 35, 40, 45, 50, 65, 75, 85}. The composition percentage of a given mix-
ture of two materials adds up to 100%, e.g., (A=15, B=85), where A and B are
the mixed materials. Moreover, we added six other mixtures using three material
in different compositions, and their percentages is made up from set P . Their
composition adds up also to 100%.
E.g., (A=33, B=33, C=33) or (A=45, B=40, C=15), where A,B and C are the
three mixed materials. In Table 1 and in Table 2 is shown the mixtures made
and from which near-infrared spectra is acquired.


2.2   Data Acquisition

The data acquisition is the same as described here [30]. We used the same sensor
to acquire the spectra of the triple material mixtures: the sensor captures two
wavelength ranges: [1350 − 1650]nm and [1750 − 2150]nm. The total number of
wavelength points captured is 702.
Table 1. The pairwise mixtures overview. Value 1 indicates presence of powder mix-
ture. The diagonal values correspond to the base materials at 100%

                       BabyMilk IceSugar NaHCO3 Cocoa Potato Rice
              BabyMilk 1
              IceSugar 1        1
              NaHCO3 1          1        1
              Cocoa    1        1        1      1
              Potato   1        1        1      1     1
              Rice     1        1        1      1     1      1

Table 2. The mixtures of three materials.Value 1 indicates that the corresponding
materials are mixed while 0 means otherwise.

                                NaHCO3 IceSugar
                       BabyMilk 1               Rice
                       Potato          1        Cacao


Dataset Following the material preparation and the acquisition of the NIR
spectra, we collected 506160 samples3 . Each material has 702 features corre-
sponding to the captured wavelengths, and for each composition percentage of
a mixture, we have ∼ 7000 samples. The target variable of each sample is a
percentage distribution over the six base materials describing the quantity of
that material in the spectral sample. Given that each spectral sample represents
only the mixture of two or three materials, only two or three elements in the
target vector contain the value of the individual materials described in the spec-
tral sample. At the same time, we set the remaining four base materials to 0.
Whereas, for the mixtures containing only one material, we set the five remain-
ing target variables to 0 and assigned the value 100% to the material represented
by the pure material spectra.
    We took the dataset of the spectra sample of the different organic materials
[30] and added other mixture’s spectra as explained in Section 2.1, and made
three major groups of spectra profiles :
 – Single Material Dataset (SMD)
 – Mixed Materials Dataset (MMD)
 – Triple Mix Materials Dataset (TMMD)
   The performance of the 1D-CNN model is evaluated using each group sepa-
rately. Then the results are compared with the use of derivatives and the Savitsky
Golay filter, so each dataset is used with the following modes:

 – data as it is (standardized)
 – applying the Savitsky Golay filter and concatenating the first derivative
 – applying the Savitsky Golay filter and concatenating the first and second
   derivative
3
    The dataset is available upon request.
    As the baseline result, we used the dataset without any pre-processing other
than standardizing. We compare the baseline against the pre-processing steps
such as the application of Savitzky Golay filter and the concatenation of deriva-
tives to the spectral data.


2.3   Method

Convolutional Neural Network Convolutional Neural Networks (CNNs) are
specific neural networks (NNs) used to process data with a known, grid-like
structure [27]. (i.e., time-series data can be thought of as a 1D grid taking samples
at regular time intervals, and image data, as 2D or 3D grid of pixels).
    In this work, we adopted the 1D-CNN used here [30], which consisted of seven
trainable layers - four convolutional layers and two fully connected layers. The
final model has a total of 713510 trainable parameters, the detailed architecture
is summed up in Table 3. The input of the 1D-CNN is a one dimensional spectral
vector containing values of the 702 wavelength points, and the target is also a one
dimensional vector containing the percentage distributions of the six materials.


              Table 3. Architecture of the 1D-CNN used in this work.

        Layer        Output Shape # Param Kernel Filter Attributes
        Conv1D       (702, 32)    128     3      32
        MaxPooling1D (351, 32)    0                     size = 2
        Conv1D       (351, 32)    3104    3      32
        MaxPooling1D (175, 32)    0                     size = 2
        Conv1D       (175, 64)    6208    3      64
        MaxPooling1D (87, 64)     0                     size = 2
        Conv1D       (87, 64)     12352   3      64
        MaxPooling1D (21, 64)     0                     size = 4
        Flatten      (1344)       0
        Dropout      (1344)       0                     rate = 0.3
        Dense        (512)        688640
        Dense        (6)          3078




Parameter Optimization The main objective of our experiments concerns the
prediction of the percentage of material contained in a composite mixture. To
this aim, we use the Kullback-Leibler (KL) divergence (Equation 1) as our loss
function. The KL is a measure of divergence between two distributions defined
as:
                                                                 
                                      X                    Q(x)
                       DKL (Q∥Z) =          Q(x) log                             (1)
                                                           Z(x)
                                      x∈X
    Where Q and Z are two probability distributions, that in our case, correspond
respectively to the true and the predicted distribution of percentages. Moreover,
we applied Adam as an optimization algorithm [13], with a scheduled learning
rate, starting from LR = 0.001, and exponentially decreasing it every epoch. We
also use early stopping criteria by limiting epochs to 100. Finally, we used the
Keras package as our machine learning framework for the entire work running
on Asus VivoBook X580GD with an Intel(R) Core (TM) i7-8750H CPU.


3     Numerical Experiments and Results
This section specifies the datasets used and results obtained when predicting the
quantity of the materials in each mixture. We used three different datasets, as
mention in Section 2.2, to evaluate our model and the pre-processing approaches:
the single material dataset (SMD), the mixed materials dataset (MMD), and
the triple mix materials dataset (TMMD). We used each dataset with three
different modalities: the standardized data, applying the Savitsky Savitsky Golay
filter, and concatenating the first derivative and then both the first and second
derivatives.
     Each dataset has been divided into train, validation, and test set. We first
split each dataset into train and test sets. After this, we keep aside the test
set and randomly choose 75% of the train set to be the actual Train set and
the remaining 25% to be the validation set. While the validation set is 25% of
the training set. The model is then iteratively trained and validated on these
different sets.
     We used the mean absolute error (MAE) to evaluate the models performance:
                                      1X
                             MAE =       |Y − Ŷ |                            (2)
                                      n
   where n is the number of samples in the test set, Y is the vector containing
the mixtures percentage and Ŷ is the vector of the predicted values.

3.1   Single Material Dataset (SMD)
The SMD is composed of six materials purely. For the baseline setup, SMD has
∼ 7000 samples for each material. Each instance contains 702 wavelength points.
We Standardized the spectral data by removing the mean and scaling to unit
variance. The standard score of a spectra xij is calculated as:

                              zij = (xij − uj )/sj                            (3)
    where uj is the mean of the spectral data features and sj is the standard
deviation of the spectral data features.
    We have applied the Savitsky Golay filter to the standardized data, and
to each sample, we concatenated its first derivative obtaining a total of 1404
wavelength points as features. When also adding the second derivative, we reach
a total of 2106 features.
   Figure 1 shows the result of the use of derivative information on the spectra
compared to the baseline model.




Fig. 1. Comparison of the results of the baseline setup with the use of Savitsky Golay
and the derivatives using the SMD.




3.2   Mixed Materials Dataset (MMD)

The mixed material dataset is composed of fifteen pairwise mixtures with differ-
ent composition proportions that add up to 100% of the whole mix as described
in Section 1.1. The MMD contains ∼ 450000 number of samples, and ∼ 7000
samples for each mixture quantity. The dataset has been divided into train-
ing validation and test set following the procedure mentioned in Section 3. We
have more than 100k samples in the test set representing each mixture and its
components quantity for this dataset. The baseline setup has 702 standardized
features. In Figure 2 it is shown the results of the prediction of each material in
composition, and it is compared against the use of pre-processing steps described
above.


3.3   Triple Mix Materials Dataset (TMMD)

The TMMD contains mixtures composed of three different materials in different
quantities. In Table 2 it is shown the mixtures combination, and their compo-
sition percentage is taken from set P described in Section 2. The TMMD is
composed of ∼ 11000 number of samples, divided into train, validation and test
set. The baseline setup has 702 standardized features. The Formula 2 is used
to evaluate the results obtained from the model and pre-processing steps. In
Figure 3, it is shown the result of the baseline setup in comparison with the
pre-processing steps adopted.
Fig. 2. Models overall performance over the mixed material on the MMD. The bar
plot compares the results of each material prediction against the use of derivatives.




Fig. 3. Models overall performance over the triple mixed materials on the TMMD. The
bar plot compares the results of each material prediction against the use of derivatives.


4    Discussions

The results achieved from the SMD are assuring. With the baseline setup, we
have been able to predict all the single materials with a very low error as seen
from the blu line in Figure 1. The Savitsky Golay filter and the addition of
derivative information to the spectra helped us gain even more accuracy when
predicting.
    Increasing the complexity of the dataset by mixing a pair of single materials
with different quantities results in the increase of the error of the predictive
model. We can see the massive difference of the MAE between the SMD and
the MMD from Figure 1 and Figure 2. Nevertheless, adding the first derivative
of the spectra to the spectra itself led to an overall 3% improvement, and while
the 2nd derivative worsened by an average of 44%.
    By increasing the number of materials in the mixtures from two to three
increases the MAE, due to the intrinsic problem of mixing the materials and
acquiring the near-infrared spectra. Nevertheless, the derivative information of
the spectra lowered the MAE of the baseline to 47% and the second derivative
to 30%. These results are promising since the model can extract the features of
the specific composition percentages of mixtures.
    We must take into account also the fact that the quantities of the materials
are prepared by weight rather than volume; this means that we can have powders
like BabyMilk that have a greater volume for a small amount. This characteristic
can affect the spectral acquisition since the material with higher volumes tends
to occupy most of the Petri dish, causing little signal for the other materials
mixed with them.
    Finally, the answers to our research questions are the following:

 – A1 - The prediction of the composition percentage of the three materials in
   a mixture is a very challenging task. Because it has inherent problems when
   making the mixture itself and acquiring the spectra. Since there is no guar-
   antee of the homogeneity of the mix due to the difference of volume/weight
   ratio of each material causing difficulties when acquiring the near-infrared
   spectra.
 – A2 - Concatenating the derivative of each spectrum to the spectra itself
   adds even more information, and we saw that in each dataset, there is a vast
   difference compared to the baseline. There is a 37% decrease in the MAE
   when adding the first derivative and a 14% decrease when adding the 1st
   and second derivative.




                       Fig. 4. Models overall performance .




5   Conclusions

In this work, we evaluated the use of derivatives in the context of spectra pre-
processing when predicting the composition percentage of organic material mix-
tures. The NIR spectra of organic materials hold intrinsic properties of the com-
position, including its quantity and derivatives, add another characteristic to the
spectra, making it easy to analyze.
    Combining the standardized NIR spectra of a material or mixtures with their
relative derivatives can uncover further information and result in a classification
or regression task, making the model robust.
    We also uncovered the results of mixtures composed of three materials, and
we saw a growing trend in the complexity of preparing such mixtures and their
analysis. The increasing percentage of error also shows this compared to the
SMD and MMD.
    Additional work is needed to tackle the high number of attributes when
concatenating the derivatives data to the original spectra. So, models that reduce
the dimensionality of the single sample without losing too much information are
good starting points to improve the overall performance of near-infrared spectra
analysis.


References

 1. Paolo Berzaghi and Roberto Riovanto. Near infrared spectroscopy in animal sci-
    ence production: principles and applications. Italian Journal of Animal Science,
    8(sup3):39–62, 2009.
 2. Paolo Berzaghi and Roberto Riovanto. Near infrared spectroscopy in animal science
    production: principles and applications. Italian J. of Animal Science, 8(sup3):39–
    62, 2009.
 3. Haiyan Cen and Yong He. Theory and application of near infrared reflectance spec-
    troscopy in determination of food quality. Trends in Food Science and Technology,
    18(2):72 – 83, 2007.
 4. Quansheng Chen, Dongliang Zhang, Wenxiu Pan, Qin Ouyang, Huanhuan Li, Khu-
    lal Urmila, and Jiewen Zhao. Recent developments of green analytical techniques
    in analysis of tea’s quality and nutrition. Trends in Food Science & Technology,
    43(1):63 – 82, 2015.
 5. Xiaoyi Chen, Qinqin Chai, Ni Lin, Xianghui Li, and Wu Wang. 1d convolutional
    neural network for the discrimination of aristolochic acids and their analogues
    based on near-infrared spectroscopy. Anal. Methods, 11:5118–5125, 2019.
 6. Yang chun Feng, Yan chun Huang, and Xiu min Ma. The application of student’s
    t-test in internal quality control of clinical laboratory. Frontiers in Laboratory
    Medicine, 1(3):125–128, 2017.
 7. Amanda Beatriz Sales de Lima, Acsa Santos Batista, Josane Cardim de Jesus,
    Jaqueline de Jesus Silva, Antônia Cardoso Mendes de Araújo, and Leandro Soares
    Santos. Fast quantitative detection of black pepper and cumin adulterations by
    near-infrared spectroscopy and multivariate modeling. Food Control, 107:106802,
    2020.
 8. Manuel Galli, Fabio Pagni, Gabriele De Sio, Andrew Smith, Clizia Chinello, Mar-
    tina Stella, Vincenzo L’Imperio, Marco Manzoni, Mattia Garancini, Diego Massi-
    mini, et al. Proteomic profiles of thyroid tumors by mass spectrometry-imaging
    on tissue microarrays. Biochimica et Biophysica Acta (BBA)-Proteins and Pro-
    teomics, 1865(7):817–827, 2017.
 9. Manuel Galli, Italo Zoppis, Gabriele De Sio, Clizia Chinello, Fabio Pagni, Fulvio
    Magni, and Giancarlo Mauri. A support vector machine classification of thyroid
    bioptic specimens using maldi-msi data. Advances in bioinformatics, 2016, 2016.
10. Kunal Ghosh, Annika Stuke, Milica Todorović, Peter Bjørn Jørgensen, Mikkel N.
    Schmidt, Aki Vehtari, and Patrick Rinke. Deep learning spectroscopy: Neural
    networks for molecular excitation spectra. Advanced Science, 6(9):1801367, 2019.
11. Marco Grossi, Giuseppe Di Lecce, Marco Arru, Tullia Gallina Toschi, and Bruno
    Riccò. An opto-electronic system for in-situ determination of peroxide value and
    total phenol content in olive oil. J. of Food Engineering, 146:1 – 7, 2015.
12. Haibo Huang, Haiyan Yu, Huirong Xu, and Yibin Ying. Near infrared spectroscopy
    for on/in-line monitoring of quality in foods and beverages: A review. J. of Food
    Engineering, 87(3):303 – 313, 2008.
13. Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimiza-
    tion. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference
    on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015,
    Conference Track Proceedings, 2015.
14. Yann LeCun, Y. Bengio, and Geoffrey Hinton. Deep learning. Nature, 521:436–44,
    05 2015.
15. Zhengxuan Li, Xiuying Tang, Zhixiong Shen, Kefei Yang, Lingjuan Zhao, and
    Yanlei Li. Comprehensive comparison of multiple quantitative near-infrared spec-
    troscopy models for aspergillus flavus contamination detection in peanut. J. of the
    Science of Food and Agriculture, 99(13):5671–5679, 2019.
16. Yachao Liu, Yongyu Li, Yankun Peng, Yanming Yang, and Qi Wang. Detection of
    fraud in high-quality rice by near-infrared spectroscopy. Journal of Food Science,
    2020.
17. Félix Lussier, Vincent Thibault, Benjamin Charron, Gregory Q. Wallace, and Jean-
    Francois Masson. Deep learning and artificial intelligence methods for raman
    and surface-enhanced raman scattering. TrAC Trends in Analytical Chemistry,
    124:115796, 2020.
18. R. Moore and J. Lopes. Paper templates. In TEMPLATE’06, 1st International
    Conference on Template Production. SCITEPRESS, 1999.
19. C. Nebauer. Evaluation of convolutional neural networks for visual recognition.
    IEEE Transactions on Neural Networks, 9(4):685–696, 1998.
20. Wartini Ng, Budiman Minasny, Maryam Montazerolghaem, Jose Padarian, Richard
    Ferguson, Scarlett Bailey, and Alex B. McBratney. Convolutional neural network
    for simultaneous prediction of several soil properties using visible/near-infrared,
    mid-infrared, and their combined spectra. Geoderma, 352:251 – 267, 2019.
21. Chao Ni, Dongyi Wang, and Yang Tao. Variable weighted convolutional neural
    network for the nitrogen content quantization of masson pine seedling leaves with
    near-infrared spectroscopy. Spectrochimica Acta Part A: Molecular and Biomolec-
    ular Spectroscopy, 209:32 – 39, 2019.
22. Brian G Osborne. Near-infrared spectroscopy in food analysis. Enc. of analytical
    chemistry: applications, theory and instrumentation, 2006.
23. Jan U. Porep, Dietmar R. Kammerer, and Reinhold Carle. On-line application
    of near infrared (nir) spectroscopy in food production. Trends in Food Science
    Technology, 46(2, Part A):211 – 230, 2015.
24. Lu Qingyun, Chen Yeming, Takashi Mikami, Motonobu Kawano, and Li Zaigui.
    Adaptability of four-samples sensory tests and prediction of visual and near-
    infrared reflectance spectroscopy for chinese indica rice. J. of Food Engineering,
    79(4):1445 – 1451, 2007.
25. T. N. Sainath, A. Mohamed, B. Kingsbury, and B. Ramabhadran. Deep con-
    volutional neural networks for lvcsr. In 2013 IEEE International Conference on
    Acoustics, Speech and Signal Processing, pages 8614–8618, 2013.
26. Solange Sanahuja, Manuel Fédou, and Heiko Briesen. Classification of puffed
    snacks freshness based on crispiness-related mechanical and acoustical properties.
    Journal of Food Engineering, 226:53–64, 2018.
27. H. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura, and
    R. M. Summers. Deep convolutional neural networks for computer-aided detection:
    Cnn architectures, dataset characteristics and transfer learning. IEEE Trans. on
    Med. Imaging, 35(5):1285–1298, 2016.
28. J. Smith. The Book. The publishing company, London, 2nd edition, 1998.
29. Xudong Sun, Ke Zhu, and Junbin Liu. Nondestructive detection of reducing sugar
    of potato flours by near infrared spectroscopy and kernel partial least square algo-
    rithm. Journal of Food Measurement and Characterization, 13(1):231–237, 2019.
30. Dagmawi Delelegn Tegegn., Italo Zoppis., Sara Manzoni., Cezar Sas., and Edoardo
    Lotti. Convolutional neural networks for quantitative prediction of different organic
    materials using near-infrared spectrum. In Proceedings of the 14th International
    Joint Conference on Biomedical Engineering Systems and Technologies - BIOSIG-
    NALS,, pages 169–176. INSTICC, SciTePress, 2021.
31. Ernest Teye, Charles L.Y. Amuah, Terry McGrath, and Christopher Elliott. In-
    novative and rapid analysis for rice authenticity using hand-held nir spectrometry
    and chemometrics. Spectrochimica Acta Part A: Molecular and Biomolecular Spec-
    troscopy, 217:147 – 154, 2019.
32. Maribel Vásconez, Édgar Pérez-Esteve, Alberto Arnau-Bonachera, José Barat, and
    Pau Talens. Rapid fraud detection of cocoa powder with carob flour using near
    infrared spectroscopy. Food Control, 92:183 – 189, 2018.
33. Hui Wang, Du Lv, Nan Dong, Sijie Wang, and Jia Liu. Application of near-infrared
    spectroscopy for screening the potato flour content in chinese steamed bread. Food
    science and biotechnology, 28(4):955–963, 2019.
34. William R Windham, Brenda G Lyon, Elaine T Champagne, Franklin E Barton,
    Bill D Webb, Anna M McClung, Karen A Moldenhauer, Steve Linscombe, and
    Kent S McKenzie. Prediction of cooked rice texture quality using near-infrared
    reflectance analysis of whole-grain milled samples. Cereal Chemistry, 74(5):626–
    632, 1997.
35. D. Wu, S. Feng, and Y. He. Short-wave near-infrared spectroscopy of milk pow-
    der for brand identification and component analysis. Journal of Dairy Science,
    91(3):939 – 949, 2008.
36. Muhammad Zareef, Quansheng Chen, Md Mehedi Hassan, Muhammad Arslan,
    Malik Muhammad Hashim, Waqas Ahmad, Felix YH Kutsanedzie, and Akwasi A
    Agyekum. An overview on the applications of typical non-linear algorithms coupled
    with nir spectroscopy in food analysis. Food Engineering Reviews, pages 1–18, 2020.
37. Lei Zhang, Xiangqian Ding, and Ruichun Hou. Classification modeling method
    for near-infrared spectroscopy of tobacco based on multimodal convolution neural
    networks. Journal of Analytical Methods in Chemistry, 2020, 2020.
38. Italo Zoppis, Erica Gianazza, Massimiliano Borsani, Clizia Chinello, Veronica
    Mainini, Carmen Galbusera, Carlo Ferrarese, Gloria Galimberti, Sandro Sorbi,
    Barbara Borroni, et al. Mutual information optimization for mass spectra data
    alignment. IEEE/ACM transactions on computational biology and bioinformatics,
    9(3):934–939, 2011.
39. Éva Szabó, Szilveszter Gergely, Tamás Spaits, Tamás Simon, and András Salgó.
    Near-infrared spectroscopy-based methods for quantitative determination of active
    pharmaceutical ingredient in transdermal gel formulations. Spectroscopy Letters,
    52(10):599–611, 2019.