Digi-Nose Part 2: Enhancing Accuracy and Efficiency
                                of a Digital Nose System With Sensor Technology for
                                Early Detection of Changes in the Forest
                                Leo Bilješko1,*,† , Georg Roman Schneider1 and Claudia Probst1
                                1
                                    FH Oberösterreich University of Applied Sciences Upper Austria, Roseggerstraße 15, 4600 Wels, Austria


                                             Abstract
                                             The detection of forest health has become significant for maintaining the forest environment,
                                             especially now in the time of increasing ecological stressors. The objective of this project is to
                                             design an electronic nose (e-nose) using metal-oxide (MOx) gas sensors to be able to distinguish
                                             between healthy and stressed trees by detecting unique volatile organic components (VOCs).
                                             The project involved the development and implementation of a gas sensor array, combining
                                             multiple MOx sensors, to detect VOCs. Taking advantage of the Arduino microcontroller, data
                                             was able to be received from gas sensors, while Python was utilized for data analysis. Data
                                             analysis involved machine learning methods, such as Linear Discriminant Analysis (LDA) and
                                             Principal Component Analysis (PCA), for classification and dimensionality reduction of the
                                             sensor data. Python also came in handy for the creation of graphical user interfaces. The
                                             capacity of the e-noses to differentiate between healthy and sick trees was demonstrated in
                                             the initial results, where it showed a reasonable level of accuracy. Initially, PCA provided
                                             good separation, however, with an increased number of target gases, the separation accuracy
                                             deteriorated. The LDA provided a clear separation between two classes, with slight overlaps.
                                             The e-nose was further assessed for different substances that may be present in stressed trees.
                                             Although it has shown the good separability of some substances, others overlapped. The
                                             great sensitivity of the MOx sensor comes with a cost of selectivity for different gases. Future
                                             research will focus on detecting these specific substances contained in the tree’s odor using a
                                             neural network, enhancing the electronic nose’s ability to detect a wider range of compounds.

                                             Keywords
                                             e-nose, VOCs, MOx gas sensor, machine learning


                                4th International Workshop on Camera Traps, AI, and Ecology, September 5 - 6, 2024, Hagenberg,
                                AUSTRIA
                                *
                                  Corresponding author.
                                †
                                  These authors contributed equally.
                                $ Leo.Biljesko@students.fh-wels.at (L. Bilješko); georg.schneider@fh-wels.at (G. R. Schneider);
                                claudia.probst@fh-wels.at (C. Probst)
                                 https://fh-ooe.at/en/ (L. Bilješko); https://fh-ooe.at/en/ (G. R. Schneider); https://fh-ooe.at/en/
                                (C. Probst)
                                            © 2024 Copyright for this paper by its authors.   Use permitted under Creative Commons License Attribution 4.0
                                            International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
1. INTRODUCTION
   In recent years, the detection of forest health has become critical for maintaining the
forest environment and agricultural richness. Trees, like other living species, produce
volatile organic compounds (VOCs) as part of their normal everyday activities. When
stress is initiated, such as from diseases or environmental changes like drought and
deluge, the concentration of VOCs that trees produce can be increased significantly.
Recent studies of electronic nose technology have unlocked new alternative approaches to
monitoring forest health through the detection of odor fingerprints.

Conventionally, Gas Chromatography-Mass Spectrometry (GC-MS) has been a standard
for the detection of VOCs. Despite the fact that it offers high accuracy and sensitivity,
GC-MS requires higher complexity and skills to operate. Due to these challenges, a
real-time observation of a tree’s health would be complex. Metal-oxide (MOx) sensors
offer a great, accessible, and cost-effective alternative due to their wide range of sensitivity
to various gases. The change of the gas or the concentration of the gas is detected by the
metal-oxide layer’s resistance, which makes it suitable for environmental monitoring.
MOx gas sensor technology has been increasing in recent years in many applications,
including air quality control, food quality monitoring, and medical applications, featuring
its effectiveness in many fields [1][4].

In this study, an electronic nose system was developed for detecting the health level of
a tree using MOx sensors. A system was developed to gather and analyze the odors
emitted by stressed and unstressed conditions. Machine learning methods such as
Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) have
been employed to categorize different smells from trees and create a fingerprint for their
detection. The process also involved the design of a graphical user interface (GUI) to
enable real-time monitoring of changes.

By merging advanced sensor technology and machine learning methods, the goal is to
provide a reliable, cost-effective tool for early detection of trees’ health. Furthermore,
this system has the potential to be applied to other agricultural practices and thereby
contribute to environmental sustainability. This approach aims to offer an alternative to
a more common GC-MS, taking advantage of cost-effective and more accessible MOx
sensors, to enable widespread, real-time monitoring of tree health.
2. THEORY
2.1. Metal-oxide (MOx) gas sensors
  MOx sensors, which are resistive sensors made from metal-oxide semiconductors, have
been accepted as a promising possibility to develop low-cost, highly efficient sensors
with fast response and recovery times. Due to their simple mechanism for measuring
resistance and their high sensitivity to different gases, these devices are often used to
detect volatile compounds (VOCs) at ppm and sub-ppm levels. These sensors have
applications in the agriculture and forestry industries to detect plant infections caused
by fungus, bacteria, and viruses, as well as damage caused by insects or mechanical means.

The working principle of the MOX device is based on the changes in the electrical
resistance of a metal oxide semiconductor when it encounters gases at elevated
temperatures (150-500 °C) [2]. The MOx sensors are susceptible to humidity and
temperature, so the addition of a heater allows it to operate at controlled conditions and
temperatures [2].

In this work, MOx sensor are a main type of sensors used to develop a possible solution
for the detection of forest health. Their implementation will be discussed in further
sections.

2.1.1. Resistance measurement
    The important parameter of MOx sensors is the resistance of the sensitive layer.
Depending on the type of gas surrounding the sensor, resistance varies. Based on this
change, the goal is to define unique resistance for a specific gas.

The measurement of sensitive layer resistance (RS ) is typically measured using a voltage
divider principle with a load resistor (RL ). This approach is favored for its simplicity
and effectiveness in converting resistance changes into measurable voltage signals. Figure
1 shows a simplified equivalent circuit of a gas sensor, consisting of heater resistance RH ,
sensitive layer resistance RS , and load resistance RL .


Figure 1: Equivalent circuit of a gas sensor.
2.2. Machine learning algorithms
     Machine learning can be described as a set of algorithms and statistical methods
that can automatically recognize patterns in data and then use the unknown patterns
to accurately predict upcoming data or perform other kinds of decision-making under
uncertainty. Usually, machine learning is categorized into two main types: predictive or
supervised learning and descriptive or unsupervised learning.

2.2.1. Supervised learning:
     In a supervised approach, the goal is to learn mapping from inputs to outputs so
that it can predict the output of new, unseen inputs. An algorithm is provided with
a labeled set of input-output pairs, known as the training set. An algorithm utilizes
a training set to discover and learn a connection between the inputs and the outputs,
giving it the ability to predict new data [3].

Supervised learning has two objectives to accomplish:

  • Classification: The output variable is a class (e.g., a healthy or unhealthy tree).
  • Regression: To predict the future value of the output variable.

2.2.2. Unsupervised learning:
     In unsupervised learning, the goal is to find hidden patterns in the data pro-
vided. It is less defined than supervised learning since it is not given what kinds of
patterns to look for and what the desired output is for each input. The advantage
that unsupervised learning provides over supervised learning is that it does not re-
quire human intervention to manually label the data, which speeds up the data processing.

Unsupervised learning goals to achieve:

  • Clustering: grouping the data that has similar features.
  • Dimensionality reduction: reducing the number of input features while keeping as
    much information as possible.

2.2.3. Principal Component Analysis and Linear Discriminant Analysis
     One example of unsupervised learning is principal component analysis (PCA). PCA
is commonly used as a method to reduce the dimensionality of large data sets. The
tradeoff for the reduction is accuracy, but the most significant data features are still
retained.

The concept behind PCA is that PCA attempts to compress as much information
as possible in the first principal component (PC1), then the maximum remaining
information in the second (PC2), and so on. The number of principal components
depends on the number of input features [3]. The construction of the first principal
component accounts for the largest possible variance in the data set. As shown below in
Figure 2(a), it is the line that goes through the origin, and it is the line in which the
projection of the points (red dots) is the most spread out.A second principal component
is calculated in the same way; however, it must be orthogonal to the first component,
and it must account for the next highest variance.

On the other hand, Linear Discriminant Analysis (LDA) is a supervised technique
for dimensionality reduction that seeks to identify the linear combination of features
that best distinguishes the classes in a data set. The intent is to project data into
lower-dimensional space while keeping the information that is most relevant for class
discrimination.

LDA works by finding a new axis that maximizes the distance between the mean values
of various classes while minimizing the spread of points within each class [4]. However,
with LDA, these linear discriminants are aiming to separate classes as far apart as
possible, which is shown in Figure 2(b).


                      (a)                                            (b)

Figure 2: (a) Construction of the principal component. (b) Class separability in LDA.
3. METHODOLOGY
3.1. Sensor selection
     A VOC detection focused on identifying VOCs like 𝛼-pinene, 3-carene and d-limonene
whose concentration in trees increases during periods of stress [5][6]. Other VOCs, such
as ethanol, 2-methyl and verbenol, were also subjects of study, as their presence could
indicate insect infestations in spruce trees, such as by bark beetles. Sensors, which are
mainly sensitive to alcohol, were selected to create a sensor array to detect these VOCs.
A list represented in Table 1 shows sensors used and the gases they are most sensitive
to. Sensors were integrated in sets of four or pairs for greater stability. All the sensors
are MOx based sensors due to their simplicity and cost effectiveness, but other types of
sensors could be applied, such as electrochemical or PID sensors.

Table 1 SENSOR TYPES
                        Sensor                Sensitive to         Quantity
                         MQ-3           CO, Alcohol, Methane       4
                       MQ-135         CO, CO2 , Alcohol, Acetone   4
                    UST GGS-1330          CO, H2 , Methane         2
                    UST GGS-2330        CO, Ethanol, Methane       2
                   UST GGS-10330           CO, H2 , Butane         2
                   Figaro TGS-2600    CO, Ethanol, H2 , Methane    2
                   Figaro TGS-822       CO, Ethanol, Acetone       2


To try to reduce the number of sensors, Random Forest was applied to create a feature
importance chart. The only sensor that stood out was GGS-2330, which has shown low
importance in substance and tree measurements, as seen in Figure 3.

This indicates that taking GGS-2330 out would not affect the performance of e-nose in
detection of relevant VOCs. Another way to reduce the number of sensors would be
to use only one of each sensor instead of sets, but this might affect the stability of a system.

In addition to the sensors above, ENS160 and BME280 sensors for monitoring temperature,
humidity, and total VOC levels were implemented.Note that a Table 1 represents the
gases the sensors are most sensitive to, but there may be other gases that can affect the
sensitivity.
                        (a)                                          (b)

Figure 3: (a) Feature importance (𝛼-pinene) (b) Feature importance (Healthy tree)


3.2. Experimental setup and measurement approach
     The experimental setup included two distinct odor measurement systems: one
for tree analysis and the other for substance detection. Figure 4(a) shows the
tree measurement setup, which includes two electronic noses: Smell Inspector by
SmartNanotubes and Digi-nose. Smell Inspector was used as a reference to guarantee
the accuracy of Digi-nose results. The design included push and pull pumps for air
circulation. Initially, the chamber was left empty for 15 minutes to create a baseline.
Following that, a tree was introduced, and its scents were investigated for 40 minutes.
Twelve trees were evaluated in total, divided into three categories: healthy, dry-stressed,
and overwatered.

The substance measuring setup, depicted in Figure 4(b), had identical compo-
nents—pumps, a chamber, sensors, and a PC—but was intended to identify specific
substances. Following a 15-minute baseline air measurement, ethanol and d-limonene
were added, and measurements were conducted for 10 minutes.
                    (a)                                          (b)

Figure 4: Measurement setup for: a) tree measurement b) substance measurement


3.3. Integrating Circuit Design, Data Acquisition, and Analysis
      The sensor array was composed of eighteen gas sensors, with the quantities of each
type listed in Table 1. Each sensor voltage is read in by Arduino Due and converted to
resistance value by voltage divider formula. Data of the same sensor type is averaged and
stored. For further processing, data is transmitted as a string. A data set is sent every
three seconds from Arduino to PC with a dimensions of 1x12 in form seen in Figure 5.
Data is collected and stored in a CSV file.


Figure 5: Structure of string data sent from Arduino to PC


For analysis, the collection of data included measurements of 𝛼-pinene, 3-carene,
d-limonene, ethanol, 2-methyl, verbenol, air, and additionally, ethanol-verbenol solution
and mixture of all substances together. Each of these was measured 10-15 times,
equaling to around 120 measurements for data analysis. Measurement of trees led to 30
measurements, where 15 measurements were healthy trees and 15 measurements were
unhealthy trees.

Since temperature, humidity, TVOC, and CO2, were not used in LDA, it was dropped,
leaving only data from MOx sensors for LDA analysis. Prior to LDA, data is split into
training and testing subsets, where the training set is normalized by MinMaxScaler.
This way, any data leakage was prevented. In this analysis, the training set consisted
of 80% of data, and the rest is used in the testing set. Data is split randomly each run.
For validation of the LDA result, accuracy is computed. Due to the balanced data set,
accuracy provides a reliable measure of model performance.
4. RESULTS AND DISCUSSION
     A PCA algorithm provided a valuable insight into dimensionality reduction of data.
However, a separation of data has yielded results with high overlapping. Therefore, an
algorithm called LDA has been applied due to its ability to not only reduce dimensionality
but also to separate data into classes. Because of this, all further results will focus on
LDA.

4.1. Digi-nose:
      Looking at Figure 6(a), one can notice quite the overlapping of sick and healthy
trees. The cause might be that some trees are not as sick as others or that healthy trees
are not as healthy. A training set contained 80% of the data and yielded an accuracy
score of 75%. The testing data set contained the rest of the data and has shown better
separation, with an accuracy score of 71%. A testing set is demonstrated in Figure 6(b).


                      (a)                                        (b)

Figure 6: LDA analysis of a: a) training set b) testing set.


Worth noting is that the data set is small. Further testing is aimed at increasing the
data set for better and more efficient predictions.

In the analysis of substance data, some of the substances have been able to be classified,
while the classification of others was challenging. Substances like d-limonene, ethanol,
verbanol, and air clearly separated, while 2-methyl, 3-carene, 𝛼-pinene and mixture
created overlapping. This problem occurs due to the nature of MOx sensors, where their
resistance can give the same results for different odors with different concentrations.

LDA analysis of substances is presented in Figure 7 and Figure 8. Similar to tree analysis,
80% of the data was used for training and 20% for testing set. Training set achieved 70%
accuracy score, while the testing set accuracy decreased to 65%.
Figure 7: LDA analysis of a training set.


Figure 8: LDA analysis of a testing set.


4.2. Smell inspector:
     Data from trees and substances have been gathered in the same manner as with
Digi-nose, and LDA analysis has been applied. Analysis of tree data resulted in 88%
accuracy in the training set, while testing accuracy was very low, around 33%. This
resulted in overfitting, which may be caused by a small set of data. LDA analysis of
smell inspector data can be seen in Figure 9(a) and Figure 9(b).
                      (a)                                      (b)

Figure 9: LDA analysis of a: a) training set b) testing set.
Substance analysis gave better results than analysis of tree data. High accuracy of 96%
for the training set has shown great separability of substances, with slight overlap of
substances such as mixture, 𝛼-pinene, 2-methyl. Testing accuracy dropped slightly but
still remained at high 88%. LDA results are shown in Figure 10 and Figure 11.


Figure 10: LDA analysis of a training set.


Figure 11: LDA analysis of a testing set.
5. CONCLUSION
     In this project, an electronic nose using metal-oxide gas sensors to differentiate
between healthy and sick trees by distinguishing unique volatile organic compounds
(VOCs) was successfully implemented. The e-nose system, together with an Arduino
Due microcontroller, further improved by Python data analysis, proved an effective way
for detecting and classifying VOCs related to forest health.

Our methodology included developing a gas sensor array that would be capable
of detecting a broad range of different odors and providing information on how
healthy trees are. The procedure involved the application of PCA and LDA for
data analysis and the design of a graphical user interface for real-time monitoring.
Initial outcomes demonstrated the ability to show a difference between healthy and
ill trees with a reasonable level of accuracy. While the PCA provided initial sep-
aration of data, the LDA provided clearer class separation, with overlapping in some cases.

Despite the success, an obstacle such as the MOx sensors’ limited selectivity for different
gases has shown a level of difficulty. A future study will focus on solving this problem
by implementing an artificial neural network to enhance e-noses’ selectivity for other gases.

In conclusion, e-nose represents an optimistic and cost-effective solution to more prevalent
approaches such as GC-MS for real-time monitoring of VOCs. This technique has the
potential to be applied not only for forest health monitoring, but also for agricultural ap-
plications, contributing to environmental sustainability while making monitoring options
more accessible. Further improvements in sensor selectivity and machine learning al-
gorithms will be necessary to fully exploit the potential of this newly developed technology.


6. OUTLOOK
     Future work shall focus on the development of a temperature compensation
algorithm that would utilize temperature data to adjust MOx sensors output data. This
would improve e-noses ability to operate in different environmental conditions.

While LDA has shown promising results, application of other algorithms, such as
Random Forest, would improve the problem of overfitting since it is less prone to it.
Another approach would be to increase a data set by more measurements or by data
augmentation (such as applying noise to the existing data).

Since the data is balanced, accuracy has provided a reliable measure of performance.
Nevertheless, a system would benefit from additional metrics, such as the F1-score or
confusion matrix, which would offer a more thorough evaluation.
References
[1] J. W. Gardner, K. C. Persaud, Electronic Noses and Olfaction 2000: Proceedings of
    the 7th International Symposium on Olfaction and Electronic Noses, Brighton, UK,
    July 2000, CRC Press, 2001.
[2] R. Jaaniso, O. K. Tan, Semiconductor gas sensors, Elsevier, 2013.
[3] C. M. Bishop, N. M. Nasrabadi, Pattern recognition and machine learning, volume 4,
    Springer, 2006.
[4] Y. Wang, W. Zhang, R. Gao, Z. Jin, X. Wang, Recent advances in the application of
    deep learning methods to forestry, Wood science and technology 55 (2021) 1171–1202.
[5] M. Antonelli, D. Donelli, G. Barbieri, M. Valussi, V. Maggini, F. Firenzuoli, Forest
    volatile organic compounds and their effects on human health: A state-of-the-art
    review, International Journal of Environmental Research and Public Health 17 (2020)
    6506.
[6] J. K. Holopainen, J. Gershenzon, Multiple stress factors and the emission of plant
    vocs, Trends in plant science 15 (2010) 176–184.