<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>P. Lindia);</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>of Fault Detection Models in Smart Agriculture Using LLM Agents for Rule-Based Anomaly Generation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Paolo Lindia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Cantini</string-name>
          <email>rcantini@dimes.unical.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Bettucci</string-name>
          <email>francesco.bettucci@phd.unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luigi Sartori</string-name>
          <email>luigi.sartori@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Trunfio</string-name>
          <email>trunfio@dimes.unical.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Università della Calabria</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rende</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Smart Agriculture, Large Language Models, Agentic Workflows, Predictive maintenance, Green AI, Environmental</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Authors contribution:</institution>
          <addr-line>P.L., R.C., P.T.: Conceptualization, Investigation, Methodology, Software , Validation; F.B., L.S.: Data</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Relatech SpA</institution>
          ,
          <addr-line>Rende (CS)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Sustainability, Internet of Things</institution>
          ,
          <addr-line>Anomaly Detection, Anomaly Generation</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>curation</institution>
          ,
          <addr-line>Validation</addr-line>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>In the context of Agriculture 4.0, advanced technologies such as the Internet of Things (IoT), artificial intelligence (AI), and big data analytics play a critical role in enhancing the eficiency and sustainability of farming operations. These innovations enable real-time monitoring and decision-making, improving the eficiency, sustainability, and productivity of agricultural systems. Central to Agriculture 4.0 is the deployment of sensors embedded in agricultural machinery, such as tractors, which continuously collect data on key operational metrics, including engine performance, fuel consumption, soil conditions, and equipment health. The efective analysis of such data is essential for predictive maintenance, as early detection of potential anomalies can prevent costly breakdowns and reduce downtime. However, finding real-world datasets containing examples of anomalies in agricultural machinery is highly challenging, making it dificult to develop and assess the efectiveness of anomaly detection models. Additionally, classical methods for anomaly generation, such as stochastic and adversarial approaches, may be dificult to apply given the intricate patterns and time dependency of these data. To address this gap, our work leverages Large Language Models (LLMs) and agentic workflows to generate realistic anomaly scenarios from agricultural data. Using a rule-based approach that combines prompt engineering techniques with a multiagent system, we create synthetic anomalies that can later be used to evaluate anomaly detection models. These models would then enable the timely identification of potential machinery failures, reducing maintenance costs, minimizing downtime, and significantly lowering the environmental impact by preventing ineficiencies such as increased fuel consumption from faulty equipment, reducing the need for replacement parts, and conserving energy and resources used in repairs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>(L. Sartori); 0000-0002-5076-6544 (P. Trunfio)
IoT sensor networks are increasingly leveraged in Industry 4.0 and Smart Agriculture to enhance
productivity and sustainability through advanced sensing, data fusion, and machine learning. In this
context, anomaly detection techniques can be efectively applied for real-time monitoring of machinery
and systems, preventing failures and optimizing operational eficiency [</p>
      <p>CEUR</p>
      <p>
        ceur-ws.org
and noisy datasets [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ]. Despite the demonstrated efectiveness of such methods, identifying
representative anomalous data for testing purposes remains a significant challenge, particularly in IoT
settings where data is spatiotemporal and real-world anomalies are often rare or dificult to observe.
Anomaly generation becomes therefore crucial in overcoming this challenge by enabling the creation of
synthetic anomalies that closely resemble real-world data distributions. Classical methods for anomaly
generation, such as rule-based or stochastic approaches, often fail to capture the complex dependencies
between spatial and temporal features, resulting in unrealistic anomalies. In addition, while more
sophisticated techniques like adversarial methods and latent models can generate realistic data, they
are computationally expensive and require extensive tuning, which may hinder their application in this
domain.
      </p>
      <p>To address these limitations, we propose a novel rule-based anomaly generation approach that
leverages the context-aware capabilities of Large Language Models (LLMs). Our methodology extends
beyond a single LLM by employing LLM agents in a collaborative workflow, where each agent contributes
specialized knowledge to produce the final synthetic anomalies. By incorporating LLM agents into
the rule generation process, we enable a more informed, context-driven creation of anomalies that
better reflect the spatiotemporal complexities of IoT sensor data. This hybrid approach combines the
interpretability and simplicity of rule-based methods with the nuanced understanding and adaptability
of LLM agents, resulting in a more eficient and realistic anomaly generation process suitable for testing
detection algorithms in dynamic, real-world environments.</p>
      <p>The main contributions of the paper can be summarized as follows:
• We advance the application of LLM agents in Smart Agriculture, showing how such systems can
cooperate within an agentic workflow to generate realistic synthetic anomalies.
• The proposed method integrates a rule-based approach with the capabilities of LLMs, addressing
the limitations of traditional methods in handling high-dimensional spatiotemporal IoT data.
• Our approach enhances the testing of anomaly detection systems, leading to more reliable
realtime monitoring and improved operational eficiency.</p>
      <p>The remainder of the paper is organized as follows. In Section 2, we discuss related work in the field
of anomaly generation, highlighting the main applications of LLMs to Smart Agriculture. Section 3
provides an in-depth description of the proposed approach showing its application to a real-world case
study. Finally, Section 4 concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Large Language Models (LLMs) have recently gained significant traction due to their remarkable
natural language understanding and generation capabilities [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ]. These systems are increasingly
being integrated into Smart Agriculture, providing powerful tools for data-driven decision-making and
precision farming. Conversational assistants powered by LLM agents provide farmers and agricultural
professionals with insights drawn from vast datasets to support resource management, enhance crop
health, and optimize environmental conditions, thereby improving productivity and sustainability [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ].
      </p>
      <p>In this work, we explore how LLM-based agents can be synergistically leveraged in the field of
smart agriculture to generate synthetic real-world anomalies. This task is critical for improving
and evaluating the performance of anomaly detection systems. Several methodologies have been
developed to generate synthetic anomalies that closely resemble real-world scenarios, enabling a
robust assessment of detection algorithms. Major approaches in the literature leverage conditional
generation approaches and Generative Adversarial Networks (GANs), in which two neural networks—a
generator and a discriminator—compete with each other during the training process. Specifically, the
discriminator tries to create realistic synthetic data, i.e., anomalous instances, while the discriminator
tries to diferentiate between normal and anomalous data. This process leads to the generation of highly
realistic anomalies that closely resemble actual outliers, making GANs particularly useful in testing the
robustness of anomaly detection systems. As an example, Uzolas et al. leverage conditional GANs for
the generation of realistic single-chromosome images following user-defined banding patterns [ 10],
while Salem et al. [11] uses a Cycle-GAN to generate synthetic anomalous data from normal data for
improving anomaly detection in imbalanced datasets. Zhang et al. [12] introduce DefectGAN, which
generates anomaly samples by superimposing learned defect foregrounds onto a normal background,
while Niu et al. propose SDGAN [13], which modifies defect-free images to introduce surface defects
using a generator trained with cycle consistency loss on both normal and anomalous images. Duan et
al. [14] introduce a few-shot defect image generation technique, producing structural anomalies from a
limited set of defect samples. It enhances a pre-trained StyleGAN2 backbone by adding defect-aware
residual blocks to manipulate features within learned defect masks.</p>
      <p>Besides GANs, Difusion Models (DMs) have been also leveraged for generating synthetic anomalies
by perturbing normal patterns. DMs are a family of probabilistic generative models that progressively
add noise to data and then learn to reverse this process to generate new samples. In the field of anomaly
generation, Dai et al. present GRAD [15], an unsupervised anomaly detection framework using a
difusion model called PatchDif to generate contrastive patterns by disrupting global structures while
preserving local ones. GRAD also includes a self-supervised reweighting mechanism and a lightweight
detector to eficiently identify anomalies. Hu et al. [ 16] propose a difusion-based few-shot anomaly
generation model, leveraging the strong prior knowledge of a latent difusion model trained on large
datasets to improve the realism of generated anomalies. Zhang et al. introduce RealNet [17], another
difusion-based approach that relies on Strength-controllable Difusion Anomaly Synthesis (SDAS)
to generate synthetic anomalies of varying strengths, mimicking real-world anomalies. RealNet also
incorporates feature selection and residual detection methods to improve anomaly detection while
managing computational cost, showing significant improvements on several benchmark datasets.</p>
      <p>While these anomaly synthesis methods are efective, they depend on real defect images and cannot
generate unseen types of anomalies. Furthermore, these methods are usually computationally intensive
and often require extensive tuning to produce meaningful results.
3. Proposed Approach: Leveraging LLM Agents for Anomaly</p>
      <p>Generation in Agricultural Machinery
In this section, we provide a detailed description of the proposed approach aimed at generating
realworld anomalies in multivariate sensor data from agricultural machinery, specifically tractors. We
leverage an agentic workflow in which diferent LLM agents interact with each other to produce
high-quality anomalous test data. The proposed methodology is articulated in two main phases:
1. Best LLM selection via zero-shot operational range generation — First, the best LLM must be selected
from all those available, including GPT-4o and LLama3.1. For this purpose, CAN bus sensor data
from tractors are analyzed to extract the operational ranges of the diferent variables considered.
By comparing these real ranges with those generated by various Large Language Models (LLMs)
through zero-shot prompting, we identify the LLM that exhibits the highest level of expertise in
the domain of agriculture and tractor operations.
2. Anomaly generation through an agentic workflow — The methodology employs an agentic workflow
to generate anomalies, which involves collaboration between two LLM-based agents: () the first
agent generates anomaly rules based on insights from the selected LLM; () the second agent
transforms the generated rules into executable Python code. This code applies the anomalies to
the original non-anomalous data, efectively simulating real-world deviations and faults.</p>
      <p>Finally, as the test anomalies are generated, they are used to assess the performance of deep
learningbased anomaly detection models. Specifically, an LSTM-based autoencoder is trained on a dataset
representing a work session of the tractor and then tested against the synthetic anomalies generated
as described above. This approach mimics real-world processes of anomaly detection in agricultural
machinery, allowing for an assessment of the efectiveness of the generated anomalies.
3.1. Best LLM selection via zero-shot operational range generation
Figure 1 depicts the flowchart used in the first phase of the methodology, dedicated to selecting the LLM
that exhibits the highest expertise in the agricultural domain, specifically regarding tractors and their
sensor data. The selection process involves several LLMs, specifically GPT-4o, Llama 3.1 70B, Gemini Pro,
and Mistral Large 2. Their efectiveness is measured by their ability to generate operational ranges for
key tractor variables, which are then compared to the actual ranges extracted from tractor sensor data.</p>
      <p>Real operational range
extraction
Sensor data</p>
      <p>Operational range generation via Zero-shot Prompting
Range evaluation</p>
      <p>Jaccard Score</p>
      <p>LLM selection</p>
      <p>In the following yellow box, we report the prompt used for querying the diferent LLMs to generate
operational ranges of variables. Each model is provided with a prompt containing the variable name, its
unit of measurement, and a description. Generation is performed through zero-shot prompting, which
means that the prompt used to interact with the model does not include any example or demonstration.</p>
      <p>As a seasoned expert in New Holland T7 165 S tractors, we seek your expertise in diagnosing various
operational variables retrieved from the CAN bus of the tractor. You are provided with a list of variables,
each with its name, unit of measurement, and description. These variables are listed according to the
following format: - &lt;var_name&gt; (&lt;unit&gt;): &lt;description&gt;. Your task is to generate the operational
range of each variable, which jointly takes into account the diferent activities performed by the tractor,
i.e. idling, moving, plowing, and turning.</p>
      <p>Format your output as follows:
- &lt;var_name&gt;: &lt;operational_range&gt; (&lt;unit&gt;)
- …
- CAN1.LFE1.EngineFuelRate (l/h): Amount of fuel consumed by the engine per unit of time.
- CAN1.EFLP1.EngineOilPressure1 (kPa): Gage pressure of oil in the engine lubrication system as
provided by the oil pump.</p>
      <p>- …</p>
      <p>Table 1 presents the operational ranges generated by each LLM for the various key variables associated
with tractor sensor data. Each row of the table details the ranges produced by the evaluated models for
a given variable, while the final column provides the actual ranges extracted from the sensor data. This
comparative analysis highlights the discrepancies and alignments between the internal knowledge of
LLMs and real-world data, which are crucial for determining the most efective LLM for the subsequent
phases of the methodology.</p>
      <p>To quantitatively assess the accuracy of LLM-generated ranges, we compared them with ground
truth values derived from tractor sensor data by introducing a continuous version of the Jaccard index
that quantifies the similarity between two ranges. Given two intervals [ 1,  1] and [ 2,  2], where  1 and
 1 represent the lower and upper bounds of the first interval, and  2 and  2 represent the bounds of the
F1
F2
F3
F4
F5
F6
F7
F8
F9
F10
F11
F12
F13
F14
F15
F16
F17
F18
F19
F20
F21
F22
F23
F24
F25
F26</p>
      <p>) − min( 1,  2) be the union of the two intervals (i.e., total covered range length).
max(0, min( 1,  2</p>
      <p>) − max( 1,  2)) be the intersection of the intervals, which is calculated based
on the overlap between the intervals.  = 0 if the intervals do not overlap. Otherwise,  represents
the length of the overlapping interval.</p>
      <p>
        Then, the Jaccard similarity for intervals can be expressed as  ([ 1,  1], [ 2,  2]) =
, with  ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]
where 0 means no overlap, and 1 means the intervals are identical.


0.62
0.73
Jaccard score:
0.56
0.19
0.27
0.31
      </p>
      <p>Jaccard score:
0.68</p>
      <p>0.12
0.81
Jaccard score:
0.42
0.46</p>
      <p>0.04
0.58
Jaccard score:</p>
      <p>0.51
0.50
0.38</p>
      <p>ℳ represents the percentage of features where ℳ achieved a higher Jaccard score compared to ℳ .</p>
      <p>For each variable, the average Jaccard similarity score was calculated across all comparisons between
the real and generated ranges. The LLM with the highest average Jaccard score was selected as the most
appropriate model for generating anomaly rules in the subsequent steps of the proposed methodology.
Figure 2 illustrates the win rates of the evaluated LLMs alongside the average Jaccard interval scores
achieved by each model. The plot shows that GPT-4o consistently outperforms all other models and
demonstrates good accuracy in generating intervals that closely resemble the actual operational ranges
extracted from tractor sensor data, confirming its suitability as the chosen model.
3.2. Anomaly generation through an agentic workflow
Once the most appropriate LLM is selected, the anomaly generation process is performed through an
agentic workflow, as illustrated in Figure 3.</p>
      <p>LLM selection</p>
      <p>Agent 1 - Expert Farmer</p>
      <p>Prompt chaining
Generation of
realworld anomaly</p>
      <p>instances
Generation of a set
of rules for each
anomaly instance</p>
      <p>Python script
generation
Agent 2 - Expert Developer</p>
      <p>Zero-shot prompting
• Expert farmer : Its role is to generate realistic cases of anomalies in the form of rules that can be
applied to test data, resulting in anomalous test instances.
• Expert developer : Its role is to convert the set of rules generated by the expert farmer into a
runnable Python script, which can be executed, via tool use, on the test dataset to produce a
structured set of anomalous test instances for benchmarking anomaly detection methods.</p>
      <p>In the following sections, the prompts used to query the LLM-based agents are shown, along with
the generated output.
3.2.1. Expert farmer Agent — Anomaly generation via prompt chaining
In this step, the prompt chaining technique is employed to generate meaningful anomaly instances, as
indicated by the green-colored boxes. Using prompt chaining, a sequence of prompts generates complex
outputs by linking multiple tasks together. Initially, the first agent (i.e., the expert farmer ) generates
a set of significant anomaly cases across various activities, such as plowing, moving, turning, and idle
operations. These anomalies are then used to create rules that modify the operational ranges of the
variables, thereby generating anomalies. For each anomaly, a corresponding rule is created that specifies
its duration and how the operational ranges are altered to simulate the anomaly within the data. These
rules are then passed to the second agent (i.e., the expert developer ) for further processing.</p>
      <p>As a seasoned expert in New Holland T7 165 S tractors, we seek your expertise in diagnosing various
operational variables retrieved from the CAN bus of the tractor. You are provided with a list of variables,
each with its name, operational range, unit of measurement and description. These variables are listed
according to the following format: - &lt;var_name&gt; (&lt;operational_range&gt; &lt;unit&gt;): &lt;description&gt;.
Your task is to generate instances of significant anomalies based on the activity performed by the tractor,
i.e., “plowing,” “moving,” “turning,” “starting,” and “idling”. Each anomaly instance must include:
- a description of the anomaly instance
- the list of variables involved in the anomaly instance
- the activity performed by the tractor when the anomaly shows up.</p>
      <p>Format your output as follows:
- &lt;instance_name&gt;: &lt;description&gt;
- variables involved:
- &lt;var_name&gt;
- …
- &lt;activity_performed&gt;
- …
Input variables:
- CAN1.LFE1.EngineFuelRate (0 - 29.35 l/h): Amount of fuel consumed by the engine per unit of
time.
- CAN1.EFLP1.EngineOilPressure1 (96 - 536 kPa): Gage pressure of oil in the engine lubrication
system as provided by the oil pump.</p>
      <p>- …
Based on: () the generated anomaly instances, () the descriptions, () the activities performed, and ( )
the operational range of the involved variables, generate a set of rules for each anomaly instance describing
how each variable involved varies numerically. Also, specify the overall duration of the anomaly for each
instance. Consider that the session in which the anomalies will be applied lasts approximately 2 hours,
with observations recorded at a frequency of 1 Hz.</p>
      <p>Format your output as follows:
- &lt;instance_name&gt; (&lt;activity_performed&gt;):
- …
• The anomaly name, which concisely describes the issue.
• The performed activity during which the anomaly occurs.
• An issue description that provides useful details on how the anomaly afects the normal operation
of the tractor.
• The duration of the anomaly.
• The variables afected.</p>
      <p>• The associated rules specifying how each variable deviates from its expected range over time.
ID Anom. Name Activity Issue description</p>
      <p>Dur. Involved features</p>
      <p>Fuel The tractor shows unusually high fuel
1 Consumption Plowing consumption during operation, despite</p>
      <p>Spike consistent speed and load. Instantaneous fuel
economy drops sharply, and the fuel rate is
well above normal.</p>
      <p>10 min</p>
      <p>CAN1.LFE1.EngineInstantaneousFuelEconomy
CAN1.LFE1.EngineFuelRate</p>
      <p>CAN1.EEC2.EnginePercentLoadAtCurrentSpeed Ifnrocmreaasensortmoaabloravneg8e0%of 30-50%.
2
3
4</p>
      <p>Overheating
Engine
Torque
Instability
Battery
Voltage</p>
      <p>Drop</p>
      <p>Anomaly instances generated by GPT-4o. Each instance includes a description and a set of associated features.
3.2.2. Expert developer Agent — Python script generation and application of rules to test data
The second agent, acting as a Python programming expert, is prompted to transform the anomaly rules,
generated by the expert farmer LLM agent, into an executable Python script.</p>
      <p>As an expert Python developer, we seek your assistance in code scripting. You are provided with a set of
rules for diferent anomaly instances that describe how each variable involved varies numerically, along
with the overall duration of the anomaly. Anomaly instances are listed according to the following format:
- &lt;instance_name&gt; (&lt;activity_performed&gt;):
- &lt;var_name&gt;: &lt;rule_description&gt;
Based on this information, generate a Python function that applies a given anomaly instance to a time
series of sensor data. The code must adhere to the following requirements:
- all anomaly instances are handled;
- random values are used instead of fixed anomalous values;
- the input dataframe is read from a csv given as input; the start time and the anomaly to be applied
are given as input;
- output the required function without any example usage.</p>
      <p>Input anomaly instances:
- Fuel Consumption Spike (Plowing):
- 10 minutes</p>
      <p>In this case, as shown in the blue-colored box, zero-shot prompting is employed, wherein the agent
generates a Python script based on the provided anomaly rules without any prior examples or specific
training data. The script is designed to take the clean, non-anomalous test dataset as input and apply
the anomalies according to the rules generated by the first agent. The generated script is executed to
create four distinct datasets by applying the anomalies to the test dataset for each possible activity.</p>
      <p>Through this agentic workflow, the entire process of anomaly rule generation and application can
be automated, providing a robust method for simulating consistent anomalous behaviors. This, in
turn, supports the evaluation of anomaly detection models, by providing realistic and domain-specific
anomalies that accurately reflects potential issues that could arise in real-world operations.
3.3. Auto-encoder evaluation on synthetic test anomalies
Here we show how the previously generated anomalous test datasets can be efectively leveraged to
assess the efectiveness of a deep learning-based anomaly detection model. In particular, for each
possible activity, including plowing, moving, turning, or idle, an LSTM autoencoder is trained on a
normal working session, encompassing non-anomalous data from CAN bus sensors (see Figure 4).</p>
      <p>The LSTM autoencoder works by reconstructing the input time series. A large reconstruction
error suggests that the input data may deviate from normal patterns, indicating an anomaly. The
detection performance of each autoencoder is measured using the Area Under the Receiver Operating
Characteristic Curve (AUC) score. It ranges from 0 to 1, where a score of 1 indicates perfect separation
between anomalies and normal data, while 0.5 suggests that the model is equivalent to random guessing.</p>
      <p>Figure 5 presents the ROC curves for the four anomalous instances considered during the anomaly
generation process. Each curve illustrates the model’s ability to distinguish between anomalous and
non-anomalous data across a diverse set of potential scenarios. Specifically, two cases (figure 5b and
5d) achieve perfect classification (AUC = 1.00), while the other two cases (figure 5a and 5c) show
strong (AUC = 0.90) and moderate (AUC = 0.76) performance, respectively. These results suggest that
the model is highly efective in detecting anomalies, with some variability depending on the specific
type of anomaly and the amount of training data from sensors. Furthermore, the ability to generate
activity-specific test data facilitates a more granular analysis of model performance, providing insights
into how diferent types of anomalies might be detected in real-world deployments.
1.0
tae0.8
R
ive0.6
it
s
Po0.4
e
rTu0.2 AUC Score = 0.90
0.00.0 0.2 0.4 0.6 0.8 1.0</p>
      <p>False Positive Rate
1.0
tae0.8
R
ive0.6
it
s
Po0.4
e
rTu0.2 AUC Score = 1.00
0.00.0 0.2 0.4 0.6 0.8 1.0</p>
      <p>False Positive Rate
1.0
tae0.8
R
ive0.6
it
s
Po0.4
e
rTu0.2 AUC Score = 0.76
0.00.0 0.2 0.4 0.6 0.8 1.0</p>
      <p>False Positive Rate
1.0
tae0.8
R
ive0.6
it
s
Po0.4
e
rTu0.2 AUC Score = 1.00
0.00.0 0.2 0.4 0.6 0.8 1.0</p>
      <p>False Positive Rate
(a) Fuel Consumption Spike (b) Overheating Engine
(c) Torque Instability
(d) Battery Voltage Drop</p>
    </sec>
    <sec id="sec-3">
      <title>4. Conclusion</title>
      <p>In this work, we advance the application of LLM agents in Smart Agriculture by proposing a
rulebased approach for the automatic generation of synthetic anomalies in agricultural machinery. By
generating realistic, domain-specific anomalies, the system creates a rich dataset that accurately reflects
potential issues that could arise in real-world operations. This enables efective evaluation of anomaly
detection models and allows researchers and developers to test their algorithms against a variety of
plausible scenarios. The generated datasets support thorough benchmarking, helping to identify the
strengths and weaknesses of diferent anomaly detection methods. Moreover, the ability to generate
diverse datasets tailored to specific activities—such as plowing, moving, turning, and idling—facilitates
more granular analysis of model performance. This can lead to insights into how diferent types of
anomalies might afect operational eficiency, safety, and tractor maintenance. Ultimately, the proposed
methodology fosters an iterative feedback loop, where the performance of anomaly detection models can
be continuously improved based on simulated data. This enhances their robustness and reliability in
realworld applications, ensuring eficient utilization of agricultural resources and paving the way for more
sustainable agricultural practices. Future work will focus on integrating domain-specific knowledge
through agentic RAG (Retrieval-Augmented Generation), further improving context awareness of the
system and enabling LLMs to better comprehend complex scenarios.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This work has been funded by the project “AGRITECH: National Research Centre for Agricultural
Technologies” - CUP CN00000022, of the National Recovery and Resilience Plan (PNRR) financed by
the European Union “Next Generation EU”, and by the “FAIR – Future Artificial Intelligence Research”
project - CUP H23C22000860006.
comprehensive review from an extension standpoint on artificial intelligence and machine learning,
Indian Research Journal of Extension Education 24 (2024) 108–123.
[10] L. Uzolas, J. Rico, P. Coupé, J. C. SanMiguel, G. Cserey, Deep anomaly generation: An image
translation approach of synthesizing abnormal banded chromosome images, IEEE Access 10 (2022)
59090–59098.
[11] M. Salem, S. Taheri, J. S. Yuan, Anomaly generation using generative adversarial networks in
host-based intrusion detection, in: 2018 9th IEEE Annual Ubiquitous Computing, Electronics &amp;
Mobile Communication Conference (UEMCON), IEEE, 2018, pp. 683–687.
[12] G. Zhang, K. Cui, T.-Y. Hung, S. Lu, Defect-gan: High-fidelity defect synthesis for automated defect
inspection, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer
Vision, 2021, pp. 2524–2534.
[13] S. Niu, B. Li, X. Wang, H. Lin, Defect image sample generation with gan for improving defect
recognition, IEEE Transactions on Automation Science and Engineering 17 (2020) 1611–1622.
[14] Y. Duan, Y. Hong, L. Niu, L. Zhang, Few-shot defect image generation via defect-aware feature
manipulation, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 2023,
pp. 571–578.
[15] S. Dai, Y. Wu, X. Li, X. Xue, Generating and reweighting dense contrastive patterns for unsupervised
anomaly detection, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 38,
2024, pp. 1454–1462.
[16] T. Hu, J. Zhang, R. Yi, Y. Du, X. Chen, L. Liu, Y. Wang, C. Wang, Anomalydifusion: Few-shot
anomaly image generation with difusion model, in: Proceedings of the AAAI Conference on
Artificial Intelligence, volume 38, 2024, pp. 8526–8534.
[17] X. Zhang, M. Xu, X. Zhou, Realnet: A feature selection network with realistic synthetic anomaly
for anomaly detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2024, pp. 16699–16708.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Cook</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mısırlı</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <article-title>Anomaly detection for iot time-series data: A survey</article-title>
          ,
          <source>IEEE Internet of Things Journal</source>
          <volume>7</volume>
          (
          <year>2019</year>
          )
          <fpage>6481</fpage>
          -
          <lpage>6494</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kim</surname>
          </string-name>
          , E. Kim,
          <article-title>Squeezed convolutional variational autoencoder for unsupervised anomaly detection in edge device industrial internet of things, in: 2018 international conference on information and computer technologies (icict)</article-title>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>67</fpage>
          -
          <lpage>71</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Multidimensional time series anomaly detection: A gru-based gaussian mixture variational autoencoder approach</article-title>
          ,
          <source>in: Asian Conference on Machine Learning, PMLR</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>112</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kieu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Jensen</surname>
          </string-name>
          ,
          <article-title>Outlier detection for multidimensional time series using deep neural networks, in: 2018 19th IEEE international conference on mobile data management (MDM)</article-title>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>125</fpage>
          -
          <lpage>134</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>T. B. Brown</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          , arXiv preprint arXiv:
          <year>2005</year>
          .
          <volume>14165</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kojima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Reid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matsuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Iwasawa</surname>
          </string-name>
          ,
          <article-title>Large language models are zero-shot reasoners</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          )
          <fpage>22199</fpage>
          -
          <lpage>22213</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dong</surname>
          </string-name>
          , et al.,
          <article-title>A survey of large language models</article-title>
          ,
          <source>arXiv preprint arXiv:2303.18223</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I. N.</given-names>
            <surname>Glukhikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Y.</given-names>
            <surname>Chernysheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. A.</given-names>
            <surname>Shentsov</surname>
          </string-name>
          ,
          <article-title>Decision support in a smart greenhouse using large language model with retrieval augmented generation</article-title>
          , in: Third International Conference on Digital Technologies, Optics, and Materials Science (DTIEE
          <year>2024</year>
          ), volume
          <volume>13217</volume>
          ,
          <string-name>
            <surname>SPIE</surname>
          </string-name>
          ,
          <year>2024</year>
          , pp.
          <fpage>166</fpage>
          -
          <lpage>173</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Raman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sarkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Yadav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Meena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , et al.,
          <article-title>Reconnoitering precision agriculture and resource management: A</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>