<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1877-0509</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Enhancing data accuracy in agri-food forecasting: Methods and implications for informed decision-making⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>László Várallyai</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Szilvia Botos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Levente P. Bálint</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viktor L. Takács</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Róbert Szilágyi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Debrecen, Faculty of Economics, Institute of Methodology and Business Digitalization</institution>
          ,
          <addr-line>138 Böszörményi út, 4032 Debrecen</addr-line>
          ,
          <country country="HU">Hungary</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>235</volume>
      <issue>5</issue>
      <fpage>17</fpage>
      <lpage>20</lpage>
      <abstract>
        <p>In our article, we aim to demonstrate how enterprises can apply secondary economic data and use methodologies for trend analysis and forecasting. By utilizing secondary databases, organisations can effectively evaluate market stability and conduct comprehensive industry analyses. This approach not only enhances the accuracy of their assessments but also supports strategic decision-making processes. Europe's Digital Decade policy and the Digital Europe Programme both clearly aim to integrate digital solutions (like AI or BigData analytics) into business processes throughout the EU, in order to enhance the operational performance of business organisations. With the help of open source softwares companies will be able to overcome the general digital technology implementation barriers to help them in achieving a new digital corporate strategy.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Forecast methodology</kwd>
        <kwd>food industry</kwd>
        <kwd>trend analysis 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Literature review</title>
      <p>According to the targets of the European Union, several strategies, policies and initiatives have
been prepared and the development of the digital economy and society is a significant part of it. With
the concept of Industry 5.0 and Agriculture 5.0, the stakeholders operating in agriculture and the
food industry also have the potential to become one of the biggest users of technologies based on
open-source solutions with various advantages.</p>
      <p>Missing data is a frequent issue across various fields, not just in academia, which can occur due
to several factors and can lead to misleading information and incorrect decisions or results
(Gjorshoska et al., 2022). In recent years, there has been an increasing interest in applying predictive
analytical methods (like SARIMA) in the field of supply chain management (Kumari &amp;
Muthulakshmi, 2024). The primary aim of using these technics was demand and supply forecasting
(for prices and volumes) in purchasing functions (Falatouri et al., 2022). The presence of missing data
in a time series can greatly affect model performance by interrupting data continuity, making it an
important area of research (Lee et al., 2024). Understanding and pinpointing the reasons behind
missing data is always essential when working with any datasets.</p>
      <p>To prepare our literature review, we first performed a qualitative analysis (Figure 1) on the
relevant keywords (agriculture, open source and python), as this method is suitable for determining
the direction of the research. We used an international literature database (Scopus) for the analysis.
We defined the most relevant keywords closely related to the research.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Individual readiness for data analysis and opportunities</title>
      <p>Because of the many operating processes with digitalized data (transaction data, Internet of
Things data etc.) recording there are a lot of data ready to be transformed into data warehouses and
analyzed. The analysis of these data helps in identifying inefficient points of the business processes
and finding bottlenecks. The quantity and quality of these types of data can also be used in advanced
data analytics, such as Machine Learning. These data support many decisions related to optimization
of material, financial and information resources, increasing cooperation and forecasting.</p>
      <p>However, for advanced data analytics, there are many requirements from the side of the human
resources as well. Figure 2 and 3 show two relevant indicators that express the readiness levels of
individuals for advanced data analysis.</p>
      <p>Enterprises have access to a variety of open-source tools designed for data analytics. These
applications offer budget-friendly options and can be adapted to fit unique business requirements.
Based on the numbers on Figure 3, we can conclude that there is a need for writing programming
codes that can be considered as a necessary digital skill in the near future.</p>
    </sec>
    <sec id="sec-4">
      <title>4. SARIMA results</title>
      <p>In this article we present a possible forecasting model that can be used with time series where
seasonal variability might occur. In our results we describe a performance analysis of the applied
seasonal autoregressive integrated moving average methods and how we implemented Python-based
data-preprocessing algorithms.</p>
      <p>The obtained secondary product volume time series data from the Hungarian agricultural product
information system is stored in a publicly available database. As there were changes made in how
the collected data is stored in categories over the years, some transformations were applied on the
data source, which were carried out using Python Data Analysis Library (Pandas).</p>
      <p>The SARIMA methodology employed is this paper is a statistical model that analyzes time series
data to enhance the understanding of the dataset or forecast future trends better (Abeladi et al., 2023).</p>
      <p>SARIMA is a type of regression analysis that assesses the relationship between one dependent
variable and other varying factors.</p>
      <p>The SARIMA model includes several parameters to measure seasonality, like frequency,
integration, MA(Q) and AR(p) orders.</p>
      <p>Before applying the forecasting method, it is crucial to verify the stationarity of the original
dataset. This can be done using the ADfuller test, which provides a significance level through
hypothesis testing. The test results in a p-value that helps determine whether the time series is
stationary. If the p-value is below 0.05, the time series can be considered stationary. Conversely, if
the p-value exceeds 0.05, the time series is deemed non-stationary (Tatarintsev et al. 2021).</p>
      <p>The data source has been checked for suitability for the SARIMA model and is applicable to use.
The data was obtained from and trained on the Hungarian Agriculture Information Portal (2024)’s
dataset.
4.1. Performing the SARIMA</p>
      <p>
        In our study, to choose the most appropriate model, the AIC (Akaike Information Criterion) was
utilized (Vadim &amp; Alchakov, 2023). After the model optimalization process the
ARIMA(
        <xref ref-type="bibr" rid="ref1 ref1 ref2">2,1,1</xref>
        )(
        <xref ref-type="bibr" rid="ref1 ref1">1,0,1</xref>
        )[12] model was selected. Figure 4 shows our original dataset and the predicted
model data.
      </p>
      <p>Holt-Winters forecasting is a statistical method used for time series data that exhibits trends and
seasonality. It's particularly effective when the underlying pattern in the data is not linear and has
recurring seasonal fluctuations. Holt-Winter's Exponential Smoothing method is used to examine
the data to identify if it exhibits trends or seasonal patterns by analyzing the overall pattern
(Pongdatu &amp; Putra, 2018). There are 3 key components: level- represents the overall average value
of the time series, trend - captures the upward or downward trend in the data over time,
seasonalityaccounts for the periodic fluctuations that occur at regular intervals. In this process we created a
Holt-Winters forecasting model with a multiplicative trend and seasonality.</p>
      <p>The model was fitted to the training data (“training Q data”), and the test set was generated (“test
q data”) for the forecast. Finally, the predicted values were created (“predicted Q data”) Figure 5.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Model assessment</title>
      <p>Based on these metrics, the SARIMA model shows moderate accuracy: the R² of 0.6687 indicates
a reasonable fit, while MSE, RMSE, MAE, and MAPE values suggest that there is some level of error
in predictions, but the model's performance is generally acceptable with room for improvement.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>In conclusion, making well-informed, data-driven decisions is essential for modern businesses,
particularly in the food industry, where data analytics plays a crucial role in enhancing sustainability,
innovation, competitiveness, and resilience.</p>
      <p>In this paper, a quantity forecast was performed for wheat flour (BL55) using Python’s in-build
SARIMA model, as an open-source digital solution.</p>
      <p>Our article illustrates how enterprises can leverage secondary economic data and employ trend
analysis and forecasting methodologies to improve market stability and strategic decision-making.
By addressing and filling missing values in production-related data, organizations can avoid biases
and distortions that could otherwise lead to flawed conclusions and ineffective resource allocation.</p>
      <p>For instance, in agriculture, real-time data from sensors can provide insights into crop health, soil
conditions, and environmental factors, which are essential for optimizing yields and managing
resources efficiently. Accurate market price and volume information helps in forecasting demand,
setting prices, and adjusting production levels accordingly.</p>
      <p>Our application of the SARIMA method on historical market data demonstrates how accurate
estimation of missing information supports better market forecasts, optimizes production and
inventory strategies, and enhances cost planning and investment decisions.</p>
      <p>Ultimately, this approach fosters more informed decision-making and contributes to greater
efficiency and profitability in the agri-food sector.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Abeladi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zafar</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Mueen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>Time Series Forecasting using LSTM and ARIMA</article-title>
          .
          <source>International Journal of Advanced Computer Science and Applications</source>
          ,
          <volume>14</volume>
          (
          <issue>1</issue>
          ),
          <fpage>313</fpage>
          -
          <lpage>320</lpage>
          . https://doi.org/10.14569/IJACSA.
          <year>2023</year>
          .
          <volume>0140133</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Eurostat</surname>
          </string-name>
          ,
          <year>2024</year>
          . Overview. URL: https://ec.europa.eu/eurostat/web/digital-economy-andsociety/database/comprehensive-database
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Hungarian</given-names>
            <surname>Agriculture Information Portal</surname>
          </string-name>
          ,
          <year>2024</year>
          . URL: https://adat.aki.gov.hu/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Ivana</given-names>
            <surname>Gjorshoska</surname>
          </string-name>
          , Tome Eftimov and
          <string-name>
            <given-names>Dimitar</given-names>
            <surname>Trajanov</surname>
          </string-name>
          ,
          <year>2022</year>
          .
          <article-title>Missing value imputation in food composition data with denoising autoencoders</article-title>
          .
          <source>Journal of Food Composition and Analysis</source>
          . Vol.
          <volume>112</volume>
          . article 104638, ISSN 0889-
          <fpage>1575</fpage>
          . https://doi.org/10.1016/j.jfca.
          <year>2022</year>
          .
          <volume>104638</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Kyungjae</given-names>
            <surname>Lee</surname>
          </string-name>
          , Hyunwoo Lim, Jeongyun Hwang, and
          <string-name>
            <given-names>Doyeon</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <year>2024</year>
          .
          <article-title>Evaluating missing data handling methods for developing building energy benchmarking models</article-title>
          .
          <source>Energy</source>
          . Vol.
          <volume>308</volume>
          . article 132979. ISSN 0360-
          <fpage>5442</fpage>
          . https://doi.org/10.1016/j.energy.
          <year>2024</year>
          .
          <volume>132979</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>