=Paper= {{Paper |id=Vol-2029/paper13 |storemode=property |title=Influence of Historical Meteorological Data Processing in a Mobile Application of Weather Prediction, based on Data Mining |pdfUrl=https://ceur-ws.org/Vol-2029/paper13.pdf |volume=Vol-2029 |authors=Herwin Alayn Huillcen-Baca,Flor de Luz Palomino-Valdivia |dblpUrl=https://dblp.org/rec/conf/simbig/Huillcen-BacaP17 }} ==Influence of Historical Meteorological Data Processing in a Mobile Application of Weather Prediction, based on Data Mining== https://ceur-ws.org/Vol-2029/paper13.pdf
    Influence of Historical Meteorological Data Processing in a Mobile
         Application of Weather Prediction, based on Data Mining

                Herwin Alayn Huillcen Baca and Flor de Luz Palomino Valdivia

        School of Systems Engineering, National University José María Arguedas, Andahuaylas, Peru
                         hhuillcen@gmail.com, fdeluz3@gmail.com




                      Abstract                         general, they are periodic natural phenomena and
                                                       depends on factors like the geographical location.
    In the city of Andahuaylas Peru, agricul-          In the case of Peru, specifically in the city of An-
    tural region, the conditions of the weather        dahuaylas, whose major source of economic de-
    and especially the temperature is decisive
                                                       velopment is agriculture, the temperature associ-
    for the success or failure of the agricultural
    campaigns, on the other hand the infor-
                                                       ated with UV radiation is vital and essential, the
    mation of UV radiation conditions the pre-         temperature variations condition the success or
    vention of health and life in general, this        failure of a campaign Agricultural, and on the
    information is shown by free internet ser-         other hand, the levels of UV radiation directly af-
    vices, but in a wrong way, because it              fect the health of the villagers. In this way, pre-
    comes from a remote location of the city           dictive and real-time information on temperature
    of Andahuaylas and also does not provide           and UV radiation is of great importance.
    detailed predictions for proper decision              Under this approach, a mobile application for
    making, in this way the present investiga-         prediction the temperature and UV radiation was
    tion had the purpose of evaluating the in-
                                                       implemented, using the historical meteorological
    fluence of historical meteorological data
    processing on the efficiency of a mobile
                                                       information of a station located in the city of An-
    application of prediction of temperature           dahuaylas, which, when processed using data
    and UV radiation, which provides real,             mining, generates prediction models that receive
    updated and predicted information of the           input data the current weather information and
    weather of the city of Andahuaylas, for it         generates numerical prediction of temperature and
    was used the historical and current data of        UV radiation. Subsequently, the degree of approx-
    the meteorological station of the National         imation of the actual temperature and UV radia-
    University José María Arguedas of Anda-            tion levels against the prediction was evaluated.
    huaylas, which were analyzed using data               This paper is a contribution as a generalization
    mining to obtain efficient prediction mod-
                                                       of the proposed predicted model and as a real
    els and were subsequently implemented in
    the mobile application. To measure the ef-
                                                       source of information of the climate for the city of
    ficiency of the prediction, we compared            Andahuaylas.
    the mean absolute errors of the models
    used by the National Service of Meteorol-          2   Related works
    ogy and Hydrology of Peru, of the cities of
    Arequipa, Iquitos and Lima, with values of         Weather prediction is an important application in
    1.24, 2.66 and 1.485 respectively and the          meteorology and has been one of the most scien-
    mean absolute error of the prediction mod-         tifically and technologically challenging problems
    el of the mobile application with a value of       around the world in the last century, so many re-
    1.18, which verifies the efficiency of the         lated works have been realized with data mining
    proposed model and is the expected result.         techniques and machine learning.
                                                           Olaiya and Barnabas (Folorunsho Olaiya,
1   Introduction                                       2012), propose the use of data mining techniques
                                                       in forecasting maximum temperature, rainfall,
In our planet, the elements of the weather are
                                                       evaporation and wind speed. This was carried out
fundamental for the development of the life in



                                                     144
using Artificial Neural Network and Decision Tree       17.5 km from the city, with an altitude of 11706
algorithms and meteorological data collected be-        feet or 3568 meters (Corpac SA, 2015), against
tween 2000 and 2009 from the city of Ibadan, Ni-        the altitude of the city of Andahuaylas of 2901
geria. A data model for the meteorological data         meters. (Regional Government of Apurimac -
was developed and this was used to train the clas-      DIRCETUR, 2015), this difference of 667 meters
sifier algorithms. The performances of these algo-      makes both places present different climates, in
rithms were compared using standard perfor-             addition to belonging to different natural regions.
mance metrics, and the algorithm which gave the            The city of Andahuaylas and its environs have
best results used to generate classification rules      as main productive activity to the agriculture,
for the mean weather variables. A predictive Neu-       which depends almost entirely on the elements of
ral Network model was also developed for the            the weather, therefore it is important to have me-
weather prediction program and the results com-         teorological information predictive of the present
pared with actual weather data for the predicted        day and later for an appropriate decision making,
periods. The results show that given enough case        the problem arises because the prediction infor-
data, Data Mining techniques can be used for            mation is not correct. It is known that in southern
weather forecasting and climate change studies.         Peru, UV radiation has the highest rates in the
   Khan, Muqeem and Javed (Sara khan, April             world, this problem directly affects the health of
2016), propose that data mining is a technique that     Peruvians through diseases of skin cancer and eye
helps in extracting relevant and meaningful in-         disease, the human exposure to UV radiation, in
formation from the set of data. It can be further       the case of Andahuaylas the problem is greater,
described as knowledge discovery process that           because there is no source of information regard-
can be applied on any set of data. Data mining          ing the UV indices, both current and predictive, so
techniques when applied on relational databases         that people adopt preventive measures during the
can be used to search certain trends or patterns.       hours that they will be exposed to the sun.
This paper provides a survey of different data
mining techniques being used in weather predic-         3.2    Research Method
tion or forecasting. It also reviews and compares       The objective is to evaluate the influence of his-
various techniques being used in a tabular format.      torical meteorological data on the efficiency of a
   Bartok, Habala, Bednar, Gazak and Hluchý             mobile application prediction of temperature and
(Juraj Bartok, 2010), present the methods and           UV radiation, based on data mining, in such a way
technologies for integration of the input data, dis-    that in case of obtaining a mean absolute error
tributed on different vendors’ servers. The mete-       (MAE) ( Pablo Cortes Achedad, 2010) acceptable
orological detection and prediction methods are         predictive models and research in general, for the
based on statistical and climatological methods         case of the temperature must have an average ab-
combined with knowledge discovery-data mining           solute error lower than 2.0 and for the case of UV
of meteorological data (messages, weather radar         radiation, less than 1.0.
imagery, “raw” meteorological data from stations,          Finally we propose a generalization of this ap-
satellite imagery and results of common meteoro-        proach, for the application of prediction models
logical prediction models).                             with an optimum amount of data and an efficient
                                                        algorithm.
3     Methodological Approach
                                                        4     Analysis of meteorological data
3.1    Problem Approach
In the city of Andahuaylas there is a problem with      In order to generate optimal models of prediction
temperature information, as free internet services      of temperature and UV radiation, based on the
provide wrong information, usually between three        analysis of the behavior of classification algo-
and five degrees less than the actual measurement,      rithms of the WEKA tool (Waikato, 2010), taking
this causes confusion and uncertainty among the         as input, the meteorological data of the meteoro-
population.                                             logical station of the National University José Ma-
   The cause is that the free internet services takes   ría Arguedas of Andahuaylas, whose records of
as a source of data the meteorological information      temperature, UV radiation, humidity, wind speed
of the airport of Andahuaylas, which is located         and rain are intervals of 5 minutes. Specifically,
                                                        prediction model algorithms were chosen for tem-



                                                    145
perature prediction after 24 hours, temperature       4.3    Prediction algorithms
prediction after 48 hours, prediction of UV radia-    For the selection of candidate algorithms to gen-
tion after 24 hours and prediction of UV radiation    erate the prediction models, we used as reference
after 48 hours.                                       the investigations that were used as antecedents
4.1   Pre-processing of data                          (José M. Molina, 2010); however, they can not be
                                                      used in their entirety, since it depends very much
The processed data are registered in a mySQL da-      on the nature of the Attributes and class attributes,
tabase of the web server of the National Universi-    so we have the following classification algorithms
ty José María Arguedas of Andahuaylas, these da-      for generating prediction models: reptree, m5p,
ta correspond to all elements of the climate rec-     kstar, linear regression, additive regression, bag-
orded every 5 minutes, for the research was used      ging, decision table, conjunctive rule, simple line-
data of 6 months of registration.                     ar regression. In fact, the WEKA tool (Waikato,
4.2   Selection of variables or characteristics       2010) does not allow the use of other available
                                                      classification algorithms. There are other algo-
The variables or characteristics correspond to the    rithms that were not taken into account by the
attributes taken into account in the generation of    high mean absolute error (MAE) that results from
input files, these variables are hour, minute, tem-   the data analysis.
perature, absolute humidity, wind speed, rainfall
and UV radiation. The objective was to evaluate       4.4    Extraction of knowledge
which elements are correlated with temperature
                                                      The input files are 28 corresponding to each pre-
and UV radiation. Figures 1 and 2 show correla-
                                                      diction taking data of 60, 30, 15, 7, 3 and 2 days
tions of variables.
                                                      prior to data collection, counting input files and
                                                      candidate classification algorithms, proceeded to
                                                      train the respective algorithms for each prediction
                                                      model, in such a way to be compared among them
                                                      by means of the absolute average error.




Figure 1: Correlation of variables for temperature
                    prediction

                                                         Figure 3: Results of algorithm training for tem-
                                                             perature prediction models at 24 hours




Figure 2: Correlation of variables for UV radia-
                tion prediction                          Figure 4: Results of algorithm training for tem-
                                                             perature prediction models at 48 hours.



                                                   146
                                                        5     Construction of Mobile Application.
                                                        5.1     Mobile Application
                                                        The mobile application was developed for plat-
                                                        forms Android (Google, 2015), works from ver-
                                                        sion 4.0 API Level 14, the development tool used
                                                        was Android Studio Beta 0.8.6. The choice of the
                                                        platform and the development tool obey the non-
                                                        functional requirement to provide free information
                                                        and easy access.
 Figure 5: Results of algorithm training for UV            Currently the mobile application is available for
    radiation prediction models at 24 hours.            download in the "Google Play" repository
                                                        (Google, 2015), under INFORAD's name, it is
                                                        free and available since July 2015. Figure 7 shows
                                                        the main interface.




 Figure 6: Results of algorithm training for UV
    radiation prediction models at 48 hours.
4.5   Interpretation and Evaluation
From Figure 3, it is observed that as the size of
historical meteorological data decreases, the mean
absolute error (MAE) of temperature predictions
to 24 hours is similar to the results of the training
of algorithms for prediction models of Tempera-
ture to 48 hours, UV radiation to 24 hours and UV
radiation to 48 hours. For the evaluation of the
choice of the prediction algorithm, it can be seen
that in Figures 3, 4, 5 and 6, the algorithm that ob-     Figure 7: Screenshot of the main interface of the
tains the minimum value of the mean absolute er-                  INFORAD mobile application
ror is the KSTAR algorithms, with a value of
0.0931 For models of prediction of temperature at       5.2     General Scheme of the application
24 hours, of UV radiation at 24 hours and of UV         The INFORAD mobile application has the follow-
radiation at 48 hours yields values of 0.0847,          ing functional requirements:
0.1343 and 0.1349 respectively, all of them
through the algorithm KSTAR. Finally we con-                • Show the temperature prediction for the
clude for the generation of prediction models                 current day for each hour of the day.
we used the KSTAR algorithm and the data
size are 2 days before the date and time of pre-            • Show the temperature prediction for the
diction.                                                      next day for each hour of the day.

                                                            • Show the prediction of UV radiation for the
                                                              current day for every 30 minutes of the day,
                                                              from 6:00 a.m. to 6:00 p.m.



                                                    147
  • Show the prediction of UV radiation for the       5.5   Generation of current and predicted in-
    next day for every 30 minutes of the day,               formation.
    from 6:00 a.m. to 6:00 p.m.                       The data mining process for knowledge extraction
                                                      is a process involving hardware resource con-
  • Display real-time information on tempera-
    ture and UV radiation levels, updated every       sumption and resource time, which can not be
    5 minutes.                                        loaded to a mobile device, for reasons of hardware
                                                      and processing limitations, then developed pro-
  • Show health recommendations according to          grams that generate current and predicted weather
    UV radiation levels.                              information, executed on the general purpose
                                                      server of the Universidad Nacional José María
The only non-functional requirement is that the       Arguedas, the resulting information is uploaded to
INFORAD mobile application must provide free          subdomain           of        the         university
and easily accessible information. To satisfy these   (http://radiacionv.unajma.edu.pe), for later use by
requirements, several components are required to      the INFORAD mobile application. Figure 9 shows
work synchronously, Figure 8 shows the compo-         the prediction interface through intuitive and easy-
nent diagram.                                         to-read graphs.




 Figure 8: Component diagram of the INFORAD
              mobile application

5.3    Meteorological Station
The meteorological station used in the present in-
vestigation is the DAVIS INSTRUMENTS COR-
PORATION1, model WIRELESS VANTAGE
PRO2, this equipment was acquired in September
of 2014 and installed in November of 2014.
5.4    Database Server                                  Figure 9: Screenshot of the prediction interface
                                                              of INFORAD mobile application
The disadvantage of the meteorological station is
the lack of database connection, so we used the       5.6   Usage statistics.
mySQL server of the university, whose platform is
                                                      According to figures 10 and 11, it is observed that
Debian GNU / Linux2, version 7.0, however an-
                                                      there are 706 installations per user, from July
other service was required to connect to the
                                                      2015 to July 2017, likewise has an average rating
weather station, Extracts data every 5 minutes and
                                                      of 3,778.
registers them in the mySQL database, then the
WEEWX3 service is installed and configured,
which is free and meets the requirements.

1 http://www.davisnet.com/
2 http://www.debian.org/
3 http://www.weewx.com/




                                                  148
                                                      6.2     Results of prediction effectiveness
                                                      The type of numerical prediction addressed in the
                                                      present research corresponds to a regional model,
                                                      so to evaluate the effectiveness of the results of
                                                      the predictions, a comparison of the mean abso-
                                                      lute error (MAE) of the proposed model with
                                                      Other prediction models of the region, in this case
                                                      there are no predictive models for the city of An-
                                                      dahuaylas, but there is information about the ef-
                                                      fectiveness of the ETA model (Vergaray, GJ
                                                      (2010), SENAMHI, 2013) of the National Service
                                                      of Meteorology and Hydrology of Peru, for some
      Figure 10: Component diagram of the IN-         Cities of Peru, as follows:
            FORAD mobile application
                                                               Model        City                MAE
                                                         ETA / SENAMHI Arequipa                   1.24
                                                         ETA / SENAMHI Iquitos                    2.66
                                                         ETA / SENAMHI Lima                      1.485
                                                         KSTAR/INFORAD Andahuaylas                1.18

                                                          Table 2: Comparison of mean absolute errors
                                                         (MAE) of temperature prediction to 24 hours for
                                                                 ETA models and the one raised

                                                      7     Conclusions and Future Work
                                                      The analysis and evaluation of historical meteoro-
      Figure 11: Component diagram of the IN-         logical data, through data mining, have an optimal
            FORAD mobile application                  influence on the efficiency of the mobile applica-
                                                      tion of prediction of temperature and UV radia-
6      Result Obtained                                tion, since smaller errors have been obtained than
                                                      the ETA model (Vergaray, GJ (2010) , Currently
6.1     Result of Errors Obtained                     used by the National Service of Meteorology and
To evaluate this efficacy, we used the mean abso-     Hydrology of Peru, for the case of temperature
lute error (MAE) of each prediction versus the        prediction at 24 hours, the present investigation
real value and then the mean of all predictions to    has an average absolute error of 1.17 compared to
obtain the mean absolute error (MAE), which re-       a value of 1.80 generated by the ETA model.
fers to the efficacy of the prediction. However, it      The KSTAR classification algorithm is the
                                                      most suitable for the generation of prediction
is also important to calculate the mean absolute
                                                      models of temperature and UV radiation, for the
percentage error (MAPE) (Pablo Cortes Achedad,
                                                      city of Andahuaylas.
2010). Five consecutive days of evaluation of the
                                                         The optimum data size for general prediction
predictions were chosen, reaching the following
                                                      models of temperature and UV radiation for the
results:
                                                      city of Andahuaylas is 2 days ago, as it is proven
                                                      that the prediction is more accurate when taking
            Temperature Pre-   UV Radiation Pre-
                                                      near-occurrence data.
            diction            diction
                                                         The degree of certainty or approximation of
            At 24 At 48        At 24 At 48
                                                      temperature predictions is better predicted for the
            hours    hours     hours    hours
                                                      next day than for the subsequent day, because the
 MAE        1.18     1.45      0.98     0.87
                                                      mean absolute percentage error (MAPE) is
 MAPE       9.32%    12.00%    38.62%    40.77%       11.46% and 12.0% respectively.
    Table 1: MAE and MAPE values of predictions



                                                   149
   The degree of certainty or approximation of            Regional Government of Apurimac - DIRCE-
predictions of UV radiation is better predicted for         TUR. (2015). Sub Regional Directorate of Foreign
the next day than for the subsequent day, because           Trade and Tourism - Andahuaylas. Retrieved on
                                                            September 10, 2015, Sub Regional Directorate of
the mean absolute percentage error (MAPE) is
                                                            Foreign Trade and Tourism - Andahuay-
38.62% and 40.77% respectively.                             las:http://dirceturandahuaylas.gob.pe/Principales-
   Predicting the temperature generates errors less         Atractivos-Turisticos.php
than predicting UV radiation, because the temper-
                                                          Sara khan, M. M. (April 2016). A Critical Review of
ature has more stable values with respect to UV
                                                            Data Mining Techniques in Weather Forecasting.
radiation.                                                  IJARCCE - International Journal of Advanced
   It is possible to implement prediction models            Research in Computer and Communication
and later prediction applications for other ele-            Engineering Vol. 5, Issue 4.
ments of the climate that are also important as in-
                                                          SENAMHI. (2013). The weather forecast, cap
formation, such as rainfall, humidity, atmospheric          13. Retrieved September 10, 2015, of the weather
pressure, wind speed and direction.                         fore-
   It is known that the weather in general is peri-         cast: http://200.58.146.28/nimbus/weather/pdf/cap1
odic, it has an annual repetition cycle, so it would        3.pdf
be interesting to analyze the historical meteoro-         Vergaray, GJ (2010). Verification of the temperature
logical data of at least 2 years ago to generate pre-       of the ETA - SENAMHI model. Retrieved 09
dictions for the whole following year, day by day,          2015, http://ftp.cptec.inpe.br/etamdl/WorkEtaIV/Es
even hour per hour.                                         tudo_de_Caso/Estudo_de_Caso_WorkETA_4_Ger
                                                            ardo_Vergaray.pdf
8   References                                            Waikato. (2010). Weka 3 - Data Mining with Open
                                                            Source Machine Learning Software in Ja-
Corpac SA (10 of 05 of 2015). Andahuaylas air-
                                                            va. Retrieved on September 10, 2015, Weka 3 -
  port. Retrieved on September 10, 2015, Andahuay-
                                                            Data Mining with Open Source Machine Learning
  las                                      Airport:
                                                            Software                   in              Ja-
  http://www.corpac.gob.pe/Docs/Aeropuertos/Adm
                                                            va: http://www.cs.waikato.ac.nz/ml/weka/
  Corpac/ANDAHUAYLAS.pdf
                                                          Wikispaces. (2015). WEKA - ARFF stable ver-
Folorunsho Olaiya, A. B. (2012). Application of Data
                                                            sion. Retrieved                              09
  Mining Techniques in Weather Prediction and
                                                            2015, https://weka.wikispaces.com/ARFF+(stable+
  Climate Change Studies. I.J. Information
                                                            version).
  Engineering and Electronic Business, 1, 51-59.
Google. (2015). Android. Retrieved                  09
  2015, https://www.android.com/
José M. Molina, J. G. (2010). Técnicas de Minería de
   Datos basadas en Aprendizaje Automático. Recu-
   perado el 10 de 09 de 2015, de Técnicas de Mine-
   ría de Datos basadas en Aprendizaje Automático:
   https://santiagozapatakdd.files.wordpress.com/201
   1/03/curso-kdd-full-cap-3.pdf
Juraj Bartok, O. H. (2010). Data mining and
   integration      for     predicting      significant
   meteorological phenomena. Procedia Computer
   Science, Volume 1, Issue 1, ISSN 1877-0509,
   http://dx.doi.org/10.1016/j.procs.2010.04.006.,
   Pages 37-46.
Keffer, T. (2010). WeeWX: weather Linux soft-
  ware. Retrieved on September 10, 2015, of
  WeeWX:          Linux       weather   soft-
  ware: http://www.weewx.com/
Pablo Cortes Achedad, LO (2010). Organization En-
  gineering. As Pablo Cortes Achedad, Engineering
  Organization (pp. 349, 350). Madrid, Spain: Díaz
  de Santos SA



                                                      150