=Paper=
{{Paper
|id=Vol-2029/paper13
|storemode=property
|title=Influence of Historical Meteorological Data Processing in a Mobile Application of Weather Prediction, based on Data Mining
|pdfUrl=https://ceur-ws.org/Vol-2029/paper13.pdf
|volume=Vol-2029
|authors=Herwin Alayn Huillcen-Baca,Flor de Luz Palomino-Valdivia
|dblpUrl=https://dblp.org/rec/conf/simbig/Huillcen-BacaP17
}}
==Influence of Historical Meteorological Data Processing in a Mobile Application of Weather Prediction, based on Data Mining==
Influence of Historical Meteorological Data Processing in a Mobile
Application of Weather Prediction, based on Data Mining
Herwin Alayn Huillcen Baca and Flor de Luz Palomino Valdivia
School of Systems Engineering, National University José María Arguedas, Andahuaylas, Peru
hhuillcen@gmail.com, fdeluz3@gmail.com
Abstract general, they are periodic natural phenomena and
depends on factors like the geographical location.
In the city of Andahuaylas Peru, agricul- In the case of Peru, specifically in the city of An-
tural region, the conditions of the weather dahuaylas, whose major source of economic de-
and especially the temperature is decisive
velopment is agriculture, the temperature associ-
for the success or failure of the agricultural
campaigns, on the other hand the infor-
ated with UV radiation is vital and essential, the
mation of UV radiation conditions the pre- temperature variations condition the success or
vention of health and life in general, this failure of a campaign Agricultural, and on the
information is shown by free internet ser- other hand, the levels of UV radiation directly af-
vices, but in a wrong way, because it fect the health of the villagers. In this way, pre-
comes from a remote location of the city dictive and real-time information on temperature
of Andahuaylas and also does not provide and UV radiation is of great importance.
detailed predictions for proper decision Under this approach, a mobile application for
making, in this way the present investiga- prediction the temperature and UV radiation was
tion had the purpose of evaluating the in-
implemented, using the historical meteorological
fluence of historical meteorological data
processing on the efficiency of a mobile
information of a station located in the city of An-
application of prediction of temperature dahuaylas, which, when processed using data
and UV radiation, which provides real, mining, generates prediction models that receive
updated and predicted information of the input data the current weather information and
weather of the city of Andahuaylas, for it generates numerical prediction of temperature and
was used the historical and current data of UV radiation. Subsequently, the degree of approx-
the meteorological station of the National imation of the actual temperature and UV radia-
University José María Arguedas of Anda- tion levels against the prediction was evaluated.
huaylas, which were analyzed using data This paper is a contribution as a generalization
mining to obtain efficient prediction mod-
of the proposed predicted model and as a real
els and were subsequently implemented in
the mobile application. To measure the ef-
source of information of the climate for the city of
ficiency of the prediction, we compared Andahuaylas.
the mean absolute errors of the models
used by the National Service of Meteorol- 2 Related works
ogy and Hydrology of Peru, of the cities of
Arequipa, Iquitos and Lima, with values of Weather prediction is an important application in
1.24, 2.66 and 1.485 respectively and the meteorology and has been one of the most scien-
mean absolute error of the prediction mod- tifically and technologically challenging problems
el of the mobile application with a value of around the world in the last century, so many re-
1.18, which verifies the efficiency of the lated works have been realized with data mining
proposed model and is the expected result. techniques and machine learning.
Olaiya and Barnabas (Folorunsho Olaiya,
1 Introduction 2012), propose the use of data mining techniques
in forecasting maximum temperature, rainfall,
In our planet, the elements of the weather are
evaporation and wind speed. This was carried out
fundamental for the development of the life in
144
using Artificial Neural Network and Decision Tree 17.5 km from the city, with an altitude of 11706
algorithms and meteorological data collected be- feet or 3568 meters (Corpac SA, 2015), against
tween 2000 and 2009 from the city of Ibadan, Ni- the altitude of the city of Andahuaylas of 2901
geria. A data model for the meteorological data meters. (Regional Government of Apurimac -
was developed and this was used to train the clas- DIRCETUR, 2015), this difference of 667 meters
sifier algorithms. The performances of these algo- makes both places present different climates, in
rithms were compared using standard perfor- addition to belonging to different natural regions.
mance metrics, and the algorithm which gave the The city of Andahuaylas and its environs have
best results used to generate classification rules as main productive activity to the agriculture,
for the mean weather variables. A predictive Neu- which depends almost entirely on the elements of
ral Network model was also developed for the the weather, therefore it is important to have me-
weather prediction program and the results com- teorological information predictive of the present
pared with actual weather data for the predicted day and later for an appropriate decision making,
periods. The results show that given enough case the problem arises because the prediction infor-
data, Data Mining techniques can be used for mation is not correct. It is known that in southern
weather forecasting and climate change studies. Peru, UV radiation has the highest rates in the
Khan, Muqeem and Javed (Sara khan, April world, this problem directly affects the health of
2016), propose that data mining is a technique that Peruvians through diseases of skin cancer and eye
helps in extracting relevant and meaningful in- disease, the human exposure to UV radiation, in
formation from the set of data. It can be further the case of Andahuaylas the problem is greater,
described as knowledge discovery process that because there is no source of information regard-
can be applied on any set of data. Data mining ing the UV indices, both current and predictive, so
techniques when applied on relational databases that people adopt preventive measures during the
can be used to search certain trends or patterns. hours that they will be exposed to the sun.
This paper provides a survey of different data
mining techniques being used in weather predic- 3.2 Research Method
tion or forecasting. It also reviews and compares The objective is to evaluate the influence of his-
various techniques being used in a tabular format. torical meteorological data on the efficiency of a
Bartok, Habala, Bednar, Gazak and Hluchý mobile application prediction of temperature and
(Juraj Bartok, 2010), present the methods and UV radiation, based on data mining, in such a way
technologies for integration of the input data, dis- that in case of obtaining a mean absolute error
tributed on different vendors’ servers. The mete- (MAE) ( Pablo Cortes Achedad, 2010) acceptable
orological detection and prediction methods are predictive models and research in general, for the
based on statistical and climatological methods case of the temperature must have an average ab-
combined with knowledge discovery-data mining solute error lower than 2.0 and for the case of UV
of meteorological data (messages, weather radar radiation, less than 1.0.
imagery, “raw” meteorological data from stations, Finally we propose a generalization of this ap-
satellite imagery and results of common meteoro- proach, for the application of prediction models
logical prediction models). with an optimum amount of data and an efficient
algorithm.
3 Methodological Approach
4 Analysis of meteorological data
3.1 Problem Approach
In the city of Andahuaylas there is a problem with In order to generate optimal models of prediction
temperature information, as free internet services of temperature and UV radiation, based on the
provide wrong information, usually between three analysis of the behavior of classification algo-
and five degrees less than the actual measurement, rithms of the WEKA tool (Waikato, 2010), taking
this causes confusion and uncertainty among the as input, the meteorological data of the meteoro-
population. logical station of the National University José Ma-
The cause is that the free internet services takes ría Arguedas of Andahuaylas, whose records of
as a source of data the meteorological information temperature, UV radiation, humidity, wind speed
of the airport of Andahuaylas, which is located and rain are intervals of 5 minutes. Specifically,
prediction model algorithms were chosen for tem-
145
perature prediction after 24 hours, temperature 4.3 Prediction algorithms
prediction after 48 hours, prediction of UV radia- For the selection of candidate algorithms to gen-
tion after 24 hours and prediction of UV radiation erate the prediction models, we used as reference
after 48 hours. the investigations that were used as antecedents
4.1 Pre-processing of data (José M. Molina, 2010); however, they can not be
used in their entirety, since it depends very much
The processed data are registered in a mySQL da- on the nature of the Attributes and class attributes,
tabase of the web server of the National Universi- so we have the following classification algorithms
ty José María Arguedas of Andahuaylas, these da- for generating prediction models: reptree, m5p,
ta correspond to all elements of the climate rec- kstar, linear regression, additive regression, bag-
orded every 5 minutes, for the research was used ging, decision table, conjunctive rule, simple line-
data of 6 months of registration. ar regression. In fact, the WEKA tool (Waikato,
4.2 Selection of variables or characteristics 2010) does not allow the use of other available
classification algorithms. There are other algo-
The variables or characteristics correspond to the rithms that were not taken into account by the
attributes taken into account in the generation of high mean absolute error (MAE) that results from
input files, these variables are hour, minute, tem- the data analysis.
perature, absolute humidity, wind speed, rainfall
and UV radiation. The objective was to evaluate 4.4 Extraction of knowledge
which elements are correlated with temperature
The input files are 28 corresponding to each pre-
and UV radiation. Figures 1 and 2 show correla-
diction taking data of 60, 30, 15, 7, 3 and 2 days
tions of variables.
prior to data collection, counting input files and
candidate classification algorithms, proceeded to
train the respective algorithms for each prediction
model, in such a way to be compared among them
by means of the absolute average error.
Figure 1: Correlation of variables for temperature
prediction
Figure 3: Results of algorithm training for tem-
perature prediction models at 24 hours
Figure 2: Correlation of variables for UV radia-
tion prediction Figure 4: Results of algorithm training for tem-
perature prediction models at 48 hours.
146
5 Construction of Mobile Application.
5.1 Mobile Application
The mobile application was developed for plat-
forms Android (Google, 2015), works from ver-
sion 4.0 API Level 14, the development tool used
was Android Studio Beta 0.8.6. The choice of the
platform and the development tool obey the non-
functional requirement to provide free information
and easy access.
Figure 5: Results of algorithm training for UV Currently the mobile application is available for
radiation prediction models at 24 hours. download in the "Google Play" repository
(Google, 2015), under INFORAD's name, it is
free and available since July 2015. Figure 7 shows
the main interface.
Figure 6: Results of algorithm training for UV
radiation prediction models at 48 hours.
4.5 Interpretation and Evaluation
From Figure 3, it is observed that as the size of
historical meteorological data decreases, the mean
absolute error (MAE) of temperature predictions
to 24 hours is similar to the results of the training
of algorithms for prediction models of Tempera-
ture to 48 hours, UV radiation to 24 hours and UV
radiation to 48 hours. For the evaluation of the
choice of the prediction algorithm, it can be seen
that in Figures 3, 4, 5 and 6, the algorithm that ob- Figure 7: Screenshot of the main interface of the
tains the minimum value of the mean absolute er- INFORAD mobile application
ror is the KSTAR algorithms, with a value of
0.0931 For models of prediction of temperature at 5.2 General Scheme of the application
24 hours, of UV radiation at 24 hours and of UV The INFORAD mobile application has the follow-
radiation at 48 hours yields values of 0.0847, ing functional requirements:
0.1343 and 0.1349 respectively, all of them
through the algorithm KSTAR. Finally we con- • Show the temperature prediction for the
clude for the generation of prediction models current day for each hour of the day.
we used the KSTAR algorithm and the data
size are 2 days before the date and time of pre- • Show the temperature prediction for the
diction. next day for each hour of the day.
• Show the prediction of UV radiation for the
current day for every 30 minutes of the day,
from 6:00 a.m. to 6:00 p.m.
147
• Show the prediction of UV radiation for the 5.5 Generation of current and predicted in-
next day for every 30 minutes of the day, formation.
from 6:00 a.m. to 6:00 p.m. The data mining process for knowledge extraction
is a process involving hardware resource con-
• Display real-time information on tempera-
ture and UV radiation levels, updated every sumption and resource time, which can not be
5 minutes. loaded to a mobile device, for reasons of hardware
and processing limitations, then developed pro-
• Show health recommendations according to grams that generate current and predicted weather
UV radiation levels. information, executed on the general purpose
server of the Universidad Nacional José María
The only non-functional requirement is that the Arguedas, the resulting information is uploaded to
INFORAD mobile application must provide free subdomain of the university
and easily accessible information. To satisfy these (http://radiacionv.unajma.edu.pe), for later use by
requirements, several components are required to the INFORAD mobile application. Figure 9 shows
work synchronously, Figure 8 shows the compo- the prediction interface through intuitive and easy-
nent diagram. to-read graphs.
Figure 8: Component diagram of the INFORAD
mobile application
5.3 Meteorological Station
The meteorological station used in the present in-
vestigation is the DAVIS INSTRUMENTS COR-
PORATION1, model WIRELESS VANTAGE
PRO2, this equipment was acquired in September
of 2014 and installed in November of 2014.
5.4 Database Server Figure 9: Screenshot of the prediction interface
of INFORAD mobile application
The disadvantage of the meteorological station is
the lack of database connection, so we used the 5.6 Usage statistics.
mySQL server of the university, whose platform is
According to figures 10 and 11, it is observed that
Debian GNU / Linux2, version 7.0, however an-
there are 706 installations per user, from July
other service was required to connect to the
2015 to July 2017, likewise has an average rating
weather station, Extracts data every 5 minutes and
of 3,778.
registers them in the mySQL database, then the
WEEWX3 service is installed and configured,
which is free and meets the requirements.
1 http://www.davisnet.com/
2 http://www.debian.org/
3 http://www.weewx.com/
148
6.2 Results of prediction effectiveness
The type of numerical prediction addressed in the
present research corresponds to a regional model,
so to evaluate the effectiveness of the results of
the predictions, a comparison of the mean abso-
lute error (MAE) of the proposed model with
Other prediction models of the region, in this case
there are no predictive models for the city of An-
dahuaylas, but there is information about the ef-
fectiveness of the ETA model (Vergaray, GJ
(2010), SENAMHI, 2013) of the National Service
of Meteorology and Hydrology of Peru, for some
Figure 10: Component diagram of the IN- Cities of Peru, as follows:
FORAD mobile application
Model City MAE
ETA / SENAMHI Arequipa 1.24
ETA / SENAMHI Iquitos 2.66
ETA / SENAMHI Lima 1.485
KSTAR/INFORAD Andahuaylas 1.18
Table 2: Comparison of mean absolute errors
(MAE) of temperature prediction to 24 hours for
ETA models and the one raised
7 Conclusions and Future Work
The analysis and evaluation of historical meteoro-
Figure 11: Component diagram of the IN- logical data, through data mining, have an optimal
FORAD mobile application influence on the efficiency of the mobile applica-
tion of prediction of temperature and UV radia-
6 Result Obtained tion, since smaller errors have been obtained than
the ETA model (Vergaray, GJ (2010) , Currently
6.1 Result of Errors Obtained used by the National Service of Meteorology and
To evaluate this efficacy, we used the mean abso- Hydrology of Peru, for the case of temperature
lute error (MAE) of each prediction versus the prediction at 24 hours, the present investigation
real value and then the mean of all predictions to has an average absolute error of 1.17 compared to
obtain the mean absolute error (MAE), which re- a value of 1.80 generated by the ETA model.
fers to the efficacy of the prediction. However, it The KSTAR classification algorithm is the
most suitable for the generation of prediction
is also important to calculate the mean absolute
models of temperature and UV radiation, for the
percentage error (MAPE) (Pablo Cortes Achedad,
city of Andahuaylas.
2010). Five consecutive days of evaluation of the
The optimum data size for general prediction
predictions were chosen, reaching the following
models of temperature and UV radiation for the
results:
city of Andahuaylas is 2 days ago, as it is proven
that the prediction is more accurate when taking
Temperature Pre- UV Radiation Pre-
near-occurrence data.
diction diction
The degree of certainty or approximation of
At 24 At 48 At 24 At 48
temperature predictions is better predicted for the
hours hours hours hours
next day than for the subsequent day, because the
MAE 1.18 1.45 0.98 0.87
mean absolute percentage error (MAPE) is
MAPE 9.32% 12.00% 38.62% 40.77% 11.46% and 12.0% respectively.
Table 1: MAE and MAPE values of predictions
149
The degree of certainty or approximation of Regional Government of Apurimac - DIRCE-
predictions of UV radiation is better predicted for TUR. (2015). Sub Regional Directorate of Foreign
the next day than for the subsequent day, because Trade and Tourism - Andahuaylas. Retrieved on
September 10, 2015, Sub Regional Directorate of
the mean absolute percentage error (MAPE) is
Foreign Trade and Tourism - Andahuay-
38.62% and 40.77% respectively. las:http://dirceturandahuaylas.gob.pe/Principales-
Predicting the temperature generates errors less Atractivos-Turisticos.php
than predicting UV radiation, because the temper-
Sara khan, M. M. (April 2016). A Critical Review of
ature has more stable values with respect to UV
Data Mining Techniques in Weather Forecasting.
radiation. IJARCCE - International Journal of Advanced
It is possible to implement prediction models Research in Computer and Communication
and later prediction applications for other ele- Engineering Vol. 5, Issue 4.
ments of the climate that are also important as in-
SENAMHI. (2013). The weather forecast, cap
formation, such as rainfall, humidity, atmospheric 13. Retrieved September 10, 2015, of the weather
pressure, wind speed and direction. fore-
It is known that the weather in general is peri- cast: http://200.58.146.28/nimbus/weather/pdf/cap1
odic, it has an annual repetition cycle, so it would 3.pdf
be interesting to analyze the historical meteoro- Vergaray, GJ (2010). Verification of the temperature
logical data of at least 2 years ago to generate pre- of the ETA - SENAMHI model. Retrieved 09
dictions for the whole following year, day by day, 2015, http://ftp.cptec.inpe.br/etamdl/WorkEtaIV/Es
even hour per hour. tudo_de_Caso/Estudo_de_Caso_WorkETA_4_Ger
ardo_Vergaray.pdf
8 References Waikato. (2010). Weka 3 - Data Mining with Open
Source Machine Learning Software in Ja-
Corpac SA (10 of 05 of 2015). Andahuaylas air-
va. Retrieved on September 10, 2015, Weka 3 -
port. Retrieved on September 10, 2015, Andahuay-
Data Mining with Open Source Machine Learning
las Airport:
Software in Ja-
http://www.corpac.gob.pe/Docs/Aeropuertos/Adm
va: http://www.cs.waikato.ac.nz/ml/weka/
Corpac/ANDAHUAYLAS.pdf
Wikispaces. (2015). WEKA - ARFF stable ver-
Folorunsho Olaiya, A. B. (2012). Application of Data
sion. Retrieved 09
Mining Techniques in Weather Prediction and
2015, https://weka.wikispaces.com/ARFF+(stable+
Climate Change Studies. I.J. Information
version).
Engineering and Electronic Business, 1, 51-59.
Google. (2015). Android. Retrieved 09
2015, https://www.android.com/
José M. Molina, J. G. (2010). Técnicas de Minería de
Datos basadas en Aprendizaje Automático. Recu-
perado el 10 de 09 de 2015, de Técnicas de Mine-
ría de Datos basadas en Aprendizaje Automático:
https://santiagozapatakdd.files.wordpress.com/201
1/03/curso-kdd-full-cap-3.pdf
Juraj Bartok, O. H. (2010). Data mining and
integration for predicting significant
meteorological phenomena. Procedia Computer
Science, Volume 1, Issue 1, ISSN 1877-0509,
http://dx.doi.org/10.1016/j.procs.2010.04.006.,
Pages 37-46.
Keffer, T. (2010). WeeWX: weather Linux soft-
ware. Retrieved on September 10, 2015, of
WeeWX: Linux weather soft-
ware: http://www.weewx.com/
Pablo Cortes Achedad, LO (2010). Organization En-
gineering. As Pablo Cortes Achedad, Engineering
Organization (pp. 349, 350). Madrid, Spain: Díaz
de Santos SA
150