=Paper= {{Paper |id=Vol-1442/paper_27 |storemode=property |title=BLO: Batata Lake (Oriximiná/PA) Application Ontology |pdfUrl=https://ceur-ws.org/Vol-1442/paper_27.pdf |volume=Vol-1442 |dblpUrl=https://dblp.org/rec/conf/ontobras/SouzaM15 }} ==BLO: Batata Lake (Oriximiná/PA) Application Ontology== https://ceur-ws.org/Vol-1442/paper_27.pdf
     BLO: Batata Lake (Oriximiná/PA) Application Ontology

               Adriano Neves de Souza, Adriana Pereira de Medeiros

 Instituto de Ciência e Tecnologia – Universidade Federal Fluminense – Rio das Ostras
                              Rio de Janeiro – RJ – Brazil
           adriano_souza@id.uff.br, adrianamedeiros@puro.uff.br

     Abstract. This work presents the BLO ontology (Batata Lake Ontology), an
     application ontology that describes in a structured way the data of research
     done by limnology researchers of Federal University of Rio de Janeiro (UFRJ)
     Macaé in Batata Lake (Oriximiná/PA). The main contribution of the BLO is
     the creation of a research data repository in RDF and the BLS application
     (Batata Lake System), a semantic web application to support researchers in
     environmental impact assessments, in preservation areas settings, in species
     protection and recovery of degraded areas, among other activities.

1.   Introduction
The ecological complexity of aquatic ecosystems caused by the large volume of
sampling data creates difficulties to understand the environment and species, as well as
the relationship between them. This understanding generates scientific data and
knowledge, which provides recovery alternatives or mitigation of external impacts in
the ecosystem [Bozelli et al. 2000]. Governments and organizations are encouraging
solutions to share the knowledge of ecology. For example, the PELD (Long Term
Ecological Program) [Esteves et al. 2004] was created by the Brazilian government to
encourage the organization of research data on ecosystems. Limnology researchers of
the UFRJ Macaé-RJ have been working for decades in research about the Batata Lake,
an Amazonian aquatic ecosystem, located at Oriximiná-PA, that suffered environmental
impacts due to the tailings generated by bauxite production [Bozelli et al. 2000]. This
lake has been monitored and studied since the 80's in order to obtain knowledge of its
ecosystem and mitigate these impacts. The lack of structuring and formalization of the
large volume of generated data makes their analysis difficult, and limits the scope of the
researchers in the search for new knowledge.
     The application of Semantic Web technologies for the management and
understanding of research data has been widely discussed currently. The ontologies
usage in biodiversity has been appointed as a solution for obtaining scientific
knowledge [Campos et al. 2011]. Ontologies for biodiversity are presented in [Moura et
al. 2012], [Campos et al 2011] and [Amanqui et al 2013], but they do not describe terms
proposed in this work.
     This paper presents the application ontology BLO (Batata Lake Ontology) that
describes the data of analysis and samplings obtained by limnology researchers of UFRJ
Macaé-RJ in order to support their researches. It also presents the BLS web application
for supporting the lake recovery analysis and the search for solutions that mitigate the
environmental impacts. An exploratory study performed to validate the ontology is
presented. Then, some conclusions and future works are discussed.
2.   Batata Lake Application Ontology
Application ontologies describe concepts of a domain and specific tasks for
implementing systems, the practical part [Guarino, 1997]. BLO was created following
the Ontology Development 101 [Noy et al, 2001] guide. It was specified with the OWL
(Ontology Web Language), specifically OWL DL 2, with 35 classes and 222 axioms.
The domain was defined as Batata Lake. Thus, the ontology will be used to support the
limnology researches of UFRJ Macaé, organizing research data, providing relevant
information to the environmental impacts mitigation in this lake and preparing these
data for online publication when needed. The ontology scope was determined by
drafting the following competency questions: i) What is the sample period with the
highest concentration of chlorophyll in a given year? ii) What flood pulse had the
highest amount of turbidity in a given year? iii) What flood pulse had the highest
percentage of organic matter in the sediment in a given year? iv) What is the flood pulse
of a certain period? v) What samplings were done in impacted areas in a given period?
        Searches were performed in the ontology repositories DAML Ontology Library
(www.daml.org/ontologies/), Protégé_Ontology_Library (protegewiki.stanford.edu/wi-
ki/Protege_Ontology_Library), Schemapedia (datahub.io/pt_BR/dataset/schemapedia)
and Swoogle (swoogle.umbc.edu/), in order to find ontologies related to this work. The
ontologies HydroBodyOfWater (sweet.jpl.nasa.gov/2.0/hydroBodyOfWater.owl) and
Geography (www.daml.org/ontologies/412) contain some generic terms with descriptive
features related to the proposed ontology, but they do not address the domain of this
work. After the BLO definition, Albuquerque et al (2015) proposed sub-ontologies as
complements to biodiversity ontology OntoBio to create a fieldwork sample vocabulary.
The reuse of this vocabulary in the BLO ontology is a future work.
        Figure 1 shows the graph preview of the main classes of the BLO. The vertices
are classes or concepts defined in the ontology. The edges, which have a one direction,
are the relations between classes, also called object properties. The Sampling class
describes the collected sample by the researcher in the sampling stations, represented by
SamplingStation class. SamplingStation has two data properties: coordenates and
impacted, which respectively specify the geographical location of the sampling station
and whether it is an impacted area or not. The object property isDoneOn determines the
relation between Sampling and SamplingStation. The relation isDoneDuring between
Sampling and Period expresses that a sampling is done in a particular period. The
number of possible relations is limited by the amount of sampling stations that had
some collected sample. The FloodPulse class specifies the lake flood pulses, which are
the process stages of filling and emptying of the lake. This class has no data property,
because the identification of instances is done by the URI (Flood, HighWater, ebby,
LowWater). The Period class contains the data property date that describes the month
and year in which the sampling is done. It is related to FloodPulse class by the object
property determines. This property describes the relation between the months of the
year and the flood stages of the lake, which can suffer changes over the years, because
there is no standard in the establishment that a month will have a particular flood pulse.
The Sediment and Water classes represent all data collected of sediment and water in the
lake and they are related to the sampling by the object property isSampliedBy. All
sampling data related to water are described by data properties of the classes Water,
SuspendedMatterial, Aluminum, Chorophyll, Iron, Nitrongen, Oxigen and Phosphor.
                     Figure 1- BLO Classes and Properties (partial)
        The object property isDoneOn between Sampling and SamplinStation is defined
with the restriction FunctionalProperty. Thus, a sampling x can be done in only one
sampling station y. Using the triple Sampling-> isDoneOn-> SamplingStation is possible
seek sampling information grouped by sampling stations. The object property
determines is defined as inverse of isDeterminedBy. It allows that when answered the
competency question "What is the flood pulse of a certain period?", the reasoner
identifies the inverse relation isDeterminedBy and retrieve any instance that has the
inverse as relation. Restrictions like these add semantic details to the data model and
with reasoners the queries can obtain more accurate results, as shown in the section 3.
        The BLO instances were obtained from actual research data of the Batata Lake
stored in the last 26 years in spreadsheets. These data were automatically exported to
RDF [Graham; Jeremy, 2004] using the BLO vocabulary and stored in a repository
using the AllegroGraph 4.14 (http://franz.com/agraph/).

3.   BLS Web Application
BLS (Batata Lake System) was developed to provide accurate information of the lake
for researcher analysis. It was implemented in JAVA with JENA library
(http://www.w3.org/2001/sw/wiki/Jena), which allows connecting the application to the
RDF repository. JENA is a Java framework for building Semantic Web applications and
has support for manipulating RDF triples, OWL, SPARQL [Eric; Andy, 2008] queries
and includes an inference engine (Reasoner). The BLS interface was developed in
Portuguese. Figure 2 presents the Period query page, which allows searching a given
period by date (Período) or flood pulse (Pulso de Inundação). All periods of the selected
pulse are raised when the page is submitted. During query performing, the application
accesses the stored data in the RDF repository and run the query in SPARQL. Frame 1
presents the SPARQL query executed from page shown in Figure 2 and answers the
competency question "What is the flood pulse of a certain period?". Thus, the BLS
application displays the query result illustrated in the Figure 2, which shows that the
flood pulse was Low Waters (AguasBaixas). Note that the data can be described using
the relation isDeterminedBy in the RDF repository instead of determines. However, the
query result would be the same, because these properties were defined as inverse in the
BLO. The “eye” icon displays all requested period data, but the result will not be
presented here due to space limitations.
         Figure 2- Period Query                        Figure 3 - Sampling Query




    Frame 1 – Period SPARQL query


                                                     Frame 2 – Sampling SPARQL
                                                     Query
        The sampling query page presented in Figure 3 allows searching the samplings
done in a period or by a particular researcher in impacted area or not. It answers the
competency question "Which samplings were done in impacted areas in a given
period?". The application can consider the filter by researcher, otherwise it will be
considered by the period. The samplings can be selected by sampling stations. The
FILTER term in Frame 2 is used to determine the sampling period and the sampling
station type that the researcher wants to get as answer in the sampling query page. The
query result helps to evaluate the samplings which were done in impacted areas and
thus comparing with samples done in non-impacted areas, in order to historically
evaluate the behavior and recovery of the environment.

4. Exploratory Study
In order to evaluate the data model defined by BLO and the BLS application, a small
exploratory study was conducted. The hypothesis was that the use of Semantic Web
technologies for describing the Batata Lake data would facilitate the access and analysis
of these data. The study was performed from a test divided into two stages: the
execution of a search activity using the BLS application and the fill of an evaluation
questionnaire. The activity was evaluating the water turbidity of a sample in a given
period, considering as parameter the sampling data of non-impacted areas done in the
same period. This is important for the researchers, since that allows evaluating the
progress of the lake recovery. The study involved seven participants. The choice of them
was premised on the experience and engagement with lake researches. Two of the
participants, one PhD researcher and one master student, accompanied and provided all
the necessary for understanding the domain and definition of competency questions.
      The goal of the study was to evaluate how the research data started to be searched
and analyzed using the BLS. It was not stipulated time for performing the activity. At
the end of the activity each participant filled a joint questionnaire with the following
questions: 1) Do the searches available in the web application allow finding and
relating the data of the samplings? Why? 2) Do the results obtained by the searches
facilitate the comparison of the data and the analysis of the lake recovery? Why? 3)
Would you use this application again to query and analyze your research data? Why? 4)
Do the terms and system’s menu options correspond to the everyday reality of research
about the lake? If the answer is no, list the terms that do not match the reality. 5)
Considering a scale of one to five, with option 1 equal bad and 5 equal great, how do
you rate the form of searching available in the web application, comparing it with that
currently performed in Excel spreadsheets?
    Most participants (five of them) answered "yes" to the questions and valued the
new way to query research data. Six participants said that would use the BLS
application again, as this tool significantly reduces the time spent looking for a data,
enabling faster analysis. Six of them said that the vocabulary was defined according to
the everyday reality of research about the lake. This indicates that the BLO ontology
was well defined according to the domain. The test results also allowed identifying
problems and difficulties in finding and analyzing the data. In the issue 2, the answers
of four participants indicated that the queries results did not facilitate the data
comparison and the lake recovery analysis, because the way the results were presented.
They informed the search filter by period should be only for year interval with a flood
pulse filter to facilitate analysis based on different periods and years. In addition, they
suggested the choice of some variables, such as turbidity or chlorophyll, presented in
parallel all the values separated by sampling stations, impacted or not. It would allow
analyzing a historical series of data and effectively evaluate the lake recovery. Plus, they
observed that the application navigability would be more intuitive with the access to the
samplings from the data of a given period.

5.   Conclusion and Future Work
This paper presented the BLO ontology for semantically describing data of the research
done by limnology researchers of UFRJ-Macaé on Batata Lake. The semantic
description of these data enables richer queries about the lake through inferences done
by reasoners. In addition, it provides a vocabulary of common terms used in other
researches about the Batata Lake. The main contributions of this ontology is the creation
of a research data repository in RDF and the development of the BLS system, a
semantic web application to support researchers to query and analysis the research data
about this lake. The aim is supporting the production of scientific knowledge from the
analysis made by semantic queries and preparing the data for online publication when
needed. An initial exploratory study was done to validate the ontology and the
application. The tests showed the BLO relevance and quality and some necessary
changes in the BLS application. After implementing these changes, a new experiment
will be conducted to validate them.
        A future work is using the ontology proposed by Moura et al (2012) and the
BLO ontology for describing the species existing in the Batata Lake. Another future
work is sharing BLO so that other researchers that study this lake can use it to support
their research. Moreover, some terms related to fieldwork sampling context of the
OntoBio [Albuquerque et al, 2015] can be reused.

References
ALBUQUERQUE, A. C. F., CAMPOS DOS SANTOS, J. L., DE CASTRO JÚNIOR,
  A. N. OntoBio: A Biodiversity Domain Ontology for Amazonian Biological
  Collected Objects. 48th Hawaii International Conference on System Sciences, p. 10,
  2015.
AMANQUI, F. K. M.; SERIQUE, K. J.; LAMPING, F.; CAMPOS, J. L.;
 ALBUQUERQUE, A. C. F.; MOREIRA, D. A. Implementing an Architecture for
 Semantic Search Systems for Retrieving Information in Biodiversity Repositories.
 Simpósio Brasileiro de Banco de Dados, p. 1–6, 2013.
BOZELLI, REINALDO L.; ESTEVES, FRANCISCO A.; ROLAND, F. Lago Batata:
  Impacto e Recuperação de um Ecossistema Amazônico. UFRJ/SBL- RJ, 2000.
CAMPOS, J. L.; NETTO, J. F. D. M.; CASTRO, A. N. DE; ALBUQUERQUE, A. C. F.
  Ontologias para Interoperabilidade de Modelos e Sistemas de Informação de
  Biodiversidade, 2011.
ERIC, P.; ANDY, S. 2008 “SPARQL Query Language for RDF”. W3C.
  http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/.
ESTEVES, F. A. ; SCARANO, F. R. ; ROCHA, C. F. D. Pesquisa de Longa Duração na
  Restinga de Jurubatiba: Ecologia, História Natural e Conservação. 1. ed. Rio de
  Janeiro: RiMA Editora, 2004. v. 1. 376p.
GUARINO, N. Understanding, building and using ontologies. International Journal of
  Human-Computer Studies, v. 46, p. 293–310, 1997. Disponível em:
  .
GRAHAM, G.; JEREMY, C. 2004. “Resource Description Framework (RDF): Concepts
  and Abstract Syntax”. W3C. Disponível em: http://www.w3.org/TR/2004/REC-rdf-
  concepts-20040210/.
MOURA, A.; PORTO, F.; POLTOSI, M. Integrating Ecological Data Using Linked Data
 Principles. ONTOBRAS-MOST 2012: 156-167, 2012.
NOY, N.; MCGUINNESS, D. Ontology development 101: A guide to creating your first
  ontology. Development, v. 32, p. 1–25, 2001. Disponível em:
  .