<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>RichRDF: A Tool for Enriching Food, Energy, and Water Datasets with Semantically Related Facts and Images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mohamed Gharibi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Praveen Rao</string-name>
          <email>raopr@umkc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nouf Alrasheed</string-name>
          <email>nalrasheed@mail.umkc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Missouri-Kansas City (UMKC)</institution>
          ,
          <addr-line>Kansas City MO</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Food, energy, and water (FEW) are the key resources to sustain human life and economic growth on Earth. While there is a plethora of information related to FEW systems online, there is a lack of reliable knowledge management tools that enable easy consumption of such information. In this paper, we present a web-based tool called RichRDF with the goal of enriching exiting FEW systems with semantically related facts and images. The main features of RichRDF include (1) an entity extraction algorithm that extracts meaningful subjects from Resource Description Framework (RDF) statements using natural language processing (NLP) techniques, (2) a reliable approach to add semantic similarity scores and relationships between different RDF subjects based on ConceptNet, (3) an efficient way to use the numbers of WordNet synsets to request the associated images from ImageNet, and (4) a user friendly interface that allows users to load and convert FEW datasets to RDF and then query the RDF datasets using an existing SPARQL engine. A video highlighting the key features of RichRDF is available at https://youtu.be/vyHgh4LgKCo.</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge Graphs</kwd>
        <kwd>Food</kwd>
        <kwd>Energy</kwd>
        <kwd>and Water</kwd>
        <kwd>RDF</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Food, energy, and water (FEW) are the interdependent components that are
undoubtedly imperative for our lives on Earth. Tremendous stress in these
resources are expected by 2050 due to population growth, natural disasters, and human
activities, which emphasize the need to improve available FEW resources. The United
Nations has classified FEW components as a high priority within their
sustainable development goals [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Meanwhile, there are several federal agencies such as the United States
Department of Agriculture (USDA) and the National Drought Mitigation Center
(NDMC) that provide massive amounts of data related to FEW systems. However, the
available data exist in CSV, XML, and JSON formats that are not readily consumable
in the world of Linked Data (LD).1
1 http://linkeddata.org</p>
      <p>
        Today, billions of RDF triples are available on the Web for developing new
Semantic Web applications [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These triples are expressed as (subject, predicate,
object) to represent entities and their relationships within a knowledge base. These
relationships can be indicated by using different ontologies such as FOAF and DBpedia
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. A specific Internationalized Resource Identifier (IRI) can be added as the context
to RDF triples. Such statements are called as RDF quads. Here is an example of an RDF
triple capturing the relationship between Oswego and the United States [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]:
&lt;http://dbpedia.org/resource/Oswego&gt;
&lt;http://dbpedia.org/ontology/country&gt;
&lt;http://dbpedia.org/resource/United_States&gt; .
      </p>
      <p>The first term in the triple represents the subject, which is ‘Oswego.’ The second
term is the predicate, which was provided by the DBpedia ontology, is ‘country’, and
the last term of the triple represents the object, the ‘United States.’ Such triples can be
generated from files in different formats including JSON, CSV, and TSV. Converting
such files to RDF triples will allow users to express semantic relationships between
subjects and objects and to structure information using RDF graphs. Moreover, these
RDF triples will provide several benefits than processing raw files including the ease
of integration and use, modeling of semantic similarity between entities, and the ability
to query the knowledge base using a SPARQL engine.</p>
      <p>
        While there are several tools that can produce RDF datasets (e.g., Karma [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]), they
do allow us to easily add new assertions in the form of triples or quads to a dataset.
Moreover, the lack of reliable knowledge graphs serving FEW systems has motivated
us to build our own knowledge graph that helps in decision-making, enriching FEW
datasets by providing extra knowledge and images based on the semantic similarities
between the dataset entities, improving knowledge discovery, simplifying access, and
providing better search results. Our system, RichRDF, employs RDF, Web Ontology
Language (OWL), and SPARQL to construct and query the FEW knowledge graph.
2
      </p>
      <p>RichRDF</p>
      <p>
        The overall architecture of RichRDF is shown in Fig. 1. RichRDF has four stages of
execution to ensure that the system can run under different conditions before it starts
each stage so that the total processing time is minimized.
a user. If the file contains RDF quads, then the file will be ready for the next stage of
processing. Otherwise, the user is asked to provide a context IRI, which will be added
as the context of each RDF triple to produce RDF quads. In the second
stage, RichRDF runs NLP techniques [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] on the subjects, to extract the meaningful
entities in order to use these entities later on for further processing. The main goal of
entity extraction is to identify a real entity as the main subject, since subjects may be a
word, a couple of words, long text, or a number like an ID. The
extracted entities will subsequently represent an entire subject.
      </p>
      <p>
        RichRDF processes two quads at a time, therefore, there are several possibilities
during the entity extraction. Consider two subjects that end with the following strings
“CHEESE,COTTAGE,CRMD,W/FRU” and “BUTTER,PDR,1.5OZ,PREP,W/1/1.HYD”
After the entity extraction stage, “CHEESE,COTTAGE” denote the entities extracted
from the first subject and “BUTTER” from the second subject. We use the “isA” relation
to link the original subjects with the extracted entities. The third stage of RichRDF uses
the extracted entities on ConceptNet [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to obtain the relationship between these
subjects. If a relationship exists between the first and the second subject, another
request will be generated to fetch the semantic similarity score. We use RDF
reification/blank nodes to represent the similarity score between the original subjects.
The semantic similarity score and the relationship between subjects can be exploited
during information retrieval and NLP, and it also expands the search to ontology
keywords. Furthermore, these scores can be used in advanced machine learning
techniques to understand the data in a better way and to build better models [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        At the last stage, RichRDF queries WordNet using the extracted entities to generate
synset groups of words [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Using these synset groups, we generate the offset ID
numbers based on their relationship with the subjects. The offset ID numbers are used
to look up images on ImageNet [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. For every offset ID, we request ImageNet to provide
us with the Uniform Resource Locators (URLs) of all the images that are associated
with the subject ID number. As a result, hundreds of URLs will be returned. Instead of
adding all these URLs using blank nodes, we split them into two categories based on
their content. The first category is a single blank node with the relationship
“IURLs_subjectName” that contains a link to a page containing hundreds of images
related to this subject. The second category contains the pure images that represent the
subject only. We add the second category as multiple blank nodes using the relationship
“subjectName_Images”. Finally, the user will be able to download the output at this
stage or RichRDF can provide the user with another service to query the output using
a SPARQL engine with a user-friendly interface. The user will be able to download the
output of the queries at any time.
      </p>
      <p>Performance Evaluation. We report the performance evaluation results of
RichRDF to provide insights on its speed. Fig. 2 shows the best-case and worst-case
time taken for RichRDF while running an input file containing 1,000 triples. This file
was processed in four different rounds where we added a different feature in each round.
Table 1 shows the time taken for different numbers of triples for various datasets.</p>
      <p>We would like to mention that the execution time depends on the triples in the input
RDF dataset. Richer the dataset, i.e., containing commonly used entities such as food
types, fruits, brand name, objects names, etc., more would be the time taken to process
all the relationships, semantic scores, and obtain the relevant images. In the future, we
plan to leverage concurrency and parallelism to speed up RichRDF.</p>
      <p>To conclude, RichRDF is a new tool for modeling FEW datasets using Semantic
Web technologies to enable easy consumption and analysis of FEW information for
intelligent decision making. In the future, we would like to automatically publish the
RDF data produced by RichRDF on Linked Data. The source code of RichRDF is
available at https://github.com/UMKC-BigDataLab/RichRDF.
Acknowledgments: The first author (M. G.) would like to thank the support of UMKC
SGS Travel Grant.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>P.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Katib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Barron</surname>
          </string-name>
          .:
          <article-title>A knowledge ecosystem for the food, energy, and water system</article-title>
          .
          <source>In KDD 2016 Workshop on Data Science for Food, Energy and Water</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          , San Francisco,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          , G. Kobilarov,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ives</surname>
          </string-name>
          .:
          <article-title>DBpedia: A nucleus for a web of open data</article-title>
          .
          <source>In the Semantic Web Lecture Notes in Computer Science</source>
          , pp.
          <fpage>722</fpage>
          -
          <lpage>735</lpage>
          . Springer, Berlin,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Szekely</surname>
          </string-name>
          .:
          <article-title>Exploiting semantics for big data integration</article-title>
          .
          <source>In AI Magazine</source>
          , Vol.
          <volume>36</volume>
          , no.
          <issue>1</issue>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>C.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Surdeanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Finkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bethard</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. McClosky.</surname>
          </string-name>
          :
          <article-title>The Stanford CoreNLP natural language processing toolkit</article-title>
          .
          <source>In Proc. of 52nd annual meeting of the Association for Computational Linguistics</source>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>60</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>R.</given-names>
            <surname>Speer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Havasi</surname>
          </string-name>
          .:
          <article-title>ConceptNet 5: A large semantic network for relational knowledge</article-title>
          .
          <source>The People's Web Meets NLP</source>
          , pp
          <fpage>161</fpage>
          -
          <lpage>176</lpage>
          . Springer-Verlag Berlin,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>M.</given-names>
            <surname>Pham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Alse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Szekely</surname>
          </string-name>
          .:
          <article-title>Semantic labeling: A domain-independent approach</article-title>
          .
          <source>In International Semantic Web Conference (ISWC)</source>
          , pp.
          <fpage>446</fpage>
          -
          <lpage>462</lpage>
          , Kobe,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Hsu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tsai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          .:
          <article-title>Combining WordNet and ConceptNet for automatic query expansion: A learning approach</article-title>
          .
          <source>Information Retrieval Technology</source>
          , vol.
          <volume>4993</volume>
          , pp
          <fpage>213</fpage>
          -
          <lpage>224</lpage>
          , Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>O.</given-names>
            <surname>Russakovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Krause</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Satheesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karpathy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Berg</surname>
          </string-name>
          , L. Fei-Fei.
          <article-title>: ImageNet large scale visual recognition challenge</article-title>
          .
          <source>International Journal of Computer Vision</source>
          , Vol.
          <volume>115</volume>
          , pp.
          <fpage>211</fpage>
          -
          <lpage>252</lpage>
          , Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>