<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Querying Large Linked Data Resources</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zareen Syed</string-name>
          <email>zsyed@umbc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lushan Han</string-name>
          <email>lushan1@umbc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muhammad Rahman</string-name>
          <email>mrahman1@umbc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tim Finin</string-name>
          <email>finin@umbc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>James Kukla</string-name>
          <email>jkukla@redshred.net</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jeehye Yun</string-name>
          <email>jyun@redshred.net</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Maryland</institution>
          ,
          <addr-line>Baltimore County 1000 Hilltop Circle, Baltimore, MD, USA 21250</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Exploring large complex linked data resources is challenging as it requires not only mastering SPARQL syntax and semantics but also understanding the RDF data model and large ontology vocabularies comprising of thousands of classes, hundreds of properties and millions of URIs for instances of interest. Natural language question answering systems solve the problem, but these are still subjects of research. We describe a compromise in which nonexperts specify a graphical query 'skeleton' and annotate it with freely chosen words, phrases and entity names. Our system automatically generates a SPARQL query based on the input query skeleton.</p>
      </abstract>
      <kwd-group>
        <kwd>Information Storage and Retrieval</kwd>
        <kwd>User Interfaces</kwd>
        <kwd>Semantic Web</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        We describe a new schema-free query (SFQ) interface, in which the user explicitly
specifies the relational structure of the query as a graphical “skeleton” and annotates it
with freely chosen words, phrases and entity names. Our framework makes three
main contributions. It uses robust methods that combine statistical association and
semantic similarity to map user terms to the most appropriate classes and properties
used in the underlying ontology. Second, it uses a novel type inference approach
based on concept linking for predicting classes for subjects and objects in the query.
Third, it implements a general property mapping algorithm based on concept linking
and semantic text similarity. We briefly describe an evaluation in the
Schemaagnostic Queries over Large-schema Databases challenge [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and directions for
future work.
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Approach</title>
      <p>
        We need to compute semantic similarity between user entered query terms and terms
in the target ontology. Our approach [
        <xref ref-type="bibr" rid="ref3 ref4">3,4</xref>
        ] for computing semantic similarity
combines part of speech tagging, LSA word similarity and WordNet knowledge along
with custom term alignment algorithms. Our system was ranked as the top performing
system in 2013 and 2014 SemEval Conference challenge tasks [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
2.2
      </p>
      <sec id="sec-2-1">
        <title>Type Inference</title>
        <p>
          Our main SFQ system [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] requires users to provide types or classes for subjects and
objects in the query triples, however, this information is not available in many
challenge queries. For the input query, we infer concept types using concept linking
approach based on Wikitology [
          <xref ref-type="bibr" rid="ref6 ref8">6,8</xref>
          ] and Wikipedia Miner [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. After linking the subject
and object to concepts in Wikipedia [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] we retrieve the associated DBpedia ontology
classes to represent concept types.
2.3
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Concept level Association Knowledge Model (CAK Model)</title>
        <p>
          We employ a computational semantic similarity measure for the purpose of locating
candidate ontology terms for user input terms. Semantic similarity measures enable
our system to have a broader linguistic coverage than that offered by synonym
expansion. We know birds can fly but trees cannot and that a database table is not kitchen
table. Such knowledge is essential for human language understanding. We refer to
this as Concept level Association Knowledge (CAK). Domain and range definitions
for properties in ontologies, argument constraint definitions of predicates in logic
systems and schemata in databases all belong to this knowledge. Manually defining
this knowledge is tedious, we therefore, learn Concept-level Association Knowledge
statistically from instance data (the “ABOX” of RDF triples) and compute degree of
associations between terms in the ontology based on co-occurrences. We count
cooccurrences between schema terms indirectly from co-occurrences between entities
because entities are associated with types. We then apply a statistical measure,
Pointwise Mutual Information (PMI), to compute degree of associations between
classes and properties and between two classes. The detailed approach is available in
[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. We employ the learned CAK and semantic similarity measures for mapping a
user query to a corresponding SPARQL query which we discuss in the next sections.
2.4
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Query Interpretation</title>
        <p>For each SFQ concept or relation, we generate a list of the k most semantically
similar candidate ontology classes or properties. In the example in Figure 1, candidate lists
are generated for the five user terms in the SFQ, which asks “Which author wrote the
book Tom Sawyer and where was he born?”. Candidate terms are ranked by their
similarity scores, which are displayed to the right of the terms. Each combination of
ontology terms, with one term coming from each candidate list, is a potential query
interpretation, but some are reasonable and others not. We use a linear combination of
three pairwise associations to rank interpretations. The three are (i) the directed
association from subject class to property, (ii) the directed association from property to
object class and (iii) the undirected association between subject class and object class,
all weighted by semantic similarities between ontology terms and their corresponding
user terms. After user terms are disambiguated and mapped to appropriate ontology
terms, translating a SFQ to SPARQL is straightforward. Classes are used to type the
instances, properties used to connect instances. Our system generates a ranked list of
SPARQL queries.
Since our original SFQ system relies on CAK model which is based on DBpedia
ontology classes and properties and does not take instance references into account, we
created an independent parallel system to support instance references in SPARQL
query. The system is based on concept linking and semantic similarity. For any
concepts mentioned in the query, we try to link it to DBpedia using Wikitology and
Wikipedia Miner and update the reference to the linked concept in DBpedia. For mapping
properties, we retrieve all associated DBpedia properties for the linked concepts and
compute semantic similarity with the property input by the user and select the
property with the highest similarity with the user input property.
For evaluation we combined the output of both systems i.e. SFQ System and System
II where System II addresses the queries related to instances for type constraints and
SFQ System addresses the queries related to ontology classes for type constraints. Our
combined system was awarded schema agnostic query challenge award in the Schema
Agnostic Queries SAQ-2015 challenge competition. The evaluation dataset for the
task had 103 queries in total. Table 1 presents the evaluation results for two systems
independently and in combination. We analyzed the incorrect queries and found
different sources of errors such as errors in type inference, concept linking and errors
due to fewer or more number of triples generated compared to gold standard query.
The challenge queries were based on DBpedia 2014 whereas, our CAK model was
trained on DBpedia 3.6, we believe that training the SFQ System on the newer
DBpedia version may have improved the performance of the system. We also observed a
number of cases in challenge queries which referenced instances for type constraints
instead of ontology classes. The queries generated by our SFQ system only reference
ontology classes for type constraints. System II addresses this issue by resolving links
to instances. However, it cannot deal with cases where the number of relations or
triples may vary between the user input query and the correct translated SPARQL
query. In the future, we plan to improve our approach by developing a unified system
that would incorporate the strengths of both systems.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>The schema-free structured query approach allows people to query the DBpedia
dataset without mastering SPARQL or acquiring detailed knowledge of the classes,
properties and individuals in the underlying ontologies and the URIs that denote them.
Our system uses statistical data about lexical semantics and RDF datasets to generate
plausible SPARQL queries that are semantically close to schema-free queries. The
key contributions of our approach are the robust methods that combine statistical
association and semantic similarity to map user terms to the most appropriate classes
and properties used in the underlying ontology and type inference for user input
concepts based on concept linking.
5</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Guo</surname>
          </string-name>
          , W.: *
          <article-title>SEM 2013 shared task: Semantic textual similarity, including a pilot on typed-similarity, 2nd Joint Conf</article-title>
          .
          <source>on Lexical and Computational Semantics</source>
          , Association for Computational Linguistics.
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Schema-free Structured Querying of DBpedia Data</article-title>
          ,
          <source>In Proc. 21st ACM Int. Conf. on Information and Knowledge Management</source>
          , pp.
          <fpage>2090</fpage>
          -
          <lpage>2093</lpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McNamee</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Yesha</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Improving Word Similarity by Augmenting PMI with Estimates of Word Polysemy, IEEE Transactions on Knowledge and Data Engineering</article-title>
          , IEEE Computer Society, v25n6, pp.
          <fpage>1307</fpage>
          -
          <lpage>1322</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>L</given-names>
          </string-name>
          .:
          <article-title>Schema Free Querying of Semantic Data</article-title>
          ,
          <source>Ph.D. Dissertation</source>
          , Univ. of Maryland, Baltimore County, Aug.
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
          </string-name>
          , A. and Cheng, D.:
          <article-title>Querying RDF Data with Text Annotated Graphs</article-title>
          ,
          <source>27th Int. Conf. on Scientific and Statistical Database Management</source>
          ,
          <year>June 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kashyap</surname>
          </string-name>
          , A.,
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yus</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sleeman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Satyapanich</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gandhi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Meerkat Mafia: Multilingual and Cross-Level Semantic Textual Similarity systems</article-title>
          ,
          <source>Proc. 8th Int. Workshop on Semantic Evaluation</source>
          ,
          <year>August 2014</year>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Milne</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I. H.</given-names>
          </string-name>
          :
          <article-title>Learning to Link with Wikipedia</article-title>
          .
          <source>In Proceedings of the 17th ACM conference on Information and knowledge management</source>
          , pp.
          <fpage>509</fpage>
          -
          <lpage>518</lpage>
          . ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Syed</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Creating and Exploiting a Hybrid Knowledge Base for Linked Data</article-title>
          ,
          <source>in Agents and Artificial Intelligence, Revised Selected Papers Series: Communications in Computer and Information Science</source>
          , v129, Springer, April
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>9. https://sites.google.com/site/eswcsaq2015/</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>