<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Multilingual SPARQL-Based Retrieval Interface for Cultural Heritage Objects</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mariana Damova</string-name>
          <email>mariana.damova@mozajka.co</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dana Dannells</string-name>
          <email>dana.dannells@svenska.gu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ramona Enache</string-name>
          <email>ramona.enache@cse.gu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>An Interface for Multilingual Queries</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science and Engineering, University of Gothenburg</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Mozaika</institution>
          ,
          <country country="BG">Bulgaria</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Sprakbanken, University of Gothenburg</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we present a multilingual SPARQL-based [1] retrieval interface for querying cultural heritage data in natural language (NL). The presented system o ers an elegant grammar-based approach which is based on Grammatical Framework (GF) [2], a grammar formalism supporting multilingual applications. Using GF, we are able to present a cross-language SPARQL grammar covering 15 languages and a cross-language retrieval interface that uses this grammar for interacting with the Semantic Web4. To our knowledge, this is the rst implementation of SPARQL generation and parsing via GF that is published as a knowledge representation infrastructure-based prototype. Querying the Semantic Web in natural language, more speci cally, using English to formulate SPARQL queries with the help of controlled natural language (CNL) syntax has been developed before [3,4]. Such approaches, based on verbalization methods are adequate for English, but in a multilingual setting where major challenges such as lexical and structural gaps become prominent [5], grammar-based approaches are preferable. The work presented here complements the method proposed by Lopez et al. [6] in that it faces the challenges in realizing NL in real world systems, not only in English, but also in multiple languages. Our system follows the approach of the Museum Reason-able View (MRV) of Linked Open Data (LOD) [7]. It provides a uni ed access to the cultural heritage sources including LOD from DBpedia,5 among other sources.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The query grammar for this data covers the nine central classes: title, painter,
type, colour, size, year, material, museum, place and the major properties
describing the relationship between them: hasCreationDate, fromTimePeriodValue,
toTimePeriodValue, hasMaterial, hasTitle, hasDimension, hasCurrentLocation,
hasColour. The set of SPARQL queries we cover include the famous ve WH
questions: who, where, when, how, what. Table 1 shows some NL queries and
their mappings to query variables in SPARQL.</p>
      <p>NL Query SPARQL
Where is Mona Lisa located? :hasCurrentLocation ?location
What are the colours of Mona Lisa? :hasColour ?colour
Who painted Mona Lisa? :createdBy ?painter
When was Mona Lisa painted? :hasCreationDate ?crdat
How many paintings were painted by
Leonardo da Vinci?</p>
      <p>?(count(distinct ?painting) as ?count)</p>
      <p>Table 1. Queries and query variables</p>
      <p>The NL to SPARQL mapping is implemented as a transformation table,
which could be extended to cover larger syntactic question variations.</p>
      <p>
        The grammar has a modular structure with three main components: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
lexicon modules covering ontology classes and properties; (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) data module covering
ontology instances; and (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) query module covering NL questions and SPARQL
query patterns. It supports NL queries in 15 languages, including: Bulgarian,
Finnish, Norwegian, Catalan, French, Romanian, Danish, Hebrew, Russian, Dutch,
Italian, Spanish, English, German and Swedish. The system relies on GF
grammars, treating SPARQL as yet another language. In the same manner as NL
generation, SPARQL patterns are encoded as grammar rules. Because of this
compact representation within the same grammar, we can achieve parallel
translations between any pair of the 15 languages and SPARQL.
      </p>
      <p>
        The grammar-based interface provides a mechanism to formulate a query in
any of the 15 languages, translate it to SPARQL and view the answers in any of
those languages. The answers can be displayed as natural language descriptions
or as triples. The latter can then be navigated as linked data. The browsing of
the triples can be carried on continuously; by clicking on one of the triples listed
in the answers, a new SPARQL query is launched and the results are generated
as natural language text via the same grammar-based interface or as triples.
Following previous question answering over linked data (QALD) evaluation
challenges [5], we divided the evaluation into three parts, each focusing on a speci c
aspect: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) user satisfaction, i.e. how many queries were answered; (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
correctness; and (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) coverage, how the system scales up.
      </p>
      <p>For the rst parts of the evaluation, we considered a number of random
queries in 7 languages and counted the number of corrections that 1-2 native
informants would make to the original queries. The results of the evaluation
showed that the amount of suggested corrections is relatively low for the majority
of the evaluated languages. The overall correctness of the generated queries seem
to be representative and acceptable, at least among the users who participated
in the evaluation.</p>
      <p>Regarding coverage, the grammar allows for paraphrasing most of the
question patterns, which sums up, on average to 3 paraphrases per construction in
the English grammar. The number of alternatives varies across languages, but
the average across languages ranges between 2 and 3 paraphrases per
construction. In addition, the 112 basic query patterns from the query grammar can be
combined with logical operators, in order to obtain more complex queries, which
sums up to 1159 query patterns that the grammar covers, including WH and
Yes/No questions. The additions needed to build the query grammar in order
for it to scale up are small, given that the other resources are in place. Also for
building the query grammar for a given language, no more than 150 lines of code
are needed. This process can be done semi-automatically.
4</p>
    </sec>
    <sec id="sec-2">
      <title>Conclusions</title>
      <p>We introduce a novel approach to multilingual interaction with the Semantic
Web content via GF grammars. The method has been successfully demonstrated
for the cultural heritage domain and could subsequently be implemented for
other domains or scaled up in terms of languages or content coverage. The main
contribution with respect to current state-of-the-art approaches is SPARQL
support and question answering in 15 languages.</p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgment</title>
      <p>This work was supported by MOLTO European Union Seventh Framework
Programme (FP7/2007-2013) under grant agreement FP7-ICT-247914. The authors
would like to acknowledge the Centre for Language Technology in Gothenburg.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Garlik</surname>
            ,
            <given-names>S.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andy</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>SPARQL 1.1 Query Language</article-title>
          .
          <source>(March</source>
          <year>2013</year>
          )
          <article-title>W3C Recommendation</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ranta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Grammatical Framework: Programming with Multilingual Grammars. CSLI Studies in Computational Linguistics</article-title>
          . CSLI,
          <string-name>
            <surname>Stanford</surname>
          </string-name>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ferre</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>SQUALL: A controlled natural language for querying and updating RDF graphs</article-title>
          . In: CNL. (
          <year>2012</year>
          )
          <volume>11</volume>
          {
          <fpage>25</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Ngonga</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.C.</given-names>
            , Buhmann, L.,
            <surname>Unger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          , Gerber.,
          <string-name>
            <surname>D.</surname>
          </string-name>
          :
          <article-title>Sorry, I don't speak SPARQL | translating SPARQL queries into natural language</article-title>
          .
          <source>In: Proceedings of WWW</source>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Walter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Unger</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Bar, D.:
          <article-title>Evaluation of a Layered Approach to Question Answering over Linked Data</article-title>
          .
          <source>In: International Semantic Web Conference (2)</source>
          . (
          <year>2012</year>
          )
          <volume>362</volume>
          {
          <fpage>374</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stieler</surname>
          </string-name>
          , N.:
          <article-title>Poweraqua: Supporting users in querying and exploring the semantic web</article-title>
          .
          <source>Semantic Web</source>
          <volume>3</volume>
          (
          <issue>3</issue>
          ) (
          <year>2012</year>
          )
          <volume>249</volume>
          {
          <fpage>265</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Damova</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dannells</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Reason-able View of Linked Data for cultural heritage</article-title>
          .
          <source>In: Proceedings of the third International Conference on Software, Services and Semantic Technologies (S3T)</source>
          . (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>