<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Bari, Italy
" diego.marcia@unica.it (D. Marcia); manuela.sanguinetti@unica.it (M. Sanguinetti); atzori@unica.it (M. Atzori)
~ https://msang.github.io/ (M. Sanguinetti); http://atzori.webofcode.org/ (M. Atzori)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>User-friendly Query Interfaces for the HOPE Project</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Discussion Paper</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diego Marcia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuela Sanguinetti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maurizio Atzori</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Matematica e Informatica (DMI), Università di Cagliari</institution>
          ,
          <addr-line>Via Ospedale 72, 09124, Cagliari (CA)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>The paper introduces the current state of an ongoing research project devoted to the study and development of a user-friendly query interface to access open data. More specifically, the interface has been conceived so to enable simplified queries over heterogeneous data from diferent sources, such as public governmental sources or private structured repositories. This has been done by expanding the functionalities of an already existing interface for querying knowledge bases. The aim of this paper is showing the main interface features, highlighting the research questions they respond to and the open issues we intend to explore as next steps of the project.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Question Answering</kwd>
        <kwd>linked data</kwd>
        <kwd>by-example</kwd>
        <kwd>learning to rank</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In the recent years, we have seen a steady growth of structured data being made available to
the public, e.g. with governments contributing to this trend by publishing their data through
the so-called public open data.</p>
      <p>The work introduced in this paper forms part of a wider research project called High-quality
Open data Publishing and Enrichment1 (HOPE), whose main goal is the development of a
web-based open data management system addressed to public and private organizations and
lay users. The former can use the system to release semantically-annotated and high-quality
open data, while the latter can access such data in a user-friendly fashion.</p>
      <p>Despite the increased availability of open data, various aspects still hinder its access to the
general public, such as the need for several pre-processing steps before data can be of any use, or
the availability of querying tools, once such data is properly encoded. Formal query languages,
with SPARQL being the most representative example, are typically used to access and query the
linked open data repositories available nowadays. However, even users with advanced skills
will experience dificulties querying new data through standard SPARQL endpoints for at least
two reasons: (i) schema behind data may not be known in advance, (ii) values may be stored
with a syntax that may not be obvious in advance. These two issues, together with the complex
syntax of query languages like SPARQL for lay users, hugely limit the full power of the available
open data.</p>
      <p>
        Several attempts have been made over the years to automatically generate structured queries
from natural language questions [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1, 2, 3, 4, 5</xref>
        ] (or conversely, to translate structured queries into
natural language [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]).
      </p>
      <p>In this project, we address these issues with diferent approaches that may fit diferent
applications and kind of users. One of these in particular is based on the By-Example Structured
Queries (BEStQ), that is a user-friendly visual interface in which some example data from the
dataset is shown to the users before they start their query.</p>
      <p>The present paper describes the current state of development of the BEStQ interface, together
with some still open issues regarding data presentation and input interpretation.</p>
    </sec>
    <sec id="sec-2">
      <title>2. By-Example Structured Queries</title>
      <p>
        As described above, BEStQ is a visual approach to query databases and in particular knowledge
graphs, in a user-friendly fashion [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. BEStQ, in turn, builds upon and generalizes a previous
tool for Wikipedia called SWiPE [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], introduced as a way to query Wikipedia in a structured
way [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], therefore relying on DBpedia (its structured counterpart at that time) without the
need to write complex SPARQL queries, and then extended to support temporal queries [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and
mobile screens [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>As an example of interaction in SWiPE, the user looking for “politicians born in Palermo”,
would first find an example Wikipedia page for a public figure, then input the constraint in the
proper fields: “Palermo” in the infobox row Birth place and “politician” in the row Occupations.
Then by clicking a button they would find the results.</p>
      <p>Although this Query-By-Example-like approach from SWiPE is user friendly and simple to
learn, a number of problems had to be addressed in the transition to BEStQ, which is not limited
to Wikipedia and aims at adapting the approach to generic SPARQL endpoints. One important
limitation was the need of an infobox-like synoptic table to be used as an input form for the user
to insert their constraints. SWiPE used infoboxes from Wikipedia, while in the development
of BEStQ we have studied a general way to generate such form, and created a prototype2 that
generates it automatically out of any endpoint by querying it for its attributes.</p>
      <p>The high-level use flow is similar to the one in SWiPE. After specifying the desired SPARQL
enpoint, the user first searches by name for a starting resource they deem representative of
the desired result set. In this phase, they can specify two constraints: the locale language they
would prefer the results in, and a specific type to retrieve the entity from. Regarding the first
option, English is used as fallback language, while the default behaviour for the type filter, in
case the user doesn’t specify any type, is a search based on name only. The goal of this type
iflter is to speed up the search for the example entity, and can be dropped in the subsequent
phase. Once the example instance is retrieved, the user can apply the desired constraints on its
properties. The allowed constraints depend on the range of values declared by the knowledge
graph for the specific properties: the current prototype allows equality checks over properties
of all types, and substring matching on string literals only. For other types, it also provides
regular expression matching as a fallback, but we plan on supporting more types.
Users can also apply a presence constraint on properties, without specifying the desired values.
Finally, in case a specific type has been selected while looking up the example resource, the
user can choose to keep this constraint and limit the final search to the same explicit type of
the example resource, or remove it.</p>
      <p>Fig. 1 shows a screenshot from the current prototype for this step in the search process.</p>
      <p>A SPARQL query will be performed on the endpoint according to these user-generated
constraints, and the results will be presented to the user with a preview comprised of all
properties on which a constraint has been specified.</p>
      <p>Although this implementative efort is an important step to apply BEStQ searches to any
structured sources (not only DBpedia), some other research problems arise in order to make
this interface really efective.</p>
      <p>Finding a good example to start with. Our current research agenda is focusing on the
design of a hybrid approach in which the user would first write their search goal as in a question
answering system, and then the system finds a good example autonomously. This requires to
understand the output type for the user question, and match it with those available from the
data source (e.g., dbo:Politician in DBpedia). Once the type is clear, still a good instance
must be selected. One simple approach would be to find the politician with the largest number
of attributes, so that the user may have the highest chances to find the fields they need to insert
their constraints on.</p>
      <p>Another way may be to find a best-fit for the given natural language query from the user. For
instance, since they are looking for the name “Alfonso”, then politician Alfonso Bonafede may
be a better example than Sergio Mattarella, despite Bonafede’s infobox may have less attributes
than Mattarella’s infobox.</p>
      <p>
        Simplifying the input of user constraints. Once an example page is given, depending on
the level of automatic interpretation of the user’s natural language question mentioned above,
the system may try to fill in the constraint fields for the user. This task would require to find
phrases in the question that semantically match some legal attribute-value pairs in the database.
Ranking of infobox attributes. A relevant usability issue that must be faced within the
HOPE project is the ranking of attributes. In fact, many data sources contain several attributes,
but only a reasonable subset may be deemed relevant and shown to the user. In order to filter out
some attributes or to show the most relevant ones on top, we need proper ranking strategies for
attributes. We already developed learning-to-rank approaches to this problem [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and we are
now studying the problem by defining diferent metrics. According to our ongoing experiments,
a good candidate statistics is tf-idf, where we substitute the term-document relatedness score
typically used in document retrieval, with property-type relatedness score. This seems to
work very well to identify attributes more related with a given type (e.g. dbp:party is more
specific/relevant than foaf:givenName to the type dbo:Politician).
      </p>
      <p>Furthermore, should the user find the proposed ranking to suit their needs, our interface
already allows them to re-arrange the attribute order. This change in ordering is recorded
together with previous ones, and could be fed to the ranking algorithm. In the current prototype,
the latest user-defined order is used for all subsequent searches.</p>
      <p>Disambiguating types. In our prototype, the query is performed on the basis of the
constraints specified by the user for the desired attributes. Keeping in mind the example from
previous sections, the query would retrieve all the resources in the knowledge base matching
the birth place=Palermo constraint and for which the property party exists, regardless of
their explicit type being diferent from Politician. This allows us to overcome wrong types
being assigned to resources, which is the case in those knowledge bases generated by automated
extractors, as could likely be the case in the data resulting from the HOPE project’s extraction
and integration activities carried on heterogeneous and not necessarily certified sources. We
leave to the user the option to specify a type for the results to be retrieved.</p>
      <p>In future implementations, the use of tf-idf metrics mentioned in the previous section could
provide a valuable aid in disambiguating the types for entities in a spurious or incomplete type
system, as our ongoing experiments suggest (e.g. the presence of dbp:party relates more
strongly with type dbo:Politician than with type dbo:HorseTrainer).</p>
    </sec>
    <sec id="sec-3">
      <title>3. Conclusions</title>
      <p>The existence of multiple proposals for the question answering task over knowledge graphs
suggests the mismatch between the ever-growing relevance and size of open access linked
data on one side, and availability to the general public on the other. We realized a prototype
of a query-by-example platform that totally decouples user interaction and low-level data
structures and interfaces. In the immediate future, our objective is bringing such prototype to
stability both in terms of property constraint expressiveness and final query reliability. We have
already identified a simple yet satisfactory metric for attribute relevance (namely, attribute-class
relatedness). As a next step, we also plan to work on the development of our second approach
to user-friendly queries, i.e., the question answering module.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This work is funded by PRIN 2017 (2019-2022) project HOPE - High quality Open data Publishing
and Enrichment.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K. B.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-D. Kim</surname>
          </string-name>
          ,
          <article-title>Evaluation of SPARQL query generation from natural language questions</article-title>
          ,
          <source>in: Proceedings of the Joint Workshop on NLP&amp;LOD and SWAIE: Semantic Web, Linked Open Data and Information Extraction</source>
          , INCOMA Ltd. Shoumen,
          <string-name>
            <surname>BULGARIA</surname>
          </string-name>
          , Hissar, Bulgaria,
          <year>2013</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>7</lpage>
          . URL: https://www.aclweb.org/anthology/W13-5202.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dasgupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Höfner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <article-title>Asknow: A framework for natural language query formalization in sparql</article-title>
          ,
          <source>in: Proceedings of the 13th International Conference on The Semantic Web</source>
          . Vol.
          <volume>9678</volume>
          , Springer-Verlag,
          <year>2016</year>
          , p.
          <fpage>300</fpage>
          -
          <lpage>316</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Diefenbach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lopez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Maret</surname>
          </string-name>
          ,
          <article-title>Core techniques of question answering systems over knowledge bases: a survey</article-title>
          ,
          <source>Knowledge and Information Systems</source>
          <volume>55</volume>
          (
          <year>2018</year>
          )
          <fpage>529</fpage>
          -
          <lpage>569</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Afolter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Stockinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <article-title>A comparative survey of recent natural language interfaces for databases</article-title>
          ,
          <source>VLDB Journal 28</source>
          (
          <year>2019</year>
          )
          <fpage>793</fpage>
          -
          <lpage>819</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jung</surname>
          </string-name>
          , W. Kim,
          <article-title>Automated conversion from natural language query to sparql query</article-title>
          ,
          <source>Journal of Intelligent Information Systems</source>
          <volume>55</volume>
          (
          <year>2020</year>
          )
          <fpage>501</fpage>
          -
          <lpage>520</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          , E. Simperl, Spartiqulation:
          <article-title>Verbalizing sparql queries</article-title>
          , in: E.
          <string-name>
            <surname>Simperl</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Norton</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Mladenic</surname>
            ,
            <given-names>E. Della</given-names>
          </string-name>
          <string-name>
            <surname>Valle</surname>
            ,
            <given-names>I. Fundulaki</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passant</surname>
          </string-name>
          , R. Troncy (Eds.),
          <source>The Semantic Web: ESWC 2012 Satellite Events</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2015</year>
          , pp.
          <fpage>117</fpage>
          -
          <lpage>131</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Mousavi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Atzori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zaniolo</surname>
          </string-name>
          ,
          <article-title>Text-mining, structured queries, and knowledge management on web document corpora</article-title>
          ,
          <source>SIGMOD Rec</source>
          .
          <volume>43</volume>
          (
          <year>2014</year>
          )
          <fpage>48</fpage>
          -
          <lpage>54</lpage>
          . doi:
          <volume>10</volume>
          .1145/ 2694428.2694437.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zaniolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Atzori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Gu,</surname>
          </string-name>
          <article-title>User-friendly temporal queries on historical knowledge bases</article-title>
          ,
          <source>Inf. Comput</source>
          .
          <volume>259</volume>
          (
          <year>2018</year>
          )
          <fpage>444</fpage>
          -
          <lpage>459</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.ic.
          <year>2017</year>
          .
          <volume>08</volume>
          .012.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Atzori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zaniolo</surname>
          </string-name>
          ,
          <article-title>Swipe: searching wikipedia by example</article-title>
          , in: A.
          <string-name>
            <surname>Mille</surname>
            ,
            <given-names>F. L.</given-names>
          </string-name>
          <string-name>
            <surname>Gandon</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Misselis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Rabinovich</surname>
          </string-name>
          , S. Staab (Eds.),
          <source>Proceedings of the 21st World Wide Web Conference, WWW</source>
          <year>2012</year>
          , Lyon, France,
          <source>April 16-20</source>
          ,
          <year>2012</year>
          (Companion Volume), ACM,
          <year>2012</year>
          , pp.
          <fpage>309</fpage>
          -
          <lpage>312</lpage>
          . doi:
          <volume>10</volume>
          .1145/2187980.2188036.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dessi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maxia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Atzori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zaniolo</surname>
          </string-name>
          ,
          <article-title>Supporting semantic web search and structured queries on mobile devices</article-title>
          , in: R. D.
          <string-name>
            <surname>Virgilio</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Geller</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Cappellari</surname>
          </string-name>
          , M. Roantree (Eds.),
          <source>3RD International Workshop on Semantic Search over the Web, SSW '13, Riva del Garda, Italy, August</source>
          <volume>30</volume>
          ,
          <year>2013</year>
          , ACM,
          <year>2013</year>
          , pp.
          <volume>5</volume>
          :
          <fpage>1</fpage>
          -
          <issue>5</issue>
          :4. doi:
          <volume>10</volume>
          .1145/2509908.2509910.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dessi</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. Atzori,</surname>
          </string-name>
          <article-title>A machine-learning approach to ranking RDF properties, Future Gener</article-title>
          .
          <source>Comput. Syst</source>
          .
          <volume>54</volume>
          (
          <year>2016</year>
          )
          <fpage>366</fpage>
          -
          <lpage>377</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.future.
          <year>2015</year>
          .
          <volume>04</volume>
          .018.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>