=Paper=
{{Paper
|id=Vol-2863/paper-03
|storemode=property
|title=AQUACOLD: A Novel Crowdsourced Linked Data Question Answering System
|pdfUrl=https://ceur-ws.org/Vol-2863/paper-03.pdf
|volume=Vol-2863
|authors=Nicholas W. Collis,Ingo Frommholz
|dblpUrl=https://dblp.org/rec/conf/chiir/CollisF21
}}
==AQUACOLD: A Novel Crowdsourced Linked Data Question Answering System==
<pdf width="1500px">https://ceur-ws.org/Vol-2863/paper-03.pdf</pdf>
<pre>
AQUACOLD: A Novel Crowdsourced Linked Data
Question Answering System
Nicholas W. Collisa , Ingo Frommholzb
a
    Institute for Research in Applicable Computing, University of Bedfordshire, Luton, England
b
    School of Mathematics and Computer Science, University of Wolverhampton, UK


                                         Abstract
                                         Question Answering (QA) systems provide answers to Natural Language (NL) questions posed by hu-
                                         mans. The Linked Data (LD) web provides an ideal knowledge base for QA as the framework expresses
                                         structure and relationships between data which assist in question parsing. Despite this, recent attempts
                                         at NL QA over LD struggle when faced with complex questions due to the challenges in automatically
                                         parsing NL into a structured query language, forcing end users to learn languages such as SPARQL which
                                         can be challenging for those without a technical background. There is a need for a system which returns
                                         accurate answers to complex natural language questions over linked data, improving the accessibility
                                         of linked data search by abstracting the complexity of SPARQL whilst retaining its expressivity. This
                                         work presents AQUACOLD (Aggregated Query Understanding And Construction Over Linked Data) a
                                         novel LD QA system which harnesses the power of crowdsourcing to meet this need. AQUACOLD uses
                                         query templates built by other users to answer questions which enables the system to handle queries
                                         of significant complexity. This paper provides an overview of the system and presents the results of a
                                         technical and user evaluation against the QALD-9 benchmark.

                                         Keywords
                                         Linked Data, Natural Language, Question Answering, Crowdsourcing, SPARQL


1. Introduction
Question Answering is a subset of the Natural Language Processing and Information Retrieval
fields that focuses on providing direct answers to end users in response to a question formed
of natural language [1]. This method of providing direct answers to questions is preferred by
users over providing links to other information sources that may or may not contain those
answers [2]. Modern search engines and question answering systems including Google and
Bing often return direct answers to simple queries such as ‘Who is in the current Manchester
United squad?’ (see figure 1).
   However, simple, common queries like this form the minority of searches as 97% of search
engine queries have been shown to occur ten times or less [3]. Search Engines and Question
Answering systems often fail with queries of greater complexity, providing instead links to
websites which may be incorrect (see Figure1). This highlights the need for a system which can
understand more complex queries and return accurate answers.

BIRDS 2021: Bridging the Gap between Information Science, Information Retrieval and Data Science, March 19 2021,
Online
" nwcphd@gmail.com (N. W. Collis); ifrommholz@acm.org (I. Frommholz)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                          11
Figure 1: (L) Direct answers given to simple question posed in a search engine (R) A more complex
question failing to return direct answers, with inaccurate web listings given instead


   To overcome the challenges present in parsing unstructured text, many QA systems reason
over a structured data source such as the Linked Data Web instead of, or in conjunction with,
unstructured text to return an answer [4].
   Linked Data — described by creator Sir Tim Berners-Lee as ’The Semantic Web Done Right’ [5] —
refers to a technology stack (RDF, OWL, SPARQL) that provides a mechanism for data to be
published, queried and inferred on the web [6]. Using URIs (Uniform Resource Identifiers) as
unique identifiers, data can be consistently referenced by programs from different domains to
foster interoperability. The additional inherent structure can provide more accurate results for
factoid questions (what, where, when, which, who, or is) than unstructured text [7].
   QA systems which harness Linked Data typically decompose natural language queries into
a structured query language such as SPARQL [8]. These approaches are successful when
answering questions with high complexity (depth) over a narrow domain (breadth) or conversely,
shallow depth over a broad domain, but are less effective with increased question depth and
breadth, due the challenges of producing query representations for every question type [9].
   Previous work has shown that harnessing crowdsourced workers to translate natural language
to SPARQL can resolve this problem, resulting in a QA space of greater depth and breadth than
is possible with programmatic methods [10]. However, much of this work has relied on paid
microtask workers to translate between natural language queries and linked data representation,
which can result in biased results [11], misaligned incentives [12] and is not easily scalable.
   AQUACOLD is a novel Question Answering system which fills this gap by allowing users to
both get answers to and produce answers for Natural Language questions using a recognisable
spreadsheet-style UI for creating and filtering linked data sets, which can be labelled with
natural language, transformed into templates to answer related questions, and the answers
retrieved in response to a natural language question. This allows complex questions to be
asked and answered by the crowd over a wide domain of knowledge with organically aligned
incentives that arise from genuine information need.
   This paper outlines AQUACOLD and its performance against the QALD-9 benchmark and
is structured as follows. In section 2 we discuss related work. In section 3 we introduce the
AQUACOLD system. Section 4 outlines the evaluation methodology used to assess the effective-
ness of the system, section 5 reports the results of the evaluation and section 6 summarises the
findings and outlines future work.


                                               12
2. Related Work
Tools that provide answers to questions from Linked Data sources are typically classified
into Natural Language Interfaces for Linked Data (NLI-LD) where the query is entered as
natural language text and Query Builder Interfaces for Linked Data (QB-LD) where the query is
expressed through manipulation of an interface.
   NLI-LD is rooted in field of natural language search over structured databases, which dates
back to search systems for specific domains such as baseball [13] and lunar rocks [14], leading
to representation languages which interpret the question independent of the database structure,
as seen in MASQUE/SQL [15], an early NL front-end to an SQL database.
   Recent NLI-LD systems use a combination of techniques to convert natural language into a
structured query language. Some systems restrict the user to controlled natural language input
that requires specific words used in a specific order [2] whereas others allow complex sentence
structures [16]. Many NLI-LD systems decompose natural language queries into SPARQL using
generic query templates [17, 18]. This approach has proved successful for search over closed
domains, but less effective over open domains due the challenges of designing appropriate
templates for every possible question [9].
   Linked Data Query Builder interfaces (QB-LD) abstract the underlying SPARQL code into a
graphical representation which can be manipulated by the user to form a query. Some QB-LD
systems use graph interaction paradigms, such as NITELITE [19] and Semantic Crystal [20]
as these align well with the graph based nature of RDF data. Other QB-LD systems follow
a tabular structure for representing and interacting with Linked Data results and/or queries,
such as FalconS [21] and Freebase Parallax [22]. Many QB-LD systems [23, 24, 25] allow the
user to progressively refine their query through the application of predicate and value pairs
termed facets[26]. Facets have been shown to benefit users composing exploratory queries over
Linked Data [27]. QB-LD systems offer a greater robustness than NLI-LD as it is clear when
an answer cannot be found due to lack of data as opposed to incorrect query syntax, but are
limited in flexibility as the progressive filtering approach necessitates simpler queries.
   Hybrid approaches combine elements of NLI-LD and QB-LD, using the readability of natural
language as an entry point for a query with the query building elements highlighting what
terminology can or cannot be used. Examples of such hybrid systems are SPARKLIS [28], CODE:
Linked Data Query Wizard [29], Atomate [30] and Ginseng [31].
   Some systems incorporate machine learning techniques into NLI-LD. One high profile example
being the success of DeepQA, the technology behind IBM Watson [32], which was able to win a
game of the TV show Jeopardy! against expert human opponents by using a variety of machine
learning and Natural Language Processing techniques to compute the correct answer over range
of structured and unstructured data sources, including Linked Data.
   Incorporating crowdsourcing into Linked Data search remains an active area of research [33].
Some predict the combination of the open, interconnected Linked Data web and the power
of the crowd will produce a ‘Global Brain Semantic Web’ [2]. Crowdsourcing has been used
to enhance many aspects of Linked Data systems, including accuracy [34], adding additional
context [35], query optimisation [36], query expansion and query understanding [37]. Using
the crowd to help translate natural language queries into SPARQL has shown to resolve some
of the problems inherent in purely programmatic/algorithmic methods [10].


                                              13
   CrowdQ [37] is of particular relevance to this work, as the system employs the crowd to build
query templates which can be used to answer multiple variants of the original query. CrowdQ
limits the complexity of questions that can be answered to 1 semantic ’hop’. So ‘all students
in class CIS01b’ would be answerable but ‘birth dates of all professors who have students in class
CIS01b’ would not.
   A hybrid QB/NLI/CS approach has been explored in Google Squared [38] which allowed
users to label ‘squares’ of related information. This employed data scraped from unstructured
text and HTML tables and did not exploit the rich relations available from RDF Linked Data.


3. AQUACOLD System Overview
AQUACOLD (Aggregated Query Understanding and Construction Over Linked Data) com-
bines the principles of Natural Language Interfaces, Query Builders and Crowdsourcing into a
Question Answering system for Linked Data which requires no prior knowledge of SPARQL or
the underlying data schema.
    AQUACOLD allows users to find answers to questions from a Linked Data endpoint by
manipulating a faceted tabular interface similar to that found in spreadsheet applications. These
filter sets can then be labelled using natural language and turned into templates which allow
similar questions to also be answered. Future users can then search for and retrieve these
answers using natural language, vote on the quality of the results and adapt them to produce
their own labeled result sets and templates. The full system diagram is shown in fig. 2.


Figure 2: AQUACOLD System Architecture


                                               14
3.1. System Interface
AQUACOLD combines several features into one novel question answering tool: natural lan-
guage search, crowdsourced query labelling and voting, a query builder interface and
query templating system.
   Figure 3 shows a screenshot of the AQUACOLD User Interface. The key components are a
combined search and labelling box (item 1 in the figure) and Query Builder (items 3-8) which
incorporates a results grid and set of customisable filters (item 3) which can explore a given
linked data source, retrieving LD node labels and linking to related nodes, progressively building
up a result set based on the filters entered by the user. Once complete, the result grid and
associated filters can be labeled using natural language (item 1 in the figure) with autocompletion
suggestions for wording and terminology based on the labels entered by other users.
   As users compose natural language queries, they are guided using autocompletion (item 7)
which indicates the data available from the linked data source and their labels, alongside other
useful information such as images and description (item 8). Finally, a voting system (item 4)
is provided to rank the results, allowing the most accurate results sets to appear first to users
(item 1).
   Query templates are produced based on pattern matching between related query and entity
labels. These templates are used to answer similar natural language queries entered by future
users, returning results from the linked data web to the results grid.


Figure 3: The AQUACOLD interface with key


                                                15
3.2. The Linked Data Feedback Loop
The following scenario demonstrates how a user may engage with the system (fig. 4).

   1. User A arrives at the site looking for information on the tanks used in World War 1.
      He/she enters ‘Tanks used in World War 1’ into the search box. No results are found.
   2. Unable to find existing results, User A builds the results grid by adding filters
      dbp:type=:Tank and dbp:usedInWar=:worldWar1.
   3. The results grid complete, User A adds the label ‘Tanks used in World War 1’.
   4. AQUACOLD populates the database with User A’s query label and the associated SPARQL
      code from the results grid, together with possible variations formed by replacing all entities
      in the query with wildcards e.g. Tanks used in [*].
   5. Another user, User B searches for ‘Tanks used in World War 2’. Although a result grid
      for this query has not been explicitly created by another user, results are returned by the
      template created in step 4, substituting the entity ‘World War 1’ with ‘World War 2’.


Figure 4: The AQUACOLD Linked Data Feedback Loop


                                                16
4. Evaluation Methodology
AQUACOLD has been evaluated with a user evaluation and technical evaluation using 408
questions that comprise the 9th Question Answering Over Linked Data (henceforth QALD)
challenge [39]. The challenge has run annually since its inception in 2011 and remains the most
prominent evaluation series of its kind.
  QALD-9 includes a wide range of questions ranging in complexity from simple (When is
Halloween?) to advanced (Give me the capitals of all countries that the Himalayas run through).
Each question is supplied with a gold standard set of answers and associated gold standard
SPARQL query. Some questions are included that cannot be answered in the dataset, to show
how the participating systems judge whether unanswerable questions represent a problem with
the system itself, or lack of data.
  DBpedia [40] is the knowledge base for these questions. Although DBpedia is large and
wide ranging enough to cover a wide scope of questions, it is also incomplete with many
ontology errors [39]. This is seen as a benefit, as it ensures competing LD-QA systems can
handle incompleteness, modelling errors, missing property information and undefined entities.
  Answers submitted by users using AQUACOLD in response to QALD-9 questions are evalu-
ated using the standard measures of recall, precision and F1-Score against the gold standard
provided in the QALD-9 dataset.

                                              Precision × Recall
                                  𝐹1 = 2 ×                                                      (1)
                                              Precision + Recall
   The user evaluation measured how successfully a random selection of AQUACOLD users
with a range of backgrounds and proficiency could use the system to answer questions over the
Linked Data web. On starting the experiment, each participant completed a short survey to
capture their proficiency in: English; Web Search; Spreadsheets; SPARQL and overall IT skill.
On submitting the survey a short video was presented instructing participants on how to use
AQUACOLD. Each participant was then presented with a set of 5 random questions taken from
the QALD-9 dataset and instructed to answer each using the AQUACOLD system.
   The crowdsourced nature of the AQUACOLD system presents a challenge for the user
evaluation as participants will be able to find answers faster and easier if the labels and filters
that provide those answers have already been added to the system by other participants. To
factor this into the evaluation, each of the 5 questions presented to participants were hosted on
a different server (unbeknown to them) containing a different amount of preloaded data:

    • Server 1 has no pre-existing data available, participants can use the Query Builder (QB)
      only.
    • Server 2 has all GS labels and filter sets available through the Natural Language Search.
    • Server 3 has both correct and incorrect label and filters with randomised vote scores.
    • Server 4 has labels and filter sets similar to the GS which can be adapted with the QB
    • Server 5 has labels and filter sets created by other users, whether correct or incorrect.

   By evaluating how participants answer questions when a variety of crowdsourced content
is preloaded into the system, real world activity can be simulated and a reliable evaluation
obtained.


                                                17
   The technical evaluation measured the maximal performance of AQUACOLD in answering
all questions from the QALD-9 dataset and compares the system’s results against competing
systems in the QALD-9 challenge.
   This evaluation was carried out by an AQUACOLD system expert (the system developer)
attempting to answer each of the 408 questions, negating the chance that lack of familiarity with
the system would impact the precision or recall of each answer. For the comparative analysis
between AQUACOLD and other QALD-9 systems, precision, recall and f-score are recorded for
AQUACOLD and compared against those reported for competing systems.
   Finally, an evaluation of the template coverage available to AQUACOLD was included as
a measurement of the total number of queries the system could answer over DBpedia when
seeded with templates from the QALD-9 dataset. This is used to assess how expeditiously the
system expands its query coverage over time.


5. Results
5.1. User Evaluation Results
30 participants took part in the user evaluation after signing up online and completing the entry
questionnaire. This exceeds the minimum sample of size of 16 put forward by TREC [41] but
was lower than the number anticipated.
   Each participant was assigned 5 random questions from the QALD-9 dataset to answer
sequentially using AQUACOLD (see section 4), resulting in 150 questions answered in total.
Questions that were identified as unanswerable or only partially answerable in the technical
evaluation (see 5.3.1) were removed, leaving only questions that were fully answerable according
to the supplied gold standard answers.
   77 questions assigned to participants were answered correctly, retrieving all correct answers
(52% of the total), 46 answered incorrectly with no correct answers (31% of total), 11 answered
partially correctly retrieving some but not all of the correct answers (7% of total) and 15 questions
were abandoned when the participant moved on to the next question without answering (10%
of total). Precision, recall and f-measure were calculated for all attempts, resulting in a mean
average score of 0.52 precision, 0.56 recall and 0.52 F1 score.
   Analysis of average F1 score per question in the order the presented to participants (table 1)
shows a slight increase (0.08 in F1 Score) in average participant performance after Q1, which may
indicate participants are getting used to the system as they progress through the experiment.

5.2. User Results Per Server
Figure 5 highlights that participants answered question on Servers 1, 3 and 4 with a similar
rate of success, which is surprising given that Servers 3 and 4 contain some natural language
labels that can help participants answer some questions. The similarity in results between these
servers indicates that participants found the extra work required to identify which of the 3
labels is correct (for Server 3) or to adapt the partially correct labels (for Server 4) were no more
useful for answering the question than using the Query Builder exclusively.


                                                 18
                                     100


             percentage of answers
                                      80

                                      60

                                      40

                                      20

                                       0
                                           Server 1        Server 2         Server 3        Server 4      Server 5

                                                 Correct     Partially Correct         Incorrect   Abandoned

Figure 5: Percentage of correct / partially correct / incorrect / abandoned questions per server


   Server 1 contained no label or filter set data, requiring the participant to use the Query
Builder to answer questions. The majority of participants on this server (75%) attempted to find
the question answer using the natural language search tool initially and when no answer was
found, attempted to find the answer using the Query Builder. The next action most common
at this stage was to search for a relevant entity in the subject filter position (48% of users),
regardless of whether the entity should have been a subject or propertyValue. This pattern
is consistent for all servers, indicating that participants may not have grasped the differences
between filter types in the tutorial.
   Server 2 contained gold standard labels and corresponding filter sets available via the natural
language search tool. This server shows a significant increase in the percentage of correct
participant answers (60%), which is to be expected as all participants have to do is type the
question into the search tool and choose the suggested label (of which there is only one) in the
search results. This demonstrates the utility of AQUACOLD in an idealised scenario, where the
system has been seeded with only correct answers by the crowd.
   Server 3 contained one correct label and filter set and two incorrect label and filter sets for
each question, with each answer assigned a random visible confidence score from -5 to +5. The
majority of participants (83%) initially chose the matching label with the highest vote score


     Question Order                        Avg. % of correct answers        Avg. Precision     Avg. Recall   Avg. F1 Score
            1                                         45%                        0.42              0.45              0.43
            2                                         51%                        0.50              0.51              0.51
            3                                         66%                        0.66              0.66              0.64
            4                                         59%                        0.51              0.59              0.52
            5                                         56%                        0.53              0.56              0.53

Table 1
Average results for all questions in the order presented to participants


                                                                       19
when searching for an answer via natural language search. Of the participants who selected
a label which resulted in incorrect answers, (26%) used the query builder instead to try and
find an answer to the question, with 74% returning to the list and selecting a different option.
This indicates that participants were willing to explore the results returned by selecting a lower
voted label and could be further ameliorated by encouraging users to exercise their own votes
to redress incorrectly assigned labels.
    Server 4 contained labels and associated filter sets related, but not identical to, the gold
standard answer and required editing with the removal of one filter and the addition of another.
54% of participants on Server 4 searched for then selected a related label when searching for
the answer to their assigned question rather than using the query builder interface. Of these,
100% edited the filter set returned, of which 71% successfully reformulated the filter set to find
the correct answer. This indicates that participants were comfortable with tweaking existing
filter sets by editing them to find the correct answer and suggests that system utility extends
beyond scenarios where only the completed answer is available.
    Server 5 contained labels and filter sets populated by previous participants when using this
server. Participants on this server recorded the lowest number of correct answers (13%) and
the joint highest number of incorrect answers (40%) which is surprising, given that the natural
language labels and associated filter sets created by previous participants should have helped
users answer the question. One explanation may be that the low number of overall participants
(30) did not allow enough opportunity for sufficient, quality, crowdsourced results to influence
the overall score for this server.
    The use of multiple preloaded servers to mimic the effects of distinct crowdsourced conditions
on AQUACOLD user interactions has shown that the system performs well when one correct
answer has been entered by the crowd or a similar answer is available that can be refined
using the grid controls. In situations where no answer is available, or multiple answers have
been added by the crowd with misleading vote scores, the system performs less well. More
testing is needed with a larger sample size to get an accurate measure of how the crowdsourced
component performs at scale.

5.3. Technical Evaluation
5.3.1. Query Coverage
To identify the maximum query coverage available to AQUACOLD, an expert user (the system
developer) attempted to answer all 408 QALD-9 questions. Of these, 342 were fully answerable,
54 were unanswerable and 12 were partially answerable.
   To be classified fully answerable, the answers returned by AQUACOLD must match the gold
standard answers supplied for the associated question by QALD-9. To be classified partially
answerable, at least one returned answer must match at least one answer for the associated
question from QALD-9. To be classified unanswerable, no returned answers match any supplied
by QALD-9 for that question.
   The 66 unanswerable or partially answerable questions could become answerable if the
AQUACOLD UI was developed further to incorporate controls for the required SPARQL elements
(see table 2). Details of potential future developments can be found in section 6.


                                               20
                    Reason                              # questions affected
                    Requires Union                              23
                    Requires additional ontology                 9
                    Requires aggregation with GT / LT            9
                    Requires relative reference                  8
                    Requires aggregation with LIMIT              6
                    Requires cell ungrouping                     5
                    Result set too large                         5
                    Requires text substring search               1
                    Requires non English translation             1

Table 2
Reasons for unanswerable questions in AQUACOLD (some included multiple unanswerable elements)


5.3.2. Template Coverage
AQUACOLD’s query templating feature provides answers to multiple questions from a single
template. We evaluate the size of templates produced by each QALD-9 question to measure the
template coverage offered by AQUACOLD. Our sample size is limited to 342 questions - the
number of QALD-9 questions answerable by the system (see 5.3.1).
  Templates could be produced from 227 of the 342 answerable questions. A total of 839,938
queries were produced from the 227 templates, giving an average of 3,700 queries produced for
each templateable question.
  Where templates were unable to be produced from a query, this was caused by the system
being unable to calculate a match between words used in the query label and the labels used
in the filter sets. For example, the query “Give me all Taikonauts” requires the filter sets
occupation:Astronaut and nationality:Russian. There are no shared words between the query
“Give me all Taikonauts” and these labels, so a template cannot be produced. If the query label
was instead “Give me all Russian Astronauts”, templates could be produced that answer queries
such as “Give me all Chinese Astronauts” and “Give me all Russian Singers”.
  This evaluation demonstrates that AQUACOLD templating system can vastly increase the
system’s available query coverage, rapidly increasing its utility as a Linked Data Question
Answering tool.

5.3.3. AQUACOLD performance in QALD-9 compared to competing systems
To evaluate AQUACOLD using consistent metrics to similar systems that took part in the QALD-
9 challenge, all 408 questions were attempted in AQUACOLD by an expert user (the system
developer) with the resulting answers converted into the established IR metrics of precision,
recall and f-measure using the formula detailed in section 4.
   Mean average system recall for all 408 QALD-9 training questions was recorded as 0.87,
with a mean average system precision of 0.9, giving an overall system f-measure score for
AQUACOLD of 0.88.


                                               21
                        WDAqua       gAnswer 2   TeBaQA   Elon   QASystem     AQUACOLD
    System f-score         0.29          0.43     0.22    0.10      0.20           0.88

Table 3
Comparison of system f-scores from QALD-9 participating systems [42] against AQUACOLD


   AQUACOLD has a higher f-score than competing systems (see table 31 ), demonstrating that
more questions can be successfully answered from QALD-9 dataset than its competitors.
   It could be argued that this higher score is due to the use of the Query Builder component
which allows precise selection of the URIs, filter sets and modifiers and is incomparable with
competing systems that use a ‘pure natural language’ approach. However, as subsequent users
can retrieve answers to these questions (and any similar ones created by the templating system)
using natural language, with fuzzy matching for both syntax and entities, we argue that the
comparison against other pure natural language approaches is applicable, once enough data
has been seeded, as the result for the end user is the same - they are provided with answers
from the linked data web in response to a natural language query.


6. Summary and Future Work
AQUACOLD is a novel question answering tool that combines natural language search, a faceted
browsing interface and a crowdsourced templating system to answer complex questions from
the Linked Data web. Our evaluation has demonstrated that the system can answer a wide
variety of question types accurately and that the templating system can vastly expand the
answerable question space based on a small number of templates seeded by users.
   The evaluation has demonstrated that most participants can find the correct answer using
the system’s natural language search system if it has been seeded by the crowd previously and
can use the grid controls to refine partially correct filter sets to produce the correct answer.
Participants are less successful when filtering answers incorrectly labelled by other users or
finding answers using the query builder component, but users with greater experience of
spreadsheets or structured data tools perform better at this task. This suggests a potential
implementation of the system where technically skilled users create filter sets using the query
builder for less technically able users to find using the natural language search tool.
   Future work will include rerunning the evaluation with a larger cohort size to return more
robust results that should highlight how larger volumes of organically seeded data affect the
user experience. We plan to expand the number of available SPARQL operators supported by
the system to enable more complex queries involving unions, multi-language support, relative
references and additional ontologies other than DBPedia to be answered. Further work will
also include investigating whether the faceted query builder component could be replaced
with a natural language question-and-answer based system, negating the need for the user to
differentiate between subject, property and property value filters which less technically able
participants reported confusion over.

   1
       Elon and QASystem not yet published


                                                 22
References
 [1] H.-J. Oh, K.-Y. Sung, M.-G. Jang, S.-H. Myaeng, Compositional question answering: A
     divide and conquer approach, Information Processing & Management 47 (2011) 808–824.
     doi:10.1016/j.ipm.2010.03.011.
 [2] M. S. Bernstein, J. Teevan, S. Dumais, D. Liebling, E. Horvitz, Direct answers for search
     queries in the long tail, Proceedings of the 2012 ACM annual conference on Human Factors
     in Computing Systems (2012) 237–246. URL: http://dl.acm.org/citation.cfm?doid=2207676.
     2207710. doi:10.1145/2207676.2207710.
 [3] R. W. White, M. Bilenko, S. Cucerzan, Studying the Use of Popular Destinations to
     Enhance Web Search Interaction, Proceedings of the 30th annual international ACM
     SIGIR conference on Research and development in information retrieval - SIGIR ’07 (2007)
     159–166. URL: http://dl.acm.org/ft_gateway.cfm?id=1277771&ftid=437536&dwn=1&CFID=
     739208913&CFTOKEN=18406667. doi:10.1145/1377488.1377490.
 [4] A. Singhal, Introducing the knowledge graph: things, not strings, Official google blog 5
     (2012).
 [5] T. Berners-Lee, The Great Unveiling, 2009. URL: http://www.ted.com/index.php/talks/tim_
     berners_lee_on_the_next_web.html>.
 [6] W3c, Linked Data definition, 2015. URL: http://www.w3.org/standards/semanticweb/data.
 [7] A. Bozzon, M. Brambilla, S. Ceri, P. Fraternali, M. Dipartimento, V. Ponzio, Liquid Query :
     Multi-Domain Exploratory Search on the Web (2010) 161–170.
 [8] S. Shekarpour, A. C. N. Ngomo, S. Auer, Query segmentation and resource disambiguation
     leveraging background knowledge, CEUR Workshop Proceedings 906 (2012) 82–93.
 [9] K. Höffner, J. Lehmann, Survey on Challenges of Question Answering in the Semantic
     Web Survey on Challenges of Question Answering in the Semantic Web 0 (2016).
[10] H.-j. D. C.-y. Wu, R. T.-h. Tsai, From Entity Recognition to Entity Linking : A Survey
     of Advanced Entity Linking Techniques, The 26th Annual Conference of the Japanese
     Society for Artitifical Intelligence (2012) 1–10.
[11] D. Damljanovic, J. Petrak, M. Lupu, H. Cunningham, M. Carlsson, G. Engstrom, B. An-
     dersson, Random Indexing for Finding Similar Nodes within Large RDF graphs (2012)
     1–15.
[12] A. Kittur, J. V. Nickerson, M. Bernstein, E. Gerber, A. Shaw, J. Zimmerman, M. Lease,
     J. Horton, The future of crowd work, in: Proceedings of the 2013 conference on Computer
     supported cooperative work, ACM, 2013, pp. 1301–1318.
[13] B. F. Green, A. K. Wolf, C. Chomsky, K. Laughery, Baseball - an aautomatic question
     answerer (1961).
[14] W. A. Woods, R. Kaplan, Lunar rocks in natural English: Explorations in natural language
     question answering, Linguistic structures processing 5 (1977) 521–569.
[15] I. Androutsopoulos, G. Ritchie, P. Thanisch, Masque / sql . An Efficient and Portable
     Natural Language Query Interface for Relational Databases (1995) 1–7.
[16] A. M. Gliozzo, O. Biran, S. Patwardhan, K. McKeown, Semantic Technologies in IBM
     Watson TM, Proceedings of the Fourth Workshop on Teaching NLP (2013) 85–92.
[17] M. Damova, D. Dannells, Natural language interaction with semantic web knowledge
     bases and lod, . . . Semantic Web. . . . (2013) 1–15. URL: http://www.molto-project.eu/sites/


                                               23
     default/files/bookchap_0.pdf.
[18] J.-D. Kim, K. B. Cohen, Natural language query processing for SPARQL generation: A
     prototype system for SNOMED-CT, Proceedings of {BioLINK SIG} 2013 (2013) 32–38. URL:
     https://sites.google.com/site/biolinksig2013/program/biolinksig2013_Kim_Cohen.pdf.
[19] A. Russell, P. R. Smart, D. Braines, N. R. Shadbolt, NITELIGHT: A graphical tool for
     semantic query construction, CEUR Workshop Proceedings 543 (2009) 1–10.
[20] E. Kaufmann, A. Bernstein, Evaluating the usability of natural language query languages
     and interfaces to Semantic Web knowledge bases, J. Web Sem. 8 (2010) 377–393.
[21] H. I. Storage, R. Content, Falcons : Searching and Browsing Entities (2008) 1101–1102.
[22] D. Huynh, D. Karger, Parallax and companion: Set-based browsing for the data web,
     WWW Conference (2009) 2005–2008. URL: http://davidhuynh.net/media/papers/2009/
     www2009-parallax.pdf.
[23] T. Berners-lee, Y. Chen, L. Chilton, D. Connolly, R. Dhanaraj, J. Hollenbach, A. Lerer,
     D. Sheets, Tabulator : Exploring and Analyzing linked data on the Semantic Web, Pro-
     ceedings of the 3rd International Semantic Web User Interaction Workshop (2006).
[24] P. Fafalios, Y. Tzitzikas, X-ENS: Semantic Enrichment of Web Search Results at Real-time,
     Proceedings of the 36th International ACM SIGIR Conference on Research and Develop-
     ment in Information Retrieval (2013) 1089–1090. doi:10.1145/2484028.2484200.
[25] S. Lohmann, S. Dietzold, Interacting with Multimedia Content in the Social Semantic Web,
     Design (2008).
[26] D. Tunkelang, Faceted search, Synthesis lectures on information concepts, retrieval, and
     services 1 (2009) 1–80.
[27] M. Arenas, B. Cuenca Grau, E. Kharlamov, Š. Marciuška, D. Zheleznyakov, Faceted
     search over RDF-based knowledge graphs, Journal of Web Semantics 37-38 (2016) 55–74.
     doi:10.1016/j.websem.2015.12.002.
[28] S. Ferré, Expressive and Scalable Query-Based Faceted Search over SPARQL Endpoints
     Sébastien Ferré To cite this version : HAL Id : hal-01100313 Expressive and Scalable
     Query-based Faceted (2015).
[29] P. Hoefler, M. Granitzer, E. Veas, C. Seifert, Linked data query wizard: A novel interface
     for accessing sparql endpoints, CEUR Workshop Proceedings 1184 (2014).
[30] M. Van, M. V. Kleek, B. Moore, D. Karger, heterogeneous information sources on the web
     Citation Accessed Citable Link Detailed Terms Atomate It ! End-user Context-Sensitive
     Automation using Heterogeneous Information Sources on the Web (2017).
[31] E. Kaufmann, A. Bernstein, Evaluating the Usability of Natural Language Query Languages
     and Interfaces to Semantic Web Knowledge Bases, SSRN Electronic Journal (2010). URL:
     https://www.ssrn.com/abstract=3199491. doi:10.2139/ssrn.3199491.
[32] D. Ferrucci, E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. A. Kalyanpur, A. Lally, J. W.
     Murdock, E. Nyberg, J. Prager, others, Building Watson: An overview of the DeepQA
     project, AI magazine 31 (2010) 59–79.
[33] C. Sarasua, E. Simperl, N. F. Noy, A. Bernstein, J. M. Leimeister, Crowdsourcing
     and the Semantic Web: A Research Manifesto, Human Computation 2 (2015) 3–17.
     URL: http://hcjournal.org/ojs/index.php?journal=jhc&page=article&op=view&path[]=45.
     doi:10.15346/hc.v2i1.2.
[34] M. Acosta, E. Simperl, F. Flöck, M.-E. Vidal, HARE: A hybrid SPARQL engine to enhance


                                              24
     query answers via crowdsourcing, in: Proceedings of the 8th International Conference on
     Knowledge Capture, ACM, 2015, p. 11.
[35] G. Demartini, D. E. Difallah, P. Cudré-Mauroux, ZenCrowd: Leveraging Prob-
     abilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Link-
     ing,      Proceedings of the 21st international conference on World Wide Web
     - WWW ’12 (2012) 469–478. URL: http://dl.acm.org/citation.cfm?id=2187836.
     2187900%5Cnhttp://www2012.wwwconference.org/proceedings/proceedings/p469.pdf.
     doi:10.1145/2187836.2187900.
[36] J. Fan, M. Zhang, S. Kok, M. Lu, B. C. Ooi, CrowdOp: Query optimization for declarative
     crowdsourcing systems, 2016 IEEE 32nd International Conference on Data Engineering,
     ICDE 2016 (2016) 1546–1547. doi:10.1109/ICDE.2016.7498417.
[37] G. Demartini, B. Kraska, M. Franklin, CrowdQ: Crowdsourced Query Understanding,
     Conference on Innovative Data Systems Research (CIDR) (2013) 4. URL: http://www.cidrdb.
     org/cidr2013/Papers/CIDR13_Paper137.pdf.
[38] D. Crow, Google Squared: web scale, open domain information extraction and presentation,
     in: European Conference on Information Retrieval, Industry Day, 2010.
[39] P. Cimiano, C. Unger, A. Freitas, Question Answering over Linked Data, 10th Reasoning
     Web Summer School 0 (2011) 3–13. URL: http://www.websemanticsjournal.org/index.php/
     ps/article/view/339.
[40] C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, S. Hellmann, DBpedia -
     A crystallization point for the Web of Data, Web Semantics: Science, Services and Agents
     on the World Wide Web 7 (2009) 154–165. URL: http://linkinghub.elsevier.com/retrieve/
     pii/S1570826809000225. doi:10.1016/j.websem.2009.07.002.
[41] S. T. Dumais, N. J. Belkin, The TREC interactive tracks: Putting the user into search, TREC:
     Experiment and evaluation in information retrieval (2005) 123–152.
[42] R. Usbeck, R. H. Gusmita, M. Saleem, 9th Challenge on Question Answering over Linked
     Data ( QALD-9 ) (2018).


                                               25

</pre>