=Paper= {{Paper |id=None |storemode=property |title=Ontology Based Queries - Investigating a Natural Language Interface |pdfUrl=https://ceur-ws.org/Vol-565/paper2.pdf |volume=Vol-565 }} ==Ontology Based Queries - Investigating a Natural Language Interface== https://ceur-ws.org/Vol-565/paper2.pdf
              Ontology Based Queries – Investigating a Natural
                           Language Interface
 Ielka van der Sluis       Feikje Hielkema                              Chris Mellish                  Gavin Doherty
  Computer Science       Computing Science                           Computing Science                Computer Science
Trinity College Dublin University of Aberdeen                       University of Aberdeen          Trinity College Dublin
  vdsluis@cs.tcd.ie    f.hielkema@abdn.ac.uk                        c.mellish@abdn.ac.uk           gavin.doherty@cs.tcd.ie
ABSTRACT                                                              (www.w3.org/TR/owl-features/). These ontologies consist
In this paper we look at what may be learned from a                   of classes (e.g. City, State) and properties (hasCapital,
comparative study examining non-technical users with a                Name). The RDF statements describe instances of these
background in social science browsing and querying                    classes (e.g. ‘The State of New York, whose capital is New
metadata. Four query tasks were carried out with a natural            York’). RDF is a subset of XML and potentially difficult to
language interface and with an interface that uses a web              understand for most non-technical users. This paper focuses
paradigm with hyperlinks. While it can be difficult to                on browsing RDF and the task of constructing complex
attribute differences in performance to specific design               queries.
features, a qualitative analysis of the user behavior provides        Support for these activities for casual, non-technical users is
some insight into the task and problematic aspects of                 an important challenge for the entire Semantic Web
existing interfaces. In general it was found that casual              research community. As most members of the social
subjects have difficulties recognizing typical ontology               science community are unfamiliar with complex formalisms
based concepts like objects, attributes and values.                   such as RDF, this makes them a representative group of non
Author Keywords                                                       technical users of the Semantic Web. Non-technical users
Querying and browsing, metadata, evaluation, natural-                 may benefit from what the Semantic Web offers, but may
language interfaces, web-based interfaces.                            be deterred by its complexity and the need to learn to use
                                                                      graphical representations or controlled languages. While
ACM Classification Keywords                                           well-designed graphical tools can provide advantages, tools
H5.m. Information interfaces and presentation (e.g., HCI):            that use graphical representations (e.g. CREAM [6] or
Miscellaneous.                                                        SHAKEN [13]) may be difficult to interpret for users
INTRODUCTION                                                          unused to complex graphical presentations or ontologies.
The advent of Semantic Web technologies [2] has generated             For instance, Petre [9] argues that graphical readership is an
a number of challenges relating to the use of technology by           acquired skill, and describes experiments into reading
domain experts and researchers in areas such as social                comprehension of graphical and textual representations.
science [3]. Among the questions to be addressed are the              These showed that for some tasks people process graphical
extent to which these researchers are comfortable with the            representations significantly slower than text, with novices
Web as a framework for research practice and                          in particular suffering from mis-readings and confusion.
collaboration; whether ontologies are appropriate (and                Kaufmann and Bernstein [7] demonstrated via an
acceptable) to this community as a way of representing                experiment that compared four different query interfaces
concepts to facilitate their research activities; the utility (or     for the Semantic Web, that naive users preferred the
otherwise) of existing metadata frameworks in use by the              interface that used full natural language sentences (as
social sciences; and how best to integrate e-science tools            opposed to keywords, partial sentences and a graphical
and methods into existing working practices.                          interface). Hence, it is worth considering whether a natural
A key aspect is concerned with support for creation of                language representation of metadata could serve as a good
metadata and access to resources annotated by semantic                solution for novices to the Semantic Web (such as many
metadata. This semantic metadata is captured with RDF                 social scientists). In order to investigate this possibility a
(Resource Description Framework; www.w3.org/RDF/),                    tool named LIBER was developed, which uses natural
statements of the type Property (subject, object) whose               language to provide access to metadata. This paper presents
semantics     are      defined     by     OWL        ontologies       a comparative study that was set up to assess and explore
                                                                      the querying and browsing interface of LIBER.
                                                                      INTERFACES FOR QUERY CONSTRUCTION
 -------------------------------------------------------              LIBER (Language Interface for Browsing and Editing
 Workshop on Visual Interfaces to the Social                          RDF) was developed for providing access to descriptions of
 and Semantic Web (VISSW2010), IUI2010,
 Feb 7, 2010, Hong Kong, China.                                       social science resources (e.g. papers, statistical datasets,
 Copyright is held by the author/owner(s).                            interview transcripts) held in a data repository. The
 ------------------------------------------------------               interface (driven by a number of ontologies) enables users
to find resources in the repository through querying and       the natural language based interface to a graphical interface,
browsing of metadata, and to deposit new resources with a      while Longwell is a faceted browser; moreover, Longwell
metadata description. Each component of the LIBER              was developed by a company and has a user community,
interface uses natural language generation to present          while Kaufmann & Bernstein produced their own graphical
information to the user through the WYSIWYM (What You          interface, so we cannot be sure that its deficiencies reflect
See Is What You Meant) approach [13]. WYSIWYM has              those of such interfaces in general.
been used by a number of other projects, such as MILE [10]
                                                               EXPERIMENTAL STUDY
and CLEF [5]. The positive results from these projects [4,     Before describing the experiment, we note that there can be
11], suggest that WYSIWYM could be a suitable approach         problems with interpreting comparison studies. Importantly,
to use for constructing and accessing metadata.                it can be difficult to attribute differences in performance to
With WYSIWYM a system generates a feedback text for            specific design features, such as the use of a natural
the user that is based on a semantic representation. The       language interface, as such choices necessitate many other
representation includes generic phrases, or ‘anchors’, which   differences in the design. For example, a badly executed
correspond to objects in the description. Each object has a    natural language based design might be outperformed by
pop-up menu which lists the properties it can have; to add     another interface, whereas a well-executed natural language
information, the user selects a property and provides an       design might perform better.
appropriate value. In LIBER, properties of objects are used
in queries, which may also include boolean operators           Methodology
(‘and’, ‘or’, ‘not’), and queries may also include optional    Twenty students and researchers with backgrounds in
elements. Results are presented as the query is constructed.   various social science related disciplines participated, one
As many other querying tools have been developed in the        of which did not finish the experiment and was excluded
Semantic Web community, we could compare LIBER’s               (N=19). None had previous experience with LIBER or
querying and browsing modules to existing systems. The         Longwell, and only two had used an ontology before.
question of which approach (natural language, graphics,        Subjects were asked to supply some background
faceted browsing) produces more usable interfaces is far       information, then were handed a one-page description of
from settled. We were therefore interested in comparing the    one of the tools and were asked to follow the instructions to
natural language interface of LIBER to one that uses a         become acquainted with its operation. They then received
different approach. Kaufmann & Bernstein [7] describe an       four questions to answer, and were asked to find the answer
evaluation study in which they compared four querying          using the tool without relying on their own general
interfaces: a graphical interface, a controlled language       knowledge about the world. When finished, subjects were
interface, a natural language interface that uses              asked to fill out a SUS questionnaire [1], a standardized
confirmation dialogues for disambiguation (Querix), and a      usability test containing ten standardized questions (e.g. ‘I
natural language interface that identifies relevant key        felt very confident using the system’) which are rated on a
phrases in the search term. The study showed that all          5-point Likert scale. This procedure was repeated for the
natural language interfaces outperformed the graphical         other tool. Afterwards, subjects were asked to complete a
interface and that subjects preferred Querix and achieved      questionnaire in which the tools were compared directly.
the best results with it. We decided to use a similar set-up   On average subjects needed about 45 min to finish the task.
and materials for our evaluation, so we could adopt a          Both the order of the tools and the order of the questions
simple ontology and have a reference point for the             were varied per subject. For both tools we recorded the
evaluation results.                                            answers the subjects provided and the time it took to answer
We compare the LIBER interface with Longwell [8], a            a question, and made video captures of the screen for
web-based RDF-powered faceted browser developed by the         qualitative analysis. To drive both tools, we used a simple
SIMILE project at MIT. Longwell takes an RDF dataset as        ontology that models the geography of the USA, which was
input, and creates a website in which the data can be          developed for Kaufmann & Bernstein’s study and is
browsed and filtered using classes, properties and             available online1. It is not faithful to the real world situation
keywords. The user browses through the dataset by clicking     (Alaska appears to have the smallest state area, for
hyperlinks (which correspond to classes, properties and        example), but this made it easier to prevent subjects from
values) and keyword searching; each click and keyword          relying on their own knowledge and thus bias the results.
search adds (or removes) a filter. Longwell thus uses the      We used two sets of questions, which were based on those
web paradigm to present information rather than natural        used by Kaufmann & Bernstein in their study. One of the
language, and we were interested to see which would prove      two sets is exemplified below:
more effective and/or popular.                                      1.   What is the area of Alaska?
Following Kaufmann & Bernstein’s study, it might be                 2.   How many lakes are there in Florida?
expected that users would be more accurate and complete             3.   Which states contain a city called Springfield?
tasks more quickly with the natural language tool LIBER
than with the faceted browser Longwell. Realistically, we
knew this inference might not apply as that study compared     1
                                                                http://www.ifi.uzh.ch/ddis/research/semweb/talking-to-the-semantic-
                                                               web/owltest-data/
    4.     Which rivers run through the state that contains the   In Longwell, the user has first added a filter 'city' to select
           largest city in the US?                                all cities, then another filter on the name (Springfield), and
                                                                  finally opened the facet 'cityOf' on the right-hand side to
'Figures 1, 2 an 3 show screenshots of LIBER and Figures          view the four states.'
4,5 and 6 show screenshots of Longwell, where the user is
searching for the answer to the question 'Which states
contain a city called Springfield?'. Both interfaces support
multiple strategies for finding this answer; the screenshots
portray merely one of them. In LIBER this user has created
a search term that provides the answer without further
browsing, by searching for all states which have the
property 'hasCity' with as value a city by name of
'Springfield'; the answer appears when the user presses
'search'.                                                                   Figure 4. Longwell: The user clicks 'city'.




                                                                  Figure 5. Longwell: The user clicks 'Springfield'in the 'Name'
                                                                                             filter.

 Figure 1. LIBER: The user chooses the property 'Has city'.




                                                                  Figure 6. Longwell: The user opens the facet 'cityOf' to view
                                                                                          the results
                                                                  Results: Comparative Analysis
                                                                  Two-tailed paired t-tests show that the Longwell interface
                                                                  outperformed the LIBER interface in terms of completion
 Figure 2. LIBER: The user specifies the name of the city.        time (LIBER, mean 191.6sec, stdv 57.1sec; Longwell mean
                                                                  96.5sec stdv 30.0s, p=0.000) and SUS score (LIBER, mean
                                                                  37.63, stdv 18.11; Longwell mean 61.16, stdv 19.65
                                                                  p=0.000). Subjects failed to complete tasks more often in
                                                                  LIBER (missing answers: LIBER, mean .47 stdv .62;
                                                                  Longwell mean .11, stdv .32, p = 0.015), but tended to
                                                                  provide more incorrect answers in Longwell (wrong
                                                                  answers: LIBER, mean .58 stdv 1.02; Longwell mean .84,
                                                                  stdev .90, p = 0.384). When asked to compare LIBER and
                                                                  Longwell directly, all but three users preferred Longwell;
                                                                  opinions on reliability were more divided but still in favour
                                                                  of Longwell (11 subjects).
                                                                  Results: Screen Capture Analysis
                                                                  We recorded screen captures and annotated the strategies
                                                                  that subjects employed in carrying out the querying task.
                                                                  Some videos did not record properly (N=16). Analysis of
         Figure 3. LIBER: Search results for question 3.
the data helped us to identify common errors, delaying           Delays
factors and misunderstandings as reported below.                 With both interfaces, subjects appeared sometimes unsure
                                                                 whether all matches were found (Longwell, 5 subjects). In
Strategies
                                                                 LIBER this happened, when the system stated the number
A clear difference was found between the preferred strategy
                                                                 of matches to the query without actually listing them (6
employed in subjects’ initial use of the LIBER interface and
                                                                 subjects), or when only one match was found (4 subjects).
the way in which subjects used LIBER over time. In
                                                                 In contrast, it also happened that browsing was stopped
answering the first question, the most frequently used
                                                                 after only a partial answer was found (LIBER, 5 subjects;
strategy (7 subjects) was phrasing a query that when
                                                                 Longwell, 4 subjects). In Longwell, subjects often clicked
submitted retrieves the correct answer immediately, without
                                                                 on links that did not lead them to anything useful, like the
need for further browsing. Five subjects used a different
                                                                 description of the ontology itself rather than the instances
strategy, they formed a small query and used the LIBER
                                                                 (10 subjects). In LIBER uncertainties appeared in the
browsing interface to find the final answer. From the
                                                                 selection of menu items (8 subjects) and there were some
second question onwards the “query then browse” strategy,
                                                                 interface issues that caused delays in task performance, for
dominated (used by 10, 8 and 7 subjects respectively).
                                                                 instance many subjects had trouble closing pop-up windows
With the Longwell interface the most popular strategy for
                                                                 (11 subjects) or browsing windows (9 subjects). Many of
finding answers to the questions was to use the provided
                                                                 them also experienced focus issues with pop-up windows; it
descriptions rather than the filters. This preference was
                                                                 was not understood that pop-up windows needed to be
independent of the type of the question as well as
                                                                 closed before a task could be continued (11 subjects).
independent of the experience with the interface that was
built up during the task.                                        DISCUSSION
                                                                 From the experimental data, it is clear that subjects
Errors
                                                                 preferred Longwell over LIBER and they performed better
In general, subjects appeared to gain little understanding
                                                                 with Longwell than with LIBER in almost all respects. It
from the interfaces of how the data in the geographical
                                                                 should be noted, however, that subjects felt that both
ontology was modelled (e.g., classes, properties and
                                                                 interfaces were needlessly complicated. While the subject’s
values). For instance, in both interfaces subjects entered
                                                                 preference for Longwell might help in choosing between
keywords such as ‘largest city’ (LIBER 4 subjects;
                                                                 the two applications at the current time, we are more
Longwell 9 subjects). This shows the extent to which
                                                                 interested in what the experiment tells us about the task of
subjects are used to other types of search engines (e.g. a
                                                                 performing complex queries, and in how to improve
web search on ‘largest city’ will list the pages that include
                                                                 interfaces to support this activity.
these search terms), and had difficulty adapting to search
                                                                 When contrasting the difficulties encountered in the LIBER
strategies suitable for RDF, which simply list population
                                                                 interface with the comparatively fluid performance in
sizes, without comparing them. To search RDF you
                                                                 Longwell, we see that with Longwell subjects generally
therefore need a different search strategy, a query that finds
                                                                 used the same strategy in answering all four questions. In
those population sizes and then compares them for you.
                                                                 contrast, with LIBER subjects learned while working on the
Compared to Longwell, in LIBER subjects made more
                                                                 task that a browsing facility is available and that spending
mistakes that can be ascribed to minor issues in the
                                                                 less time on a perfect query yielded better results. This
interface, such as those caused by not moving values to
                                                                 indicates that novice users’ initial expectations of the
boxes for inclusion in the query before confirming the
                                                                 querying interface are incorrect. With LIBER many errors
query (18 subjects), and those caused by usage of the
                                                                 and delays can be attributed to minor usability issues in the
‘optional’ checkbox (7 subjects). Most of these situations
                                                                 interface, although some issues do appear to be related to
were catered for in that LIBER provided a warning or
                                                                 the interface style. The analysis of the screen captures
clarification, which brought subjects back on track. Still, in
                                                                 helped to identify areas where the LIBER interface might
LIBER some errors seem to be specific to the natural
                                                                 be improved such as clarification of the ‘optional checkbox’
language interface, like assigning a property or value to the
                                                                 and handling of pop-ups and browsing windows. Compared
wrong object (e.g. looking for lakes called ‘Florida’, rather
                                                                 to LIBER, in Longwell fewer things can go wrong, users
than for ‘lakes in a state called Florida’) (4 subjects).
                                                                 click on links and end up somewhere else (useful or not).
With Longwell fewer things could go wrong but, most
                                                                 Because of their familiarity with the web paradigm, users
likely due to the fact that subjects did not receive any
                                                                 may explore the interface more confidently, as they can
feedback on what went wrong, the same errors were made
                                                                 backtrack when they find themselves on an irrelevant page.
repeatedly. Compared to LIBER, errors were of a different
kind, such as selecting the wrong value for both filters (5      CONCLUSIONS
subjects) and descriptions (2 subjects), browsing through        This paper described a study that was performed to help in
only one of multiple results (3 subjects), typos (5 subjects),   the design and refinement of LIBER’s interfaces for
and misinterpretations of descriptions (5 subjects).             querying and browsing metadata. The study compares
                                                                 subjects’ performance using LIBER with the existing
                                                                 Longwell interface, which provides a benchmark for
                                                                 performance. The study allows us to look at differences in
interaction strategy, and to identify issues which may be       ACKNOWLEDGMENTS
associated with the interface style, including the use of       This research is funded by SFI as part of the CNGL project
natural language. The study has focused on initial use of       and the ESRC as part of the PolicyGrid project.
tools for querying and browsing metadata by researchers         REFERENCES
with backgrounds in social science, yielding insight into the   1. J. Brooke, SUS: a "quick and dirty" usability scale, in:
difficulties experienced by casual, non-technical users when       P. Jordan, B. Thomas, B. Weerdmeester, A. McClelland
operating an interface to an unknown database that                 (eds.), Usability Evaluation in Industry, Taylor and
nevertheless stored a general domain. A longer training            Francis, London, 1996.
time or a more longitudinal study could well yield different
results, and could help to improve the system for use by        2. D. De Roure, N. Jennings, N. Shadbolt, The Semantic
more experienced users. Also, the use of a database that is        Grid: Past, Present and Future. In Proc. IEEE’05, 93(3),
less simple, as well as more relevant for the subjects, might      2005.
make a difference in that subjects would have intuitions and    3. P. Edwards, A. Chorley, F. Hielkema, E. Pignotti, A.
expectations about the ontology used for representing the          Preece, C. Mellish, J. Farrington, Using the Grid to
data, which would be more representative of real world use.        Support Evidence-Based Policy Assessment in Social
In general, it was found that subjects that do not have any        Science. In Proc. UK e-Science All Hands Meeting,
knowledge of RDF data or SQL querying, seem to have                Nottingham, 2007.
difficulties recognizing and distinguishing concepts like       4. C. Hallett, D. Scott, and R. Power. Composing
classes, properties and values and the way in which they are       Questions through Conceptual Authoring.
defined in the ontology used in this study. Subjects seemed        Computational Linguistics, 33(1) (2007) 105–133.
to rely on their methods for searching the internet, without
realizing that different rules apply to metadata and the        5. C. Hallett. Generic Querying of Relational Databases
particular database that was used for the study. Neither           using Natural Language Generation Techniques. In
LIBER nor Longwell provide the user with sufficient                Proc. INLG’06, pages 88–95, Nottingham, UK, 2006.
information about what type of input the system expects. Or     6. S. Handschuh, S. Staab, A. Maedche, CREAM: creating
in other terms, both LIBER and Longwell have not yet               relational metadata with a component-based, ontology-
succeeded in providing an interface that supports users in         driven annotation framework. In Proc. K-CAP’01, ACM
efficiently constructing metadata-based queries.                   Press, Victoria, British Columbia, Canada, 2001.
We believe that the usability of LIBER and Longwell (and        7. E. Kaufmann, A. Bernstein, How Useful Are Natural
natural language interfaces and faceted browsers in general)       Language Interfaces to the Semantic Web for Casual
depends on a number of factors that will vary between and          End-Users? In Proc. ISWC’07, vol. 4825 of LNCS,
even within domains, such as:                                      Springer Verlag, Busan, Korea, 2007.
  -   The experience of users with ontologies and other         8. Longwell. http://simile.mit.edu/wiki/Longwell
      metadata;
  -   The data described by the ontologies (for instance, a     9. M. Petre, Why Looking isn’t always Seeing: Readership
      recipe is more usually described in natural language         Skills and Graphical Programming, Communications of
      than geographical data);                                     the ACM 38 (6) (1995) 33-44.
  -   The type of interfaces that users normally utilise        10. P. Piwek, R. Evans, L. Cahil, and N. Tipper, Natural
      (those used to working with databases through e.g.            Language Generation in the MILE System. In Proc. of
      Access would prefer Longwell);                                IMPACTS in NLG workshop, 33–42, Schloss Dagstuhl,
  -   The size of the ontologies, and the number of                 Germany, 2000.
      individuals within them (large amounts of                 11. P. Piwek, Requirements Definition, Validation,
      individuals might cause the generation of very long           Verification and Evaluation of the CLIME Interface and
      and therefore confusing descriptions in LIBER);               Language Processing Technology. Technical Report
  -   The mix of tasks and goals which might have an                ITRI-02-03, ITRI, University of Brighton, 2002.
      effect on strategy (e.g. users may have a whole range
      of interaction types with a browsing system               12. R. Power, D. Scott, and R. Evans. 1998. What You See
      depending on their goals and mode of working.);               Is What You Meant: Direct Knowledge Editing with
  -   The heterogeneity of the data (Longwell's filters             Natural Language Feedback. In Proceedings of the
      work better if each individual has the same set of            Thirteenth European Conference on Artificial
      properties, while LIBER generates separate menus              Intelligence, Brighton, UK.
      for each individual, and can thus deal better with        13. J. Thoméré, K. Barker, V. Chaudhri, P. Clark, M.
      heterogeneity).                                               Eriksen, S. Mishra, B. Porter, A. Rodriguez, A Web-
                                                                    based Ontology Browsing and Editing System. In Proc.
Further studies should evaluate each of these factors
                                                                    AAAI-02, Edmonton, Alberta, Canada, 2000.
separately in order to provide a better understanding of
interfaces to support ontology-based queries.