=Paper=
{{Paper
|id=None
|storemode=property
|title=Ontology Based Queries - Investigating a Natural Language Interface
|pdfUrl=https://ceur-ws.org/Vol-565/paper2.pdf
|volume=Vol-565
}}
==Ontology Based Queries - Investigating a Natural Language Interface==
Ontology Based Queries – Investigating a Natural
Language Interface
Ielka van der Sluis Feikje Hielkema Chris Mellish Gavin Doherty
Computer Science Computing Science Computing Science Computer Science
Trinity College Dublin University of Aberdeen University of Aberdeen Trinity College Dublin
vdsluis@cs.tcd.ie f.hielkema@abdn.ac.uk c.mellish@abdn.ac.uk gavin.doherty@cs.tcd.ie
ABSTRACT (www.w3.org/TR/owl-features/). These ontologies consist
In this paper we look at what may be learned from a of classes (e.g. City, State) and properties (hasCapital,
comparative study examining non-technical users with a Name). The RDF statements describe instances of these
background in social science browsing and querying classes (e.g. ‘The State of New York, whose capital is New
metadata. Four query tasks were carried out with a natural York’). RDF is a subset of XML and potentially difficult to
language interface and with an interface that uses a web understand for most non-technical users. This paper focuses
paradigm with hyperlinks. While it can be difficult to on browsing RDF and the task of constructing complex
attribute differences in performance to specific design queries.
features, a qualitative analysis of the user behavior provides Support for these activities for casual, non-technical users is
some insight into the task and problematic aspects of an important challenge for the entire Semantic Web
existing interfaces. In general it was found that casual research community. As most members of the social
subjects have difficulties recognizing typical ontology science community are unfamiliar with complex formalisms
based concepts like objects, attributes and values. such as RDF, this makes them a representative group of non
Author Keywords technical users of the Semantic Web. Non-technical users
Querying and browsing, metadata, evaluation, natural- may benefit from what the Semantic Web offers, but may
language interfaces, web-based interfaces. be deterred by its complexity and the need to learn to use
graphical representations or controlled languages. While
ACM Classification Keywords well-designed graphical tools can provide advantages, tools
H5.m. Information interfaces and presentation (e.g., HCI): that use graphical representations (e.g. CREAM [6] or
Miscellaneous. SHAKEN [13]) may be difficult to interpret for users
INTRODUCTION unused to complex graphical presentations or ontologies.
The advent of Semantic Web technologies [2] has generated For instance, Petre [9] argues that graphical readership is an
a number of challenges relating to the use of technology by acquired skill, and describes experiments into reading
domain experts and researchers in areas such as social comprehension of graphical and textual representations.
science [3]. Among the questions to be addressed are the These showed that for some tasks people process graphical
extent to which these researchers are comfortable with the representations significantly slower than text, with novices
Web as a framework for research practice and in particular suffering from mis-readings and confusion.
collaboration; whether ontologies are appropriate (and Kaufmann and Bernstein [7] demonstrated via an
acceptable) to this community as a way of representing experiment that compared four different query interfaces
concepts to facilitate their research activities; the utility (or for the Semantic Web, that naive users preferred the
otherwise) of existing metadata frameworks in use by the interface that used full natural language sentences (as
social sciences; and how best to integrate e-science tools opposed to keywords, partial sentences and a graphical
and methods into existing working practices. interface). Hence, it is worth considering whether a natural
A key aspect is concerned with support for creation of language representation of metadata could serve as a good
metadata and access to resources annotated by semantic solution for novices to the Semantic Web (such as many
metadata. This semantic metadata is captured with RDF social scientists). In order to investigate this possibility a
(Resource Description Framework; www.w3.org/RDF/), tool named LIBER was developed, which uses natural
statements of the type Property (subject, object) whose language to provide access to metadata. This paper presents
semantics are defined by OWL ontologies a comparative study that was set up to assess and explore
the querying and browsing interface of LIBER.
INTERFACES FOR QUERY CONSTRUCTION
------------------------------------------------------- LIBER (Language Interface for Browsing and Editing
Workshop on Visual Interfaces to the Social RDF) was developed for providing access to descriptions of
and Semantic Web (VISSW2010), IUI2010,
Feb 7, 2010, Hong Kong, China. social science resources (e.g. papers, statistical datasets,
Copyright is held by the author/owner(s). interview transcripts) held in a data repository. The
------------------------------------------------------ interface (driven by a number of ontologies) enables users
to find resources in the repository through querying and the natural language based interface to a graphical interface,
browsing of metadata, and to deposit new resources with a while Longwell is a faceted browser; moreover, Longwell
metadata description. Each component of the LIBER was developed by a company and has a user community,
interface uses natural language generation to present while Kaufmann & Bernstein produced their own graphical
information to the user through the WYSIWYM (What You interface, so we cannot be sure that its deficiencies reflect
See Is What You Meant) approach [13]. WYSIWYM has those of such interfaces in general.
been used by a number of other projects, such as MILE [10]
EXPERIMENTAL STUDY
and CLEF [5]. The positive results from these projects [4, Before describing the experiment, we note that there can be
11], suggest that WYSIWYM could be a suitable approach problems with interpreting comparison studies. Importantly,
to use for constructing and accessing metadata. it can be difficult to attribute differences in performance to
With WYSIWYM a system generates a feedback text for specific design features, such as the use of a natural
the user that is based on a semantic representation. The language interface, as such choices necessitate many other
representation includes generic phrases, or ‘anchors’, which differences in the design. For example, a badly executed
correspond to objects in the description. Each object has a natural language based design might be outperformed by
pop-up menu which lists the properties it can have; to add another interface, whereas a well-executed natural language
information, the user selects a property and provides an design might perform better.
appropriate value. In LIBER, properties of objects are used
in queries, which may also include boolean operators Methodology
(‘and’, ‘or’, ‘not’), and queries may also include optional Twenty students and researchers with backgrounds in
elements. Results are presented as the query is constructed. various social science related disciplines participated, one
As many other querying tools have been developed in the of which did not finish the experiment and was excluded
Semantic Web community, we could compare LIBER’s (N=19). None had previous experience with LIBER or
querying and browsing modules to existing systems. The Longwell, and only two had used an ontology before.
question of which approach (natural language, graphics, Subjects were asked to supply some background
faceted browsing) produces more usable interfaces is far information, then were handed a one-page description of
from settled. We were therefore interested in comparing the one of the tools and were asked to follow the instructions to
natural language interface of LIBER to one that uses a become acquainted with its operation. They then received
different approach. Kaufmann & Bernstein [7] describe an four questions to answer, and were asked to find the answer
evaluation study in which they compared four querying using the tool without relying on their own general
interfaces: a graphical interface, a controlled language knowledge about the world. When finished, subjects were
interface, a natural language interface that uses asked to fill out a SUS questionnaire [1], a standardized
confirmation dialogues for disambiguation (Querix), and a usability test containing ten standardized questions (e.g. ‘I
natural language interface that identifies relevant key felt very confident using the system’) which are rated on a
phrases in the search term. The study showed that all 5-point Likert scale. This procedure was repeated for the
natural language interfaces outperformed the graphical other tool. Afterwards, subjects were asked to complete a
interface and that subjects preferred Querix and achieved questionnaire in which the tools were compared directly.
the best results with it. We decided to use a similar set-up On average subjects needed about 45 min to finish the task.
and materials for our evaluation, so we could adopt a Both the order of the tools and the order of the questions
simple ontology and have a reference point for the were varied per subject. For both tools we recorded the
evaluation results. answers the subjects provided and the time it took to answer
We compare the LIBER interface with Longwell [8], a a question, and made video captures of the screen for
web-based RDF-powered faceted browser developed by the qualitative analysis. To drive both tools, we used a simple
SIMILE project at MIT. Longwell takes an RDF dataset as ontology that models the geography of the USA, which was
input, and creates a website in which the data can be developed for Kaufmann & Bernstein’s study and is
browsed and filtered using classes, properties and available online1. It is not faithful to the real world situation
keywords. The user browses through the dataset by clicking (Alaska appears to have the smallest state area, for
hyperlinks (which correspond to classes, properties and example), but this made it easier to prevent subjects from
values) and keyword searching; each click and keyword relying on their own knowledge and thus bias the results.
search adds (or removes) a filter. Longwell thus uses the We used two sets of questions, which were based on those
web paradigm to present information rather than natural used by Kaufmann & Bernstein in their study. One of the
language, and we were interested to see which would prove two sets is exemplified below:
more effective and/or popular. 1. What is the area of Alaska?
Following Kaufmann & Bernstein’s study, it might be 2. How many lakes are there in Florida?
expected that users would be more accurate and complete 3. Which states contain a city called Springfield?
tasks more quickly with the natural language tool LIBER
than with the faceted browser Longwell. Realistically, we
knew this inference might not apply as that study compared 1
http://www.ifi.uzh.ch/ddis/research/semweb/talking-to-the-semantic-
web/owltest-data/
4. Which rivers run through the state that contains the In Longwell, the user has first added a filter 'city' to select
largest city in the US? all cities, then another filter on the name (Springfield), and
finally opened the facet 'cityOf' on the right-hand side to
'Figures 1, 2 an 3 show screenshots of LIBER and Figures view the four states.'
4,5 and 6 show screenshots of Longwell, where the user is
searching for the answer to the question 'Which states
contain a city called Springfield?'. Both interfaces support
multiple strategies for finding this answer; the screenshots
portray merely one of them. In LIBER this user has created
a search term that provides the answer without further
browsing, by searching for all states which have the
property 'hasCity' with as value a city by name of
'Springfield'; the answer appears when the user presses
'search'. Figure 4. Longwell: The user clicks 'city'.
Figure 5. Longwell: The user clicks 'Springfield'in the 'Name'
filter.
Figure 1. LIBER: The user chooses the property 'Has city'.
Figure 6. Longwell: The user opens the facet 'cityOf' to view
the results
Results: Comparative Analysis
Two-tailed paired t-tests show that the Longwell interface
outperformed the LIBER interface in terms of completion
Figure 2. LIBER: The user specifies the name of the city. time (LIBER, mean 191.6sec, stdv 57.1sec; Longwell mean
96.5sec stdv 30.0s, p=0.000) and SUS score (LIBER, mean
37.63, stdv 18.11; Longwell mean 61.16, stdv 19.65
p=0.000). Subjects failed to complete tasks more often in
LIBER (missing answers: LIBER, mean .47 stdv .62;
Longwell mean .11, stdv .32, p = 0.015), but tended to
provide more incorrect answers in Longwell (wrong
answers: LIBER, mean .58 stdv 1.02; Longwell mean .84,
stdev .90, p = 0.384). When asked to compare LIBER and
Longwell directly, all but three users preferred Longwell;
opinions on reliability were more divided but still in favour
of Longwell (11 subjects).
Results: Screen Capture Analysis
We recorded screen captures and annotated the strategies
that subjects employed in carrying out the querying task.
Some videos did not record properly (N=16). Analysis of
Figure 3. LIBER: Search results for question 3.
the data helped us to identify common errors, delaying Delays
factors and misunderstandings as reported below. With both interfaces, subjects appeared sometimes unsure
whether all matches were found (Longwell, 5 subjects). In
Strategies
LIBER this happened, when the system stated the number
A clear difference was found between the preferred strategy
of matches to the query without actually listing them (6
employed in subjects’ initial use of the LIBER interface and
subjects), or when only one match was found (4 subjects).
the way in which subjects used LIBER over time. In
In contrast, it also happened that browsing was stopped
answering the first question, the most frequently used
after only a partial answer was found (LIBER, 5 subjects;
strategy (7 subjects) was phrasing a query that when
Longwell, 4 subjects). In Longwell, subjects often clicked
submitted retrieves the correct answer immediately, without
on links that did not lead them to anything useful, like the
need for further browsing. Five subjects used a different
description of the ontology itself rather than the instances
strategy, they formed a small query and used the LIBER
(10 subjects). In LIBER uncertainties appeared in the
browsing interface to find the final answer. From the
selection of menu items (8 subjects) and there were some
second question onwards the “query then browse” strategy,
interface issues that caused delays in task performance, for
dominated (used by 10, 8 and 7 subjects respectively).
instance many subjects had trouble closing pop-up windows
With the Longwell interface the most popular strategy for
(11 subjects) or browsing windows (9 subjects). Many of
finding answers to the questions was to use the provided
them also experienced focus issues with pop-up windows; it
descriptions rather than the filters. This preference was
was not understood that pop-up windows needed to be
independent of the type of the question as well as
closed before a task could be continued (11 subjects).
independent of the experience with the interface that was
built up during the task. DISCUSSION
From the experimental data, it is clear that subjects
Errors
preferred Longwell over LIBER and they performed better
In general, subjects appeared to gain little understanding
with Longwell than with LIBER in almost all respects. It
from the interfaces of how the data in the geographical
should be noted, however, that subjects felt that both
ontology was modelled (e.g., classes, properties and
interfaces were needlessly complicated. While the subject’s
values). For instance, in both interfaces subjects entered
preference for Longwell might help in choosing between
keywords such as ‘largest city’ (LIBER 4 subjects;
the two applications at the current time, we are more
Longwell 9 subjects). This shows the extent to which
interested in what the experiment tells us about the task of
subjects are used to other types of search engines (e.g. a
performing complex queries, and in how to improve
web search on ‘largest city’ will list the pages that include
interfaces to support this activity.
these search terms), and had difficulty adapting to search
When contrasting the difficulties encountered in the LIBER
strategies suitable for RDF, which simply list population
interface with the comparatively fluid performance in
sizes, without comparing them. To search RDF you
Longwell, we see that with Longwell subjects generally
therefore need a different search strategy, a query that finds
used the same strategy in answering all four questions. In
those population sizes and then compares them for you.
contrast, with LIBER subjects learned while working on the
Compared to Longwell, in LIBER subjects made more
task that a browsing facility is available and that spending
mistakes that can be ascribed to minor issues in the
less time on a perfect query yielded better results. This
interface, such as those caused by not moving values to
indicates that novice users’ initial expectations of the
boxes for inclusion in the query before confirming the
querying interface are incorrect. With LIBER many errors
query (18 subjects), and those caused by usage of the
and delays can be attributed to minor usability issues in the
‘optional’ checkbox (7 subjects). Most of these situations
interface, although some issues do appear to be related to
were catered for in that LIBER provided a warning or
the interface style. The analysis of the screen captures
clarification, which brought subjects back on track. Still, in
helped to identify areas where the LIBER interface might
LIBER some errors seem to be specific to the natural
be improved such as clarification of the ‘optional checkbox’
language interface, like assigning a property or value to the
and handling of pop-ups and browsing windows. Compared
wrong object (e.g. looking for lakes called ‘Florida’, rather
to LIBER, in Longwell fewer things can go wrong, users
than for ‘lakes in a state called Florida’) (4 subjects).
click on links and end up somewhere else (useful or not).
With Longwell fewer things could go wrong but, most
Because of their familiarity with the web paradigm, users
likely due to the fact that subjects did not receive any
may explore the interface more confidently, as they can
feedback on what went wrong, the same errors were made
backtrack when they find themselves on an irrelevant page.
repeatedly. Compared to LIBER, errors were of a different
kind, such as selecting the wrong value for both filters (5 CONCLUSIONS
subjects) and descriptions (2 subjects), browsing through This paper described a study that was performed to help in
only one of multiple results (3 subjects), typos (5 subjects), the design and refinement of LIBER’s interfaces for
and misinterpretations of descriptions (5 subjects). querying and browsing metadata. The study compares
subjects’ performance using LIBER with the existing
Longwell interface, which provides a benchmark for
performance. The study allows us to look at differences in
interaction strategy, and to identify issues which may be ACKNOWLEDGMENTS
associated with the interface style, including the use of This research is funded by SFI as part of the CNGL project
natural language. The study has focused on initial use of and the ESRC as part of the PolicyGrid project.
tools for querying and browsing metadata by researchers REFERENCES
with backgrounds in social science, yielding insight into the 1. J. Brooke, SUS: a "quick and dirty" usability scale, in:
difficulties experienced by casual, non-technical users when P. Jordan, B. Thomas, B. Weerdmeester, A. McClelland
operating an interface to an unknown database that (eds.), Usability Evaluation in Industry, Taylor and
nevertheless stored a general domain. A longer training Francis, London, 1996.
time or a more longitudinal study could well yield different
results, and could help to improve the system for use by 2. D. De Roure, N. Jennings, N. Shadbolt, The Semantic
more experienced users. Also, the use of a database that is Grid: Past, Present and Future. In Proc. IEEE’05, 93(3),
less simple, as well as more relevant for the subjects, might 2005.
make a difference in that subjects would have intuitions and 3. P. Edwards, A. Chorley, F. Hielkema, E. Pignotti, A.
expectations about the ontology used for representing the Preece, C. Mellish, J. Farrington, Using the Grid to
data, which would be more representative of real world use. Support Evidence-Based Policy Assessment in Social
In general, it was found that subjects that do not have any Science. In Proc. UK e-Science All Hands Meeting,
knowledge of RDF data or SQL querying, seem to have Nottingham, 2007.
difficulties recognizing and distinguishing concepts like 4. C. Hallett, D. Scott, and R. Power. Composing
classes, properties and values and the way in which they are Questions through Conceptual Authoring.
defined in the ontology used in this study. Subjects seemed Computational Linguistics, 33(1) (2007) 105–133.
to rely on their methods for searching the internet, without
realizing that different rules apply to metadata and the 5. C. Hallett. Generic Querying of Relational Databases
particular database that was used for the study. Neither using Natural Language Generation Techniques. In
LIBER nor Longwell provide the user with sufficient Proc. INLG’06, pages 88–95, Nottingham, UK, 2006.
information about what type of input the system expects. Or 6. S. Handschuh, S. Staab, A. Maedche, CREAM: creating
in other terms, both LIBER and Longwell have not yet relational metadata with a component-based, ontology-
succeeded in providing an interface that supports users in driven annotation framework. In Proc. K-CAP’01, ACM
efficiently constructing metadata-based queries. Press, Victoria, British Columbia, Canada, 2001.
We believe that the usability of LIBER and Longwell (and 7. E. Kaufmann, A. Bernstein, How Useful Are Natural
natural language interfaces and faceted browsers in general) Language Interfaces to the Semantic Web for Casual
depends on a number of factors that will vary between and End-Users? In Proc. ISWC’07, vol. 4825 of LNCS,
even within domains, such as: Springer Verlag, Busan, Korea, 2007.
- The experience of users with ontologies and other 8. Longwell. http://simile.mit.edu/wiki/Longwell
metadata;
- The data described by the ontologies (for instance, a 9. M. Petre, Why Looking isn’t always Seeing: Readership
recipe is more usually described in natural language Skills and Graphical Programming, Communications of
than geographical data); the ACM 38 (6) (1995) 33-44.
- The type of interfaces that users normally utilise 10. P. Piwek, R. Evans, L. Cahil, and N. Tipper, Natural
(those used to working with databases through e.g. Language Generation in the MILE System. In Proc. of
Access would prefer Longwell); IMPACTS in NLG workshop, 33–42, Schloss Dagstuhl,
- The size of the ontologies, and the number of Germany, 2000.
individuals within them (large amounts of 11. P. Piwek, Requirements Definition, Validation,
individuals might cause the generation of very long Verification and Evaluation of the CLIME Interface and
and therefore confusing descriptions in LIBER); Language Processing Technology. Technical Report
- The mix of tasks and goals which might have an ITRI-02-03, ITRI, University of Brighton, 2002.
effect on strategy (e.g. users may have a whole range
of interaction types with a browsing system 12. R. Power, D. Scott, and R. Evans. 1998. What You See
depending on their goals and mode of working.); Is What You Meant: Direct Knowledge Editing with
- The heterogeneity of the data (Longwell's filters Natural Language Feedback. In Proceedings of the
work better if each individual has the same set of Thirteenth European Conference on Artificial
properties, while LIBER generates separate menus Intelligence, Brighton, UK.
for each individual, and can thus deal better with 13. J. Thoméré, K. Barker, V. Chaudhri, P. Clark, M.
heterogeneity). Eriksen, S. Mishra, B. Porter, A. Rodriguez, A Web-
based Ontology Browsing and Editing System. In Proc.
Further studies should evaluate each of these factors
AAAI-02, Edmonton, Alberta, Canada, 2000.
separately in order to provide a better understanding of
interfaces to support ontology-based queries.