An intelligent query interface
                                     based on ontology navigation
                   Enrico Franconi                    Paolo Guagliardo                          Marco Trevisan
                     franconi@inf.unibz.it         paolo.guagliardo@stud-inf.unibz.it       marco.trevisan@stud-inf.unibz.it


                                                   KRDB Research Centre
                                              Free University of Bozen-Bolzano

ABSTRACT                                                               neous data sources by means of a conceptual schema and
In this paper we present a formal framework and an experi-             supports the users in the task of formulating a precise query
mental software supporting the user in the task of formulat-           over it. In describing a specific domain, the ontology defines
ing a precise query – which best captures their information            a vocabulary which is often richer than the logical schema
needs – even in the case of complete ignorance of the vocab-           of the underlying data and usually closer to the user’s own
ulary of the underlying information system holding the data.           vocabulary. The ontology can thus be effectively exploited
Our intelligent interface is driven by means of appropriate            by the user in order to formulate a query that best captures
automated reasoning techniques over an ontology describ-               their information need. The user is constantly guided and
ing the domain of the data in the information system.                  assisted in this task by an intuitive visual interface, whose
                                                                       intelligence is dynamically driven by reasoning over the on-
We will define what a query is and how it is internally repre-         tology. The inferences drawn on the conceptual schema help
sented, which operations are available to the user in order to         the user in choosing what is more appropriate with respect
modify the query and how contextual feedback is provided               to their information need, restricting the possible choices
about it presenting only relevant pieces of information. We            to only those parts of the ontology which are relevant and
will then describe the elements that constitute the query in-          meaningful in a given context.
terface available to the user, providing visual access to the
underlying reasoning services and operations for query ma-             The most powerful and innovative feature of our framework
nipulation. Lastly, we will define a suitable representation           lies in the fact that not only do not users need to be aware of
in “linear form”, starting from which the query can be more            the underlying organisation of the data, but they are also not
easily expressed in natural language.                                  required to have any specific knowledge of the vocabulary
                                                                       used in the ontology. In fact, such knowledge can be grad-
Author Keywords                                                        ually acquired by using the tool itself, gaining confidence
query formulation support automated reasoning description              with both the vocabulary and the ontology. Users may also
logics ontology visual graphical intelligent user interface            decide to just explore the ontology without actually querying
                                                                       the information system, with the aim of discovering general
                                                                       information about the modelled domain.
ACM Classification Keywords
H.5.2 Information Interfaces and Presentation: User Inter-             Another important aspect is that only queries that are logi-
faces—Graphical user interfaces (GUI)—Natural language                 cally consistent with the context and the constraints imposed
—Interaction styles                                                    by the ontology can be formulated, since contradictory or re-
                                                                       dundant pieces of information are not presented to the user
INTRODUCTION                                                           at all. This makes user’s choices clearer and simpler, by rul-
Recent research showed that adopting formal ontologies as              ing out irrelevant information that might be distracting and
a means for accessing heterogeneous data sources has many              even generate confusion. Furthermore, it also eliminates the
benefits, in that not only does it provide a uniform and flexi-        often frustrating and time-consuming process of finding the
ble approach to integrating and describing such sources, but           right combination of parts that together constitute a mean-
it can also support the final user in querying them, thus im-          ingful query. For this reason, the user is free to explore the
proving the usability of the integrated system.                        ontology without the worry of making a “wrong” choice at
                                                                       some point and can thus concentrate on expressing their in-
We introduce a framework that enables access to heteroge-              formation need at best.

                                                                       Queries can be specified through a refinement process con-
                                                                       sisting in the iteration of few basic operations: the user first
                                                                       specifies an initial request starting with generic terms, then
                                                                       refines or deletes some of the previously added terms or in-
                                                                       troduces new ones, and iterates the process until the resulting
Workshop on Visual Interfaces to the Social                            query satisfies their information need. The available opera-
and Semantic Web (VISSW2010), IUI2010,                                 tions on the current query include addition, substitution and
Feb 7, 2010, Hong Kong, China.
Copyright is held by the author/owner(s).

                                                                   1
deletion of pieces of information, and all of them are sup-            achieved by selecting something within the query and asking
ported by the reasoning services running over the ontology.            for a substitution. In our example, we tick the check-box as-
                                                                       sociated with the term Thing and then press the Substitute
The framework is implemented in the form of an experimen-              button. As shown in Figure 2, we are presented with a three-
tal software, called Query Tool, which relies on a web-based           part menu listing all the possible substitutions available for
client-server architecture consisting of three components:             the selected portion of the query: terms that appear at the top
                                                                       are more general than the selection, the ones in the middle
• the query logic (QueLo), responsible of “reasoning” over             are equivalent, while those at the bottom are more specific.
  the ontology in order to provide only relevant information           Moreover, these terms are organised in sub-menus according
  w.r.t. the current query;                                            to the taxonomic information defined in the ontology. Thus,
                                                                       we can easily navigate the different options choosing the de-
• the natural language generation (NLG) engine, that given             sired level of detail for the substitution. In our case, instead
  a query and a lexicalization map for the ontology produces           of Person we choose the further specific term Rich Person
  an English sentence;                                                 as a replacement for the selection, resulting in the query vis-
• the user interface (UI), that provides visual access to the          ible on the right side of Figure 2.
  query and editing facilities for it, allowing to interact with
  the QueLo sub-system while benefiting from the services
  of the NLG engine.

OVERVIEW
In the paper we will introduce the Query Tool’s formal frame-
work and functional API, that constitute the basis on top of
which different types of user interfaces (UIs) can be devised.
Previous implementations of the Query Tool ([1, 15]) exper-
imented mostly with tree-based and menu-based UIs. In this
section we first describe the behaviour of the Query Tool us-                         Figure 2: Example of specialisation
ing a generic representation over an abstract user interface.
Later on in the paper we will then introduce a concrete inter-         Another way of modifying a query is to add constraints in the
face based on the generation of natural language.                      form of new terms or relations. As shown in Figure 3, upon
                                                                       clicking on the add button a two-part menu is displayed, con-
Consider a scenario in which we have a conceptual schema,              taining suitable terms and relations that can be safely added
say an OWL ontology, we know nothing about. In such situ-              to the query. Terms are shown in the upper part of the menu,
ation, the Query Tool reveals to be particularly useful in that        while relations in the lower one. The chosen item is inserted
it allows to discover information about the ontology and the           in a specific place of the query, that can be selected by means
modelled domain, even when its vocabulary is completely                of the radio button present in each line. Figure 3a shows how
ignored. What we call intensional navigation of the ontol-             the new term Single Person is added to the first (and only)
ogy is the process of building a query, starting from a very           line of the query, while Figure 3b shows how adding a rela-
general request which is then refined by adding or deleting            tion results in the creation of a new line, indented w.r.t. the
constraints according to the user’s information need. In our           first one and consisting of the name of the relation lives in
abstract representation, the default initial query looks like          followed by the label House associated with its range. Ob-
the one shown in Figure 1 that generically asks for some               serve that the menu of Figure 3b, compared to that of Fig-
“thing”. Four operations are available for manipulating the            ure 3a, does not include the term Single Person and the re-
query: add for the addition of new terms and relations; sub-           lation married to Person as possible options. In fact, the
stitute for replacing a portion of the query with a more gen-          former is already present in the query, thus it would be re-
eral, equivalent or more specific term; delete for discarding          dundant to propose it again; the latter became incompatible
parts of the query; and weaken for making a portion of the             with the query due to the addition of the previous term, and
query as general as possible.                                          this means that in our ontology a person who is rich and sin-
                                                                       gle (or perhaps just single) cannot be married to anyone.

                                                                       A query can be made more general or “weaker” in a variety
                                                                       of ways, one of which is the substitution with a more general
                                                                       term. Other possibilities are given by deletion and weaken-
                                                                       ing, both of which remove selected elements from the query
                                                                       but with distinct approaches and outcomes. The difference
                                                                       between them is shown in Figure 4: while in Figure 4a delet-
                                                                       ing the selected portion causes the second row of the query
                                                                       to disappear, in Figure 4b weakening the same portion pre-
                                                                       serves the row, although the term House is replaced with the
                     Figure 1: Initial query                           generic term Thing.

The first step in the refinement of our query consists in be-          Suppose that our ontology states that a rich man who is mar-
ing more specific about what we are looking for. This can be           ried to a beautiful woman and lives in a beautiful house is in-


                                                                   2
                                                                                  Figure 5: Preventing side-effects of substitution


                                 (a)
                                                                        Formal framework
                                                                        From the point of view of the QueLo sub-system, a query is a
                                                                        labelled tree where each node corresponds to a variable and
                                                                        it is associated with a set of concept names from the ontol-
                                                                        ogy, while each edge is labelled with a role name. Therefore,
                                                                        for what concerns the expressive power, the Query Tool can
                                                                        represent tree-shaped conjunctive queries only.

                                                                        Let N be a countably infinite set of node names, C be a finite
                                                                        set of concept names and R a finite set of role names, and let
                                                                        N, C and R be pairwise disjoint. A query Q is a quintuple
                                 (b)                                    hV, E, o, V, Ei in which (V, E) is a directed tree rooted in
                                                                        o ∈ V , with set of nodes V ⊆ N and with set of edges E ⊆
     Figure 3: Addition of (a) a new term and (b) a new relation        V × V ; V is a total function, called node-labelling function,
                                                                        associating each node with either a non-empty set of concept
                                                                        names or with the singleton {>}; and E is called the edge-
                                                                        labelling function that associates each edge with a role name.
                                                                        A query consisting of exactly one node, whose set of labels
                                                                        is a singleton, is called atomic. For an edge e = hx, yi, we
                                                                        indicate its initial node x with init(e) and its terminal node
                                 (a)
                                                                        with ter(e).

                                                                        Given queries S and Q, we say that S is a subquery of Q, and
                                                                        write S ⊆ Q, iff V (S) ⊆ V (Q), E(S) ⊆ E(Q), each node
                                                                        n ∈ V (S) is s.t. VS (n) ⊆ VQ (n) and every edge e ∈ E(S)
                                                                        is such that ES (e) = EQ (e). We say that S is a complete sub-
                                                                        query of Q (in symbols S j Q) if it also holds that, for every
                                                                        n ∈ V (S), VS (n) ⊇ VQ (n) and every descendant of oS in
                                 (b)                                    Q is a node in S. A selection within a query Q is a subquery
                                                                        S of Q, which is called simple if S j Q or S consists of
        Figure 4: Example of (a) deletion and (b) weakening             exactly one node, namely its root oS , such that VS (oS ) is a
                                                                        singleton or is equal to VQ (oS ). Every selection S within a
                                                                        query Q partitions the nodes of Q into selected, which be-
deed a lucky person. As a result of the substitution shown in           long to V (S), and unselected, belonging to V (Q) \ V (S).
Figure 5, where the whole query is replaced with the (more              The selected nodes can be further partitioned into totally se-
general) term Lucky Person, the second line disappears. In              lected, having all of their labels selected, and partially se-
some situations this “side-effect” is undesired, because we             lected, which have some, but not all, of their labels selected.
would perhaps like to operate on that part of the query later           An example of query as represented is shown in Figure 6,
on. The closed padlock icon visible in the third line of the            which also shows the compact graphical notation we use for
query indicates that that line is protected against such an “ac-        representing a selection within a query: selected nodes are
cidental” deletion and would not inadvertently disappear as             drawn using a double circle and selected labels within each
the result of the substitution. However, note that locked por-          of them are underlined.
tions of the query are still fully affected by explicit deletion.
                                                                        The weakening of a query Q w.r.t. a selection S within Q
THE QUERY LOGIC                                                         is the query Q S obtained from Q by replacing its node-
The Query Tool’s framework and its functional API (QueLo)               labelling function VQ with a function that associates each
have been recently defined from a formal point of view in the           totally selected node with {>}, each partially selected node
technical report [7], with the purpose of precisely describ-            n with VQ (n) \ VS (n) and each unselected node m with
ing its components and the abstract operations provided for             VQ (m).
query manipulation. Here, we concisely summarise the main
definitions introduced in [7] and formally prove the most im-           The last notion we introduce is that of sticky edges, which
portant of Query Tool’s properties, namely that it generates            are edges that can only be deleted explicitly (that is, when
only “meaningful” queries.                                              performing a deletion), but never implicitly (e.g., as the con-


                                                                    3
                         x {Man}                                      Algorithm 1 Calculate enc-rollup(Q, n, m)
                                                                      Input: a query Q and two nodes n, m ∈ V (Q)
          marriedTo            livesIn                                Output: a concept C expressing Q in the description logics
                                                                      language L
                 y               w {Beautiful, House}
                                                                       1: C ← c, for some c ∈ V(n)
            {Woman}                                                    2: for all x ∈ V(n) such that x 6= c do
                                   owned by
                                                                       3:       C ←C ux
                                                                       4: end for
                                 z {RichPerson}
                                                                       5: for all children x of n in Q such that x 6= m do
        Figure 6: Example of query and a selection within it
                                                                       6:       R ← E(hn, xi)
                                                                       7:       C ← C u ∃R . enc-rollup(Q, x, n)
                                                                       8: end for
sequence of a substitution). Sticky edges are closed w.r.t. the        9: if n 6= o then
tree structure of the query, that is, when an edge e is sticky,       10:       Let p be the parent node of n in Q
then all the edges in the path from the root of the query to          11:       if p 6= m then
ter(e) are such. The meaning and importance of sticky edges           12:             R ← E(hp, ni)
will become more clear in the next section, where we intro-           13:             C ← C u ∃R− . enc-rollup(Q, p, n)
duce and describe the two operations delete and substitute.           14:       end if
For the moment, sticky edges can be simply understood as              15: end if
immutable (to some extent) pieces of information within a             16: return C
query, which are not modified as a “side effect” of an opera-
tion not directly intended to do so.
                                                                      Given a query Q and a node n, we say that a concept name c
Functional API
                                                                      is compatible with Q focused in n iff c u roll-up(Q, n) 6v ⊥,
To draw the inferences that are at the basis of the query for-        while a role name r is such iff ∃r− . roll-up(Q, n) 6v ⊥. The
mulation tasks, we express a query as a concept of some               operation getComp(Q, n) returns a directed acyclic graph
description logic (DL) language, for which the containment            (DAG) G, whose nodes are all the concept names that are
test of two conjunctive queries is decidable and available as a       compatible with Q focused in n and that are neither sub-
reasoning service. In what follows, we assume the existence           nor super-concepts of roll-up(Q, n), and whose edges are
of an underlying knowledge base K in such a DL language               all the pairs of concept names c1 , c2 ∈ V (G) such that c1
L over C and R. We say that C is a sub-concept of (or sub-            is a direct sub-concept of c2 . In other words, the output of
sumes) D in K, and write C vK D, iff K |= C v D, in                   getComp is a taxonomy of concept names which are com-
which case we also say that D is a super-concept of (or is            patible with the query and not in hierarchy with the context.
subsumed by) C. Two concepts C and D are equivalent in                The operation getRel(Q, n) returns a DAG G, whose nodes
K, written C ≡K D, iff one subsumes the other and vice                are all the pairs hr, ci of role names and concept names such
versa. For two concept names c1 and c2 we say that c1 is a            that r is compatible with Q focused in n and c is a sub- or
direct sub-concept of c2 (and that c2 is a direct super-concept       a super-concept of ∃r− . roll-up(Q, n), and whose edges are
of c1 ) iff c1 subsumes c2 and there is no c ∈ C equivalent to        the pairs hhr, c1 i, hr, c2 ii ∈ V (G) × V (G) such that c1 is a
neither c1 nor c2 and such that c1 vK c vK c2 .                       direct super-concept of c2 .
Before introducing the functional API, let us first give some         Let S be a selection within a query Q. Then, the operations
preliminary definitions. Given a query Q and n ∈ V (Q), the           getSupers, getEquiv and getSubs return the concept names
operation roll-up(Q, n) translates Q into an L-concept w.r.t.         that are more general than, equivalent to and more specific
n and it is defined as enc-rollup(Q, n, n), where enc-rollup          than roll-up(S), respectively. Moreover, the concept names
is the recursive procedure described in Algorithm 1. We use           in the output of getSubs(Q, S) are additionally required to
roll-up(Q) as an abbreviation for roll-up(Q, o), where o is           be compatible with Q focused in the root of S.
the root of Q. The concept roll-up(Q, n) is called the context
of Q w.r.t. n, expressing the informative content of Q from           Let Q be a query and n a focus node. For a concept name c in
the point of view of a specific node, which we call the focus.        the output of getComp(Q, n), the operation addComp adds
Queries Q1 and Q2 are equivalent, in symbols Q1 ≡ Q2 , iff            c to V(n). More precisely, the result of addComp(Q, n, c) is
roll-up(Q1 ) ≡K roll-up(Q2 ). We say that a query Q over              the query Q0 obtained from Q by replacing its node-labelling
a consistent knowledge base K is satisfiable iff its roll-up is       function V with V 0 := V [n 7→ V(n) ∪ {c}]. For a pair hr, ci
such in K (that is, K 6|= roll-up(Q) v ⊥).                            in the output of getRel(Q, n), the operation addRel creates a
                                                                      new node n0 such that V(m) = {c} and an edge e = hn, mi
The functional API of the Query Tool is structured in two             with E(e) = r.
main parts:

• the underlying reasoning services, consisting of the oper-          Let Q and R be queries and E    e be a set of sticky edges. Then,
  ations getComp, getRel, getSupers, getEquiv, getSubs;               the operation prune deletes from Q the maximal number of
                                                                      non-root nodes, having no incoming sticky edge (if any) and
• the operations for query manipulation, including addRel,            associated with the same concept names both in R and Q,
  addComp, weaken, substitute and delete.                             such that the result is still a query.


                                                                  4
Let S be a selection within a query Q and let E
                                              e be a set of            The English sentence representing the query in the UI is gen-
sticky edges. We define weaken(Q, S) as Q S and                        erated by the NLG sub-system, which will be described in
                                                                      the next section. An example of the textual rendering of the
                  e := prune weaken(Q, S), R, E
     delete(Q, S, E)                             e ,                   query in natural language as displayed by the UI is given in
                                                                       Figure 7.
where R is the query obtained from S by replacing VS with
the function on V (S) associating each node n that is both
in Q and S with VQ (n) ∩ VS (S) if such intersection is non-             I’m looking for a man who lives in a beautiful house
empty and with {>} otherwise, and each other node m of
S with VS (m). The last operation we introduce is “substitu-             owned by a rich person.
tion” which, for a concept name c in the output of getSupers
(generalisation) or getEquiv or getSubs (specialisation), is
defined as follows:                                                       Figure 7: Textual representation of a query in natural language
                         e c) := delete(Q0 , S, E)
        substitute(Q, S, E,                      e ,
                                                                       Hovering
where Q0 is the query obtained from Q by adding c to the set           In graphical user interfaces terminology, the user hovers on
of concept names associated with the root of S.                        a graphic element whenever the mouse cursor moves from
                                                                       some point outside the element to some point inside the ele-
Properties of the framework                                            ment. In normal conditions, as the user hovers on the query,
We will now formally state and prove that, starting from an            the system gives visual hints about its structure:
atomic query that is satisfiable, the query obtained by means
of the operations in the Query Tool’s functional API is satis-         • hovering on the span associated with a tag of some node n
fiable. In order to do that, we first prove that the operations          causes the span to become lightly highlighted, along with
for query manipulation preserve satisfiability, i.e., the appli-         all the spans associated with the tags occurring in the com-
cation of each of them to a satisfiable query results in a query         plete subquery rooted in n;
that is satisfiable.
                                                                       • hovering on the span associated with the tag of some edge
  L EMMA 1. Each of the operations addComp, substitute,                  e causes that span and all the spans associated with the
addRel, weaken and delete preserves query satisfiability.                elements of tags ter(e) to become lightly highlighted.
   P ROOF S KETCH . Let Q be a satisfiable query and Q0 re-            The highlighting is such that spans associated with different
sult from the application of one the above operations. In the          tags are visualized as distinct, even when adjacent. One way
case of deletion, weakening and substitution with an equiv-            of obtaining this kind of effect is, for instance, by rounding
alent or more generic term, the resulting query Q0 is equiva-          the corners of the highlighted rectangular area around each
lent or more general than the input query Q, which is satisfi-         span. The only case in which highlighting on hovering does
able by assumption. Therefore, Q0 is also satisfiable. In the          not trigger is when a menu is being displayed.
case of addition of a new term/relation and substitution with
a more specific term, the satisfiability of Q0 is ensured by the
satisfiability of Q and the explicit check for compatibility in         I’m looking for a man who lives in a beautiful house
the definitions of addComp, addRel and substitute.
                                                                        owned by a rich person.
The fundamental property of the Query Tool is then proved
by means of a simple induction.
                                                                                     Figure 8: Hovering on the span “house”
   T HEOREM 1. The query obtained from an initial satisfi-
able atomic query through a finite sequence of applications            Associated with each node of the query is a button, called the
of the operations addComp, addRel, substitute, weaken and              add-button, which is displayed as a blue onion with a white
delete is satisfiable.                                                 cross inside and located below the text baseline immediately
                                                                       after the rightmost span associated with a tag of that node.
   P ROOF. By an easy induction on the sequence of applica-            Hovering on the add-button of some node lightly highlights
tions of the operations: the base case holds by assumption,            all the spans associated with tags of that node.
while the inductive step follows directly from Lemma 1.
                                                                        I’m looking for a man who lives in a beautiful house
THE USER INTERFACE
In this section we describe a concrete UI for the Query Tool,
based on natural language generation. In the UI, the query is           owned by a rich person.
represented as a continuous string of natural language text,
composed of a sequence of coherent text constituents called
                                                                           Figure 9: Hovering on the add-button at the right of “house”
spans. Each of the tags occurring in the query is associated
with a span by means of an injective mapping. As for each
edge there is one and only one corresponding edge tag, if a            Sticky edges
span is associated with the tag of an edge we simply say that          An edge can be marked as sticky by clicking on the span as-
the span is associated with that edge.                                 sociated with it, at which point the text span is rendered in


                                                                   5
boldface. Clicking on a span associated with a sticky edge                    well as a complete selection. We consider atomic selections
unmarks the edge as sticky and reverts the text span to its de-               to have the lowest priority and complete selections the high-
fault (non-bold) representation. Spans associated with node                   est, and when a simple selection belongs to more than one
tags are not visually affected by sticky edges.                               class, it is considered to be only of the type with higher pri-
                                                                              ority. Furthermore, whenever a double click would result in
                                                                              the same kind of selection a single click would, it yields a
  I’m looking for a man who lives in a beautiful house
                                                                              complete selection instead.
  owned by a rich person.                                                     A complex selection (non-simple) is obtained from an empty
                                                                              or simple selection by control-clicking (i.e., clicking while
                                                                              pressing the CTRL key on the keyboard) on additional spans
 Figure 10: Clicking on “lives in” marks the associated edge as sticky
                                                                              associated with node tags, which are consequently included
                                                                              in the existing selection. Note that a complex selection can
Selection                                                                     be disconnected, in the sense that it is not a well-formed sub-
The UI provides facilities to easily select portions of the                   query from the formal point of view, because there might be
query. A simple selection can be directly specified by click-                 two selected nodes that are not connected by an edge. Exam-
ing on the span associated with a tag of some node n in one                   ples of connected and disconnected complex selections are
of the following ways:                                                        given in Figure 12.

• a single click results in an atomic selection, highlighting
  only the span on which the click occurred;                                    I’m looking for a man who lives in a beautiful house

• a double click results in a node selection, highlighting all                  owned by a rich person.
  the spans associated with the elements of tags(n);
                                                                                                               (a)
• a triple click results in a complete selection, highlighting
  all the spans in the complete subquery rooted in n.
                                                                                I’m looking for a man who lives in a beautiful house
An example of each of these different types of simple selec-
tion is shown in Figure 11.                                                     owned by a rich person.
                                                                                                               (b)
   I’m looking for a man who lives in a beautiful house
                                                                              Figure 12: Examples of (a) connected and (b) disconnected complex
                                                                              selection
   owned by a rich person.
                                  (a)                                         From the graphic point of view, spans associated with tags in
                                                                              a selection are highlighted in a stronger way (e.g., a darker
                                                                              color) than they are when highlighted because of hovering
   I’m looking for a man who lives in a beautiful house                       and, unlike the highlighting effect triggered by hovering, it is
                                                                              not possible to distinguish between adjacent selected spans
   owned by a rich person.                                                    associated with the same node. When the selection includes
                                                                              one or more paths between nodes (that is, all of the nodes in
                                  (b)
                                                                              a path within the query are selected), spans associated with
                                                                              edge tags are also highlighted. The visual appearance of the
   I’m looking for a man who lives in a beautiful house                       spans associated with tags of selected nodes or edges does
                                                                              not change as the result of an hovering event, as shown in
   owned by a rich person.                                                    Figure 13. Moreover, as the reader might already have no-
                                                                              ticed, when a non-empty selection is present, the add-buttons
                                  (c)                                         become invisible without changing the layout of the text.
Figure 11: Simple selections obtained by (a) single, (b) double and (c)
triple clicking on the span “house”                                              I’m looking for a man who lives in a beautiful house
A selection can be cleared by clicking on an area of the UI                      owned by a rich person.
where clicking does not have any other effect (e.g., on the
white space between the lines of text representing the query,
                                                                              Figure 13: Hovering on either “beautiful” or “house” in presence of a
or on a span that is not associated with any tag). Clearing a                 non-empty selection
selection results in an empty selection.

Observe that when a node has only one node label, an atomic                   Addition
selection on that node happens to be also a node selection,                   The query logic sub-system provides two operations, namely
and when a node has no children, a node selection is also a                   addComp and addRel, for refining a query through the addi-
complete selection. Thus, an atomic selection on a node hav-                  tion of compatible terms and relations to a focus node. The
ing only one label and no children is also a node selection as                UI makes these operations available to the user by means of


                                                                          6
a pop-up menu, activated by clicking on the add-button of a           Note that the operations weaken and delete, as defined in our
node which is set as the focus.                                       functional API, cannot directly handle a disconnected com-
                                                                      plex selection. However, such a selection can be decom-
The menu contains a list of suitable arguments for the invo-          posed by the UI in a series of connected selections that are
cation of either addComp or addRel. The menu entries are              then suitable for the actual invocation of the two operations.
concept names and pairs consisting of a role name and a con-
cept name, which are obtained from the output of the QueLo            Observe that selected nodes can be deleted even if they have
operations getComp and getRel w.r.t. the current query and            an incoming sticky edge; in other words, the result of dele-
focus. In particular, for a query Q focused in n, the menu is         tion is the same independently of the presence of sticky edges
populated with the nodes in the graph resulting from disjoint         in the selection. Note also that in some cases deletion pro-
union of the output graphs of getComp(Q, n) and getRel(Q,             duces the same result as weakening (e.g., for a node selection
n), arranged in the following way:                                    rooted in a non-leaf node).

• nodes with no incoming edge populate a menu of level 0,             Substitution
  which is the topmost menu, where entries corresponding              The QueLo sub-system provides the operation substitute in
  to concept names are listed before entries associated with          order to allow the substitution of a selection within the query
  pairs of concept/role names.                                        with a more generic, equivalent or more specific term. The
• for each node n in a menu at level k, all the nodes that are        UI makes this operation available to the user: upon long-
  reachable from n in one step populate a sub-menu at level           clicking on a selected portion (i.e., strongly highlighted) of
  k + 1 associated with entry n.                                      the query a pop-up menu is displayed, listing all the possi-
                                                                      ble terms with which the selection can be replaced. Such a
The actual items shown to the user in the above menu struc-           menu is populated with concept names that are more gen-
ture are natural language descriptions of the node-entries (ei-       eral than, equivalent to and more specific than the selection
ther concept names or atomic concept/role pairs) generated            and that are retrieved from the QueLo sub-system by means
by the NLG sub-system.                                                of the operations getSupers, getEquiv and getSubs, respec-
                                                                      tively. More general terms are shown at the top of the list,
Some of the items come with an icon on their left: an upward-         equivalent terms in the middle and more specific terms at
pointed (resp., downward-pointed) triangle is displayed for           the bottom. At the left of each item an icon is shown: an
concept names (resp., role/concept pairs) indicating that the         upward-pointing triangle for more general terms, a square
option is associated with a nested sub-menu containing more           for equivalent terms and a downward-pointing triangle for
specific (resp., more generic) options of the same type. Hov-         more specific terms.
ering on any of these options opens the pop-up menu associ-
ated with that item and displayed next to it. An example of           The substitution menu has a similar hierarchical structure as
the structure of the addition menu is given in Figure 14.             the menu for addition, reflecting the taxonomic information
                                                                      in the output graphs of the operations getSupers, getEquiv
                                                                      and getSubs. In particular, some (possibly none) of the more
                                                                      general terms might be further generalised, in which case
                                                                      hovering on one such item triggers a sub-menu containing its
                                                                      direct super-concepts; similarly, if some of the more specific
                                                                      terms can be further specialised, then hovering on one such
                                                                      item triggers a sub-menu containing its direct sub-concepts
                                                                      that are compatible with the query. The same rules for fur-
                                                                      ther generalisation/specialisation apply to the items in the
                                                                      sub-menus, while equivalent terms (if any) cannot be further
            Figure 14: Structure of the addition menu                 generalised nor specialised.

Clicking on any of the elements in the menu triggers the in-
vocation of either addComp or addRel, according to whether
the clicked item is associated with a concept name or a con-
cept/role pair, respectively. Upon clicking, the menus disap-
pear and the UI updates its representation of the query af-
ter the necessary changes are performed by the QueLo sub-
system.                                                                          Figure 15: Structure of the substitution menu

Weakening and Deletion                                                As in the case of addition, the actual items shown to the user
The user can weaken (respectively, delete) a selected portion         in the substitution menu are natural language descriptions
of the query by pressing the backspace (resp., delete) key on         generated by the NLG sub-system, rather than bare concept
the keyboard, which invokes the QueLo operation weaken                names. Clicking on any of given options triggers the invoca-
(resp., delete) with the current query and selection as input         tion of substitute with the current selection and the concept
arguments. Upon weakening (resp., deletion), the selection            name associated with the clicked item as input arguments.
is cleared and the UI updates its representation to reflect the       The selection is then cleared and the UI updates the repre-
changes in the query.                                                 sentation of the query after the selected elements have been


                                                                  7
replaced with the chosen term.                                         Informally, the conditions that a strict total order must satisfy
                                                                       to qualify as a linearisation state that “the tag of each edge
Observe that the operation substitute, as defined in our func-         is (1a) preceded by all the tags of its initial node and (1b)
tional API, cannot deal with disconnected complex selection            followed by at least one of the tags of its terminal node” and
and, unlike the case of weakening and deletion, the problem            that (1c) “between any two tags of a node there can only be
cannot be overcome by converting such selection in a series            (distinct) tags of the same node”. Moreover, from the above
of connected ones. This is due to the fact that substitution re-       conditions it can be also proved that “the tag of each edge is
lies on the roll-up of the input selection itself, which is thus       followed by all the tags of its terminal node”.
required to be tree-shaped (i.e., connected). For this reason,
in the presence of a disconnected selection, the substitution          In general, without further restrictions, a query may admit
operation is disabled.                                                 more than one linearisation, but all of them share some com-
                                                                       mon properties. In particular, each linearisation C of a query
NATURAL LANGUAGE RENDERING                                             Q is such that the minimal element is always a tag of the root
The natural language interface of the Query Tool masks the             and for each subquery S of Q the restriction of C on tags(S)
composition of a precise query as the composition of English           is itself a linearisation of S. Moreover, every linearisation is
text describing the equivalent information needs. Interfaces           compatible with the tree-order of the query, that is, the tags
following this paradigm are known as “menu-based natural               of a node n precede the tags of a node m in the linearisation
language interfaces to databases”. The users of such systems           whenever there is a path from a n to m in the query.
edit a query by composing fragments of generated natural
language provided by the system through contextual menus.              A strict1 total order on the node set of a tree can be obtained
                                                                       by fixing a total order on the children of each node and per-
In what follows, we describe how the natural language ren-             forming a complete visit of the tree according to a chosen
dering of a query is achieved. We start by defining a particu-         traversal strategy. If the tree is visited using a depth-first or
lar linear form of the query that satisfies certain constraints,       breadth-first traversal, then the resulting order is an exten-
necessary to represent the elements of the query using a lin-          sion of the partial order induced by the tree itself on its node
ear medium, that is, text. The constraints are enforced at the         set. Thus, a linearisation of a query can always be obtained
API level to ensure that different graphical user interfaces           by additionally fixing a total order on the labels associated
represent the query in a homologous way. Moreover, a con-              with each node. For our purposes, we decided to use a depth-
sistent ordering of the query elements needs to be preserved           first traversal where the labels (resp., children) of a node are
during the operations for query manipulation to avoid con-             ordered according to the sequence (resp., inverse sequence)
fusing the end user. The linearised version of the query is            of applications of the operation addComp (resp., addRel) to
then used as a guide for the language generation performed             that node as focus. The motivation for this choice lies in the
by the Query Tool’s NLG engine.                                        fact that, for cognitive reasons, we want the query to change
                                                                       as little as possible from the visual point of view and in such
                                                                       a way that changes are restricted to a limited portion of the
Linearisation of a query
                                                                       query. As a result, every execution of addComp or addRel
Since a concept name can be in general associated with more            yields one or more new tags whose place in the linearisation
than one node in the same query, we introduce the notion of            is right after all the node tags relative to the node given as
tag: given a query Q, a node tag is a pair hn, ci where n ∈            input argument to the operation.
V (Q) and c ∈ V(n), while an edge tag is a pair he, E(e)i
such that e ∈ E(Q). The set of tags relative to a node n is
given by tags(n) := {hn, ci | c ∈ V(n)}, while the set of all          Natural Language Generation module
tags occurring in a query Q is defined as follows:                     The natural language interface (NLI) of the Query Tool re-
                                             [                         lies on a natural language generation (NLG) system to pro-
   tags(Q) := {he, E(e)i | e ∈ E(Q)} ∪            tags(n)              duce the textual representation of the query, following an
                                            n∈V (Q)                    idea first presented in [12] and lately refined in [8].

                                                                       NLG systems use techniques from artificial intelligence and
A linearisation of a query Q is a strict total order C (that is,       computational linguistics to produce human-readable texts
a binary relation that is asymmetric, transitive and total) on         out of machine-readable data. The Query Tool uses NLG to
tags(Q) such that, for each edge e ∈ E(Q), the following               represent the whole query, along with all the elements that
conditions hold:                                                       the user can use to refine it, as English text. The generated
   ∀t ∈ tags(init(e)),     t C he, E(e)i           ;       (1a)        text is enriched with links that connect it to the underlying
                                                                       logical form of the query. This allows the user to operate on
   ∃t ∈ tags(ter(e)),          he, E(e)i J t       ;       (1b)        the query simply by editing an English text.
and, for each node n ∈ V (Q), it is the case that:
                                                                       Unlike most NLG systems, ours is built to let the user deter-
  ∀t1 , t2 ∈ tags(n), t1 C t C t2 =⇒ t ∈ tags(n) . (1c)                mine the structure of the generated text by inserting, replac-
                                                                       ing and removing snippets of it. Thus, while in the classic
where we use the expression t1 J t2 for indicating that t1 C
                                                                       NLG pipeline the information to be conveyed in the text and
t2 and there does not exist t ∈ tags(Q) for which t1 C t C
                                                                       its order is determined by the document planning module, in
t2 . In such case we say that t1 immediately precedes t2 or,
equivalently, that t2 immediately follows t1 . Note that C is          1
                                                                         We only consider orders that are strict (i.e., irreflexive) unless ex-
the transitive closure of the relation J.                              plicitly stated otherwise.


                                                                   8
the Query Tool it is the user who decides both the informa-            as surface realisation, produces a list of text tokens, some
tion to be displayed and its arrangement.                              of which are connected to edge or node labels. This list is
                                                                       finally fed to the GUI, that displays it to the user.
As the Query Tool is not tailored to any specific domain, its
NLG module is simple enough to be adopted in any context               Elements populating the menu for addition and substitution
and it is not bundled with all the resources that are needed           operations undergo a similar processing. To produce the tex-
to generate text out-of-the-box. Therefore, in order to use it         tual representation of such an element, the system makes a
on a specific knowledge base, the system must be provided              temporary copy of the portion of query affected by the oper-
with a lexicon and a template map. The former contains the             ation. The operation is then carried out on this portion and
words to be used in the generated text; the latter is the bridge       the resulting structure is fed to the generation pipeline used
between the natural language and the knowledge represen-               for entire queries. The outcome of the generation process is
tation language, associating each concept/role name with a             the text which will appear on the menu.
generation template. Each such template contains the syn-
tactic and lexical information necessary to generate a frag-
ment of text representing the associated concept or role.              Automated generation of lexicon and template map
                                                                       For the Query Tool’s NLI to work with a specific knowledge
We selected the syntactic features available in the templates,         base (KB) a lexicon and a template map must be provided
hence supported by the generator, in order to keep the sys-            for it. Devising these resources requires an understanding
tem simple while still being expressive enough. For this               of both the domain of interest (DOI) and basic linguistic no-
purpose, we collected and analysed a corpus of more than               tions such as verb tenses, noun genders and countability. To
12.000 unique relation identifiers and we partitioned them             ease the burden of developing these resources from scratch,
according to the recurring syntactic patterns. For each class          we experimented with a computational technique to have the
of the partition, we then proposed a common natural lan-               system generate them automatically. This technique follows
guage representation template. The result of this study is a           an approach to domain independent generation proposed in
set of simple but effective templates for representing most            [11]. The functionality we implemented allows to produce
ontology relations using natural language.                             all the resources necessary to configure our NLI for use with
                                                                       a new KB, using as a source of data the ontology itself. How-
During the first stage (microplanning) of the generation, lin-         ever, the process is not completely reliable, therefore system
guistic information stored in the template map and in the lex-         engineers must review the result and make the necessary cor-
icon blends with the logic information encoded in the query            rections. In the following, we describe the technique to han-
into a single structure, known in the NLG literature as text           dle relation identifiers, but we adapted it to handle concept
specification and consisting of a list of syntactic trees with         identifiers as well.
inflected lexemes on its leaves. The NLG system operates on
this structure to aggregate groups of adjacent syntactic struc-        The idea is based on the observation that KBs already con-
tures into single more complex structures, and to select and           tain some form of linguistic information. In real-world on-
replace existing referring expressions with more appropriate           tologies, every concept and relation has a unique identifier
ones. These two tasks are known in the literature as aggre-            (ID), which most of the times is not just an arbitrary string,
gation and referring expressions generation, respectively. At          but a mnemonic chosen by the knowledge engineer to de-
the same time, the system keeps track of which element of              scribe the intended meaning of the identified concept or re-
the text specification is associated with which element (ei-           lation. Moreover, within these IDs, certain syntactic patterns
ther a node tag or an edge tag) of the query. An association           occur more frequently than others.
holds when the syntactic element is the result of the instanti-
ation of a template associated with the element of the query.          In our approach, each relation ID is first tokenized according
These associations are used for enriching the generated text           to an algorithm that takes advantage of the naming conven-
with links to the underlying query.                                    tions used by ontology engineers. Second, the tokenized ID
                                                                       is fed to a custom part-of-speech tagger built around QTAG
The linearisation of the query simplifies the effort required          [14]. The resulting tagged tokenized ID is then lightly pre-
by the referring expressions generation, as referring expres-          processed before being finally passed to a transformation
sions that need to be reworked always appear in subject posi-          rule, chosen among thirteen different ones, that produces a
tion. Our algorithm replaces a subject with a pronoun when-            template for the template map of the NLG system.
ever the previous sentence had the same subject, otherwise
the subject is left unchanged. Although ambiguous expres-              For the design of the transformation rules, we analysed our
sions may occur, ambiguity is not a crucial issue as these ex-         corpus, containing more than 12.000 relation IDs, in order
pressions originate from user operations upon a selected el-           to devise a partition of the domain in terms of syntactic pat-
ement, which always becomes the target of the referring ex-            terns. The classes defined in this partition are s.t. to each
pression. Our aggregation module performs relatively sim-              relation of the same class can be applied a simple transfor-
ple aggregation tasks such as aggregating sentences with the           mation in order to obtain a template. Each such transforma-
same subject, eliding the subject and parts of the verb when-          tion is also a uniform interpretation of the intended meaning
ever it is feasible.                                                   of each relation ID in the class. Some care is needed when
                                                                       giving a uniform interpretation to syntactic patterns, as there
                                                                       are situations in which the same syntactic pattern is to be
Once these operations are completed, the text specification is         interpreted differently. For instance, the relation IDs “coun-
ready to be transformed into the final text. This task, known          try of nationality” and “language of country” share the same

                                                                   9
syntactic structure, but the first relation should be read as            eventuality that an edge associated with an attribute has to
“the country of nationality of X is Y”, while the second as              be inverted (going from the datatype to the subject). Though
“the language of X is Y”. Each of the thirteen rules we de-              apparently simple, allowing for attributes poses some inter-
fined corresponds to one class of the partition, and together            esting questions, that we are currently investigating, in order
they can handle 93% of the relations of the average ontology.            to deal with concrete values from the point of view of rea-
                                                                         soning.
The system has been tested with some of the ontologies [9]
developed by the national mapping agency of Great Britain                The work in this paper has been partially supported by the
and with the Pizza Ontology [6] by the University of Manch-              European project ONTORULE.
ester, contributing 64 unique relations in total. From the IDs
of these relations we automatically generated relation tem-
plates, which were then inspected in order to evaluate their
usability in text generation. The result of the evaluation re-           REFERENCES
vealed that for 42 out of 64 relations (65%) the generated                1. T. Catarci, T. Di Mascio, P. Dongilli, E. Franconi, G. Santucci, and
                                                                             S. Tessaris. An ontology based visual tool for query formulation
template is suitable for direct use with the Query Tool’s NLI.               support. In Proc. 16th Eur. Conf. Artificial Intelligence, 2004.
The result suggests that although the generation of the tem-
                                                                          2. T. Catarci, T. Di Mascio, P. Dongilli, E. Franconi, G. Santucci, and
plate map is not totally reliable, it is nevertheless useful in              S. Tessaris. An Ontology-Based Query Manager: Usability
that it speeds up the work of systems engineers, as they do                  Evaluation. In Proc. HCITALY 2005, 2005.
not need to create the whole map from scratch, but only have              3. T. Catarci, T. Di Mascio, P. Dongilli, E. Franconi, G. Santucci, and
to review the generated map and repair eventual errors. This                 S. Tessaris. Usability evaluation in the SEWASIE (SEmantic Webs
improves the portability of the Query Tool’s NLI, making it                  and AgentS in Integrated Economies) project. In Proc. 11th Int. Conf.
faster and easier to switch to a different knowledge base.                   on Human-Computer Interaction, 2005.
                                                                          4. P. Dongilli. Natural language rendering of a conjunctive query.
                                                                             Technical Report KRDB08-3, KRDB Research Centre, Free
CONCLUSIONS AND FUTURE WORK                                                  University of Bozen-Bolzano. http:
                                                                             //www.inf.unibz.it/krdb/pub/TR/KRDB08-3.pdf, June
A Java implementation of the Query Tool has been devised,                    2008.
including the query logic and natural languages sub-systems
                                                                          5. P. Dongilli, E. Franconi, and S. Tessaris. Semantics driven support for
and a fully functional user interface as described above. The                query formulation. In Proc. of the 2004 Int. Workshop on Description
Query Tool was implemented for the first time in the context                 Logics, 2004.
of the European project SEWASIE [10] as a Java web-based                  6. N. Drummond, M. Horridge, R. Stevens, C. Wroe, and S. Sampaio.
application, later on converted into a stand-alone applica-                  Pizza ontology. The University of Manchester.
tion [15]. Important optimisations were then introduced (see                 http://www.co-ode.org/ontologies/pizza/.
[16]) with the purpose of minimising the number of reasoner               7. P. Guagliardo. Theoretical foundations of an ontology-based visual
calls and thus improving the responsiveness of the tool. A                   tool for query formulation support. Technical Report KRDB09-5,
difference between our implementation of the Query Tool                      KRDB Research Centre, Free University of Bozen-Bolzano. http:
and the previous one is that while the latter uses a DIG-based               //www.inf.unibz.it/krdb/pub/TR/KRDB09-05.pdf,
                                                                             October 2009.
reasoning engine, the former takes advantage of the OWL-
API. Most importantly, our implementation follows the for-                8. C. Hallett, D. Scott, and R. Power. Composing questions through
                                                                             conceptual authoring. Computational Linguistics, 33(1):105–133,
mal specification given in the Query Tool’s framework and                    2007.
functional API and complies with them.
                                                                          9. Ordnance Survey - Great Britain’s national mapping agency. http:
                                                                             //www.ordnancesurvey.co.uk/oswebsite/ontology/.
A usability evaluation of the previous Query Tool’s imple-
                                                                         10. The SEmantic Webs and AgentS in Integrated Economies (SEWASIE)
mentation was carried out in past work [3, 2] with the pur-                  project. http://www.sewasie.org/, 2005.
pose of measuring its complexity of use from the user’s point
                                                                         11. X. Sun and C. Mellish. Domain independent sentence generation from
of view. In particular, the study aimed at determining how                   RDF representations for the Semantic Web. In Proc. ECAI’06
difficult it is for the user to formulate queries using the Query            Combined Workshop on Language-Enhanced Educational Technology
Tool and to understand the results. The outcome of the ex-                   and Development and Evaluation of Robust Spoken Dialogue Systems,
periments showed that the tool can be easily used also by                    2006.
non-experienced users to query a domain in which they have               12. H. R. Tennant, K. M. Ross, R. M. Saenz, C. W. Thompson, and J. R.
no special expertise.                                                        Miller. Menu-based natural language understanding. In Proc. 21st
                                                                             Annual Meeting of the Association for Computational Linguistics,
                                                                             pages 151–158. Association for Computational Linguistics, 1983.
The framework and functional API presented in this paper
consider only the addition of new relations to the query, but            13. M. Trevisan. A portable menu-guided natural language interface to
                                                                             knowledge bases. Master’s thesis, University of Groningen, 2009.
they could be extended to deal with attributes (i.e., properties
relating a concept to a datatype) as well. The only difference           14. D. Tufis and O. Mason. Tagging Romanian texts: a case study for
                                                                             QTAG, a language independent probabilistic tagger. Proc. 1st Int.
with the current framework would be that a node associated                   Conf. on Language Resources and Evaluation (LREC’98), pages
with a datatype (i.e., the “range” of the attribute) cannot be               589–596, 1998.
the focus of a query for operations other than deletion. This            15. I. Zorzi. An Ontology-Based Visual Tool for Query Formulation
basically means that such a node is always a leaf of the query               Support. Bachelor’s thesis, Faculty of Computer Science, Free
tree and the only operation allowed on it is deletion. Then,                 University of Bozen-Bolzano, 2005.
since a node of this kind cannot be refined by adding a com-             16. I. Zorzi. Improving Responsiveness of Ontology-Based Query
patible term or attaching a new property, the query is never                 Formulation. Master’s thesis, Faculty of Computer Science, Free
rolled-up with respect to it, thus avoiding the nonsensical                  University of Bozen-Bolzano, March 2008.


                                                                    10