An intelligent query interface based on ontology navigation Enrico Franconi Paolo Guagliardo Marco Trevisan franconi@inf.unibz.it paolo.guagliardo@stud-inf.unibz.it marco.trevisan@stud-inf.unibz.it KRDB Research Centre Free University of Bozen-Bolzano ABSTRACT neous data sources by means of a conceptual schema and In this paper we present a formal framework and an experi- supports the users in the task of formulating a precise query mental software supporting the user in the task of formulat- over it. In describing a specific domain, the ontology defines ing a precise query – which best captures their information a vocabulary which is often richer than the logical schema needs – even in the case of complete ignorance of the vocab- of the underlying data and usually closer to the user’s own ulary of the underlying information system holding the data. vocabulary. The ontology can thus be effectively exploited Our intelligent interface is driven by means of appropriate by the user in order to formulate a query that best captures automated reasoning techniques over an ontology describ- their information need. The user is constantly guided and ing the domain of the data in the information system. assisted in this task by an intuitive visual interface, whose intelligence is dynamically driven by reasoning over the on- We will define what a query is and how it is internally repre- tology. The inferences drawn on the conceptual schema help sented, which operations are available to the user in order to the user in choosing what is more appropriate with respect modify the query and how contextual feedback is provided to their information need, restricting the possible choices about it presenting only relevant pieces of information. We to only those parts of the ontology which are relevant and will then describe the elements that constitute the query in- meaningful in a given context. terface available to the user, providing visual access to the underlying reasoning services and operations for query ma- The most powerful and innovative feature of our framework nipulation. Lastly, we will define a suitable representation lies in the fact that not only do not users need to be aware of in “linear form”, starting from which the query can be more the underlying organisation of the data, but they are also not easily expressed in natural language. required to have any specific knowledge of the vocabulary used in the ontology. In fact, such knowledge can be grad- Author Keywords ually acquired by using the tool itself, gaining confidence query formulation support automated reasoning description with both the vocabulary and the ontology. Users may also logics ontology visual graphical intelligent user interface decide to just explore the ontology without actually querying the information system, with the aim of discovering general information about the modelled domain. ACM Classification Keywords H.5.2 Information Interfaces and Presentation: User Inter- Another important aspect is that only queries that are logi- faces—Graphical user interfaces (GUI)—Natural language cally consistent with the context and the constraints imposed —Interaction styles by the ontology can be formulated, since contradictory or re- dundant pieces of information are not presented to the user INTRODUCTION at all. This makes user’s choices clearer and simpler, by rul- Recent research showed that adopting formal ontologies as ing out irrelevant information that might be distracting and a means for accessing heterogeneous data sources has many even generate confusion. Furthermore, it also eliminates the benefits, in that not only does it provide a uniform and flexi- often frustrating and time-consuming process of finding the ble approach to integrating and describing such sources, but right combination of parts that together constitute a mean- it can also support the final user in querying them, thus im- ingful query. For this reason, the user is free to explore the proving the usability of the integrated system. ontology without the worry of making a “wrong” choice at some point and can thus concentrate on expressing their in- We introduce a framework that enables access to heteroge- formation need at best. Queries can be specified through a refinement process con- sisting in the iteration of few basic operations: the user first specifies an initial request starting with generic terms, then refines or deletes some of the previously added terms or in- troduces new ones, and iterates the process until the resulting Workshop on Visual Interfaces to the Social query satisfies their information need. The available opera- and Semantic Web (VISSW2010), IUI2010, tions on the current query include addition, substitution and Feb 7, 2010, Hong Kong, China. Copyright is held by the author/owner(s). 1 deletion of pieces of information, and all of them are sup- achieved by selecting something within the query and asking ported by the reasoning services running over the ontology. for a substitution. In our example, we tick the check-box as- sociated with the term Thing and then press the Substitute The framework is implemented in the form of an experimen- button. As shown in Figure 2, we are presented with a three- tal software, called Query Tool, which relies on a web-based part menu listing all the possible substitutions available for client-server architecture consisting of three components: the selected portion of the query: terms that appear at the top are more general than the selection, the ones in the middle • the query logic (QueLo), responsible of “reasoning” over are equivalent, while those at the bottom are more specific. the ontology in order to provide only relevant information Moreover, these terms are organised in sub-menus according w.r.t. the current query; to the taxonomic information defined in the ontology. Thus, we can easily navigate the different options choosing the de- • the natural language generation (NLG) engine, that given sired level of detail for the substitution. In our case, instead a query and a lexicalization map for the ontology produces of Person we choose the further specific term Rich Person an English sentence; as a replacement for the selection, resulting in the query vis- • the user interface (UI), that provides visual access to the ible on the right side of Figure 2. query and editing facilities for it, allowing to interact with the QueLo sub-system while benefiting from the services of the NLG engine. OVERVIEW In the paper we will introduce the Query Tool’s formal frame- work and functional API, that constitute the basis on top of which different types of user interfaces (UIs) can be devised. Previous implementations of the Query Tool ([1, 15]) exper- imented mostly with tree-based and menu-based UIs. In this section we first describe the behaviour of the Query Tool us- Figure 2: Example of specialisation ing a generic representation over an abstract user interface. Later on in the paper we will then introduce a concrete inter- Another way of modifying a query is to add constraints in the face based on the generation of natural language. form of new terms or relations. As shown in Figure 3, upon clicking on the add button a two-part menu is displayed, con- Consider a scenario in which we have a conceptual schema, taining suitable terms and relations that can be safely added say an OWL ontology, we know nothing about. In such situ- to the query. Terms are shown in the upper part of the menu, ation, the Query Tool reveals to be particularly useful in that while relations in the lower one. The chosen item is inserted it allows to discover information about the ontology and the in a specific place of the query, that can be selected by means modelled domain, even when its vocabulary is completely of the radio button present in each line. Figure 3a shows how ignored. What we call intensional navigation of the ontol- the new term Single Person is added to the first (and only) ogy is the process of building a query, starting from a very line of the query, while Figure 3b shows how adding a rela- general request which is then refined by adding or deleting tion results in the creation of a new line, indented w.r.t. the constraints according to the user’s information need. In our first one and consisting of the name of the relation lives in abstract representation, the default initial query looks like followed by the label House associated with its range. Ob- the one shown in Figure 1 that generically asks for some serve that the menu of Figure 3b, compared to that of Fig- “thing”. Four operations are available for manipulating the ure 3a, does not include the term Single Person and the re- query: add for the addition of new terms and relations; sub- lation married to Person as possible options. In fact, the stitute for replacing a portion of the query with a more gen- former is already present in the query, thus it would be re- eral, equivalent or more specific term; delete for discarding dundant to propose it again; the latter became incompatible parts of the query; and weaken for making a portion of the with the query due to the addition of the previous term, and query as general as possible. this means that in our ontology a person who is rich and sin- gle (or perhaps just single) cannot be married to anyone. A query can be made more general or “weaker” in a variety of ways, one of which is the substitution with a more general term. Other possibilities are given by deletion and weaken- ing, both of which remove selected elements from the query but with distinct approaches and outcomes. The difference between them is shown in Figure 4: while in Figure 4a delet- ing the selected portion causes the second row of the query to disappear, in Figure 4b weakening the same portion pre- serves the row, although the term House is replaced with the Figure 1: Initial query generic term Thing. The first step in the refinement of our query consists in be- Suppose that our ontology states that a rich man who is mar- ing more specific about what we are looking for. This can be ried to a beautiful woman and lives in a beautiful house is in- 2 Figure 5: Preventing side-effects of substitution (a) Formal framework From the point of view of the QueLo sub-system, a query is a labelled tree where each node corresponds to a variable and it is associated with a set of concept names from the ontol- ogy, while each edge is labelled with a role name. Therefore, for what concerns the expressive power, the Query Tool can represent tree-shaped conjunctive queries only. Let N be a countably infinite set of node names, C be a finite set of concept names and R a finite set of role names, and let N, C and R be pairwise disjoint. A query Q is a quintuple (b) hV, E, o, V, Ei in which (V, E) is a directed tree rooted in o ∈ V , with set of nodes V ⊆ N and with set of edges E ⊆ Figure 3: Addition of (a) a new term and (b) a new relation V × V ; V is a total function, called node-labelling function, associating each node with either a non-empty set of concept names or with the singleton {>}; and E is called the edge- labelling function that associates each edge with a role name. A query consisting of exactly one node, whose set of labels is a singleton, is called atomic. For an edge e = hx, yi, we indicate its initial node x with init(e) and its terminal node (a) with ter(e). Given queries S and Q, we say that S is a subquery of Q, and write S ⊆ Q, iff V (S) ⊆ V (Q), E(S) ⊆ E(Q), each node n ∈ V (S) is s.t. VS (n) ⊆ VQ (n) and every edge e ∈ E(S) is such that ES (e) = EQ (e). We say that S is a complete sub- query of Q (in symbols S j Q) if it also holds that, for every n ∈ V (S), VS (n) ⊇ VQ (n) and every descendant of oS in (b) Q is a node in S. A selection within a query Q is a subquery S of Q, which is called simple if S j Q or S consists of Figure 4: Example of (a) deletion and (b) weakening exactly one node, namely its root oS , such that VS (oS ) is a singleton or is equal to VQ (oS ). Every selection S within a query Q partitions the nodes of Q into selected, which be- deed a lucky person. As a result of the substitution shown in long to V (S), and unselected, belonging to V (Q) \ V (S). Figure 5, where the whole query is replaced with the (more The selected nodes can be further partitioned into totally se- general) term Lucky Person, the second line disappears. In lected, having all of their labels selected, and partially se- some situations this “side-effect” is undesired, because we lected, which have some, but not all, of their labels selected. would perhaps like to operate on that part of the query later An example of query as represented is shown in Figure 6, on. The closed padlock icon visible in the third line of the which also shows the compact graphical notation we use for query indicates that that line is protected against such an “ac- representing a selection within a query: selected nodes are cidental” deletion and would not inadvertently disappear as drawn using a double circle and selected labels within each the result of the substitution. However, note that locked por- of them are underlined. tions of the query are still fully affected by explicit deletion. The weakening of a query Q w.r.t. a selection S within Q THE QUERY LOGIC is the query Q S obtained from Q by replacing its node- The Query Tool’s framework and its functional API (QueLo) labelling function VQ with a function that associates each have been recently defined from a formal point of view in the totally selected node with {>}, each partially selected node technical report [7], with the purpose of precisely describ- n with VQ (n) \ VS (n) and each unselected node m with ing its components and the abstract operations provided for VQ (m). query manipulation. Here, we concisely summarise the main definitions introduced in [7] and formally prove the most im- The last notion we introduce is that of sticky edges, which portant of Query Tool’s properties, namely that it generates are edges that can only be deleted explicitly (that is, when only “meaningful” queries. performing a deletion), but never implicitly (e.g., as the con- 3 x {Man} Algorithm 1 Calculate enc-rollup(Q, n, m) Input: a query Q and two nodes n, m ∈ V (Q) marriedTo livesIn Output: a concept C expressing Q in the description logics language L y w {Beautiful, House} 1: C ← c, for some c ∈ V(n) {Woman} 2: for all x ∈ V(n) such that x 6= c do owned by 3: C ←C ux 4: end for z {RichPerson} 5: for all children x of n in Q such that x 6= m do Figure 6: Example of query and a selection within it 6: R ← E(hn, xi) 7: C ← C u ∃R . enc-rollup(Q, x, n) 8: end for sequence of a substitution). Sticky edges are closed w.r.t. the 9: if n 6= o then tree structure of the query, that is, when an edge e is sticky, 10: Let p be the parent node of n in Q then all the edges in the path from the root of the query to 11: if p 6= m then ter(e) are such. The meaning and importance of sticky edges 12: R ← E(hp, ni) will become more clear in the next section, where we intro- 13: C ← C u ∃R− . enc-rollup(Q, p, n) duce and describe the two operations delete and substitute. 14: end if For the moment, sticky edges can be simply understood as 15: end if immutable (to some extent) pieces of information within a 16: return C query, which are not modified as a “side effect” of an opera- tion not directly intended to do so. Given a query Q and a node n, we say that a concept name c Functional API is compatible with Q focused in n iff c u roll-up(Q, n) 6v ⊥, To draw the inferences that are at the basis of the query for- while a role name r is such iff ∃r− . roll-up(Q, n) 6v ⊥. The mulation tasks, we express a query as a concept of some operation getComp(Q, n) returns a directed acyclic graph description logic (DL) language, for which the containment (DAG) G, whose nodes are all the concept names that are test of two conjunctive queries is decidable and available as a compatible with Q focused in n and that are neither sub- reasoning service. In what follows, we assume the existence nor super-concepts of roll-up(Q, n), and whose edges are of an underlying knowledge base K in such a DL language all the pairs of concept names c1 , c2 ∈ V (G) such that c1 L over C and R. We say that C is a sub-concept of (or sub- is a direct sub-concept of c2 . In other words, the output of sumes) D in K, and write C vK D, iff K |= C v D, in getComp is a taxonomy of concept names which are com- which case we also say that D is a super-concept of (or is patible with the query and not in hierarchy with the context. subsumed by) C. Two concepts C and D are equivalent in The operation getRel(Q, n) returns a DAG G, whose nodes K, written C ≡K D, iff one subsumes the other and vice are all the pairs hr, ci of role names and concept names such versa. For two concept names c1 and c2 we say that c1 is a that r is compatible with Q focused in n and c is a sub- or direct sub-concept of c2 (and that c2 is a direct super-concept a super-concept of ∃r− . roll-up(Q, n), and whose edges are of c1 ) iff c1 subsumes c2 and there is no c ∈ C equivalent to the pairs hhr, c1 i, hr, c2 ii ∈ V (G) × V (G) such that c1 is a neither c1 nor c2 and such that c1 vK c vK c2 . direct super-concept of c2 . Before introducing the functional API, let us first give some Let S be a selection within a query Q. Then, the operations preliminary definitions. Given a query Q and n ∈ V (Q), the getSupers, getEquiv and getSubs return the concept names operation roll-up(Q, n) translates Q into an L-concept w.r.t. that are more general than, equivalent to and more specific n and it is defined as enc-rollup(Q, n, n), where enc-rollup than roll-up(S), respectively. Moreover, the concept names is the recursive procedure described in Algorithm 1. We use in the output of getSubs(Q, S) are additionally required to roll-up(Q) as an abbreviation for roll-up(Q, o), where o is be compatible with Q focused in the root of S. the root of Q. The concept roll-up(Q, n) is called the context of Q w.r.t. n, expressing the informative content of Q from Let Q be a query and n a focus node. For a concept name c in the point of view of a specific node, which we call the focus. the output of getComp(Q, n), the operation addComp adds Queries Q1 and Q2 are equivalent, in symbols Q1 ≡ Q2 , iff c to V(n). More precisely, the result of addComp(Q, n, c) is roll-up(Q1 ) ≡K roll-up(Q2 ). We say that a query Q over the query Q0 obtained from Q by replacing its node-labelling a consistent knowledge base K is satisfiable iff its roll-up is function V with V 0 := V [n 7→ V(n) ∪ {c}]. For a pair hr, ci such in K (that is, K 6|= roll-up(Q) v ⊥). in the output of getRel(Q, n), the operation addRel creates a new node n0 such that V(m) = {c} and an edge e = hn, mi The functional API of the Query Tool is structured in two with E(e) = r. main parts: • the underlying reasoning services, consisting of the oper- Let Q and R be queries and E e be a set of sticky edges. Then, ations getComp, getRel, getSupers, getEquiv, getSubs; the operation prune deletes from Q the maximal number of non-root nodes, having no incoming sticky edge (if any) and • the operations for query manipulation, including addRel, associated with the same concept names both in R and Q, addComp, weaken, substitute and delete. such that the result is still a query. 4 Let S be a selection within a query Q and let E e be a set of The English sentence representing the query in the UI is gen- sticky edges. We define weaken(Q, S) as Q S and erated by the NLG sub-system, which will be described in  the next section. An example of the textual rendering of the e := prune weaken(Q, S), R, E delete(Q, S, E) e , query in natural language as displayed by the UI is given in Figure 7. where R is the query obtained from S by replacing VS with the function on V (S) associating each node n that is both in Q and S with VQ (n) ∩ VS (S) if such intersection is non- I’m looking for a man who lives in a beautiful house empty and with {>} otherwise, and each other node m of S with VS (m). The last operation we introduce is “substitu- owned by a rich person. tion” which, for a concept name c in the output of getSupers (generalisation) or getEquiv or getSubs (specialisation), is defined as follows: Figure 7: Textual representation of a query in natural language e c) := delete(Q0 , S, E) substitute(Q, S, E, e , Hovering where Q0 is the query obtained from Q by adding c to the set In graphical user interfaces terminology, the user hovers on of concept names associated with the root of S. a graphic element whenever the mouse cursor moves from some point outside the element to some point inside the ele- Properties of the framework ment. In normal conditions, as the user hovers on the query, We will now formally state and prove that, starting from an the system gives visual hints about its structure: atomic query that is satisfiable, the query obtained by means of the operations in the Query Tool’s functional API is satis- • hovering on the span associated with a tag of some node n fiable. In order to do that, we first prove that the operations causes the span to become lightly highlighted, along with for query manipulation preserve satisfiability, i.e., the appli- all the spans associated with the tags occurring in the com- cation of each of them to a satisfiable query results in a query plete subquery rooted in n; that is satisfiable. • hovering on the span associated with the tag of some edge L EMMA 1. Each of the operations addComp, substitute, e causes that span and all the spans associated with the addRel, weaken and delete preserves query satisfiability. elements of tags ter(e) to become lightly highlighted. P ROOF S KETCH . Let Q be a satisfiable query and Q0 re- The highlighting is such that spans associated with different sult from the application of one the above operations. In the tags are visualized as distinct, even when adjacent. One way case of deletion, weakening and substitution with an equiv- of obtaining this kind of effect is, for instance, by rounding alent or more generic term, the resulting query Q0 is equiva- the corners of the highlighted rectangular area around each lent or more general than the input query Q, which is satisfi- span. The only case in which highlighting on hovering does able by assumption. Therefore, Q0 is also satisfiable. In the not trigger is when a menu is being displayed. case of addition of a new term/relation and substitution with a more specific term, the satisfiability of Q0 is ensured by the satisfiability of Q and the explicit check for compatibility in I’m looking for a man who lives in a beautiful house the definitions of addComp, addRel and substitute. owned by a rich person. The fundamental property of the Query Tool is then proved by means of a simple induction. Figure 8: Hovering on the span “house” T HEOREM 1. The query obtained from an initial satisfi- able atomic query through a finite sequence of applications Associated with each node of the query is a button, called the of the operations addComp, addRel, substitute, weaken and add-button, which is displayed as a blue onion with a white delete is satisfiable. cross inside and located below the text baseline immediately after the rightmost span associated with a tag of that node. P ROOF. By an easy induction on the sequence of applica- Hovering on the add-button of some node lightly highlights tions of the operations: the base case holds by assumption, all the spans associated with tags of that node. while the inductive step follows directly from Lemma 1. I’m looking for a man who lives in a beautiful house THE USER INTERFACE In this section we describe a concrete UI for the Query Tool, based on natural language generation. In the UI, the query is owned by a rich person. represented as a continuous string of natural language text, composed of a sequence of coherent text constituents called Figure 9: Hovering on the add-button at the right of “house” spans. Each of the tags occurring in the query is associated with a span by means of an injective mapping. As for each edge there is one and only one corresponding edge tag, if a Sticky edges span is associated with the tag of an edge we simply say that An edge can be marked as sticky by clicking on the span as- the span is associated with that edge. sociated with it, at which point the text span is rendered in 5 boldface. Clicking on a span associated with a sticky edge well as a complete selection. We consider atomic selections unmarks the edge as sticky and reverts the text span to its de- to have the lowest priority and complete selections the high- fault (non-bold) representation. Spans associated with node est, and when a simple selection belongs to more than one tags are not visually affected by sticky edges. class, it is considered to be only of the type with higher pri- ority. Furthermore, whenever a double click would result in the same kind of selection a single click would, it yields a I’m looking for a man who lives in a beautiful house complete selection instead. owned by a rich person. A complex selection (non-simple) is obtained from an empty or simple selection by control-clicking (i.e., clicking while pressing the CTRL key on the keyboard) on additional spans Figure 10: Clicking on “lives in” marks the associated edge as sticky associated with node tags, which are consequently included in the existing selection. Note that a complex selection can Selection be disconnected, in the sense that it is not a well-formed sub- The UI provides facilities to easily select portions of the query from the formal point of view, because there might be query. A simple selection can be directly specified by click- two selected nodes that are not connected by an edge. Exam- ing on the span associated with a tag of some node n in one ples of connected and disconnected complex selections are of the following ways: given in Figure 12. • a single click results in an atomic selection, highlighting only the span on which the click occurred; I’m looking for a man who lives in a beautiful house • a double click results in a node selection, highlighting all owned by a rich person. the spans associated with the elements of tags(n); (a) • a triple click results in a complete selection, highlighting all the spans in the complete subquery rooted in n. I’m looking for a man who lives in a beautiful house An example of each of these different types of simple selec- tion is shown in Figure 11. owned by a rich person. (b) I’m looking for a man who lives in a beautiful house Figure 12: Examples of (a) connected and (b) disconnected complex selection owned by a rich person. (a) From the graphic point of view, spans associated with tags in a selection are highlighted in a stronger way (e.g., a darker color) than they are when highlighted because of hovering I’m looking for a man who lives in a beautiful house and, unlike the highlighting effect triggered by hovering, it is not possible to distinguish between adjacent selected spans owned by a rich person. associated with the same node. When the selection includes one or more paths between nodes (that is, all of the nodes in (b) a path within the query are selected), spans associated with edge tags are also highlighted. The visual appearance of the I’m looking for a man who lives in a beautiful house spans associated with tags of selected nodes or edges does not change as the result of an hovering event, as shown in owned by a rich person. Figure 13. Moreover, as the reader might already have no- ticed, when a non-empty selection is present, the add-buttons (c) become invisible without changing the layout of the text. Figure 11: Simple selections obtained by (a) single, (b) double and (c) triple clicking on the span “house” I’m looking for a man who lives in a beautiful house A selection can be cleared by clicking on an area of the UI owned by a rich person. where clicking does not have any other effect (e.g., on the white space between the lines of text representing the query, Figure 13: Hovering on either “beautiful” or “house” in presence of a or on a span that is not associated with any tag). Clearing a non-empty selection selection results in an empty selection. Observe that when a node has only one node label, an atomic Addition selection on that node happens to be also a node selection, The query logic sub-system provides two operations, namely and when a node has no children, a node selection is also a addComp and addRel, for refining a query through the addi- complete selection. Thus, an atomic selection on a node hav- tion of compatible terms and relations to a focus node. The ing only one label and no children is also a node selection as UI makes these operations available to the user by means of 6 a pop-up menu, activated by clicking on the add-button of a Note that the operations weaken and delete, as defined in our node which is set as the focus. functional API, cannot directly handle a disconnected com- plex selection. However, such a selection can be decom- The menu contains a list of suitable arguments for the invo- posed by the UI in a series of connected selections that are cation of either addComp or addRel. The menu entries are then suitable for the actual invocation of the two operations. concept names and pairs consisting of a role name and a con- cept name, which are obtained from the output of the QueLo Observe that selected nodes can be deleted even if they have operations getComp and getRel w.r.t. the current query and an incoming sticky edge; in other words, the result of dele- focus. In particular, for a query Q focused in n, the menu is tion is the same independently of the presence of sticky edges populated with the nodes in the graph resulting from disjoint in the selection. Note also that in some cases deletion pro- union of the output graphs of getComp(Q, n) and getRel(Q, duces the same result as weakening (e.g., for a node selection n), arranged in the following way: rooted in a non-leaf node). • nodes with no incoming edge populate a menu of level 0, Substitution which is the topmost menu, where entries corresponding The QueLo sub-system provides the operation substitute in to concept names are listed before entries associated with order to allow the substitution of a selection within the query pairs of concept/role names. with a more generic, equivalent or more specific term. The • for each node n in a menu at level k, all the nodes that are UI makes this operation available to the user: upon long- reachable from n in one step populate a sub-menu at level clicking on a selected portion (i.e., strongly highlighted) of k + 1 associated with entry n. the query a pop-up menu is displayed, listing all the possi- ble terms with which the selection can be replaced. Such a The actual items shown to the user in the above menu struc- menu is populated with concept names that are more gen- ture are natural language descriptions of the node-entries (ei- eral than, equivalent to and more specific than the selection ther concept names or atomic concept/role pairs) generated and that are retrieved from the QueLo sub-system by means by the NLG sub-system. of the operations getSupers, getEquiv and getSubs, respec- tively. More general terms are shown at the top of the list, Some of the items come with an icon on their left: an upward- equivalent terms in the middle and more specific terms at pointed (resp., downward-pointed) triangle is displayed for the bottom. At the left of each item an icon is shown: an concept names (resp., role/concept pairs) indicating that the upward-pointing triangle for more general terms, a square option is associated with a nested sub-menu containing more for equivalent terms and a downward-pointing triangle for specific (resp., more generic) options of the same type. Hov- more specific terms. ering on any of these options opens the pop-up menu associ- ated with that item and displayed next to it. An example of The substitution menu has a similar hierarchical structure as the structure of the addition menu is given in Figure 14. the menu for addition, reflecting the taxonomic information in the output graphs of the operations getSupers, getEquiv and getSubs. In particular, some (possibly none) of the more general terms might be further generalised, in which case hovering on one such item triggers a sub-menu containing its direct super-concepts; similarly, if some of the more specific terms can be further specialised, then hovering on one such item triggers a sub-menu containing its direct sub-concepts that are compatible with the query. The same rules for fur- ther generalisation/specialisation apply to the items in the sub-menus, while equivalent terms (if any) cannot be further Figure 14: Structure of the addition menu generalised nor specialised. Clicking on any of the elements in the menu triggers the in- vocation of either addComp or addRel, according to whether the clicked item is associated with a concept name or a con- cept/role pair, respectively. Upon clicking, the menus disap- pear and the UI updates its representation of the query af- ter the necessary changes are performed by the QueLo sub- system. Figure 15: Structure of the substitution menu Weakening and Deletion As in the case of addition, the actual items shown to the user The user can weaken (respectively, delete) a selected portion in the substitution menu are natural language descriptions of the query by pressing the backspace (resp., delete) key on generated by the NLG sub-system, rather than bare concept the keyboard, which invokes the QueLo operation weaken names. Clicking on any of given options triggers the invoca- (resp., delete) with the current query and selection as input tion of substitute with the current selection and the concept arguments. Upon weakening (resp., deletion), the selection name associated with the clicked item as input arguments. is cleared and the UI updates its representation to reflect the The selection is then cleared and the UI updates the repre- changes in the query. sentation of the query after the selected elements have been 7 replaced with the chosen term. Informally, the conditions that a strict total order must satisfy to qualify as a linearisation state that “the tag of each edge Observe that the operation substitute, as defined in our func- is (1a) preceded by all the tags of its initial node and (1b) tional API, cannot deal with disconnected complex selection followed by at least one of the tags of its terminal node” and and, unlike the case of weakening and deletion, the problem that (1c) “between any two tags of a node there can only be cannot be overcome by converting such selection in a series (distinct) tags of the same node”. Moreover, from the above of connected ones. This is due to the fact that substitution re- conditions it can be also proved that “the tag of each edge is lies on the roll-up of the input selection itself, which is thus followed by all the tags of its terminal node”. required to be tree-shaped (i.e., connected). For this reason, in the presence of a disconnected selection, the substitution In general, without further restrictions, a query may admit operation is disabled. more than one linearisation, but all of them share some com- mon properties. In particular, each linearisation C of a query NATURAL LANGUAGE RENDERING Q is such that the minimal element is always a tag of the root The natural language interface of the Query Tool masks the and for each subquery S of Q the restriction of C on tags(S) composition of a precise query as the composition of English is itself a linearisation of S. Moreover, every linearisation is text describing the equivalent information needs. Interfaces compatible with the tree-order of the query, that is, the tags following this paradigm are known as “menu-based natural of a node n precede the tags of a node m in the linearisation language interfaces to databases”. The users of such systems whenever there is a path from a n to m in the query. edit a query by composing fragments of generated natural language provided by the system through contextual menus. A strict1 total order on the node set of a tree can be obtained by fixing a total order on the children of each node and per- In what follows, we describe how the natural language ren- forming a complete visit of the tree according to a chosen dering of a query is achieved. We start by defining a particu- traversal strategy. If the tree is visited using a depth-first or lar linear form of the query that satisfies certain constraints, breadth-first traversal, then the resulting order is an exten- necessary to represent the elements of the query using a lin- sion of the partial order induced by the tree itself on its node ear medium, that is, text. The constraints are enforced at the set. Thus, a linearisation of a query can always be obtained API level to ensure that different graphical user interfaces by additionally fixing a total order on the labels associated represent the query in a homologous way. Moreover, a con- with each node. For our purposes, we decided to use a depth- sistent ordering of the query elements needs to be preserved first traversal where the labels (resp., children) of a node are during the operations for query manipulation to avoid con- ordered according to the sequence (resp., inverse sequence) fusing the end user. The linearised version of the query is of applications of the operation addComp (resp., addRel) to then used as a guide for the language generation performed that node as focus. The motivation for this choice lies in the by the Query Tool’s NLG engine. fact that, for cognitive reasons, we want the query to change as little as possible from the visual point of view and in such a way that changes are restricted to a limited portion of the Linearisation of a query query. As a result, every execution of addComp or addRel Since a concept name can be in general associated with more yields one or more new tags whose place in the linearisation than one node in the same query, we introduce the notion of is right after all the node tags relative to the node given as tag: given a query Q, a node tag is a pair hn, ci where n ∈ input argument to the operation. V (Q) and c ∈ V(n), while an edge tag is a pair he, E(e)i such that e ∈ E(Q). The set of tags relative to a node n is given by tags(n) := {hn, ci | c ∈ V(n)}, while the set of all Natural Language Generation module tags occurring in a query Q is defined as follows: The natural language interface (NLI) of the Query Tool re- [ lies on a natural language generation (NLG) system to pro- tags(Q) := {he, E(e)i | e ∈ E(Q)} ∪ tags(n) duce the textual representation of the query, following an n∈V (Q) idea first presented in [12] and lately refined in [8]. NLG systems use techniques from artificial intelligence and A linearisation of a query Q is a strict total order C (that is, computational linguistics to produce human-readable texts a binary relation that is asymmetric, transitive and total) on out of machine-readable data. The Query Tool uses NLG to tags(Q) such that, for each edge e ∈ E(Q), the following represent the whole query, along with all the elements that conditions hold: the user can use to refine it, as English text. The generated ∀t ∈ tags(init(e)), t C he, E(e)i ; (1a) text is enriched with links that connect it to the underlying logical form of the query. This allows the user to operate on ∃t ∈ tags(ter(e)), he, E(e)i J t ; (1b) the query simply by editing an English text. and, for each node n ∈ V (Q), it is the case that: Unlike most NLG systems, ours is built to let the user deter- ∀t1 , t2 ∈ tags(n), t1 C t C t2 =⇒ t ∈ tags(n) . (1c) mine the structure of the generated text by inserting, replac- ing and removing snippets of it. Thus, while in the classic where we use the expression t1 J t2 for indicating that t1 C NLG pipeline the information to be conveyed in the text and t2 and there does not exist t ∈ tags(Q) for which t1 C t C its order is determined by the document planning module, in t2 . In such case we say that t1 immediately precedes t2 or, equivalently, that t2 immediately follows t1 . Note that C is 1 We only consider orders that are strict (i.e., irreflexive) unless ex- the transitive closure of the relation J. plicitly stated otherwise. 8 the Query Tool it is the user who decides both the informa- as surface realisation, produces a list of text tokens, some tion to be displayed and its arrangement. of which are connected to edge or node labels. This list is finally fed to the GUI, that displays it to the user. As the Query Tool is not tailored to any specific domain, its NLG module is simple enough to be adopted in any context Elements populating the menu for addition and substitution and it is not bundled with all the resources that are needed operations undergo a similar processing. To produce the tex- to generate text out-of-the-box. Therefore, in order to use it tual representation of such an element, the system makes a on a specific knowledge base, the system must be provided temporary copy of the portion of query affected by the oper- with a lexicon and a template map. The former contains the ation. The operation is then carried out on this portion and words to be used in the generated text; the latter is the bridge the resulting structure is fed to the generation pipeline used between the natural language and the knowledge represen- for entire queries. The outcome of the generation process is tation language, associating each concept/role name with a the text which will appear on the menu. generation template. Each such template contains the syn- tactic and lexical information necessary to generate a frag- ment of text representing the associated concept or role. Automated generation of lexicon and template map For the Query Tool’s NLI to work with a specific knowledge We selected the syntactic features available in the templates, base (KB) a lexicon and a template map must be provided hence supported by the generator, in order to keep the sys- for it. Devising these resources requires an understanding tem simple while still being expressive enough. For this of both the domain of interest (DOI) and basic linguistic no- purpose, we collected and analysed a corpus of more than tions such as verb tenses, noun genders and countability. To 12.000 unique relation identifiers and we partitioned them ease the burden of developing these resources from scratch, according to the recurring syntactic patterns. For each class we experimented with a computational technique to have the of the partition, we then proposed a common natural lan- system generate them automatically. This technique follows guage representation template. The result of this study is a an approach to domain independent generation proposed in set of simple but effective templates for representing most [11]. The functionality we implemented allows to produce ontology relations using natural language. all the resources necessary to configure our NLI for use with a new KB, using as a source of data the ontology itself. How- During the first stage (microplanning) of the generation, lin- ever, the process is not completely reliable, therefore system guistic information stored in the template map and in the lex- engineers must review the result and make the necessary cor- icon blends with the logic information encoded in the query rections. In the following, we describe the technique to han- into a single structure, known in the NLG literature as text dle relation identifiers, but we adapted it to handle concept specification and consisting of a list of syntactic trees with identifiers as well. inflected lexemes on its leaves. The NLG system operates on this structure to aggregate groups of adjacent syntactic struc- The idea is based on the observation that KBs already con- tures into single more complex structures, and to select and tain some form of linguistic information. In real-world on- replace existing referring expressions with more appropriate tologies, every concept and relation has a unique identifier ones. These two tasks are known in the literature as aggre- (ID), which most of the times is not just an arbitrary string, gation and referring expressions generation, respectively. At but a mnemonic chosen by the knowledge engineer to de- the same time, the system keeps track of which element of scribe the intended meaning of the identified concept or re- the text specification is associated with which element (ei- lation. Moreover, within these IDs, certain syntactic patterns ther a node tag or an edge tag) of the query. An association occur more frequently than others. holds when the syntactic element is the result of the instanti- ation of a template associated with the element of the query. In our approach, each relation ID is first tokenized according These associations are used for enriching the generated text to an algorithm that takes advantage of the naming conven- with links to the underlying query. tions used by ontology engineers. Second, the tokenized ID is fed to a custom part-of-speech tagger built around QTAG The linearisation of the query simplifies the effort required [14]. The resulting tagged tokenized ID is then lightly pre- by the referring expressions generation, as referring expres- processed before being finally passed to a transformation sions that need to be reworked always appear in subject posi- rule, chosen among thirteen different ones, that produces a tion. Our algorithm replaces a subject with a pronoun when- template for the template map of the NLG system. ever the previous sentence had the same subject, otherwise the subject is left unchanged. Although ambiguous expres- For the design of the transformation rules, we analysed our sions may occur, ambiguity is not a crucial issue as these ex- corpus, containing more than 12.000 relation IDs, in order pressions originate from user operations upon a selected el- to devise a partition of the domain in terms of syntactic pat- ement, which always becomes the target of the referring ex- terns. The classes defined in this partition are s.t. to each pression. Our aggregation module performs relatively sim- relation of the same class can be applied a simple transfor- ple aggregation tasks such as aggregating sentences with the mation in order to obtain a template. Each such transforma- same subject, eliding the subject and parts of the verb when- tion is also a uniform interpretation of the intended meaning ever it is feasible. of each relation ID in the class. Some care is needed when giving a uniform interpretation to syntactic patterns, as there are situations in which the same syntactic pattern is to be Once these operations are completed, the text specification is interpreted differently. For instance, the relation IDs “coun- ready to be transformed into the final text. This task, known try of nationality” and “language of country” share the same 9 syntactic structure, but the first relation should be read as eventuality that an edge associated with an attribute has to “the country of nationality of X is Y”, while the second as be inverted (going from the datatype to the subject). Though “the language of X is Y”. Each of the thirteen rules we de- apparently simple, allowing for attributes poses some inter- fined corresponds to one class of the partition, and together esting questions, that we are currently investigating, in order they can handle 93% of the relations of the average ontology. to deal with concrete values from the point of view of rea- soning. The system has been tested with some of the ontologies [9] developed by the national mapping agency of Great Britain The work in this paper has been partially supported by the and with the Pizza Ontology [6] by the University of Manch- European project ONTORULE. ester, contributing 64 unique relations in total. From the IDs of these relations we automatically generated relation tem- plates, which were then inspected in order to evaluate their usability in text generation. The result of the evaluation re- REFERENCES vealed that for 42 out of 64 relations (65%) the generated 1. T. Catarci, T. Di Mascio, P. Dongilli, E. Franconi, G. Santucci, and S. Tessaris. An ontology based visual tool for query formulation template is suitable for direct use with the Query Tool’s NLI. support. In Proc. 16th Eur. Conf. Artificial Intelligence, 2004. The result suggests that although the generation of the tem- 2. T. Catarci, T. Di Mascio, P. Dongilli, E. Franconi, G. Santucci, and plate map is not totally reliable, it is nevertheless useful in S. Tessaris. An Ontology-Based Query Manager: Usability that it speeds up the work of systems engineers, as they do Evaluation. In Proc. HCITALY 2005, 2005. not need to create the whole map from scratch, but only have 3. T. Catarci, T. Di Mascio, P. Dongilli, E. Franconi, G. Santucci, and to review the generated map and repair eventual errors. This S. Tessaris. Usability evaluation in the SEWASIE (SEmantic Webs improves the portability of the Query Tool’s NLI, making it and AgentS in Integrated Economies) project. In Proc. 11th Int. Conf. faster and easier to switch to a different knowledge base. on Human-Computer Interaction, 2005. 4. P. Dongilli. Natural language rendering of a conjunctive query. Technical Report KRDB08-3, KRDB Research Centre, Free CONCLUSIONS AND FUTURE WORK University of Bozen-Bolzano. http: //www.inf.unibz.it/krdb/pub/TR/KRDB08-3.pdf, June A Java implementation of the Query Tool has been devised, 2008. including the query logic and natural languages sub-systems 5. P. Dongilli, E. Franconi, and S. Tessaris. Semantics driven support for and a fully functional user interface as described above. The query formulation. In Proc. of the 2004 Int. Workshop on Description Query Tool was implemented for the first time in the context Logics, 2004. of the European project SEWASIE [10] as a Java web-based 6. N. Drummond, M. Horridge, R. Stevens, C. Wroe, and S. Sampaio. application, later on converted into a stand-alone applica- Pizza ontology. The University of Manchester. tion [15]. Important optimisations were then introduced (see http://www.co-ode.org/ontologies/pizza/. [16]) with the purpose of minimising the number of reasoner 7. P. Guagliardo. Theoretical foundations of an ontology-based visual calls and thus improving the responsiveness of the tool. A tool for query formulation support. Technical Report KRDB09-5, difference between our implementation of the Query Tool KRDB Research Centre, Free University of Bozen-Bolzano. http: and the previous one is that while the latter uses a DIG-based //www.inf.unibz.it/krdb/pub/TR/KRDB09-05.pdf, October 2009. reasoning engine, the former takes advantage of the OWL- API. Most importantly, our implementation follows the for- 8. C. Hallett, D. Scott, and R. Power. Composing questions through conceptual authoring. Computational Linguistics, 33(1):105–133, mal specification given in the Query Tool’s framework and 2007. functional API and complies with them. 9. Ordnance Survey - Great Britain’s national mapping agency. http: //www.ordnancesurvey.co.uk/oswebsite/ontology/. A usability evaluation of the previous Query Tool’s imple- 10. The SEmantic Webs and AgentS in Integrated Economies (SEWASIE) mentation was carried out in past work [3, 2] with the pur- project. http://www.sewasie.org/, 2005. pose of measuring its complexity of use from the user’s point 11. X. Sun and C. Mellish. Domain independent sentence generation from of view. In particular, the study aimed at determining how RDF representations for the Semantic Web. In Proc. ECAI’06 difficult it is for the user to formulate queries using the Query Combined Workshop on Language-Enhanced Educational Technology Tool and to understand the results. The outcome of the ex- and Development and Evaluation of Robust Spoken Dialogue Systems, periments showed that the tool can be easily used also by 2006. non-experienced users to query a domain in which they have 12. H. R. Tennant, K. M. Ross, R. M. Saenz, C. W. Thompson, and J. R. no special expertise. Miller. Menu-based natural language understanding. In Proc. 21st Annual Meeting of the Association for Computational Linguistics, pages 151–158. Association for Computational Linguistics, 1983. The framework and functional API presented in this paper consider only the addition of new relations to the query, but 13. M. Trevisan. A portable menu-guided natural language interface to knowledge bases. Master’s thesis, University of Groningen, 2009. they could be extended to deal with attributes (i.e., properties relating a concept to a datatype) as well. The only difference 14. D. Tufis and O. Mason. Tagging Romanian texts: a case study for QTAG, a language independent probabilistic tagger. Proc. 1st Int. with the current framework would be that a node associated Conf. on Language Resources and Evaluation (LREC’98), pages with a datatype (i.e., the “range” of the attribute) cannot be 589–596, 1998. the focus of a query for operations other than deletion. This 15. I. Zorzi. An Ontology-Based Visual Tool for Query Formulation basically means that such a node is always a leaf of the query Support. Bachelor’s thesis, Faculty of Computer Science, Free tree and the only operation allowed on it is deletion. Then, University of Bozen-Bolzano, 2005. since a node of this kind cannot be refined by adding a com- 16. I. Zorzi. Improving Responsiveness of Ontology-Based Query patible term or attaching a new property, the query is never Formulation. Master’s thesis, Faculty of Computer Science, Free rolled-up with respect to it, thus avoiding the nonsensical University of Bozen-Bolzano, March 2008. 10