<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CLOVA: An Architecture for Cross-Language Semantic Data Querying</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>John McCrae</string-name>
          <email>jmccrae@cit-ec.uni-</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jesu´ s R. Campan˜ a</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philipp Cimiano</string-name>
          <email>cimiano@cit-ec.uni-</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer</institution>
          ,
          <addr-line>Science and Artificial, Intelligence</addr-line>
          ,
          <institution>University of Granada</institution>
          ,
          <addr-line>Granada</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Semantic Computing Group, CITEC, University of Bielefeld</institution>
          ,
          <addr-line>Bielefeld</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <fpage>26</fpage>
      <lpage>30</lpage>
      <abstract>
        <p>Semantic web data formalisms such as RDF and OWL allow us to represent data in a language independent manner. However, so far there is no principled approach allowing us to query such data in multiple languages. We present CLOVA, an architecture for cross-lingual querying that aims to address this gap. In CLOVA, we make a distinction between a language independent data layer and a language independent lexical layer. We show how this distinction allows us to create modular and extensible cross lingual applications that need to access semantic data. We specify the search interface at a conceptual level using what we call a semantic form speci cation abstracting from speci c languages. We show how, on the basis of this conceptual speci cation, both the query interface and the query results can be localized to any supported language with almost no e ort. More generally, we describe how the separation of the lexical layer can be used with a principled ontology lexicon model (LexInfo) in order to produce application-speci c lexicalisations of properties, classes and individuals contained in the data.</p>
      </abstract>
      <kwd-group>
        <kwd>Multilingual Semantic Web</kwd>
        <kwd>Ontology Localisation</kwd>
        <kwd>Software Architecture</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>
        Data models and knowledge representation formalisms in
the Semantic Web allow us to represent data without
reference to natural language1. In order to facilitate the
interaction of human users with semantic data, supporting
languagebased interfaces in multiple languages is crucial. However,
currently there is no principled approach supporting the
access of semantic data across multiple languages. To ll this
gap, we present in this paper an architecture we call CLOVA
(Cross-Lingual Ontology Visualisation Architecture) designed
for querying semantic data in multiple languages. A
developer of a CLOVA application can de ne the search interface
independently of any natural language by referring to
ontological relations and classes within a semantic form speci
cation (SFS), which represents a declarative and conceptual
representation of the search interface with respect to a given
ontology. We have designed an XML-based language which
is inspired by the Fresnel language [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for this purpose. The
search interface can then be automatically localised by the
use of a lexicon ontology model such as LexInfo [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], enabling
the system to automatically generate the form in the
appropriate language. The queries to the semantic repository are
generated on the basis of the information provided in the SFS
and the results of the query can be localised using the same
method as used for the localisation of the search interface.
The CLOVA framework is generic in the sense that it can
be quickly customised to new scenarios, new ontologies and
search forms and additional languages can be added without
changing the actual application, even at run time if we desire.
      </p>
      <p>The paper is organised as follows. Section 2 describes state
of the art on information access across languages and points
out basic requirements for cross lingual systems. Section 3
describes the CLOVA framework for rapid development of
cross-lingual search applications accessing semantic data. We
conclude in Section 4.
2.</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>Providing access to information across languages is an
important topic in a number of research elds. While our work
is positioned in the area of the Semantic Web, we discuss
work related to a number of other research areas, including
databases, cross-language information retrieval as well as
ontology presentation and visualisation.
1This holds mainly for RDF triples with resources as subjects
and objects. String data-type elements are often
languagespeci c.
2.1</p>
    </sec>
    <sec id="sec-4">
      <title>Database Systems</title>
      <p>
        Supporting cross-language data access is an important topic
in the area of database systems, albeit one which has not
received very prominent attention (see [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). An important issue
is certainly the one of character encoding as we need to
represent characters for di erent languages. However, most of
the current database systems support Unicode so that this
issue is not a problem anymore. A more complex issue is the
representation of content in the database in such a way that
information can be accessed across languages. There seems
to be no consensus so far on what the optimal
representation of information would be such that cross-language access
can be realised e ectively and e ciently. One of the basic
requirements for multilingual organisation of data mentioned
by Kumaran et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is the following:
\The basic multilingual requirement is that the database
system must be capable of storing data in multiple
languages."
      </p>
      <p>This requirement seems de nitely too strict to us as it
assumes that the representation of data is language-dependent
and that the database is supposed to store the data in
multiple languages. This rules out language-independent
approaches which do not represent language-speci c
information in the database at all.</p>
      <p>The following requirement by Kumaran et al. is one we can
directly adhere to:</p>
      <sec id="sec-4-1">
        <title>Requirement 1 (Querying in multiple languages)</title>
        <p>Data must be queriable using query strings in any
(supported) language.</p>
        <p>In fact, we will refer to the above as Requirement 1a and
add the following closely related Requirement 1b: `The
results of a query should also be presented in any (supported)
language.' Figure 1 summarises all the requirements
discussed in this section. However, it does not strictly follow
from this that the data should be stored in multiple
languages in the database. In fact, it su ces that the front end
that users interact with supports di erent languages and is
able to translate the user's input into a formal
(languageindependent) query and localises the results returned by the
database management system (DBMS) into any of the
supported languages.</p>
        <p>A further important requirement by Kumaran et al. we
subscribe to is related to interoperability:</p>
      </sec>
      <sec id="sec-4-2">
        <title>Requirement 2 (Interoperability)</title>
        <p>The multilingual data must be represented in such a way that
it can be exchanged across systems.</p>
        <p>This feature is certainly desirable. We will come back to
this requirement in the context of our discussion of the
Semantic Web (see below). The next two requirements
mentioned by Kumaran et al. are in our view questionable as
they assume that the DBMS itself has built-in support for
multiple languages:</p>
      </sec>
      <sec id="sec-4-3">
        <title>String equality across scripts: A Multilingual database</title>
        <p>system should support lexical joins allowing to join
information in di erent tables even if the relevant
attributes of the join are in di erent scripts.</p>
        <p>Linguistic equivalences: Multilingual database
systems should support linguistic joins which exploit
prede ned mappings between attributes and values across
languages. For example, we might state explicitly that
the attributes \marital status" (in English) and
\Familienstand" are equivalent and that the values \married"
and \verheiratet" are equivalent.</p>
        <p>In fact, those two requirements follow from Kumaran et
al.'s assumption that the database should store the data in
multiple languages. If this is the case then we certainly have
to push all the cross-language querying functionality into the
DBMS itself. This is rather undesirable from our point of
view as every time a new language is added to the system,
the DBMS needs to be modi ed to extend the linguistic and
lexical equivalences. Further, the data is stored redundantly
(once for every language supported). Therefore, we
actually advocate a system design where the data is stored in a
language-independent fashion and the cross-lingual querying
functionality as well as result localisation is external to the
DBMS itself, implemented as pre- and post-processing steps,
respectively.</p>
        <p>In fact, we would add the following requirement to any
system allowing to access data across languages:</p>
      </sec>
      <sec id="sec-4-4">
        <title>Requirement 3 (Language Modularity)</title>
        <p>The addition of further languages should be modular in the
sense that it should not require the modi cation of the DBMS
or in uence the other languages supported by the system.</p>
        <p>As a consequence, the capability of querying data across
languages should not be speci c to a certain implementation
of a DBMS but work for any DBMS supporting the data
model in question.</p>
        <p>
          One of the important issues in representing information in
multiple languages is avoiding redundancy (see [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]). Hoque
et al. indeed propose a schema to give IDs to every piece
of information and then include the language information in
a dictionary table. This is perfectly in line with Semantic
Web data models (RDF in particular) where URIs are used
to uniquely identify resources. Dictionaries can then be
constructed expressing how the elements represented by the URIs
are referred to across languages. This thus allows to
conceptually separate the data from the dictionary. This is a crucial
distinction that CLOVA also adheres to (see below).
2.2
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Cross-language Information Retrieval</title>
      <p>
        In the eld of information retrieval, information access across
languages has also been an important topic, mainly in the
context of the so called Cross-Language Evaluation Forum2
(see [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] for the proceedings of CLEF 2008). Cross-language
information retrieval (CLIR) represents an extreme case of
the so called vocabulary mismatch problem well-known from
information retrieval. The problem, in short, is the fact that
a document can be highly relevant to a query in spite of not
having any words in common with the query. CLIR
represents an extreme case in the sense that if a query and a
document are in di erent languages, then the word overlap
and consequently every vector-based similarity measure will
be zero.
2http://www.clef-campaign.org/
      </p>
      <p>In CLIR, the retrieval unit is the document, while in database
systems the retrieval unit corresponds to the information
units stored in the data base. Therefore, the requirements
with respect to multilinguality are rather di erent for CLIR
and multilingual database systems.
2.3</p>
    </sec>
    <sec id="sec-6">
      <title>Semantic Web</title>
      <p>Multilinguality has been so far an underrepresented topic
in the Semantic Web eld. While on the Semantic Web we
encounter similar problems as in the case of databases, there
are some special considerations and requirements. We will
consider further important requirements for multilinguality
in the context of the Semantic Web. Before, we introduce the
crucial distinction between the data layer (proper) and the
lexical layer. We will see below that the conceptual
separation between the data and the dictionary is even more
important in the context of the Semantic Web. According to our
distinction, the data layer contains the application-relevant
data while the lexical layer merely contains information about
how the data is realised/expressed in di erent languages and
acts like a dictionary. We note that this distinction is a
conceptual one as the data in both layers can be stored in the
same DBMS. However, this might not always be possible in
a decentralised system such as the Semantic Web:</p>
      <sec id="sec-6-1">
        <title>Requirement 4 (Data and Lexicon Separation)</title>
        <p>We require a clear separation between the data and lexicon
layer in the Semantic Web. The addition of further languages
should be possible without modifying the data layer. This
means that the proper data layer and the lexical layer are
cleanly separated and data is not stored redundantly.</p>
        <p>In the Semantic Web, the parties interested in accessing
a certain data source are not necessarily its owners (in
contrast to standard centralised database systems as considered
by Kumaran et al.). As a corollary it follows that if a user
requires access to a data source in language x he might not have
the permission to enrich the data source by data represented
in the language x.</p>
        <p>A further relevant requirement in the context of the
Semantic Web is the following:</p>
      </sec>
      <sec id="sec-6-2">
        <title>Requirement 5 (Sharing of Lexica)</title>
        <p>Lexica should be represented declaratively and in a form
which is independent of speci c applications such that it can
be shared.</p>
        <p>It is very much in the spirit of the Semantic Web that
information should be interoperable and thus reusable beyond
speci c applications. Following this spirit, it seems desireable
that (given that data representation is language-independent)
the language-speci c information how certain resources are
expressed in various languages can be shared across systems.
This can be accomplished by declaratively described lexica
which can be shared.</p>
        <p>
          Multilinguality has been approached in RDF through the
use of its label property, which can assign labels with
language annotations to URIs. The SKOS framework [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] further
expands on this by use of prefLabel, altLabel, hiddenLabel.
These formalisms are su cient for providing simple
representation of language information. However, as more complex
lexico-syntactic information is required, in turn more
complex representations are necessary. A more formal
distinction of the \data layer" and \lexical layer" is provided by
lexicon ontology models of which the most prominent models
are the Linguistic Information Repository (LIR) and LexInfo
(see [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]).
2.4
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Ontology Presentation and Visualisation</title>
      <p>
        Fresnel [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is a display vocabulary that describes methods of
data presentation in terms of lenses and formats. In essence
the lens in Fresnel selects which values are to be displayed
and the format selects the formatting applied to each part of
the lens. This provides many of the basic tools for presenting
semantic web data. However it does not represent
multilinguality within the vocabulary and it is not designed to present
a queriable interface to the data. There exist many forms of
ontology visualisation methods through the use of trees, and
other structures to display the data contained within the
ontology, a survey of which is provided in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. These are of
course focussed mainly on displaying the structure of the
ontology and do not attempt to convert the ontology to
natural language. Furthermore, for very large data sources, it is
impractical to visualise the whole ontology at one time and
hence we wish only to select a certain section of it and hence
require a query interface to perform this task.
3.
      </p>
    </sec>
    <sec id="sec-8">
      <title>MULTILINGUAL ACCESS AND QUERY</title>
    </sec>
    <sec id="sec-9">
      <title>ING USING CLOVA</title>
      <p>CLOVA addresses the problem of realising localised search
interfaces on top of language-independent data sources,
abstracting the work ow and design of a search engine and
providing the developer with a set of tools to de ne and
develop a new system with relatively little e ort. CLOVA
abstracts lexicalisation and data storage as services, providing
a certain degree of independence from data sources and
multilingual representation models.</p>
      <p>The di erent modules of the system have been designed
with the goal of providing very speci c, non-overlapping and
independent tasks to developers working on the system
deployment concurrently. User interface de nition tasks are
completely separated from data access and lexicalisation,
allowing developers of each module to use di erent resources
as required.</p>
      <p>CLOVA as an architecture does not ful l any of the
aforementioned requirements (as they should be ful lled by
lexicalisation services), but provides a framework to fully exploit
cross-lingual services meeting these requirements. The
application design allows to separate conceptual representations
from language dependant lexical representations, making user
interfaces completely language independent in order to later
localise them to any supported language.
3.1</p>
    </sec>
    <sec id="sec-10">
      <title>System Architecture</title>
      <p>The CLOVA architecture is designed to enable the
querying of semantic data in a language of choice, while still
presenting queries to the data source in a language-independent
form. CLOVA is modular, reusable and extensible and as
such is easily con gured to adapt to di erent data sources,
user interfaces and localisation tools3.
3A Java implementation of CLOVA is available at http://
www.sc.cit-ec.uni-bielefeld.de/clova/</p>
      <sec id="sec-10-1">
        <title>Req. No Implication</title>
        <p>Req. 1a Querying in multiple languages
Req. 1b Result localisation in multiple languages
Req. 2 Data interoperability
Req. 3 Language modularity
Req. 4a Separation between data and lexical layer
Req. 4b Language-independent data representation
Req. 5 Declarative representation of lexica</p>
      </sec>
      <sec id="sec-10-2">
        <title>Status</title>
        <p>REQUIRED
REQUIRED
REQUIRED
REQUIRED
DESIRED TO SUPPORT Req. 3
DESIRED TO AVOID REDUNDANCY</p>
        <p>DESIRED FOR SHARING LEXICAL INFORMATION
Figure 2 depicts the general architecture of CLOVA and
its main modules. The form displayer is a module which
translates the semantic form speci cation into a displayable
format, for example HTML. Queries are performed by the
query manager and then the results are displayed to the user
using the output displayer module. All of the modules use
the lexicaliser module to convert the conceptual descriptions
(i.e., URIs) to and from natural language. Each of these
modules are implemented independently and can be exchanged or
modi ed without a ecting the other parts of the system.</p>
        <p>We assume that we have a data source consisting of a set
of properties referenced by URIs and whose values are also
URIs or language-independent data values. We shall also
assume that there are known labels for each such URI and
each language supported by the application. If this separation
between the lexical layer and the data layer does not already
exist, we introduce elements to create this separation. It is
often necessary to apply such manual enrichment to a data
source, as it is not trivial to identify which strings in the
data source are language-dependent, however we nd that is
often a simple task to perform by identifying which properties
have language-dependent ranges, or by using XML's language
attribute.</p>
        <p>We introduce an abstract description of a search interface
by way of XML called a semantic form speci cation. It
speci es the relevant properties that can be queried by using the
URIs in the data source, thus abstracting from any natural
language. We show how this can be used to display a form
to the user and to generate appropriate queries once he/she
has lled in the form. The query manager provides a
backend that allows us to convert our queries using information
in the form into standard query languages such as SPARQL
and SQL. Finally, we introduce a lexicalisation component,
which is used to translate between the language-independent
forms speci ed by the developer and the localised forms
presented to the user. We describe a lexicaliser which builds on
a complex lexicon model and demonstrate that it can provide
more exibility with respect to the context and complexity
of the results we wish to lexicalise.
3.2</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Modules</title>
      <p>3.2.1</p>
      <p>Semantic Form Specification</p>
      <p>One of the most important aspects of the architecture is
the Semantic Form Speci cation (SFS), which contains all
the necessary information to build a user interface to query
the ontology. In the SFS the developer speci es the ontology
properties to be queried by the application via their URIs.
This consists of a form for which we specify a domain, i.e.,
the class of objects we are querying as de ned in the database
by an RDF type declaration or similar. If this is omitted
we simply choose all individuals in the data source. The
SFS essentially consists of a list of elds which are to be
used to query the ontology. Each eld contains the following
information:</p>
      <p>Name: An internal identi er is used to name the input
elds for HTML and HTTP requests.</p>
      <p>Query output: This de nes whether this eld will
be included in these results. Valid values are always,
never, ask (the user could decide wether to include the
eld in the results or not), if empty (if the eld has not
been queried it is included in the output), if queried
(if the eld is queried, it is included in the output) and
ask default selected (the user decides, but as default the
eld will be shown).</p>
      <p>Property: represents the URI for the ontology
property to be queried through the eld. An indication of
reference=self in place of a URI means that we are
querying the domain of the search. Such queries are
useful for querying the lexicalisation of the object being
queried or limiting the query to a xed set of objects.
Property Range: We de ne a number of types (called
property ranges) that describe the data that a eld can
handle. It di ers from the data types of RDF or similar
in that we also describe how the data should be queried
as well. For example, while it is possible to describe
both the revenue of a company and the age of an
employee as integers in the database, it is not sensible to
query revenue as a single value, whereas it is often
useful to query age as a single value. These property ranges
provide an abstraction of these properties in the data
and thus support the generation of appropriate forms
and queries. The following property ranges are built-in
into CLOVA:
{ String, Numeric, Integer, Date: Simple data-type
values. Note that String is intended for
representing language-independent strings, e.g. IDs, not
natural language strings. The numeric and date
ranges are used to query precise values like \age"
and \birth date".
{ Range, Segment, Set : These are de ned relative
to another property range and specify how a user
can query the property in question. Range
species that the user should query the data by
providing an upper and/or lower bound, e.g. \revenue",
\number of employees". Segment is similar but
requires that the developer divides the data up into
pre-de ned intervals. Set allows the developer to
specify a xed set of queriable values, e.g. \marital
status".
{ Lexicalised Element : Although we assume all data
in the source is de ned by URIs, it is obviously
desirable that the user can query the data using
natural language. This property range in fact allows
to query for URIs through language-speci c strings
that need to be resolved by the system to the URI
in question. The strings introduced into this eld
are processed by the lexicaliser to nd the URI to
which they belong which is then used in the
corresponding queries. For example, locations can have
di erent names in di erent languages, e.g. \New
York " and \Nueva York ", but the URI in the data
source should be the same.
{ Complex : A complex property is considered to
be a property composed of other sub-properties.
For example, searching for a \key person" within
a company can be done by searching for
properties of the person, e.g., \name", \birth place".
This nested form allows us to express queries over
the structure of an RDF repository or other data
source.
{ Unqueriable: For some data, methods for e cient
querying cannot be provided, especially binary data
such as images. Thus we de ned this eld to allow
the result to still be extracted from the data source
and included in the results.</p>
      <p>The described property ranges are supported natively
by CLOVA, but it is also possible to de ne new property
ranges and include them in the SFS XML document.
The appropriate implementation for a form display
element that can handle the newly de ned property range
has to be provided of course (see Section 3.2.2).</p>
      <p>Rendering Properties: There is often information for
a particular rendering that cannot be provided in the
description of the property ranges alone. Thus, we allow
for a set of context speci c properties to be passed to
the rendering engine. Examples of these include the use
of auto-completion features or an indication of the type
of form element to display, i.e. a Set can be displayed
as a drop-down list, or as a radio button selection.</p>
      <p>
        The SFS document is in principle similar to the concept of a
\lens" in the Fresnel display vocabulary [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] in that it describes
the set of elds in the data that should be used for display
and querying. However, by including more information about
methods for querying the data, we provide a description that
can be used for both presentation and querying of the data.
      </p>
      <p>Example: Suppose that we want to build a small web
application that queries an ontology with information about
companies stored in an RDF repository. The application should
ask for company names, companies' revenue, and company
locations. The syntax of a SFS XML document for that
application is shown below:
&lt;!--xmlns:dbpedia="http://dbpedia.org/ontology/"--&gt;
&lt;form domain="dbpedia:Company"&gt;
&lt;fields&gt;
&lt;field name="Name" output="ALWAYS"&gt;
&lt;property reference="self"/&gt;
&lt;property-range&gt;</p>
      <p>&lt;lexicalised-property-range/&gt;
&lt;/property-range&gt;
&lt;rendering context="html"&gt;</p>
      <p>&lt;property name="autocompletion" value="yes"/&gt;
&lt;/rendering&gt;
&lt;/field&gt;
&lt;field name="Location" output="ASK"&gt;
&lt;property uri="&amp;dbpedia;Organisation/location"/&gt;
&lt;property-range&gt;</p>
      <p>&lt;lexicalised-property-range/&gt;
&lt;/property-range&gt;
&lt;/field&gt;
&lt;field name="Revenue" output="ASK_DEFAULT_SELECTED"&gt;
&lt;property uri="&amp;dbpedia;Organisation/revenue"/&gt;
&lt;property-range&gt;
&lt;ranged-property-range&gt;
&lt;continuous-property-range&gt;</p>
      <p>&lt;min&gt;0&lt;/min&gt;
&lt;/continuous-property-range&gt;
&lt;/ranged-proprety-range&gt;
&lt;/property-range&gt;
&lt;/field&gt;
&lt;/fields&gt;
&lt;/form&gt;
3.2.2</p>
      <p>Form Displayer</p>
      <p>The form displayer consists of a set of form display elements
de ned for each property range. It processes the SFS by using
these elements to render the elds in a given order. The
implementation of these elements is dependent on the output
method. The form display elements are rendered using Java
code to convert the document to XHTML4.</p>
      <p>Figure 3 shows an example of rendering of an SFS which
includes the elds in the example above. In this rendering
the eld \name" is displayed as a text eld as it refers to the
lexicalisation of this company. The location of a company for
instance is represented as a text eld. However, in spite of
the fact that the data is represented in the data source as a
language independent URI, the user can query by specifying
4The CLOVA project also provides XSLT les to perform the
same task
the name of the resource in their own language (e.g., a
German user querying \Munchen" receives the same results as
an English user querying \Munich"). Finally, the revenue is
asserted as a continuous value which is queried by specifying
a range and is thus rendered with two inputs allowing the
user to specify the upper and/or lower bounds of their query.
A minimum value on this range allows for client-side data
consistency checks. In addition, check boxes are appended
to elds in order to allow users to decide if the elds will be
shown in the results, according to the output parameter in
the SFS.
3.2.3</p>
      <p>Query Manager</p>
      <p>
        Once the form is presented to the user, he or she can ll
the elds and select which properties he or she wishes to
visualise in the results. When the query form is sent to the Query
Manager, it is translated into a speci c query for a particular
knowledge base. We have provided modules to support the
use of SQL queries using JDBC and SPARQL queries using
Sesame [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We created an abstract query interface which can
be used to specify the information required in a manner that
is easy to convert to the appropriate query language allowing
us to change the knowledge base, ontology and back end
without major problems. The query also needs to be preprocessed
using the lexicaliser due to the presence of language-speci c
terms introduced by the user which need to be converted to
language independent URIs.
3.2.4
      </p>
      <p>Output Displayer</p>
      <p>Once the query is evaluated, the results are processed by
the output displayer and an appropriate rendering shown to
the user. The displayer consists of a number of display
elements, each of which represents a di erent visualisation of
the data, including not only simple tabular forms, but also
graphs and other visual display methods. As with the form
displayer, all of these elements are lexicalised in the same
manner as the form displayer.</p>
      <p>In general we might restrict the types of data that
components will display as not every visualisation paradigm is
suitable for any kind of data. For example, a bar chart showing
foundation year and annual income would be both
uninformative and di cult to display due to the scale of values. For
this reason we provide an Output Speci cation to de ne the
set of available display elements and sets of values they can
display. These output speci cations consist of a list of output
elements described as follows:</p>
      <p>ID: Internal identi er of the output element displayed.
URI: A reference to the output resource speci ed as a
URI.5
Fields: The set of elds used by this element. These
should correspond by name to elements in the SFS.
Display properties: Additional parameters passed to
the display element to modify its behaviour. Some of
these parameters include the possibility to ignore
incomplete data, or to de ne the subtypes of a chart to
display. These parameters are class dependant so that
each output element has its own set of valid parameters.
5These can reference Java classes by linking to the
appropriate class le or location in a JAR le
The following output speci cation de nes two output
elements to show results.
&lt;!-- xmlns:clova="jar:file:clova-html.jar!/clova/html/output/"
xmlns:dbpedia="http://dbpedia.org/ontology/"--&gt;
&lt;output&gt;
&lt;elements&gt;
&lt;element id="HTable" URI="&amp;clova;HTableDisplayElement"&gt;
&lt;fields&gt;</p>
      <p>&lt;all/&gt;
&lt;/fields&gt;
&lt;/element&gt;
&lt;element id="BarChart" URI="&amp;clova;GraphDisplayElement"&gt;
&lt;fields&gt;</p>
      <p>&lt;field name="revenue"/&gt;
&lt;/fields&gt;
&lt;display&gt;</p>
      <p>&lt;property name="Type" value="barChart"/&gt;
&lt;/display&gt;
&lt;/element&gt;
&lt;/elements&gt;
&lt;/output&gt;</p>
      <p>The rst element displays a table containing all the
results returned by the query, while the second output element
shows a bar chart for the property \Revenue". The HTML
output generated for a given output speci cation containing
the above mentioned descriptions is shown in Figure 4.
3.2.5</p>
      <p>Lexicaliser</p>
      <p>Simple lexicon models can be provided by language
annotations, for example RDF's label and SKOS's prefLabel,
and developing a lexicaliser is then as simple as looking up
these labels for the given resource URI. This approach may
be suitable for some tasks. However, we sometimes require
lexicalisation using extra information about the context and
would like to provide lexicalisation of more than just URIs,
e.g. when lexicalising triples. While RDF labels can be
attached to properties and individuals for instance, there is no
mechanism that allows to compute a lexicalization for a triple
by composing together the labels of the property and the
individuals. This is a complex problem and we will leave a full
investigation and evaluation of this for future work.
 Subject : SyntacticArgument </p>
      <p> SynSem Arg Map 1   Domain : SemanticArgument 
 PObject : SyntacticArgument 
 SynSem Arg Map 2 
 Range : SemanticArgument 
 NounPP : SubcategorizationFrame 
 SemanticPredicate 
 SyntacticBehaviour 
 Lemma 
 hasWrrittenForm="product" 
 Noun : LexicalEntry </p>
      <p> http://dbpedia.org/ontology/productOf : Sense  
 WordForm 
 hasWrittenForm="products" [ number=plural ] </p>
      <p>
        Furthermore, it is often desirable to have ne control over
the form of the lexicalisation, for example, the ontology
label may be \company location in city". However, we may
wish to have this property expressed by the simpler label
\location". By using a lexicon ontology model we can specify
the lexicalisation in a programmatic way, and hence adapt
it to the needs of the particular query interface. For these
reasons we primarily support lexicalisation through the use
of the LexInfo [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] lexicon ontology model and its associated
API6, which is compatible with the LMF Vocabulary [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <sec id="sec-11-1">
        <title>The LexInfo model:</title>
        <p>
          A LexInfo model is essentially an OWL model describing
the lexical layer of an ontology specifying how properties,
classes and individuals are expressed in di erent languages.
We refer to the task of producing language-speci c
representation of elements in the data source including triples as
lexicalisation of the data. The corresponding LexInfo API
organises the lexical layer mainly by de ning so called
aggregates which describe the lexicalisation of a particular URI,
specifying in particular the lexico-syntactic behaviour of
certain lexical entries as well as their interpretation in terms of
properties, classes and individuals de ned in the data. An
aggregate essentially bundles all the relevant individuals of the
LexInfo model needed to describe the lexicalization of a
certain URI. This includes a description of syntactic, lexical and
morphological characteristics of each lexicon entry in the
lexicon. Indeed, each aggregate describes a lexical entry together
with its lemma and several word forms (e.g. in ectional forms
such as the plural etc.). The syntactic behaviour of a lexical
entry is described through subcategorization frames making
the required syntactic arguments explicit. The semantic
interpretation of the lexical entry with respect to the ontology
is captured through a mapping (\syn-sem argument map")
from the syntactic arguments to the semantic arguments of
a semantic predicate which stands proxy for an ontology
element in the ontology. Finally the aggregate is linked through
a hasSense link to the URI in the data layer it lexicalises.
An example of an aggregate is given in gure 5. For details
6Available at http://lexinfo.googlecode.com/
the interested reader is referred to [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
      </sec>
      <sec id="sec-11-2">
        <title>LILAC:</title>
        <p>
          In order to produce lexicalisations of ontology elements
from a LexInfo model we use a simple rule language included
with the LexInfo API called LILAC (LexInfo Label Analysis
&amp; Construction). A LILAC rule set describes the structure of
labels and can be used for both generating the lexicon from
labels and generating lexicalisations from the lexicon. In
general we assume that lexicons are generated from some set of
existing labels, which may be extracted from annotations in
the data source, e.g., RDFS's label, from the URIs in the
ontology or from automatic translations of these labels from
another language. The process of generating aggregates from
raw labels requires that rst the part of speech tags are
identied by a tagger such as TreeTagger. Then, the part-of-speech
tagged labels are parsed using a LR(1)-based parser (see [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]).
The API then handles these parse trees and converts them
into LexInfo aggregates.
        </p>
        <p>LILAC rules are implemented in a symmetric manner so
that they can be used to both generate the aggregates in the
lexicon ontology model (e.g. by analysing the labels of a given
ontology) as well as lexicalise those aggregates.</p>
        <p>A simple example rule for a label such as \revenue of" is:
Noun_NounPP -&gt; &lt;noun&gt; &lt;preposition&gt;</p>
        <p>This rule states that the lexicalisation of a Noun NounPP
Aggregate is given by rst using the written form of lemma of
the \noun" of the aggregate followed by the lemma of
\preposition" of the aggregate. LILAC also supports the insertion
of literal terms and choosing the appropriate word form in
the following manner:
Verb_Transitive -&gt; "is" &lt;verb&gt; [ participle,
tense=past ] "by"</p>
        <p>This rule can be used to convert a verb with transitive
behaviour into a passive form (e.g., it transforms \eats" into
\is eaten by").</p>
        <p>LILAC can create lexicalisations recursively for phrase and
similar, for example to lexicalise an aggregate for \yellow
moon", the following rules are used. Note that in this cases
the names provided by the aggregate class are not available
so the name of the type is used instead:
NounPhrase -&gt; &lt;adjective&gt; &lt;NounPhrase&gt;
NounPhrase -&gt; &lt;noun&gt;</p>
        <p>The process for lexicalisation proceeds as follows: for each
ontology element (identi ed by a URI) that needs to be
lexicalised, the LexInfo API is used to nd the lexical entry that
refers to the URI in question. Then the appropriate LILAC
rules are invoked to provide a lexicalization of the URI in a
given language.</p>
        <p>As this process requires only the URI of the ontology
element, by changing the LexInfo model and providing a reusable
set of LILAC rules the language of the interface can be changed
to any suitable form. It is important to emphasize that the
LILAC rules are language-speci c and thus need to be
provided for each language supported.</p>
        <p>Another issue is that we desire that our users are capable of
searching for elements by their lexicalised form. LexInfo can
support this as well. This involves querying the lexicon for
all lexical entries that have a word form matching the query
and returning the URI that the lexical entry is associated
to. Once we have mapped all language-speci c strings to
URIs, the query can be handled using the query manager as
usual. For example if the user queries for \food" then the
LexInfo model could be queried for all lexical entries that
have either a lemma or word form matching this literal. The
URIs referred to by this word can then be used to query the
knowledge base. This means that a user can query in their
own language and expect the same results, for example the
same concept for \food processing" will be returned by an
English user querying \food" and a Spanish user querying
for \alimento" (part of the compound noun \Procesado de
los alimentos").</p>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>CLOVA for company search</title>
      <p>We developed a search interface for querying data about
companies using CLOVA, which is available at http://www.
sc.cit-ec.uni-bielefeld.de/clova/demo. For this
application we used data drawn from the DBPedia ontology, which
we entered into a Sesame store. We used the labels of the
URIs to generate the lexicon model for English, and used
the translations provided by DBPedia's wikipage links
(themselves derived from WikiPedia's \other languages" links), to
provide labels in German and Spanish. As properties were
not translated in this way, the translations for these elements
were manually provided. These translations were converted
into a LexInfo model through the use of about 100 LILAC
rules. About 20 of these rules were selected to provide
lexicalisation for the company search application. In addition,
we selected the form properties and output visualisations by
producing a semantic form speci cation as well as an output
speci cation. These were rendered by the default elements
of the CLOVA HTML modules, and the appearance was
further modi ed by specifying a CSS style-sheet. In general,
the process of adapting CLOVA involves creating a lexicon,
which could be a LexInfo model or a simpler representation
such as with RDF's label property, and then producing the
semantic form speci cation and output speci cation.
Adapting CLOVA to a di erent output format or data back end, it
requires implementing only a set of modest interfaces in Java.</p>
    </sec>
    <sec id="sec-13">
      <title>CONCLUSION</title>
      <p>
        We have presented an architecture for querying semantic
data in multiple languages. We started by providing methods
to specify the creation of forms, the querying of the results
and presentation of the results in a language-independent
manner through the use of URIs and XML speci cations.
By creating this modular framework we provide an
interoperable language-independent description of the data, which
could be used in combination with a lexicalisation module
to enable multilingual search and querying. We then
separated the data source into a language-independent data layer
and a language-dependent lexical layer, which allows us to
modularise each language and made the lexical information
available separately on the semantic web. In this way we
achieved all the requirements we set out in Figure 1. We
described an implementation of this framework, which was
designed to transform abstract speci cations of the data into
HTML pages available on the web and performed its
lexicalisations by the use of LexInfo lexicon ontology models [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
providing ne control on the lexicalisations used in a
particular context.
      </p>
    </sec>
    <sec id="sec-14">
      <title>Acknowledgements</title>
      <p>This work has been carried out in the context of the
Monnet STREP Project funded by the European Commission
under FP7, and partially funded by the \Consejer a de
Innovacion Ciencia y Empresa de Andaluc a" (Spain) under
research project P06-TIC-01433.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Aho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sethi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Ullman</surname>
          </string-name>
          .
          <article-title>Compilers: principles, techniques, and tools</article-title>
          . Reading, MA,,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and E.</given-names>
            <surname>Pietriga</surname>
          </string-name>
          .
          <article-title>Fresnel: A browser-independent presentation vocabulary for rdf</article-title>
          .
          <source>In Proceedings of the Second International Workshop on Interaction Design and the Semantic Web</source>
          , Galway, Ireland. Citeseer,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Broekstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kampman</surname>
          </string-name>
          , and
          <string-name>
            <surname>F. Van Harmelen. Sesame:</surname>
          </string-name>
          <article-title>A generic architecture for storing and querying rdf and rdf schema</article-title>
          .
          <source>Lecture Notes in Computer Science</source>
          , pages
          <volume>54</volume>
          {
          <fpage>68</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Buitelaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Haase</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Sintek</surname>
          </string-name>
          .
          <article-title>Towards linguistically grounded ontologies</article-title>
          .
          <source>In Proceedings of the European Semantic Web Conference (ESWC)</source>
          , pages
          <fpage>111</fpage>
          {
          <fpage>125</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Francopoulo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>George</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Calzolari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Monachini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pet</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Soria</surname>
          </string-name>
          .
          <article-title>Lexical markup framework (LMF) for NLP multilingual resources</article-title>
          .
          <source>In Proceedings of the workshop on multilingual language resources and interoperability</source>
          , pages
          <fpage>1</fpage>
          <lpage>{</lpage>
          8. Association for Computational Linguistics,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A. S. M. L.</given-names>
            <surname>Hoque</surname>
          </string-name>
          and
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Are n. Multilingual data management in database environment</article-title>
          .
          <source>Malaysian Journal of Computer Science</source>
          ,
          <volume>22</volume>
          (
          <issue>1</issue>
          ):
          <volume>44</volume>
          {
          <fpage>63</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Katifori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Halatsis</surname>
          </string-name>
          , G. Lepouras,
          <string-name>
            <given-names>C.</given-names>
            <surname>Vassilakis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Giannopoulou</surname>
          </string-name>
          .
          <article-title>Ontology visualization methods: a survey</article-title>
          .
          <source>ACM Computing Surveys (CSUR)</source>
          ,
          <volume>39</volume>
          (
          <issue>4</issue>
          ):
          <fpage>10</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumaran</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Haritsa</surname>
          </string-name>
          .
          <article-title>On database support for multilingual environments</article-title>
          .
          <source>In Proceedings of the IEEE RIDE Workshop on Multilingual Information Management,</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Miles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Matthews</surname>
          </string-name>
          , M. Wilson, and
          <string-name>
            <given-names>D.</given-names>
            <surname>Brickley</surname>
          </string-name>
          . SKOS Core:
          <article-title>Simple knowledge organisation for the web</article-title>
          .
          <source>In Proceedings of the International Conference on Dublin Core and Metadata Applications</source>
          , pages
          <volume>12</volume>
          {
          <fpage>15</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Deselaers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kurimo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Pen~as, and</article-title>
          <string-name>
            <given-names>V.</given-names>
            <surname>Petras</surname>
          </string-name>
          .
          <source>Evaluating Systems for Multilingual and Multimodal Information Access</source>
          , volume
          <volume>5706</volume>
          . Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>