<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>User Centered and Ontology Based Information Retrieval System for Life Sciences</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sylvie Ranwez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vincent Ranwez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohameth-François Sy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jacky Montmain</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michel Crampes</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LGI2P Research Centre, EMA/Site EERIE</institution>
          ,
          <addr-line>Parc scientifique G. Besse, 30 035 Nîmes cedex 1</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Laboratoire de Paléontologie, Phylogénie et Paléobiologie, Institut des Sciences de l'Evolution (UMR 5554 CNRS), Université Montpellier II</institution>
          ,
          <addr-line>CC 064, 34 095 MONTPELLIER Cedex 05</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Because of the increasing number of electronic data, designing efficient tools to retrieve and exploit documents is a major challenge. Current search engines suffer from two main drawbacks: there is limited interaction with the list of retrieved documents and no explanation for their adequacy to the query. Users may thus be confused by the selection and have no idea how to adapt their query so that the results match their expectations. This paper describes a request method and an environment based on aggregating models to assess the relevance of documents annotated by concepts of ontology. The selection of documents is then displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive exploration of data corpus.</p>
      </abstract>
      <kwd-group>
        <kwd>Information Retrieval</kwd>
        <kwd>Semantic Query</kwd>
        <kwd>Visualization</kwd>
        <kwd>Directed Acyclic Graph</kwd>
        <kwd>Aggregation Operator</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>As the number of electronic data grows it is crucial to profit from powerful tools to
index and retrieve documents efficiently. This is particularly true in life sciences
where new technologies, such as DNA chips a decade ago and Next Generation
Sequencing today, sustain the exponential growth of available data. Moreover,
exploiting published documents and comparing them with related biological data is
essential for scientific discovery. Information retrieval (IR), the key functionality of
the emerging “semantic Web”, is one of the main challenges for the coming years.
Ontologies now appear to be a de facto standard of semantic IR systems. By defining
key concepts of a domain, they introduce a common vocabulary that facilitates
interaction between the user and the software. Meanwhile, by specifying relationships
between concepts, they allow semantic inference and enrich the semantic
expressiveness for both indexing and querying document corpus.</p>
      <p>
        Though most IR systems rely on ontologies, they often use one of the two
following extreme approaches: either they use most of the semantic expressiveness of
the ontology and hence require complex query languages that are not really
appropriate for non specialists; or they provide very simple query language that
almost reduces the ontology to a dictionary of synonyms used in Boolean retrieval
models [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Another drawback of most IR systems is the lack of expressiveness of
their results. In most cases, results are simply proposed as a set of documents with no
further explanations concerning the match between the documents and the query.
Even when an IR system proposes a list of ranked documents, no explanation is
provided with regard to (w.r.t.) this ranking, which means the result is not made
explicit. In the absence of any justification concerning the results of IR, users may be
confused and may not know how to modify their query satisfactorily in an iterative
search process.
      </p>
      <p>This paper describes an original alternative. Our system relies on a domain
ontology and on entities that are indexed using its concepts (e.g. genes annotated by
concepts of the Gene Ontology or PubMed articles annotated using the MeSH). It
estimates the overall relevance of each entity w.r.t. a given query. The overall
relevance of a document is obtained by aggregating the partial similarity
measurements between each concept of the query and those indexing the document.
Aggregation operators are preference models that capture end user expectations. The
retrieved documents are ordered according to their overall scores, so that the most
relevant documents (indexed with the exact query concepts) are ranked higher than
the least relevant ones (indexed with hyperonyms or hyponyms of query concepts).
More interestingly, defining an overall adequacy based on partial similarities enables
a precise score to be assigned to each document w.r.t. every concept of the query. We
summarize this detailed information in a small explanatory pictogram and use an
interactive semantic map to display top ranked documents. Thanks to this approach,
the end user can easily tune the aggregation process, identify, at a simple glance, the
most relevant documents, recognize entity adequacy w.r.t. each query term, and
identify the most discriminating ones.</p>
      <p>The main objective of this work is to favor interactivity between end users and the
information retrieval system (IRS). This interactivity is based on the explanation of
how a document is ranked by the IR system itself: explaining how the relevance of a
document is computed provides additional knowledge that is useful to end users to
iterate their query more appropriately. This is achieved by evaluating how well each
document matches the query based on both query/indexation semantic similarities and
end user preferences and by providing a visual representation of retrieved entities and
their relatedness relation to each query term.</p>
      <p>In section 2 we review problems involved in information retrieval and describe the
different approaches of similarity measurement used in this context. In section 3 we
describe a new document-request matching model based on multi-level aggregations
of relevance scores. In section 4 we illustrate the use of this approach for the
identification of cancer genes and the interactive query rendering interface of our IRS.
Finally, in section 5 we draw conclusions and look at future perspectives.
2</p>
      <p>Information Retrieval: Overview of the State of the Art
The contribution of this paper is related to the use of semantics for information
representation and visualization in information retrieval systems.</p>
      <p>
        Information retrieval is generally considered as a sub-field of computer science
that deals with the representation, storage, and access of information. The field has
matured considerably in recent decades because of the increase in computer storage
and calculus capacity and the growth of the World Wide Web. Some domains, such as
life sciences, have particularly benefited from this technological advance. Nowadays,
people no longer labor to gather general information, but rather to locate the exact
pieces of information that meet their needs [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The main goal of an information
retrieval system can thus be defined as "finding material (usually documents) that
satisfies an information need from within large collections (usually stored on
computers)" [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The main use of an IRS can thus be summarized as follows: needing
information within an application context, a user submits a query in the hope of
retrieving a set of relevant documents as the answer. To achieve this goal, IRSs
usually implement three processes [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]:
- The indexation process aims to represent documents and queries with sets of
(weighted) terms (or concepts) that best summarize their information content.
- The search is the core process of an IRS. It contains the system strategy for
retrieving documents that match the query. An IRS selects and ranks relevant
documents according to a score strategy that is highly dependent on their
indexation.
- The query expansion is an intermediate process that reformulates the user
query, based on internal system information, to improve the quality of the
result.
      </p>
      <p>
        In most IRSs, the indexation process boils down to representing both documents
and queries as a bag of weighted terms (often called keywords) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. IRSs that use such
document representation are keyword-based. A serious weakness of such systems is
that they can be misled by the ambiguity of terms (e.g. homograph) and ignore
relationships among terms (e.g. synonym or hyperonym) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. To overcome this
difficulty, recent IRSs map keywords to the concepts they represent [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. These
concept-based IR systems thus need general or domain conceptual structures on
which to map the terms. Conceptual structures include dictionaries, thesauri
(Wordnet, UMLS) or ontologies (e.g. Gene Ontology). It is now widely
acknowledged that their use significantly improves the performance of IRSs [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and
there is still room for improvement since most ontologies are not optimized to achieve
this goal [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. A survey of concept-based IR tools can be found in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Many concept-based IRSs were developed based on theoretical frameworks for the
indexing process as well as for relevance measurement [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The latter assigns a score
to each document (called RSV – Retrieval Status Value), depending on how well it
matches the query.
      </p>
      <p>The work presented here is in line with the concept-based approach and takes as a
starting point the existence of a domain ontology. Both documents and queries are
represented by a set of concepts from this ontology. Fig. 1 gives an example based on
the Gene Ontology to illustrate how ontologies can help reduce the number of
relevant documents missed by IRSs (i.e. silences). Having the query: "Organelle
organization (GO_0006996)" and "Cardiac muscle fiber development (GO_0048739)",
the system may retrieve a document that has been indexed by concepts:
"Mitochondrion organization" and "Muscle fiber development" as well as one (with
smaller RSV) indexed by "Cellular component organization".</p>
      <p>A) B)
Fig. 1 – How a domain ontology (A) can avoid silences in an IRS by expanding hyponyms and
hyperonyms while querying an indexed corpus with a set of concepts (B).
2. 1</p>
      <p>
        Boolean Request and their Generalizations
Boolean requests are certainly the most simple and widespread requests. However,
studies indicate that even simple Boolean operators (AND, OR, NOT) are rarely used in
web queries [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and are even sometimes misused [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ]. Indeed, even when users
know that all the terms must be included in the indexation (conjunctive request) or, on
the contrary, that only one is needed (disjunctive requests), they do not mention it to
the system. In the following, we thus focus on common requests where the user query
is only a set of a few words.
      </p>
      <p>
        Minkowski-Hölder's Lp norms are aggregation operators that provide a theoretical
framework to express whether a query is conjunctive or disjunctive using only one
parameter [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. They are particularly well suited to cases where the terms of the
request are weighted. These weights may be related to term frequencies within the
corpus, e.g. TF-IDF [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], or result from a fuzzy set indexation model. In this latter, a
weight is associated with each concept indexing a document to represent to what
extent a concept is a reliable indexation of a document [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Unfortunately, by summarizing the relevance of the document in a single score,
aggregation operators tend to favor information loss and to fuzz out query results
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Indeed, unlike end users, they do not differentiate between documents whose
scores result from cumulative minor contributions of all concepts within the query
and those whose scores are due to the major contribution of a single concept. In
addition, as they do not take advantage of semantic resources (ontologies, thesauri),
they are unable to find relevant documents that are indexed by concepts that are
different but semantically related to those of the query. Indeed, these operators only
aggregate weights of a sub-set of terms: the ones that appear in the query. This
statement is the basis of query expansion.
2. 2
      </p>
      <p>
        Query Expansion
Query expansion is an intermediary step between the indexing and the matching
process. As stated in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], end users can rarely perfectly formulate their needs using
query languages because they only have partial knowledge of IRS strategy, of the
underlying semantic resources, and of the content of the database. Based on this
statement, query refinement and expansion strategies have been developed to provide
(semi-)automatic reformulation of the query. These reformulations may modify a
query by adding concepts to it, by removing “poor” concepts from it or by refining
the weights assigned to its query terms. Many query expansion (QE) techniques have
been proposed, among which the widespread relevance feedback [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. This query
expansion technique uses the documents that are judged to be relevant by the user
after an initial query to produce a new one using reformulation, re-weighting and
expansion [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. When done automatically, this process is called relevance
backpropagation [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        Query expansion may also be based on external vocabulary taken from ontologies
or thesauri [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. A common expansion strategy aims to supplement the query by
adding its hyponyms. This method is an interesting complement to the Boolean search
system detailed above. Indeed, it is then possible to select documents that are not
indexed using exactly the same terms as the query and thus to avoid silences. This
strategy is used for instance by the IRSs of PUBMED (http://www.ncbi.nlm.nih.gov
/pubmed) and GOFISH [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. However, since no distinction is made between the initial
terms and those added users may be puzzled by the set of documents retrieved.
Indeed, since they are not aware their query has been altered, they may not be able to
understand the selection of a document indexed with none of their query terms.
2. 3
      </p>
      <p>Semantic Similarity Measurements
It is possible to improve query expansion by using similarity measures. These
measures not only enable selection of documents indexed with terms related to those
of the query, but also retrieved documents to be ranked according to their semantic
similarity to the query.</p>
      <p>Since our approach extensively relies on semantic similarity measurements that
significantly impact RSV calculus, we detail some of them below. As some of these
measures satisfy distance axioms, we use semantic proximity, closeness or similarity
randomly in the following.</p>
      <p>
        The similarity measurements that have been proposed can be grouped in two main
categories depending whether they are defined by intention or by extension. The first
use the semantic network of concepts as metric space, and the second use a statistical
analysis of term appearance in a corpus of documents [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>
        While the semantic network may include various kinds of concept relationships,
most intentional similarity measures only rely on the subsumption relationship,
denoted as is-a, [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Indeed this relationship is the only one shared by all ontologies
and it constitutes their backbone. The key role of the is-a relationship is clearly made
explicit in the formal definition of an ontology proposed by [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] (p. 244-). The set of
is-a relationships among concepts can be conveniently represented by an oriented
graph whose vertices are concepts and whose edges indicate their subsumption
relationship (is-a). Many concept similarities are based on this is-a graph. One of the
most straightforward uses of this graph structure is to consider the length of the
shortest path between two concepts C1 and C2 as their semantic distance [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. If all
the edges of the path have the same orientation, one concept is subsuming the other,
but the more changes in direction the path contains, the harder it is to interpret.
Therefore, [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] proposes to adapt this classical graph distance to produce a more
sensitive proximity measurement, !"#$%&amp;' %(), which takes into account the length of
the path P between C1 and C2, lg(P) and the changes in direction within the path,
nbC(P):
      </p>
      <p>
        !"#$%&amp;' %() * +,-./$0&amp;10() 23$4) 5 6 7 -89$4) (1)
The K factor modulates the influence of changes in direction on the overall
measurement. When K=0, !"# is equivalent to the distance proposed in [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. On the
other hand, a high value of K implies a minimum number of changes and thus a path
that meets either the least common ancestor of C1 and C2, denoted by lca(C1,C2) or
one of their greater common descendants, denoted gcd(C1,C2). Since 1994, when [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]
first proposed to use lca in this context, it has played a key role in several similarity
measurements. However, while focusing on the lca, this measurement neglects the
symmetric notion of gcd and completely ignores whether concepts share common
descendants, or not. [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] proposes a variant that takes this information into account.
      </p>
      <p>
        One main limitation of all these graph-based measurements is that they assume
edge homogeneity, whereas each edge of the is-a graph represents a specific degree of
generalization or specialization. The semantic measurement proposed in [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] tries to
capture this information based on the number of descendants of each concept. As this
measurement is based on the is-a graph, it is denoted dISA and authors demonstrated
that it satisfies distance axioms. More formally, denoting by hypo(C1) the set of
hyponyms of C1 (i.e. its descendants) and by ancEx(C1, C2) the set of concepts that
are ancestors of either C1 or C2 (but not of both), dISA is defined as:
:;&lt;=$%&amp;' %() * &gt;?@AB$CDEFG$%&amp;' %())H!H?@AB$%&amp;)H!H?@AB$%()HI H?@AB$%&amp;)H"H?@AB$%()&gt; (2)
      </p>
      <p>
        In this approach, the information content of a concept is evaluated by intention
using only the ontology but not the corpus. Alternatively, Extensional measurements
are mostly based on the corpus and often rely on the concept information content (or
IC) defined in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. The IC of a concept C1 is derived from the probability P(C1) that
a document of the corpus is indexed by C1 or one of its descendants:
      </p>
      <p>J%$%&amp;) * K2L3H$M$%&amp;)) (3)</p>
      <p>
        Combining the ideas of lca and IC, [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] introduces the notion of the most
informative common ancestor (MICA) of a pair of concepts and defines a semantic
proximity based on it as: !NOPQRS * J%$TJ%U$%&amp;' %()). It should however be noted
that MICA(C1,C2) is not necessarily a lca of C1 and C2. This proximity measurement
is tightly correlated with the individual IC of the two concepts. [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] proposes a variant
to correct this bias:
!VRQ$%&amp;' %() *
(7;W$HX;W=$WY'WZ)H)
;W$WY)[;W$WZ)
(4)
Proximities can be used in different contexts and their choice strongly depends on
final objectives. Adequacy with real concepts' relatedness (i.e. the ones given by
experts) must also be taken into account within the measurement choice [
        <xref ref-type="bibr" rid="ref29 ref30">29, 30</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>An Original Multi-level Score Aggregation</title>
    </sec>
    <sec id="sec-3">
      <title>Documents' Relevance Based on Semantic Proximity to</title>
    </sec>
    <sec id="sec-4">
      <title>Assess</title>
      <p>Our work refers to concept-based IRSs. Our Retrieval Status Values (RSVs) are
calculated from a similarity measurement between the concepts of an ontology. We
propose to break down the RSV computation into a three stage aggregation process.
First, we start with a simple and intuitive similarity measure between two concepts of
the ontology (stage 1); then, a proximity measure is computed between each concept
of the query and a document indexing (stage 2); finally, these measures are combined
in the global RSV of the document through an aggregation model (stage 3). The last
stage (aggregation) captures and synthesizes the user’s preferences and ranks the
collection of retrieved documents according to their RSV. The aggregation model
enables restitution of the contribution of each query term to the overall relevance of a
document. Hence it provides our system with explanatory functionalities that facilitate
man-machine interaction and support end users in iterating their query. Furthermore
in order to favor user interactions concept proximities must be intuitive (so that the
end user can easily interpret them) and rapid to compute (so that the IRS is responsive
even in the case of large ontologies).</p>
      <p>We estimate the similarity of two concepts based on the Jaccard index between
their descendant sets. Two main objectives are followed here: i) avoid silence when
no document is indexed with the exact query concepts but with related concepts
(hyponyms, hyperonyms) to increase the recall of the system; ii) make the query
results more explicit concerning the way a match is computed, in particular
documents indexed by query concepts and documents indexed by hyponyms or
hyperonyms need to be distinguished.
3. 1</p>
      <p>
        Semantic Similarity Between Concepts and Sets of Concepts
The choice of the semantic similarity measurement used by our IRS has a major
impact on: i) the relevance of the retrieved documents, ii) the system's recall and iii)
user comprehension of the document selection strategy. Hence, we chose a variant of
the similarity measurement proposed by [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] (equation 3), with a valuation of the
informational content of a concept based on the number of its hyponyms [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ].
      </p>
      <p>Because it has been emphasized that query concepts should only be replaced by
hyponyms or hyperonyms, we estimate the semantic proximity of two concepts based
on how much their hyponyms overlap (using the Jaccard index) as long as one is a
hyponym of the other and otherwise we fix it at 0:
!"#$%&amp;' %() * +
,-./$01)2,-./$03)
,-./$01)4,-./$03)
&gt;
It should be noted that:
• !"#$%&amp;' %&amp;) * F
• !"#$%&amp;' %() G F , for each concept C1 that is different from C2
• !"#$%&amp;' %() H &gt; , for each C1 and C2 having a hyponym relationship.</p>
      <p>
        Several solutions have been proposed to extend similarity measurement between
two concepts to measurement of similarity between two sets of concepts. This
problem is of particular interest in life sciences because similarity between two gene
indexations through the Gene Ontology (GO) may provide hints on how to predict
gene functions or protein interactions [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. Whereas comparing gene indexations (and
document indexing in general) requires similarity measurements to be symmetric, this
is not the case in IR. Indeed, when matching documents to queries, it seems normal to
penalize a document because one concept of the query is absent from its indexing; on
the other hand, penalizing a document because it is indexed by one concept absent
from the query would be rather odd.
      </p>
      <p>Given a similarity measurement between two concepts, the proximity between a
query concept and a document can be defined as the maximum value of the
similarities calculated between the query concept and each concept of the document
indexing. This strategy leads to a simple and intuitive proximity measurement
between each query concept and a document. More formally, if I denotes the
similarity between two concepts from an ontology O, and Di denotes the ith concept of
document D indexing, 6 * FJ J KLK, then we define the similarity between a concept Qt
of the query and D as I$MN' O) * PQRSTUTKVK I$MN' OU).
After determining similarities between each concept of the query and (the index of) a
document, the next step consists in combining them in a single score that reflects the
global relevance of the document w.r.t. the query. User's preferences have to be taken
into account during this process in order to determine the overall relevance of a
document w.r.t. a query, i.e. its RSV.</p>
      <p>
        As mentioned above, computing documents’ RSV enables them to be ranked
according to their relevance. Furthermore having the detail of the score of the
document for each query concept allows us to justify and compare the source of the
match of each document with the query. This is clearly related to the preference
representation problem that has been extensively studied in decision theory [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]. A
classical solution is to define a utility function U in such a way that, for each
alternative D, D ' in a list ! of alternatives, D ! D ' (i.e. D is preferred to D’) iff
U (D) ≥ U (D ') . The decomposable model of Krantz [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] has been widely used when
alternatives are n dimensional. Following this model the utility function U is defined
as: U (q1, .., qn ) = h(u1 (q1 ), .., un (qn )) with =W$J )' @ * FJ J ?' are real-valued functions
in [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ] and h: [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ]n → [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ] is an aggregation operator that satisfies the following
conditions:!
• h is continuous;
• h(0, 0,…, 0) = 0 and h(1, 1,…, 1) = 1;
• ∀(ai, bi) ∈ [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ]2, if ai ≥bi then h(a1, …, an) ≥ h(b1, …, bn).
      </p>
      <p>In our context, the n dimensional space corresponds to n query concepts. The n
coordinates of a document correspond to its proximities with each concept of the
query, i.e., "#$%&amp; '(&amp; ) * +, , -&amp; defined in the previous section correspond to the .%#,)
functions. The aggregation model combines the degrees of relevance (or matches) of a
document indexing w.r.t. each query concept w.r.t. the user's preferences. The
aggregation function h captures the preferences of the user: the way the elementary
degrees of relevance are aggregated depends on the role of each query term w.r.t. the
user’s requirements. Three kind of aggregation can be distinguished:
•
conjunctions (AND), h(π (Q1 , D), ..,π (QQ , D)) ≤ minπ (Qt , D) ;
t=1.. Q
•
disjunctions (OR),
h(π (Q1, D), ..,π (QQ , D)) ≥ maxπ (Qt , D) ;
t=1.. Q
• compromises minπ (Qt , D) ≤ h(π (Q1, D), ..,π (QQ , D)) ≤ maxπ (Qt , D) .</p>
      <p>t=1.. Q t=1.. Q</p>
      <p>
        With the goal of improving man/machine interaction, we hope to give users a
friendly and intuitive way of expressing their preferences concerning the overall
relevance scoring strategy between a document and a query. We thus focus on
compromise operators because they fit the widespread decision strategy that
constrains the overall score to be between the minimum and the maximum value of
elementary scores (convexity). Our approach is consequently based on Yager's
operators [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ]. These define a parameterized family of functions that represents
compromise operators:
      </p>
      <p>! ! Q " "1/q
Ym (π (Q1, D),..,π (QQ , D)) = ## ## 'π (Qt , D)q $$ / | Q | $$ , q ∈ ! (6)
% % t=1 &amp; &amp;</p>
      <p>To get a better idea of the wide range of aggregation functions that are possible
with this operators' family, let us exert some remarkable values:
• q = 1, arithmetic mean, • q → 0, geometrical mean,
• q = -1, harmonic mean, • q → +!, max (OR generalization)
• q → -!, min (AND generalization)</p>
      <p>A compromise operator can thus be selected by the user who may simply provide
the value of parameter q. The choice of an aggregation operator is simply reduced to
the choice of parameter q which still corresponds to our intuitive man/machine
requirements. Indeed, our IRS interface includes a cursor to control the value of
parameter q and to indicate whether the aggregation should tend toward a generalized
"OR", a generalized "AND", or should tolerate more or less compensatory effects.
When criteria do not play a symmetric role in the aggregation process, the relative
importance of criteria can also be introduced in aggregation operators. In our case, it
is possible to check that the Yager family can be extended to the weighted operators’
family:
! Q " 1q</p>
      <p>Ywm (π (Q1, D),...,π (QQ , D)) = #% 't=1 pt .π (Qt , D)q $&amp; (7)
However, in our application context, introducing weights can be more confusing
than useful. Indeed, it is difficult for users to a priori assign weights to each of their
query terms. Identifying precise values of weights requires specific procedures that
are clearly thorny and cumbersome when simply writing a query. In this study, we
thus focus on aggregation operators in which all the query terms play a symmetric
role. Users only have to choose whether their compromise decisional behavior is
closer to AND-like or OR-like.</p>
      <p>This RSV 3-step computation (i.e. concept/concept, concept/document,
query/document) has been integrated in an efficient and interactive querying system
as detailed in the following section.
4</p>
      <p>Results: OBIRS Prototype and Applications
Querying systems endowed with query expansion that add hyponym concepts to the
query can be seen as the first step towards a semantic querying system. Our approach
refines basic solutions to avoid silences by selecting documents that are indexed by
the semantically closest hyponyms or hyperonyms of the query concepts.
Furthermore, we are convinced that users should easily be able to understand the RSV
at a glance to favor interaction with the IRS and query reformulation. Our 3-stage
relevance model (which allows RSVs to be computed) integrates both the semantic
expressiveness of the ontology based data structure and the end user's preferences.
The more user friendly the man-machine interface, the more efficient the interaction
between the IRS and the end-user.</p>
      <p>To validate our approach, a corpus that contains the whole set of human genes
referenced in the Ensembl database1 has been chosen (~50.000 genes). Each gene is
indexed with a subset of concepts of the Gene Ontology (about 30.000 concepts).
The screenshot presented in Fig. 2 shows our IRS (named OBIRS for Ontological
Based Information Retrieval System) used here to find human genes indexed by Gene
Ontology concepts. The query used for this screenshot is the same as the one used in
section 2 (Fig. 1). The screen of the OBIRS interface is split vertically. On the left
side, the querying interface provides assistance in expressing queries. Users are
helped with the selection of query concepts (auto-completion, search concept with
labels containing some terms) and can easily tune the aggregation function according
to their preferences by moving a cursor from rough (strict conjunctive – "AND") to
tolerant (disjunctive – "OR"). It is also possible to limit the number of documents
retrieved (here 50) and to fix a threshold for the RSV (here 0.1).
1 http://www.ensembl.org/</p>
      <p>According to these parameters, the IRS selects relevant documents (human genes
in this example) and for each document a histogram is generated to justify its match
with the query. Each bar of a histogram is associated with a query concept and
colored green, red or blue depending on whether the closest concept of the document
indexing is exactly this query concept (green), a hyponym (red) or a hyperonym
(blue). The size of the bar associated with a query concept Qt is proportional to the
elementary relevance (i.e. π (Qt , D) ) of the document w.r.t. this concept (see Fig. 3).
This information is detailed, when selecting a document, on the bottom left of the
interface (here CXADR has been selected). These pictograms are then displayed on a
semantic map in such a way that their physical distance to the query symbol (blue
square in the middle of the screen) is proportional to the RSV of the document. Users
can thus identify the most relevant documents at a glance and the reasons why they
match the query, i.e., the contribution of each query concept to the RSV assessment.
To facilitate the readability of the semantic map in Fig.2, a lens was designed that
enlarges each pictogram in a popup in the proximity of the mouse (Fig.3).</p>
      <p>Note that using the same query (Organelle organization; Cardiac muscle fiber
development) the Ensembl database retrieves 0 genes with a classical Boolean
strategy based on the "AND" operator and 15 genes based on a "OR". In OBIRS these
15 genes can easily be distinguished from additional ones (indexed with hyponym and
hypernym concepts) since they are closer to the query symbol and their pictograms
contain a green bar representing a perfect match (see Fig. 2). Hence, in OBIRS the
recall is enhanced thanks to query expansion, and best genes can easily be identified
through the visual display so that in practice there is no real loss of precision. We are
working on further experiments to compare OBIRS precision and recall with other
IRS and to estimate the influence of the semantic distance measurements on them.
4. 2</p>
      <p>
        Application to Gene Identification
A request is built using the significantly over-represented GO terms of "molecular
function" and "biological process" in cancer genes v.s. non-cancer genes (10 concepts
of the table 1 in [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ]). For a RSV threshold equals to 0.3 and a rather tolerant
aggregation function (q=5.0), OBIRS proposes the genes that are shown in Fig. 4 (the
higher the RSV the closer to the query symbol).
Several of these genes belong to the cancer genes listed in Cancer Genes Census2 and
most others are obviously also related to cancer (e.g. LATS1, framed in blue, stands for
Large Tumor Suppressor). This query is processed in about one second on a standard
desktop computer.
2 http://www.sanger.ac.uk/genetics/CGP/Census/
When end-users consider that too many documents have been returned by the IRS,
they can alter the relevance threshold: the lower the threshold, the stricter the
selection. However, changing this threshold simply eliminates from the screen
documents with a low RSV. But they can also modify the way the aggregation is
performed (rough/tolerant cursor), and the semantic map is then completely reshaped
because all the RSV are recomputed. As a result the closest documents in the second
semantic map displayed may be completely different than the ones in Fig.4 (on Fig. 5,
q=0.85).
The approach described in this paper is an important step towards an IRS that benefits
from the semantic expressiveness of ontologies while remaining easy to use. An
original three stage aggregation model has been described to compute RSV scoring.
This model has the particularity to embed end user preferences. The resulting OBIRS
prototype is one of the first IRS able to elucidate its document selection to the user
thanks to the decomposition of the RSV score that can be transcribed through intuitive
pictograms. By locating these pictograms on a semantic map, OBIRS provides an
informative overview of the result of the query and new possible interactions.
      </p>
      <p>We are currently working on an OBIR extension that will let users reformulate
their query through graphically selecting the documents they value and those in which
they have no interest. This reformulation can be done by adding/removing concepts
from the query, specifying/generalizing initial concepts of the query or adjusting the
aggregation function. Reformulation leads to several optimization and mathematical
questions but also raises important issues concerning feedback to users to enable them
to continue to understand the IRS process and fruitfully interact with it.</p>
      <p>We believe that there are many advantages to coupling the IR engine and rendering
the result of the query, and that they should be considered simultaneously to provide a
new efficient, interactive query environment. The RSV decomposition described in
this paper is a good example of the benefit of simultaneously considering two related
problems: i) how to rate documents w.r.t. a query ii) how to provide users feedback
concerning rating of the documents. The latter is crucial to favor user/IRS intuitive
interaction in iterative improvement of the query.</p>
      <p>Acknowledgments. This work is the result of collaboration between ISEM (UMR
5554 – CNRS/UMII) and LGI2P-EMA. It was supported by the French Agence
Nationale de la Recherche (ANR-08-EMER-011 "PhylAriane"). This publication is
contribution No 2010-138 of ISEM.
6</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Vallet</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Castells</surname>
          </string-name>
          ,
          <article-title>An ontology-based information retrieval model</article-title>
          .
          <source>Semantic Web: Research and Applications, Proceedings</source>
          ,
          <year>2005</year>
          .
          <volume>3532</volume>
          : p.
          <fpage>455</fpage>
          -
          <lpage>470</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Belkin</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ingwersen</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.M.</given-names>
            <surname>Pejtersen</surname>
          </string-name>
          .
          <source>Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          .
          <year>1992</year>
          . Copenhagen, Denmark: ACM.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Christopher</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Prabhakar</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Hinrich</surname>
          </string-name>
          , Introduction to Information Retrieval.
          <year>2008</year>
          : Cambridge University Press. 496.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Salton</surname>
          </string-name>
          , G. and
          <string-name>
            <surname>M.J. McGill</surname>
          </string-name>
          , Introduction to Modern Information Retrieval.
          <year>1986</year>
          :
          <article-title>McGrawHill, Inc</article-title>
          .
          <volume>400</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Mustapha</given-names>
            <surname>Baziz</surname>
          </string-name>
          , et al.
          <article-title>A Fuzzy Set Approach to Concept-based Information Retrieval. in 4th Conference of the European Society for Fuzzy Logic and Technology and the 11ème Rencontres Francophones sur la Logique Floue et ses Applications (Eusflat-LFA 2005 joint Conferences</article-title>
          ).
          <year>2005</year>
          . Barcelona, Spain.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Haav</surname>
            , H.-M. and
            <given-names>T.-L.</given-names>
          </string-name>
          <string-name>
            <surname>Lubi</surname>
          </string-name>
          .
          <article-title>A Survey of Concept-based Information Retrieval Tools on the Web</article-title>
          .
          <source>in 5th East-European Conference</source>
          ,
          <string-name>
            <surname>ADBIS</surname>
          </string-name>
          <year>2001</year>
          .
          <year>2001</year>
          . Vilnius, Lithuania.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Andreasen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bulskov</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Knappe</surname>
          </string-name>
          .
          <article-title>On ontology-based querying</article-title>
          .
          <source>in 18th International Joint Conference on Artificial Intelligence: Ontologies and distributed systems</source>
          ,
          <source>IJCAI</source>
          <year>2003</year>
          .
          <year>2003</year>
          . Acapulco, Mexico.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Jimeno-Yepes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Berlanga-Llavori</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Rebholz-Schuhmann</surname>
          </string-name>
          ,
          <article-title>Ontology refinement for improved information retrieval</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <year>2010</year>
          .
          <volume>46</volume>
          (
          <issue>4</issue>
          ): p.
          <fpage>426</fpage>
          -
          <lpage>435</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Van</given-names>
            <surname>Rijsbergen</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.J.</surname>
          </string-name>
          ,
          <source>Information Retrieval</source>
          .
          <year>1979</year>
          : Butterworth-Heinemann.
          <volume>208</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Jansen</surname>
            ,
            <given-names>B.J.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Spink</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Saracevic</surname>
          </string-name>
          ,
          <article-title>Real life, real users, and real needs: a study and analysis of user queries on the web</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <year>2000</year>
          .
          <volume>36</volume>
          (
          <issue>2</issue>
          ): p.
          <fpage>207</fpage>
          -
          <lpage>227</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Jansen</surname>
            ,
            <given-names>B.J.,</given-names>
          </string-name>
          <article-title>The effect of query complexity on Web searching results</article-title>
          .
          <source>Inf. Res.</source>
          ,
          <year>2000</year>
          .
          <volume>6</volume>
          (
          <issue>1</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lucas</surname>
            , W. and
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Topi</surname>
          </string-name>
          ,
          <article-title>Training for Web search: Will it get you in shape? Journal of the American Society for Information Science</article-title>
          and Technology,
          <year>2004</year>
          .
          <volume>55</volume>
          (
          <issue>13</issue>
          ): p.
          <fpage>1183</fpage>
          -
          <lpage>1198</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Detyniecki</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <article-title>Browsing a video with simple constrained queries over fuzzy annotations</article-title>
          .
          <source>Flexible Query Answering Systems</source>
          ,
          <year>2001</year>
          : p.
          <fpage>282</fpage>
          -
          <lpage>288</lpage>
          (
          <issue>612</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Schamber</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <article-title>Relevance and information behavior</article-title>
          .
          <source>Annual Review of Information Science and Technology</source>
          ,
          <year>1994</year>
          .
          <volume>29</volume>
          : p.
          <fpage>3</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al.,
          <article-title>Integration of association rules and ontologies for semantic query expansion</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          ,
          <year>2007</year>
          .
          <volume>63</volume>
          (
          <issue>1</issue>
          ): p.
          <fpage>63</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Crouch</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , J. and
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>Experiments in automatic statistical thesaurus construction</article-title>
          .
          <source>in 15th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          .
          <year>1992</year>
          . Copenhagen, Denmark: ACM.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Abdelali</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cowie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.S.</given-names>
            <surname>Soliman</surname>
          </string-name>
          ,
          <article-title>Improving query precision using semantic expansion</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <year>2007</year>
          .
          <volume>43</volume>
          (
          <issue>3</issue>
          ): p.
          <fpage>705</fpage>
          -
          <lpage>716</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Boughanem</surname>
            , M.,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Chrisment</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Soule-Dupuy</surname>
          </string-name>
          ,
          <article-title>Query modification based on relevance back-propagation in an ad hoc environment</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <year>1999</year>
          .
          <volume>35</volume>
          (
          <issue>2</issue>
          ): p.
          <fpage>121</fpage>
          -
          <lpage>139</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Andreasen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <article-title>An approach to knowledge-based query evaluation</article-title>
          .
          <source>Fuzzy Sets and Systems</source>
          ,
          <year>2003</year>
          .
          <volume>140</volume>
          (
          <issue>1</issue>
          ): p.
          <fpage>75</fpage>
          -
          <lpage>91</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Berriz</surname>
            ,
            <given-names>G.F.</given-names>
          </string-name>
          , et al.,
          <article-title>GoFish finds genes with combinations of Gene Ontology attributes</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <year>2003</year>
          .
          <volume>19</volume>
          (
          <issue>6</issue>
          ): p.
          <fpage>788</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Resnik</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <article-title>Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          ,
          <year>1999</year>
          .
          <volume>11</volume>
          : p.
          <fpage>95</fpage>
          -
          <lpage>130</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Rada</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , et al.,
          <article-title>Development and application of a metric on semantic nets</article-title>
          .
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          ,
          <year>1989</year>
          .
          <volume>19</volume>
          (
          <issue>1</issue>
          ): p.
          <fpage>17</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Maedche</surname>
            ,
            <given-names>A.D.</given-names>
          </string-name>
          ,
          <article-title>Ontology Learning for the Semantic Web</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          . Vol.
          <volume>16</volume>
          .
          <year>2002</year>
          , Boston ; Dordrecht ; London: Kluwer Academic.
          <fpage>244</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Hirst</surname>
            , G. and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>St Onge</surname>
          </string-name>
          ,
          <article-title>Lexical Chains as representation of context for the detection and correction malapropisms, in WordNet: An Electronic Lexical Database and some of its applications (Language, Speech,</article-title>
          and Communication), C. Fellbaum, Editor.
          <year>1998</year>
          , The MIT Press: Cambrige, MA, USA.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmer</surname>
          </string-name>
          .
          <article-title>Verbs semantics and lexical selection</article-title>
          .
          <source>in 32nd annual meeting on Association for Computational Linguistics</source>
          .
          <year>1994</year>
          . Las Cruces, New Mexico:
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Zargayouna</surname>
            , H. and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Salotti</surname>
          </string-name>
          .
          <article-title>Mesure de similarité dans une ontologie pour l'indexation sémantique de documents XML. in 15èmes journées francophones d'ingénierie des connaissances IC2004</article-title>
          .
          <year>2004</year>
          . Lyon, France.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Ranwez</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , et al.,
          <article-title>Ontological distance measures for information visualisation on conceptual maps</article-title>
          .
          <source>On the Move to Meaningful Internet Systems</source>
          <year>2006</year>
          :
          <article-title>OTM 2006 Workshops, Pt 2</article-title>
          ,
          <string-name>
            <surname>Proceedings</surname>
          </string-name>
          ,
          <year>2006</year>
          .
          <volume>4278</volume>
          : p.
          <fpage>1050</fpage>
          -
          <lpage>1061</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>An Information-Theoretic Definition of Similarity</article-title>
          . in
          <source>Fifteenth International Conference on Machine Learning</source>
          .
          <year>1998</year>
          : Morgan Kaufmann Publishers Inc.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>W.N.</given-names>
          </string-name>
          , et al.,
          <article-title>Comparison of ontology-based semantic-similarity measures</article-title>
          .
          <source>AMIA Annu Symp Proc</source>
          ,
          <year>2008</year>
          : p.
          <fpage>384</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Pakhomov</surname>
            ,
            <given-names>S.V.</given-names>
          </string-name>
          , et al.,
          <article-title>Towards a framework for developing semantic relatedness reference standards</article-title>
          .
          <source>J Biomed Inform</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Seco</surname>
            , N.,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Veale</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hayes</surname>
          </string-name>
          ,
          <article-title>An intrinsic information content metric for semantic similarity in WordNet</article-title>
          .
          <source>Ecai 2004: 16Th European Conference on Artificial Intelligence, Proceedings</source>
          ,
          <year>2004</year>
          .
          <volume>110</volume>
          : p.
          <fpage>1089</fpage>
          -
          <lpage>1090</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Pesquita</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , et al.,
          <article-title>Semantic similarity in biomedical ontologies</article-title>
          .
          <source>PLoS Comput Biol</source>
          ,
          <year>2009</year>
          .
          <volume>5</volume>
          (
          <issue>7</issue>
          ): p.
          <fpage>e1000443</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Modave</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Grabisch</surname>
          </string-name>
          .
          <article-title>Preference representation by a Choquet integral: Commensurability hypothesis</article-title>
          .
          <source>in 7th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU'98)</source>
          .
          <year>1998</year>
          . Paris, France: Editions EDK, Paris.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Krantz</surname>
            ,
            <given-names>D.H.</given-names>
          </string-name>
          , et al.,
          <source>Foundations of measurement</source>
          . Vol.
          <volume>1</volume>
          : Additive and polynomial representations.
          <year>1971</year>
          : Academic Press, New York.
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Yager</surname>
            ,
            <given-names>R.R.</given-names>
          </string-name>
          ,
          <article-title>Possibilistic decision making</article-title>
          .
          <source>IEEE Trans. on Systems, Man and Cybernetics</source>
          ,
          <year>1979</year>
          (9): p.
          <fpage>388</fpage>
          -
          <lpage>392</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , et al.,
          <article-title>Discovering cancer genes by integrating network and functional properties</article-title>
          .
          <source>BMC Med Genomics</source>
          ,
          <year>2009</year>
          . 2: p.
          <fpage>61</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>