<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Tunis, Department of Computer Science</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>iUvenrisvietrysiotyf oQfaQtaatra</institution>
          ,
          <addr-line>r,FFaaccuultltyy ooff SScciieenncceess,, DepaDretpmaretnmteonft oCfoCmopmuptuetrerSSccieiennccee,,DDoohhaa,, QQaatatar.r</addr-line>
        </aff>
      </contrib-group>
      <fpage>107</fpage>
      <lpage>122</lpage>
      <abstract>
        <p>With the advent of the Web along with the unprecedented amount of information coming from sources of heterogeneous data, Formal Concept Analysis (FCA) is more useful and practical than ever, because this technology addresses important limitations of the systems that currently support users in their quest for information. In this paper, we will focus on the unique features of FCA for searching in distributed heterogeneous information. The development of FCA-based applications for distributed heterogeneous information returns a major gain.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>The information systems these days manage, import, broadcast, exchange and
integrate big volumes of sometimes recorded data, often in different formats (documents,
cards, tables). With the internet development, the institutions are often confronted to
the manipulation and the analysis of important information volumes. These
informations are often coming from heterogeneous data sources and are themselves of
heterogeneous nature. Regarding this heterogeneity, the integration or the simple
exchange of the data is not an easy task if the different intervening (producers or
information consumers) do not agree on the semantic of data. It is therefore very difficult
to research the answer to an information need in all bases.</p>
      <p>In this direction, we are very interested in defining an approach that is focused
particularly on the detection of the similar objects. Furthermore, the important volume
that occupies the heterogeneous data creates gaps and technical difficulties such as
pertinent information deficiency and the loss time for precise information research. In
this context, we propose an analysis and an interpretation approach of the similar
objects allowing jointly to realize a more effective research and to extract
automatically the information from the dispersed sets of heterogeneous data in the framework
of the cooperative work. Our approach is based on the formal concept analysis.</p>
      <sec id="sec-1-1">
        <title>So, this paper is organized as follows. In section 2, we introduce some basic definitions on formal analysis. Then in section 3, we present the related work. Section 4 is devoted to the presentation of proposed system for searching in heterogeneous information. In section 5 and 6, we present the evaluation of our system.</title>
        <p>2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Mathematical Foundations</title>
      <p>
        Among the mathematical theories recently found with important applications in
computer science, lattice theory has a specific place for data organization, information
engineering, data mining and for reasoning. It may be considered as the mathematical
tool that unifies data and knowledge or information retrieval [
        <xref ref-type="bibr" rid="ref1 ref17 ref19 ref22 ref6 ref9">1,4,7,10,18,20,23</xref>
        ]. In
this section, we define formal context, formal concept, Galois connection and the
lattice of concepts associated to the formal context.
      </p>
      <sec id="sec-2-1">
        <title>2.1 Formal Context</title>
        <p>
          Definition 1. A formal context is a triple k = &lt;O,P,R&gt;, where O is a finite set of
elements called objects, P a finite set of elements called properties and R is a binary
relation defined between O and P. The notations (g,m), or R(g,m)=1, mean that
"formal object g verifies property m in relation R" [
          <xref ref-type="bibr" rid="ref11 ref3">3,12</xref>
          ].
        </p>
        <p>Example 1. Let O = {a1, a2, a3, a4, a5, a6} be a set of person of different grade and P =
{b1, b2, b3, b4, b5, b6, b7} be a set of the properties. This context describes the
professional qualifications verified by the persons set according to the binary relation R.
The
f (A) = {m | ∀g, g ∈ A Æ</p>
        <sec id="sec-2-1-1">
          <title>Operator f defines the properties shared by all elements of A. Operator h defines</title>
          <p>
            objects sharing the same properties included in set B. Operators f and h define a
Galois Connection between sets O and P [
            <xref ref-type="bibr" rid="ref11">12</xref>
            ].
          </p>
          <p>
            Proposition 1. Operators f and h define a Galois connection between O and P, such
that if A1, A2 are subsets of O, and B1, B2 are two subsets of P, then f and h verify
the following properties [
            <xref ref-type="bibr" rid="ref11">12</xref>
            ]:
- A1 ⊆ A2 ⇒ f (A1) ⊇ f (A2)
- B1 ⊆ B2 ⇒ h (B1) ⊇ h (B2)
- A1 ⊆ h o f (A1) and B1 ⊆ f o h (B1)
- A ⊆ h (B) ⇔ B ⊆ f (A)
- f = f o h o f and h = h o f o h
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.3 Formal Concept</title>
        <sec id="sec-2-2-1">
          <title>Definition 3. A formal concept of the context &lt;O,P,R&gt; is a pair (A,B), where A ⊆ Ο,</title>
          <p>
            B ⊆ P, such f (A) = B and h (B) = A. Sets A and B are called respectively the
domain (extent) and range (intent) of the formal concept [
            <xref ref-type="bibr" rid="ref11 ref3">3,12</xref>
            ].
          </p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.4 Concept Lattice</title>
        <sec id="sec-2-3-1">
          <title>Definition 4. From a formal context &lt;O,P,R&gt;, we can extract all possible concepts. In</title>
          <p>
            [
            <xref ref-type="bibr" rid="ref11">12</xref>
            ], we prove that the set of all concepts may be organized as a lattice, when we
define the following partial order relation &lt;&lt; between two concepts, (A1,B1) &lt;&lt;
(A2,B2) ⇔ (A1 ⊆ A2 ) and (B2 ⊆ B1). The concepts (A1,B1) and (A2,B2) are called
nodes in the lattice.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>2.5 Objects Similarity</title>
        <sec id="sec-2-4-1">
          <title>The object similarity can be envisioned according to two view points: - The semantic view point: the objects are similar if they have commons properties, - The system view point: to take into account the object model have vector model.</title>
          <p>Semantic Similarity. Definition 5. Let k=&lt;O,P,R&gt; a formal context, O is object set,</p>
        </sec>
        <sec id="sec-2-4-2">
          <title>P is properties set and R is the binary relation between O and P. The similarity</title>
          <p>
            between two objects a and b is considered the commons properties. Let a and b two
elements of O, Pa the verifying properties by the object a and Pb the verifying
properties by the object b. The commons properties between two objects a and b
forms the set Pa∩Pb. The similarity between two objects is calculated with the
following formula [
            <xref ref-type="bibr" rid="ref22">23</xref>
            ]:
          </p>
          <p>Similarity (a, b) =</p>
          <p>Pa ∩ Pb
Pa ∪ Pb
(1)
(2)</p>
        </sec>
        <sec id="sec-2-4-3">
          <title>The similarity is a value in the interval [0,1]. In our system, we use this formula in order to detect the similar documents.</title>
        </sec>
        <sec id="sec-2-4-4">
          <title>Example 2. Let two formal contexts, presented in table 2, defined respectively be</title>
          <p>tween 5 objects {O1, O2, O3, O4, O5} and three properties {A, C, D} and between four
objects {O6, O7, O8, O9} and four properties {A, B, C, E}.</p>
          <p>Similarity (a, b) = cos( a , b ) =
O1
O2
O3
O4
O5</p>
        </sec>
        <sec id="sec-2-4-5">
          <title>System similarity. In order to measure the similarity between two objects a and b, it necessary to take in consideration the different object models. For this reason, we present only the similarity calculation between two objects in the vector seen model the complexity of the others model. [11,21,22,24,25,26,27]</title>
        </sec>
        <sec id="sec-2-4-6">
          <title>Definition 6. The similarity between two objects a and b in the vectorial model [24, 25, 26] is measured as the angle cosines between two vectors presenting those objects.</title>
        </sec>
        <sec id="sec-2-4-7">
          <title>Object Similarity Choice. We mention that the object similarity value whatever the view point system or semantic is a value in the interval [0,1].</title>
        </sec>
        <sec id="sec-2-4-8">
          <title>This object similarity criterean may crold two values: two objects may seen alike or</title>
          <p>different from each other. So, we determine two sets Sim_objet and</p>
        </sec>
        <sec id="sec-2-4-9">
          <title>Dis_objet according to similarities and dissimilarities of an object with an object a:</title>
          <p>Sim_objet (a) = { b ; Similarity (a,b) &gt;= αsim }</p>
          <p>Dis_objet (a) = { b ; Similarity (a,b) &lt; αsim }
where αsim is the threshold that determines the object notion near or distant. In our
work, this threshold is provided by the user. The given value of the research session
means that the user accepts the similar answers with this degree. Seen that we use the
concepts formal analysis as basic foundation of our research approach, we do not
consider the similarity from the system view point but we are very interested in the
similarity from the semantic view point.
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        Using FCA can complement the existing search systems to address some of their
main limitations. Basically, FCA exploits the similarity between documents in order
to offer an automatic support structure (i.e., the document lattice) in which we place
the information retrieval process. The document lattice can be used to improve basic
individual search strategies [
        <xref ref-type="bibr" rid="ref1 ref12 ref2">1,2,4,13</xref>
        ]. Moreover, query refinement is one of the most
natural applications of concept lattices. Its main objective is to recover from the
nulloutput or the information overload problem. The concept lattice may be used to make
a transformation between the representation of a query and the representation of each
document [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7 ref8">5,6,7,8,9</xref>
        ]. The query is merged into the document lattice and each
document is ranked according to the length of the shortest path linking the query to the
document concept. On the other hand, in the set of terms describing the document,
there exist hierarchies in the form of thesaurus [
        <xref ref-type="bibr" rid="ref12 ref13 ref9">4,10,13,14</xref>
        ]. The information search
using FCA takes as input a query that will be forwarded to a selected search engine
[
        <xref ref-type="bibr" rid="ref5 ref6 ref7">6,7,8</xref>
        ]. The first pages retrieved by the search engine in answer to the query are
collected and parsed. At this point, a set of index units that describe each returned
document is generated; such indices are next used to build the concept lattice
corresponding to the retrieved results. The last step consists in showing the lattice to the user and
managing the subsequent interaction between the user and the system. In spite of such
limitations such as for larger information collection, generally we get a huge number
of reference, we are interested in building a FCA-based system for distributed
information, which may affect both the efficiency and the effectiveness of the overall
system [
        <xref ref-type="bibr" rid="ref17 ref18 ref19">18,19,20</xref>
        ]. These systems suppose that a same document is identified in same
manner that presents a strong hypothesis. In order to reduce this constraint, we
proposed a similar object detection method. While basing itself on this last one, we have
defined a cooperative system of heterogeneous information retrieval HIC2RS that will
be described in the next section.
4 Cooperative Conceptual Retrieval System for Heterogeneous
Information
      </p>
      <sec id="sec-3-1">
        <title>We present in this section the cooperative research for heterogeneous information. While considering the formal concept analysis as mathematical foundation, we propose an heterogeneous information conceptual cooperative retrieval system HIC2RS, as illustrated in figure 1, that is composed of two parts:</title>
      </sec>
      <sec id="sec-3-2">
        <title>1) The first part is the cooperative information retrieval system handling lo</title>
        <p>cal databases. The search of the answer to a query consists in applying a
research conceptual approach on every local database. As a result, we will
have concepts set forming the content of a Response vector.</p>
      </sec>
      <sec id="sec-3-3">
        <title>2) The second part is the final answer formulation that operates in two steps : i)</title>
        <p>Similar objects detection based on the Response vector and on the
local databases set, and
ii) The concepts merger based on the similar objects set and operated
according to the similarity threshold given by the user in order to offer the
final answer.</p>
        <sec id="sec-3-3-1">
          <title>4.1 Cooperative Information Retrieval System</title>
          <p>The first part of the system HIC2RS is formed of information retrieval systems set
that cooperate to give the complete answer to a query. Every information retrieval
system has access to a local database on which it applies the Galois connection to
rediscover the satisfactory documents query. This last one is keywords set. To resolve
a query (Qr), every conceptual information retrieval system executes the research
algorithm, presented in the following, on its local database (LD). This application
gives us concepts set forming the Response vector (RV).
Algorithm Research
Inputs: Query: Qr</p>
          <p>Local database: LD
Output: Response Vector: RV
Begin
End</p>
          <p>M := the keywords of LD.</p>
          <p>Ml := M ∩ Qr
RV contains the concept obtained by Galois
connection application on M1.</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>4.2 Final Answer Formulation</title>
          <p>In this section, we present the second part of the system HIC2RS that is the final
answer formulation. The final answer formulation is carried out in two steps: i) the
detection of the similar objects of the Response vector, and ii) the merger of the
different answers based on the Response vector and on the similar objects. The final
answer formulation consists in the application of the algorithm Merge_IH that we
propose on the Response vector basing on the query and on the similarity threshold to
have the final answer.</p>
          <p>Similarity Objects Detection. The similar objects detection consists in examine the
documents that figure in the Response vector and calculating the similarity between
them. From the concepts, we create a similar objects set. This set contains the
similarity degrees between the different documents. The similarity degree calculation
between two documents is based on the formula (1) defined in section 2.5. In fact, seen
that our system is based on the terminologies of the concepts formal analysis, it is
useless to use the similarity from the system view point that depends on used model
to present and search the information such as the vectoriel model. While taking
account of the keywords number of every document and the number of common
keywords between them, the similarity degree between two documents is calculated.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>Answer Merge. Basing on the calculated similarity degrees as well as on the Response vector concepts, we formulate the final answer to the query. The merger is based on algorithm Merge_IH that we propose by the continuation.</title>
      </sec>
      <sec id="sec-3-5">
        <title>This merger algorithm combines the Response vector concepts while respecting cer</title>
        <p>tain conditions. We construct the final answer in a repeated way. Initially, the final
answer is an empty set. We treat the concepts set element by element.
For every element, if the keywords (the extension) of the concept are different of
those of query, we add then the documents (his intention) to the final answer. If this
condition is not satisfied, we search the similar documents to those of other concepts
of the Response vector (the intention) verifying the threshold similarity, and we
calculate the union of the extensions (to obtain the under together keywords). We
continue to construct these sets of similar documents until we find all the query
keywords.</p>
      </sec>
      <sec id="sec-3-6">
        <title>This algorithm has as entry the query, the similarity threshold and the Response vector and as a result the final answer.</title>
        <p>Algorithm Merge_IH</p>
        <p>Inputs : Query: Qr</p>
        <p>Response Vector RV: a concepts set C1 .. CN
Threshold similarity: S</p>
        <p>Output : Final answer: FA
Begin</p>
        <p>FA := ∅•// initialize the final answer
For each concept Ci of RV do</p>
        <p>If extent of concept Ci = Qr then</p>
        <p>Add the intent of Ci to FA
Else</p>
        <p>While exist a concept Cj (j &gt; i) do
- Initialize P by the extent of Ci
- Initialize D by the intent of Ci</p>
        <p>While (P &lt;&gt; Qr and exist a concept Cj) do
- Add the extent of Cj to P
- Search the similar documents, with
the threshold S, between the intent of
the concept Cj and the elements of D:</p>
        <p>D := Similar (D, intent_ Cj,S)
- Pass to the next concept
End do
If (P = Qr ) then</p>
        <p>Add D to FA : FA :=FA ∪ D</p>
        <p>End if</p>
        <p>End do</p>
        <p>End if</p>
        <p>End for</p>
        <p>End.</p>
      </sec>
      <sec id="sec-3-7">
        <title>The similar function consists in looking the similar objects with a similar threshold in</title>
        <p>two objects sets. This research is based on the similar objects set found at the time in
the phase of the similar objects detection. We keep only the objects having a
similarity degree greater than the similarity Threshold. The function is described in
the following and it has as inputs two objects sets A1 and A2 and a similarity
threshold α and as output the set A3.</p>
        <p>Function Similar
Inputs:</p>
        <p>Objects sets: A1, A2.</p>
        <p>Similarity Threshold: S
Output: Objects set: A3
Begin</p>
        <p>A3 := Ø</p>
        <p>For each object di of A1 do
For each object dj of A2 do
- Calculate the similarity between two objects di
and dj :
α :=</p>
        <p>P d i ∩
P d i ∪</p>
        <p>P d j</p>
        <p>P d j
- If α&gt;=S, add objects di and dj and the similarity
α to A3.</p>
        <p>End if</p>
        <p>End for
End for</p>
        <p>Return (A3)</p>
        <p>End.</p>
        <sec id="sec-3-7-1">
          <title>4.3 Illustrative Example</title>
          <p>We take an illustrative example to show the HIC2RS system functionalities. Let the
databases presented in tables 2, 3 and 4. These databases describe documents set
indexed by a keywords set. For the query: "Which documents indexed by the
keywords M2, M3 and M4 having a similarity Threshold 0.33", the query is formed by
three keywords M2, M3 and M4. The treatment of this query is carried out in two
steps.
- Step 1 : Cooperative Research</p>
        </sec>
      </sec>
      <sec id="sec-3-8">
        <title>The research principle is explained in figure 2. Fig. 1. Cooperative Information Retrieval System.</title>
      </sec>
      <sec id="sec-3-9">
        <title>Every conceptual information retrieval system applies algorithm retrieve on its local</title>
        <p>database. The Galois connection application on the query keywords sets existing in
the first database presented in table 3 and the query (M1 = {M2,M3}) gives the
documents set {D1}. The found concept is then ({M2,M3}, {D1}).</p>
      </sec>
      <sec id="sec-3-10">
        <title>For the second local database presented in table 4, the Galois connection application</title>
        <p>for the keywords M2 and M4, the common found keywords between the local
database keywords and those of the query, we give the documents {D6, D9}. So, the result
for this local database is formed by the concept ({M2,M4}, {D6,D9}).</p>
      </sec>
      <sec id="sec-3-11">
        <title>The third local database presented in table 5 contains the keywords M3 and M4. While applying the Galois connection, we find the documents set {D10,D13}. Thus, the concept ({M3,M4}, {D10,D13}) is the result of this research.</title>
      </sec>
      <sec id="sec-3-12">
        <title>We obtain three concepts from different local databases that we find in the Response vector presented by the table 6.</title>
        <p>1
2
3
M2 M3
D1
M2 M4
D6 D9
M3 M4
D10 D13</p>
      </sec>
      <sec id="sec-3-13">
        <title>Basing ourselves on this vector, we construct the final answer.</title>
        <p>- Step 2: Final Answer Formulation</p>
      </sec>
      <sec id="sec-3-14">
        <title>The final answer formulation is realized in two phases: similar objects detection and</title>
        <p>the answers merger.</p>
        <p>Similar objects detection : The Response vector contains three concepts that we
examine one by one. The first concept contains the document D1. We calculate then the
degree of similarity between this document and every document existing in the two
other concepts that are D6, D9, D10 and D13. The same treatment is carried out on the
document D6. We calculate the similarity degree between D6 and D10 then between D6
and D13. The same treatment is done on the document D9. The degrees of calculated
similarities are the following ones:</p>
        <p>Similarity(D1, D6) = 1/3 =0.33; Similarity (D1, D9)= 1/4 = 0.25;
Similarity (D1,D10) = 1/3 = 0.33; Similarity (D1,D13) = 1/3 = 0.33;
Similarity (D6,D10) = 1/3 = 0.33; Similarity (D6,D13) = 1/3= 0.33;</p>
        <p>Similarity (D9,D10) = 1 / 4 =0.25; Similarity (D9,D13) = 1/4 = 0.25;</p>
      </sec>
      <sec id="sec-3-15">
        <title>Answer Merge : We remind that our query is {M2,M3,M4} and the similarity threshold</title>
        <p>is 0.33. Initially, the final answer is an empty set. We treat the first concept of the</p>
      </sec>
      <sec id="sec-3-16">
        <title>Response vector. Its keywords are different from the query. So, we merge those key</title>
        <p>words with those of the second concept and we search the similar documents. The
result of this research is the documents set {D1,D6}, considering that the documents</p>
      </sec>
      <sec id="sec-3-17">
        <title>D1 and D6 are similar with the degree 0.33, and that the keywords union is the set {M2,M3,M4} that is equal to the query. The similarity between D1 and D9 is equal to</title>
        <p>0.25 that is less than the similarity threshold. So, we ignore D9 and we add the found
documents to the final answer. At this step, the final answer is the set {D1,D6}.</p>
      </sec>
      <sec id="sec-3-18">
        <title>Then, we calculate the union of the keywords and the similar documents between the</title>
        <p>first and the third concepts of the Response vector. The merge result is the set
{M2,M3,M4} that is equal to the query. We remark that the documents D10 and D13 are
similar to D1 and to D6 with the degree superior to 0.33. So, we add those documents
to final answer that becomes {D1,D6,D10,D13}.</p>
      </sec>
      <sec id="sec-3-19">
        <title>Thus, we continue with the next concept. We merge the keywords of the second and</title>
        <p>the last concepts. The result is the set {M2,M3,M4}. The similar documents are
{D6,D10,D13} that we add to the final answer. The final answer is now the set
{D1,D6,D10,D13} that will be delivered to the user.</p>
        <p>Remark 1: If we take for example a similarity threshold equal to 0.8, our system
returns an empty answer. This answer explains oneself by the fact that there doesn’t
exist similar objects for this degree. As opposed to the threshold equal to 0.2, the final
answer is then composed by all documents forming the Response vector. This can be
explained by the fact that the similarity degrees between the different documents are
greater than the given value. Thus, our approach considers that the documents set
represent the same knowledge and we evade late the empty answers.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5 Complexity Analysis</title>
      <p>complexities.</p>
      <sec id="sec-4-1">
        <title>5.1 Temporal Complexity</title>
        <sec id="sec-4-1-1">
          <title>In order to evaluate the system HIC2RS, we calculate the temporal and the spatial</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>We suppose that a database has n objects and m properties and we dispose of k local databases.</title>
        </sec>
        <sec id="sec-4-1-3">
          <title>We recall the steps of our system HIC2RS:</title>
          <p>- Phase 1: the concepts research from the different local databases.
- Phase 2: the similar objects detection and the merge of k found concepts.</p>
        </sec>
        <sec id="sec-4-1-4">
          <title>The temporal complexity CT of the system is then:</title>
          <p>CT = CPhase 1(n,m,k) + CPhase 2(n,m,k)</p>
        </sec>
        <sec id="sec-4-1-5">
          <title>The phase 1 needs k×n×m operations and the phase 2 needs k×(k-1)/2+(n×k) opera</title>
          <p>tions. So, the temporal complexity is: CT = k×n×m+k×(n+1)+(n×k) =
(k×n×m)+(k2k)/2+n×k ≈ O(k×n×m+k2) operations. The temporal complexity of the system HIC2RS
is then in order of O(k×n×m+k2) operations.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>5.1 Spatial Complexity</title>
        <sec id="sec-4-2-1">
          <title>The system HIC2RS uses k matrix of n lines and of m columns, a vector of k elements</title>
          <p>as well as a square of dimension n. The system reserves thus (k×n×m)+k+(n×n)
memory cases. So, the spatial complexity of the system HIC2RS is equal to: CS = (n× m×
k+ k+n2).
6</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Evaluation</title>
      <p>The system HIC2RS treats heterogeneous information. Indeed, to remedy the problem
of the existence of different identifications for similar or identical documents, we
proposed a similar objects detection method during the cooperative information
retrieval process. The implementation of this system consists first of in fragmenting a
test collection and next in releasing the retrieval process while supposing that a same
document can have different identifications. This hypothesis is based on unit similar
objects detection. The experiment was conduced on CRAN and MED collections.
The CRAN collection (Cranfield collection) includes a textual corpus that has a size
upper than 1.6Mo. This collection contains 1400 documents and 4612 different terms
and it is tested on 225 queries. The MED collection includes a textual corpus that has
a size upper than 1.1Mo. It contains 1033 scientific articles extracted from the
medicine database domain and 5831 different terms and it is tested on 30 queries. With
experiments done on the MED and CRAN test collections, we noticed that the final
quality of retrieval improved in term precision and recall that in term answer times.
The figure 3 illustrates the precision and recall graph of the MED test collection for
the system treating homogenous information CIRS and HIC2RS.</p>
      <p>1
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0</p>
      <sec id="sec-5-1">
        <title>We note, according to figure 3, that for the MED test collection, the measure of aver</title>
        <p>age precision has 11 reminder points for the system treating information homogenous
(CIRS) is in the order of 43.9%. While, for the system HIC2RS treating information
heterogeneous is on the order of 46.7%. Thus, the similar object detection integration
gives an improvement of average precision on the order of 6.4%.</p>
      </sec>
      <sec id="sec-5-2">
        <title>All the same, experimentations done on the CRAN test collection fragmented showed an improvement of average precision of the CRAN test collection on the order of 7.5%. (figure 4).</title>
        <p>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30</p>
        <p>Query
Fig. 4. The answer time take by the systems CIRS and HIC2RS for the MED test
collection.</p>
        <p>Of even for the CRAN test collection, the answer time take by HIC2RS is lower than
the one take by the system treating information homogenous (to see figures 6).
1
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
450
400
350
300
250
200
150
100
50
0
HIC2RS
CIRS
CIRS
HIC2RS
0,1667 0,2 0,25 0,3333 0,4 0,5 0,6 0,6667 0,8333 1</p>
        <p>Recall</p>
      </sec>
      <sec id="sec-5-3">
        <title>The figure 5 shows that HIC2RS treats different MED test collection queries faster than the conceptual information retrieval system.</title>
        <p>450
400
350
300
250
200
150
100
50
0
CIRS
HIC2RS
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29</p>
        <p>Query
Fig. 5. The answer time take by the systems CIRS and HIC2RS for the CRAN test
collection.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>7 Conclusion</title>
      <sec id="sec-6-1">
        <title>We presented in this paper a conceptual cooperative retrieval system for heterogene</title>
        <p>ous information (HIC2RS). Being given a heterogeneous environment constituted by
a set of information retrieval systems handling each a local database, our approach
allows soliciting these databases in order to have a complete answer to a user query.
In fact, after a query and according to a similarity threshold given by the user, our
system releases conceptual research processes on the different local databases and it
will have as a result a concepts set. Basing on this concepts set and on the similarity
threshold, the system formulates the final answer that it delivers to the user. The
similar objects detection method, that we defined, enriched the returned answers of
different databases. This method improved average precision of 6.4% for the MED test
collection and of 7.5% for the CRAN test collection.</p>
      </sec>
      <sec id="sec-6-2">
        <title>4. Carpineto C. and Romano G., Using Concept Lattices for Text Retrieval and Min</title>
        <p>ing. In the 1st International Conference on Formal Concept Analysis, Darmstadt,
Germany, (2003).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aboud</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chrisment</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Razouk</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Florence</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soulé-Dupuy</surname>
          </string-name>
          ,
          <article-title>Query a Hypertext Information Retrieval System by use of Classification</article-title>
          .
          <source>Information Processing and Management</source>
          ,
          <volume>29</volume>
          (
          <issue>3</issue>
          ), (
          <year>1993</year>
          )
          <fpage>387</fpage>
          -
          <lpage>396</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Amati</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carpineto</surname>
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Romano</surname>
            <given-names>G.</given-names>
          </string-name>
          , FUB at TREC-10
          <string-name>
            <given-names>Web</given-names>
            <surname>Track</surname>
          </string-name>
          :
          <article-title>A Proabilistic Framework for Topic Relevance Term Weighting</article-title>
          .
          <source>In Proceedings of the 10th Text REtrieval Conference (TREC-10)</source>
          , NIST Special Publication 500- 250, Gaithersburg,
          <string-name>
            <surname>MD</surname>
          </string-name>
          , USA (
          <year>2001</year>
          )
          <fpage>182</fpage>
          -
          <lpage>191</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bordat</surname>
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <article-title>Calcul pratique du treillis de Galois d'une correspondance</article-title>
          .
          <source>Math. Sci. Hum</source>
          .,
          <volume>96</volume>
          , (
          <year>1986</year>
          )
          <fpage>31</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          5.
          <string-name>
            <surname>Carpineto</surname>
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Romano</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <article-title>Information retrieval through hybrid navigation of lattice representations</article-title>
          .
          <source>International Journal of Human-Computer Studies</source>
          ,
          <volume>45</volume>
          (
          <issue>5</issue>
          ), (
          <year>1996</year>
          )
          <fpage>553</fpage>
          -
          <lpage>578</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          6.
          <string-name>
            <surname>Carpineto</surname>
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Romano</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <article-title>A lattice conceptual clustering system and its application to browsing retrieval</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>24</volume>
          (
          <issue>2</issue>
          ), (
          <year>1996</year>
          )
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          7.
          <string-name>
            <surname>Carpineto</surname>
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Romano</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <article-title>Effective reformulation of Boolean queries with concept lattices</article-title>
          .
          <source>In Proceedings of the 3rd International Conference on Flexible Query-Answering Systems</source>
          , pages
          <fpage>83</fpage>
          -
          <lpage>94</lpage>
          , Roskilde, Denmark,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          8.
          <string-name>
            <surname>Cole</surname>
            <given-names>R.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>Eklund P.</given-names>
            ,
            <surname>Browsing</surname>
          </string-name>
          semi
          <article-title>-structured web texts using formal concept analysis</article-title>
          .
          <source>In Proceedings of the 9th International Conference on Conceptual Structures</source>
          , Stanford, CA, USA, (
          <year>2001</year>
          )
          <fpage>319</fpage>
          -
          <lpage>332</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          9.
          <string-name>
            <surname>Efthimiadis</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <article-title>Query expansion</article-title>
          . In M. E. Williams, editor,
          <source>Annual Review of Information Systems and Technology</source>
          , v31, American Society for Information Science, Silver Spring, Maryland, USA, (
          <year>1996</year>
          )
          <fpage>121</fpage>
          -
          <lpage>187</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ferrfie</surname>
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ridoux</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <article-title>A file system based on concept analysis</article-title>
          .
          <source>In Proceedings of the 1st International Conference on Computational Logic</source>
          , London,
          <string-name>
            <surname>UK</surname>
          </string-name>
          , (
          <year>2000</year>
          )
          <fpage>1033</fpage>
          -
          <lpage>1047</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          11.
          <string-name>
            <surname>Fuhr</surname>
            and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Buckley</surname>
          </string-name>
          ,
          <article-title>A probabilistic learning approach for document indexing</article-title>
          ,
          <source>ACM Transactions on Information System</source>
          <volume>9</volume>
          ,
          <issue>19991</issue>
          , N°
          <volume>3</volume>
          , pages
          <fpage>223</fpage>
          -
          <lpage>248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          12.
          <string-name>
            <surname>Ganter</surname>
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Wille</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <source>Formal Concept Analysis - Mathematical Foundations</source>
          . Springer,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          13.
          <string-name>
            <surname>Godin</surname>
            <given-names>R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Mili. H.</surname>
          </string-name>
          ,
          <article-title>Building and Maintaining Analysis Level Class Hierarchies Using Galois Lattices</article-title>
          .
          <source>In Proceedings of the 8th Annual Conference on Object Oriented Programming Systems Languages and Applications</source>
          , Washington,
          <string-name>
            <surname>D.C.</surname>
          </string-name>
          , USA, (
          <year>1993</year>
          )
          <fpage>394</fpage>
          -
          <lpage>410</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          14.
          <string-name>
            <surname>Godin</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Missaoui</surname>
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>April</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>Experimental comparison of navigation in a Galois lattice with conventional information retrieval methods</article-title>
          .
          <source>International Journal of Man-Machine Studies</source>
          ,
          <volume>38</volume>
          : (
          <year>1993</year>
          )
          <fpage>747</fpage>
          -
          <lpage>767</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          15.
          <string-name>
            <surname>Godin</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saunders</surname>
            <given-names>E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Jecsei</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <article-title>Lattice model of browsable data spaces</article-title>
          .
          <source>Journal of Information Sciences</source>
          ,
          <volume>40</volume>
          : (
          <year>1986</year>
          )
          <fpage>89</fpage>
          -
          <lpage>116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          16.
          <string-name>
            <surname>Jaoua</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Bsaies</given-names>
            <surname>Kh</surname>
          </string-name>
          ., and Consmtini W.,
          <article-title>May reasoning be reduced to an Information Retrieval problem</article-title>
          .
          <source>Relational Methods in Computer Science</source>
          , Quebec, Canada, (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          17.
          <string-name>
            <surname>Jaoua</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Al-Rashdi</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>AL-Muraikhi</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Al-Subaiey</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Al-Ghanim</surname>
            <given-names>N.</given-names>
          </string-name>
          ,
          <article-title>and AlMisaifri S., Conceptual Data Reduction, Application for Reasoning and Learning</article-title>
          .
          <source>The 4th Workshop on Information and Computer Science</source>
          , KFUPM, Dhahran, Saudi Arabia, (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          18.
          <string-name>
            <surname>Nafkha</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elloumi</surname>
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Jaoua</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>Conceptual Cooperative Information Retrieval System</article-title>
          .
          <source>In International Arab Conference on Information Technology, Doha December 16-19</source>
          , Qatar, (
          <year>2002</year>
          )
          <fpage>534</fpage>
          -
          <lpage>539</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          19.
          <string-name>
            <surname>Nafkha</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elloumi</surname>
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Jaoua</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>Conceptual Information Retrieval System based on cooperative conceptual data reduction</article-title>
          .
          <source>1St International Conference on Information &amp; Communication Technologies : from Theory to Applications</source>
          , Syria, (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          20.
          <string-name>
            <surname>Nafkha</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elloumi</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaoua</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>Using Concept Formal Analysis for Cooperative Information Retrieval</article-title>
          .
          <source>Concept Lattices and their applications Workshop</source>
          (CLA'04),
          <source>VSB-TU Ostrava, September 23th-24th</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          21.
          <string-name>
            <surname>Rijsbergen C.J. Van</surname>
          </string-name>
          ,
          <article-title>A non-classical logic for information retrieval</article-title>
          .
          <source>The Computer Journal 29</source>
          ,
          <year>1986</year>
          , N 6, pages
          <fpage>481</fpage>
          -
          <lpage>485</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          22.
          <string-name>
            <surname>Rijsbergen C.J. Van</surname>
          </string-name>
          ,
          <article-title>A new theorical framework for information retrieval</article-title>
          .
          <source>Proceeding of the 1986-ACM Conference on Research and Development in Information Retrieval</source>
          ,
          <year>1986</year>
          , pages
          <fpage>194</fpage>
          -
          <lpage>200</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          23.
          <string-name>
            <surname>Salton</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <article-title>Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer</article-title>
          .
          <source>Addison Wesley</source>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          24.
          <string-name>
            <surname>Salton</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wang</surname>
          </string-name>
          and
          <string-name>
            <surname>C. S. YANG</surname>
          </string-name>
          ,
          <article-title>A vector space model for automatic indexing</article-title>
          ,
          <source>Communication of the ACM 18</source>
          ,
          <year>1975</year>
          , N°
          <volume>11</volume>
          , pages
          <fpage>613</fpage>
          -
          <lpage>620</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          25.
          <string-name>
            <surname>Salton</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <article-title>Improving Retrieval Performance by Relevance Feedback</article-title>
          .
          <source>Journal of the American Society for Information Science 41</source>
          ,
          <year>1990</year>
          , N°
          <volume>4</volume>
          , pages
          <fpage>288</fpage>
          -
          <lpage>297</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          26.
          <string-name>
            <surname>Salton</surname>
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Buckley</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <article-title>Improving retrieval performance by relevance feedback</article-title>
          .
          <source>Journal of the American Society for Information Science (JASIS)</source>
          . Vol.
          <volume>41</volume>
          , N°
          <volume>4</volume>
          , pages
          <fpage>288</fpage>
          -
          <lpage>297</lpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          27.
          <string-name>
            <surname>Waller</surname>
            <given-names>G. W.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kraft D.H.</surname>
          </string-name>
          ,
          <article-title>A mathematical model of a weighted Boolean retrieval system</article-title>
          .
          <source>Information Processing and Management</source>
          (
          <year>1997</year>
          ),
          <source>N°15</source>
          , pages
          <fpage>235</fpage>
          -
          <lpage>245</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>