<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Selecting the Best Cluster of a Collection of Technical Reports</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ma. Auxilio Medina</string-name>
          <email>mauxmedina@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>J. Alfredo S</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>anchez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Silvia Titla</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rebeca Rodr</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pedro Vargas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Polit</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad de las Am</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Technical reports are an invaluable resource of information. They represent the work that is carried on by students or researchers in short periods of time. In order to maintain the organization of collections, this paper discusses how the structure and the terms of the collection can be used to select the best cluster for a new technical report. The collection is represented as a lightweight ontology which is a hierarchical structure that clusters similar documents in such a way that the documents of a k -level cluster share the k -terms of the cluster label. The ontology is used to get syntactic and semantic measures.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Technical reports are an invaluable resource of information. Some common
criteria to organize these documents are by number, date, department or author.
An organization based on the content is less frequent.</p>
      <p>For small collections, the organization of documents by content could be
done manually by domain experts, but for large collections, document clustering
techniques are commonly used. These techniques depend of diverse factors as the
domain, the format and the representation of documents. Some collections have
privacy politics that avoid the free access to documents but they allow users to
query the titles, the abstracts and maybe the list of references or cites.</p>
      <p>At the Universidad Polit¶ecnica de Puebla, teachers and students maintain a
collection of technical reports called CORTUPP (Colecci¶on de Reportes T¶ecnicos
de la Universidad Polit¶ecnica de Puebla 3). Technical reports show the work of
students and teachers in short periods of time (approximately eight months).</p>
      <p>
        CORTUPP collection is represented as a lightweight ontology. The term
lightweight indicates that the construction of the ontology requires minimum human
3 CORTUPP collection is available at http://server3.uppuebla.edu.mx/cortupp/index.html
participation, it is often limited to the assignment of values for input parameters
of an ontology learning method [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. 4.
      </p>
      <p>
        In the ¯eld of digital libraries, ontologies have been used in tasks such
integration of information, interoperability on metadata and communication level,
search and browsing of digital collections, or description of resources [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        An ontology creates a machine-accessible data model. The method used to
construct the ontology (called the OntOAIr method) has previously described
in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This method allows human and software agents to organize and
retrieve information from the collection.
      </p>
      <p>The ontology is a hierarchical structure of disjoint clusters of similar
documents. Although the construction time of the ontology is linear with respect to
the number of documents, when this number is large, the construction can
become a time consuming task. This paper discusses how the structure, the terms
of a collection and the ontology can be used to select the best cluster for a new
technical report. We propose the best cluster algorithm (BC algorithm) for this
purpose and the CORTUPP collection as a test bed. The algorithm allows users
to insert new technical reports to avoid the reconstruction of the ontology. In
general, the algorithm can also be perceived as a simple strategy to maintain the
organization of any collection of documents that is represented as a lightweight
ontology.</p>
      <p>The remainder of the document is organized as follows. Section 2 presents
the related work. Section 3 summarizes the ontology learning method called
OntOAIr, which is used to construct the ontology that represents the collection of
technical reports. Section 2 and Section 3 correspond to previous work of the
¯rst two authors. The following sections describe new contributions. Section 4
presents the algorithm to ¯nd the best cluster for a new technical report.
Section 5 discusses preliminary results. Finally, Section 6 includes conclusions and
suggests future directions of our work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        There are relevant works that formally have analyzed the similarity between
abstracts and documents such as [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. In this work, we take a di®erent
point of view. We assume that an ontology represents and organizes a collection
of documents and therefore, we can take advantage of its structure. For this
reason, the related work describe some ontology learning methods.
      </p>
      <p>
        OntoLearn is an ontology learning method that applies a hierarchical
algorithm to a set of documents from dedicated web sites and document warehouses
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The output of the algorithm and WordNet are used to construct domain
ontologies. Unlike the OntOAIr method, the ontologies constructed by OntoLearn
describe concepts with terms composed by two or more words, however, it
requires the existence of previously classi¯ed documents.
4 In the study of formal ontology languages, a lightweight ontology refers to an
ontology written in a formal language that has good computational features but limited
expressive power
      </p>
      <p>
        The authors of [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] propose methods to semi-automatically construct
ontologies from a set of documents which contain competencies of companies;
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] uses the bisecting algorithm and [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] the k-means algorithm. The main
disadvantage of these methods is the stopping criterion of both algorithms (setting
this parameter in real-life scenarios is a hard task). The FIHC algorithm used
in the OntOAIr method does not require a stopping criterion.
      </p>
      <p>
        Another group [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] propose a method based on the incremental use of
kmeans algorithm to construct ontologies from HTML documents. This method
takes into account the implicit semantic structure of HTML labels to extract
terms. Unlike the documents used by [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] which refer to particular domains such
as tourism or medicine, the documents processed by the OntOAIr method can
belong to di®erent domains.
      </p>
      <p>
        The authors of [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] propose the semantic grow-bag approach to create
lightweight ontologies called topic categorization systems. The approach uses the
terms provided by authors of digital objects to compute a new co-occurrence
metric, ¯nds relations between terms based on the co-occurrence metric and
constructs graphs that represent the neighborhood of the terms. The OntOAIr
method has not been applied to annotated documents.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>The OntOAIr method</title>
      <p>
        We have developed an ontology learning method called OntOAIr. It uses
simpli¯ed representations of documents, an adaptation of the Frequent Itemset-based
Hierarchical Clustering algorithm (FIHC)[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and ontological engineering
techniques.
      </p>
      <p>The OntOAIr method is universal in the sense that it can be used for di®erent
domains, languages and applications. The four main tasks of this method are the
following: (1) harvesting, (2) representation, (3) clustering and (4) formalization.
These tasks are brie°y described as follows:
3.1</p>
      <sec id="sec-3-1">
        <title>Harvesting</title>
        <p>The harvesting task obtains the documents from collections. Collections
typically have hundreds or thousands of abstracts that would need to be
transmitted through the network. For this reason, we assume that harvesting is an
asynchronous task.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Representation</title>
        <p>
          The representation task constructs a vectorial representation called feature vector
for each harvested document. The name is adopted from [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. A feature vector is
a simpli¯ed representation of a document formed by terms (all terms other than
stop-words) and weights (numeric values that represent the relevance of terms).
3.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Clustering</title>
        <p>
          The clustering task applies the Frequent Itemset-based Hierarchical Clustering
algorithm (FIHC) to produce a tree of clusters. FIHC is an agglomerative
clustering algorithm proposed by [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. This algorithm is based on the hypothesis that
\if a group of documents refers to the same topic, the documents would share a
set of terms". The sets of shared terms are called frequent itemsets. The FIHC
algorithm produces a hierarchical structure of non-overlapping clusters. It
requires of two mandatory input parameters called global support (the percentage
of documents in a collection that contains a frequent itemset), and cluster
support (the percentage of documents in a cluster that contains a frequent itemset).
The values of input parameters must be experimentally determined by the user.
&lt;!ELEMENT ontologyofrecords (algorithm, cluster+)&gt;
&lt;!ATTLIST ontologyofrecords
date CDATA #REQUIRED &gt;
&lt;!ELEMENT algorithm EMPTY&gt;
&lt;!ATTLIST algorithm
name CDATA #FIXED ``FIHC''
globalsupport CDATA #REQUIRED
clustersupport CDATA #REQUIRED&gt;
&lt;!ELEMENT cluster
(label, level, record*, cluster*)&gt;
&lt;!ELEMENT label (#PCDATA)&gt;
&lt;!ELEMENT level (#PCDATA)&gt;
&lt;!ELEMENT record
(title, subject?, description?,
identifier, url, dataprovider,
metadataformat, datestamp&gt;
&lt;!ELEMENT title (#PCDATA)&gt;
&lt;!ELEMENT subject (#PCDATA)&gt;
&lt;!ELEMENT description (#PCDATA)&gt;
&lt;!ELEMENT identifier (#PCDATA)&gt;
&lt;!ELEMENT url (#PCDATA)&gt;
&lt;!ELEMENT dataprovider (#PCDATA)&gt;
&lt;!ELEMENT metadataformat (#PCDATA)&gt;
&lt;!ELEMENT datestamp (#PCDATA)&gt;
&lt;!ENTITY generatedBy ``OntoSIR 2.1''&gt;
3.4
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Formalization</title>
        <p>
          The formalization task transforms the tree of clusters into a lightweight
ontology. We have explored the use of XML, RDF and OWL languages to represent
the ontologies constructed by the OntOAIr method. We have analyzed the use
of these languages in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. In this work, we have chosen the XML
representation which is described in Table 1. This document type de¯nition (DTD)
was ¯rst proposed in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Table 1 is described as follows:
1. The date attribute and the algorithm element describe the data about the
data of the ontology of records (the metadata)
2. At least one cluster element is needed
3. The record element represents a document
4. Each cluster has a label, a level and zero or more records
5. The subject and description elements are optional. The rest of the
elements are required.
        </p>
        <p>The ontologies constructed by the OntOAIr method are hierarchical
structures that cluster similar documents in such a way that the documents of a
k -level cluster share the k -terms of the cluster label. As a way of illustration,
Figure 1 shows a small ontology produced by the OntOAIr method. The boxes
contain a term of the cluster labels. At the ¯rst level, the ontology has three main
clusters and three labels, processing, robotics and simulation, respectively. At the
following levels, the clusters represent more specialized topics. For example, the
processing cluster is divided into two clusters, each one with the following labels:
images processing and language processing. The language processing cluster
includes the natural language processing cluster which belongs to the third level,
and so on. Note that the boxes only contains a term of the label, however, the
terms of its ancestors are also part of the cluster label.</p>
        <p>The ontology is a hierarchical structure, therefore common operations such
as insertion, elimination or search of an element can be implemented. The next
section describes how this structure can be used to select the best cluster for a
new document.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Selecting the best cluster</title>
      <p>The BC algorithm uses the ontology constructed by the OntOAIr method to
select the best cluster for a new technical report. We believe that the terms of
cluster labels de¯ne a vocabulary that describes the topics.</p>
      <p>The algorithm assumes that the ontology has been constructed previously.
Before processing, stop words and any case sensitivity are removed from the new
document. The algorithm uses the following notation.
4.1</p>
      <sec id="sec-4-1">
        <title>Notation</title>
        <p>{ A technical report tr of m terms is represented as a tuple tr (t1; t2; :::tm)
where m is the number of terms.
{ The length of tr denoted by length(tr ) is m, the number of terms of tr
{ An ontology of records is represented as O
{ l(c) indicates that l is the label of the cluster C
{ S(tr,l) denotes the similarity function between a technical report tr and a
cluster label l
{ CL(l) denotes the level of the cluster with label l in the ontology
{ Lc is a list of clusters
4.2</p>
        <p>The BC algorithm
input: tr : a new technical report, O : an ontology of records
output: bc, the best cluster for tr
begin
1. Determine the set C of clusters c of level 1 such that ti == l(c), 8 1 · i ·
m // select the clusters at the ¯rst level such that their label coincide with a
term of the new technical report
2. For each c 2 C do
{ Add the label l of each descendent of c (concatenating the labels of
ancestor clusters) to the list of labels Lc
3. For each label l 2 Lc do</p>
        <p>{ Apply S(tr, l)
4. Find the cluster in Lc with the highest S(tr, l), bc
5. Return bc</p>
        <p>The function to measure the similarity between a label l of the cluster c and
a technical report tr(t1; t2; :::; tm) is the following:</p>
        <p>S(tr; l(c)) =</p>
        <p>X w(ti)
ti
if ti == l(c), 8 1 · i · m
where w(ti) represents the weight of the i -term of tr de¯ned as follows:
w(ti) =
8 1:5 if ti appears in the title and abstract of tr
&gt;&lt;&gt; 1:0 if ti only appears in the title of tr</p>
        <p>0:5 if ti only appears in the abstract of tr
&gt;
&gt;: 0:0 if ti do not appear in the title neither in the abstract of tr
Two notes about the BC algorithm are the following:
{ There is not a formal criteria that explains the assigned values to the weights
of the terms of a technical report. We propose these values experimentally
assuming that a term in the title is more representative than a term in the
abstract.
{ The BC algorithm applies the similarity function S(tr; l) only to cluster
labels because in the construction process of the ontology, the terms of the
labels are the most representative terms of the collection.
4.3</p>
      </sec>
      <sec id="sec-4-2">
        <title>Finding similar technical reports:</title>
        <p>The BC cluster can be used to estimate the similarity between a new technical
report (ntr ) and the elements of the BC cluster (tr's). The purpose of this task
is to identify duplicated documents. If all the terms of a document are also
included in the new technical report and their weight is the same, then the new
technical report should not be inserted into the collection.</p>
        <p>We assume that each element tr of BC and the new technical report (ntr ) are
represented as tuple of terms: tr (t1; t2; :::tm) and ntr (n1; n2; :::nk), respectively.
The function to measure the similarity between a tr and ntr hS(tr, ntr) is the
following:
hS(tr, ntr) =
length(tr) ¤ 100
length(ctr)
;
where length(ctr) is the number of terms in ntr that are also in tr. The hS(tr,
ntr) function assumes that the terms of a tr in the BC cluster are more
representative than the terms in the new technical report, thus, new terms of ntr are
discarded.</p>
        <p>This section presents an algorithm to ¯nd similar technical reports called
FSTR algorithm.</p>
        <p>input: ntr : a new technical report, BC cluster
output: Astr, an array with the titles, identi¯es and a value that represents the
similarity between ntr and the technical reports of the BC cluster
begin
1. For each tr 2 BC do
{ Apply hS(tr, ntr)
{ Construct an object str formed by the title of tr, its identi¯er and the
value or hS(tr, ntr)
{ Insert str to the array of str objects Astr
2. Find in Astr the tr with the highest value of hS(tr, ntr)</p>
        <sec id="sec-4-2-1">
          <title>The next section describes the preliminary results.</title>
          <p>5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Preliminary results</title>
      <p>
        The OntOAIr method is used to construct an ontology for the CORTUPP
collection with the following input parameters: minimum support 20%, cluster support
25% and global support 5%. The values of these parameters were experimentally
determined in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>The constructed ontology has the following characteristics:
{ Number of nodes at the ¯rst level: 9
{ Number of levels: 4
{ Number of clusters: 17
{ Total number of documents: 32</p>
      <p>We propose two experiments to validate the BC algorithm:</p>
      <sec id="sec-5-1">
        <title>Experiment 1:</title>
        <p>We randomly select 10 of 32 documents of di®erent clusters. Each selected
document was manually removed from the ontology and then it was considered a
new technical report. The best cluster was found in seven cases correctly (70%),
however there were three cases (30%) where the best cluster identi¯ed was an
ancestor of the original cluster.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Experiment 2:</title>
        <p>
          We propose a human validation for the identi¯cation of the best cluster using
the cluster labels of the ontology. We required the participation of four persons,
each one read the title and the abstract of 4 di®erent technical reports. We
requested to these persons to select the most appropriate label related with the
title and abstract that they had read. We use the categories proposed by [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] to
classify the answers as follows:
correct An answer is correct (C) if the best cluster identi¯ed by the BC
algorithm was the same cluster selected by the person
somehow related An answer is somehow related (S) if the best cluster
identi¯ed by the BC algorithm and the cluster selected by the person have a
common ancestor
wrong An answer is wrong (W) if the unique common cluster is the root cluster
of the ontology
can not tell An answer is can not tell (N) if the person could not associate the
document to any cluster
        </p>
        <p>
          The results of experiment 2 are the following: 40% answers were correct, 30%
were identi¯ed as somehow related, 10% as incorrect and 20% as cannot tell. The
performance of BC algorithm is estimated using the following formula [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]:
(jCj + 0:5jSj)
(jCj + jSj + jW j)
where |C| is interpreted as the number of correct answers, |S| is the
number of somehow related and so on. Thus, the performance of BC algorithm
is 0.68. The previous formula gives a score of 1 if the questions that are not \N"
rated are all considered of type (C), and a score of 0 if they are all considered
of type (W).
        </p>
        <p>A test bed system called (Best Cluster system (BCS)) has been implemented
to provide users with an environment to carry on the following tasks:
(1)</p>
        <sec id="sec-5-2-1">
          <title>1. Open the XML ¯le of the ontology 2. Select the best cluster 3. Find similar technical reports</title>
          <p>6</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In this paper, we proposed an algorithm to select the best cluster in a collection
represented as an ontology. We have described the ontology learning method
that constructs the ontology and we have presented the ontology structure using
XML language.</p>
      <p>The BC algorithm applies a similarity function only to cluster labels. It is
appropriate for collections where the number of clusters is ¯xed or when the
organization of clusters has been validated. However, for a large number of new
documents, the reconstruction of the ontology would be recommended.</p>
      <p>The output of the BC algorithm can be used to estimate the similarity
between a new technical report and the documents of the best cluster. This task
can be interpreted as a simple plagiarism mechanism.</p>
      <p>We have constructed a prototypical system that implements the BC
algorithm and we have presented preliminary results using the CORTUPP collection
as the data set.</p>
      <p>The BC algorithm can be applied to any collection that can be represented as
an ontology constructed by the OntOAIr method. We expect that the algorithm
supports e±ciently the maintenance of collections.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>P. R.</given-names>
            <surname>David Pinto</surname>
          </string-name>
          , H¶ector Jim¶enez-Salazar.
          <article-title>Clustering abstracts of scienti¯c texts using the transition point technique</article-title>
          .
          <source>In Computational Linguistics and Intelligent Text Processing</source>
          , volume
          <volume>3878</volume>
          , pages
          <fpage>536</fpage>
          {
          <fpage>546</fpage>
          . Lecture Notes in Computer Science,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>J.</given-names>
            <surname>Diederich</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Balke</surname>
          </string-name>
          .
          <article-title>The semantic growbag algorithm: Automatically deriving categorization systems</article-title>
          .
          <source>In Proceedings of the 11th European Conference on Research and Advanced Technology for Digital Libraries'07 (ECDL'07</source>
          , Budapest, Hungary, September), volume
          <volume>4675</volume>
          of Lecture Notes in Computer Science, pages
          <volume>33</volume>
          {
          <fpage>40</fpage>
          , Berlin,
          <year>September 2007</year>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>B.</given-names>
            <surname>Fung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ester</surname>
          </string-name>
          .
          <article-title>Hierarchical document clustering using frequent itemsets</article-title>
          .
          <source>In Proceedings of the Third SIAM International Conference on Data Mining</source>
          ,
          <source>(SDM'03</source>
          , San Francisco, California, May),, pages
          <fpage>59</fpage>
          {
          <fpage>70</fpage>
          , San Francisco, CA, USA, May
          <year>2003</year>
          . SIAM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>L.</given-names>
            <surname>Karouri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aufaure</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Bennacer</surname>
          </string-name>
          .
          <article-title>Context-based hierarchial clustering for the ontology learning</article-title>
          .
          <source>In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence</source>
          ,
          <source>(WI'06)</source>
          , pages
          <fpage>420</fpage>
          {
          <fpage>427</fpage>
          ,
          <string-name>
            <surname>Hong</surname>
            <given-names>Kong</given-names>
          </string-name>
          , Japan,
          <year>December 2006</year>
          . IEEE Conference Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>O.</given-names>
            <surname>Lassila</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          .
          <article-title>The role of frame-based representation on the semantic web</article-title>
          .
          <source>LinkoÄping Electronic Articles in Computer and Information Science</source>
          ,
          <volume>6</volume>
          (
          <issue>5</issue>
          ),
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. P. Ljubi^c, N. Lavra^c, J.
          <string-name>
            <surname>Plisson</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Mladeniae</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bollhalter</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Jermol</surname>
          </string-name>
          .
          <article-title>Automated structuring of company competencies in virtual organizations</article-title>
          .
          <source>In Proceedings of the Conference on Data Mining and Data Warehouses 2005 (SiKDD</source>
          <year>2005</year>
          , Ljubljana, Slovenia, October, pages
          <volume>190</volume>
          {
          <fpage>193</fpage>
          . 7th
          <source>International Multi-conference on Information Society IS'05</source>
          ,
          <year>October 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Medina</surname>
          </string-name>
          .
          <article-title>OntOAIr: Construction of Lightweight Ontologies to Support Information Retrieval from Multiple Collections of Documents</article-title>
          .
          <source>PhD thesis</source>
          , Universidad de las Am¶
          <fpage>ericas</fpage>
          - Puebla,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>M.</given-names>
            <surname>Medina</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Cha¶vez, and</article-title>
          <string-name>
            <given-names>R.</given-names>
            <surname>Ch</surname>
          </string-name>
          <article-title>¶avez. Construction, implementation and maintenance of ontologies of records</article-title>
          .
          <source>In Proceedings of the Fourth Latin American Web Congress (LA-WEB'06</source>
          ,
          <string-name>
            <surname>Puebla</surname>
          </string-name>
          , M¶exico, May), pages
          <fpage>67</fpage>
          {
          <fpage>73</fpage>
          . IEEE Computer Society, May
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>M.</given-names>
            <surname>Medina</surname>
          </string-name>
          and
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>S¶anchez</article-title>
          . Ontoair:
          <article-title>a method to construct lightweight ontologies from document collections</article-title>
          .
          <source>In Proceedings of the Ninth Mexican International Conference on Computer Science</source>
          <year>2008</year>
          , (
          <issue>ENC</issue>
          08,
          <string-name>
            <surname>Baja</surname>
            <given-names>California</given-names>
          </string-name>
          ¶³, M¶exico, October),
          <source>page 12</source>
          . IEEE Computer Society,
          <year>October 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>M. Medina</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>S¶anchez, A. Ch¶avez, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Ben</surname>
          </string-name>
          <article-title>¶³tez. Designing ontological agents: an alternative to improve information retrieval in federated digital libraries</article-title>
          .
          <source>In Proceedings of the Atlantic Web Intelligence Conference</source>
          <year>2004</year>
          , (AWIC'04,
          <string-name>
            <surname>Cancn</surname>
          </string-name>
          , Mxico, May)., pages
          <volume>155</volume>
          {
          <fpage>163</fpage>
          , May
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>M. Medina</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>S¶anchez, and</article-title>
          <string-name>
            <given-names>J.</given-names>
            <surname>Paz</surname>
          </string-name>
          .
          <article-title>Document retrieval from multiple collections by using lightweight ontologies</article-title>
          .
          <source>In Proceedings of the Fifteenth International Conference on Computing (CIC-2006)</source>
          , pages
          <fpage>141</fpage>
          {
          <fpage>146</fpage>
          . IEEE Computer Society, November
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>M. Medina</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Sa¶nchez, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Ram</surname>
          </string-name>
          <article-title>¶³rez. Describing document hierarchies by using markup languages</article-title>
          . In Taller de tecnolog¶
          <article-title>³as del lenguaje humano</article-title>
          .
          <source>Proceedings of the Seventh Mexican International Conference on Computer Science</source>
          <year>2006</year>
          , (
          <issue>ENC</issue>
          06, San Luis Potos¶³, M¶exico, September), pages
          <fpage>31</fpage>
          {
          <fpage>37</fpage>
          . IEEE Computer Society,
          <year>September 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>R.</given-names>
            <surname>Mizoguchi</surname>
          </string-name>
          .
          <article-title>Tutorial on ontological engineering - part 1: Introduction to ontological engineering</article-title>
          . New Generation Computing,
          <string-name>
            <surname>OhmSha</surname>
          </string-name>
          &amp;Springer,
          <volume>21</volume>
          (
          <issue>4</issue>
          ):
          <volume>365</volume>
          {
          <fpage>384</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Velardi</surname>
          </string-name>
          .
          <article-title>Learning domain ontologies from document warehouses and dedicated web sites</article-title>
          .
          <source>In Computational Lingistics</source>
          , volume
          <volume>30</volume>
          , pages
          <fpage>151</fpage>
          {
          <fpage>179</fpage>
          . MIT Press,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>D.</given-names>
            <surname>Pinto</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-M. Bened</surname>
          </string-name>
          <article-title>¶³, and</article-title>
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <article-title>Clustering narrow-domain short texts by using the kullback-leibler distance</article-title>
          .
          <source>In CICLing</source>
          , pages
          <volume>611</volume>
          {
          <fpage>622</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>J. Plisson</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Mladeniae</surname>
            , P. Ljubi^c, N. Lavra^c, and
            <given-names>M. Grobelnik.</given-names>
          </string-name>
          <article-title>Using machine learning to structure the expertise of companies: Analysis of the yahoo! business data</article-title>
          .
          <source>In Conference on Data Mining and Data Warehouses (SiKDD</source>
          <year>2005</year>
          )
          <article-title>Proceedings</article-title>
          , pages
          <volume>186</volume>
          {
          <fpage>189</fpage>
          . 7th
          <source>International Multi-conference on Information Society IS'05</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>R.</given-names>
            <surname>Soricut</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Brill</surname>
          </string-name>
          .
          <article-title>Automatic question answering: Beyond the factoid</article-title>
          . In D. M.
          <article-title>Susan Dumais and</article-title>
          S. Roukos, editors,
          <source>HLT-NAACL 2004: Main Proceedings</source>
          , pages
          <volume>57</volume>
          {
          <fpage>64</fpage>
          , Boston, Massachusetts, USA, May 2 - May 7
          <year>2004</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>