<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>E.Chernyak</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>O.Chugunova</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>J.Askarova</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>S. Nascimento</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>B.Mirkin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Birkbeck University of London</institution>
          ,
          <addr-line>London</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Informatics, New University of Lisbon</institution>
          ,
          <addr-line>Caparica</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Division of Applied Mathematics and Informatics, National Research University Higher School of Economics</institution>
          ,
          <addr-line>Moscow, Russian Federation</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>A method for computationally visualizing and interpreting a text or corpus of texts in a taxonomy of the field is described. The method involves such stages as matching taxonomy topics and text(s) by using annotated suffix trees (ASTs), combining multiple information such as text abstracts, key-words and taxonomy cross-references, building clusters of taxonomy topics and their profiles, and lifting the profiles to higher ranks of the taxonomy hierarchy.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The concept of ontology as a computational device for handling domain knowledge
is one of the points of growing interest in machine intelligence. Initially main efforts
of the researchers concentrated on building ontologies; currently the research interests
are shifting towards the usage of ontologies (see, for example, [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). This
paper is aimed at the latter perspective. Our ultimate goal is to devise a system that
would allow the user to use a domain ontology for computational interpretation of a
text or a set of texts from this field. The paper presents some initial stages of our work
on the long path towards achieving the goal. These stages include the following: (a)
selection of the conceptual hierarchy (taxonomy) as a formalization of the concept of
ontology, (b) representation of both texts and taxonomy topics in a unified framework
that facilitates and channels the sifting of the taxonomy topics through the texts to
score matches between them in a comprehensive way, (c) developing quantitative
profiles of the texts, (d) clustering them in a way that does not require much input
from the user, and, in the very end, (e) lifting the profiles of clusters or individual
texts to higher ranks of the hierarchy to visualize and interpret them.
      </p>
      <p>When applying this approach to texts in Russian, additional issues emerge due to
lack of adequate tools for both linguistic analysis and taxonomy development in
Cyrillic alphabet.</p>
      <p>
        The remainder describes the techniques that are being under development and, in
part, is an adaptation of a method in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Our preliminary attempts at applying the
techniques to real data are described too. The conclusion states what has been already
done and issues yet to be tackled.
      </p>
      <p>This paper comprises research findings obtained in the framework of the research
project "Development and adaptation of clustering methods to automate analysis of
unstructured texts using domain ontologies" supported by The NRU Higher School
of Economics 2011-2012 Academic Fund Program. The project was partly supported
by the Program of Fundamental Studies of the NRU Higher School of Economics in
2011.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Method's description</title>
      <sec id="sec-2-1">
        <title>Input information</title>
        <p>There are two inputs to the method: (1) a domain ontology and (2) a domain
related text collection.</p>
        <p>We consider an ontology to be a rooted tree-like structure of topics in the domain,
with the parental nodes corresponding to more general topics than the children.
Besides the hierarchical relation between the topics, other relations might exist. There
can be links between topics from different parts of the tree.</p>
        <p>
          We work with two such sets: (1) the ACM-CCS ontology [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and a collection of
ACM journal abstracts; (2) the VINITI ontology of mathematics and informatics [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
and a collection of teaching syllabuses of mathematics and informatics in The
National Research University Higher School of Economics Moscow (NRU HSE, in
Russian).
        </p>
        <p>The ACM-CCS ontology is a four-layer rooted tree in which three upper layers are
coded as usual, whereas the fourth layer is not coded and can be considered as
consisting of descriptions of the third layer topics. The tree has eleven major topics on the
first layer, such as B. Hardware, C. Computer Systems Organization, etc. They are
subdivided in 81 second-layer topics, which are further divided into third-level topics
or so-called leaf topics. Almost all leaf topics are accomplished by topic descriptors
that are sets of common phrases or terms corresponding to the topic. There are some
cross-references between topics in different partitions. Here is a part of ACM–CCS
ontology related to one of the eleven main subjects, D. Software:</p>
        <p>D. Software
D.0 GENERAL
D.1 PROGRAMMING TECHNIQUES (E)
D.1.0 General
D.1.1 Applicative (Functional) Programming
D.1.2 Automatic Programming (I.2.2)
D.1.3 Concurrent Programming
Distributed programming
Parallel programming</p>
        <p>Three coded layers above are presented by topics D., D.0, D.1.0, etc. The topic
D.1.3 is supplied with the topic description involving two terms. The topics D.1 and
D.1.2 have references to topics E and I.2.2, respectively.</p>
        <p>
          The VINITI ontology of mathematics and informatics is the most extensive
ontology of mathematics domain that is available in Russian [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. It is an unbalanced rooted
tree of mathematics and informatics topics supplied with many cross-references.
        </p>
        <p>Usually, these ontologies are used to annotate documents or publications in large
collections such as the ACM portal library or VINITI journals library. Here we
concentrate on a different aspect of using ontologies – a procedure for abstracting
concepts from text documents.</p>
        <p>Accordingly, we consider two collections of texts.</p>
        <p>
          First, we have taken an issue of ACM Journal on Emerging Technologies in
Computing Systems (JETC), which is a free access journal [
          <xref ref-type="bibr" rid="ref2 ref8">2, 8</xref>
          ]. Each publication is
represented by three items: 1) an abstract; 2) a set of keywords provided by authors; 3) a
set of index terms that are ACM–CCS ontology topics, used on the journal’s web site
to manually index the article. We use both the abstract and keywords to represent the
contents of an article.
        </p>
        <p>Second, we have NRU HSE teaching syllabuses for the courses involving
Mathematics and/or Informatics as they are taught in the School of Applied Mathematics
and Informatics at NRU HSE. They can be easily downloaded from the NRU HSE
web-site (http://www.hse.ru).
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Method’s composition</title>
        <p>
          The method takes in a text, generates its profile, and then proceeds to further stages
described below. The profile is a list of ontology topics generated for the input text.
This is based on estimations of the degree of similarity between ontology topics and
the text derived by using the so-called annotated suffix tree (AST) techniques [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>This is the sequence of the method’s steps:</p>
        <sec id="sec-2-2-1">
          <title>1. texts and ontology preprocessing</title>
          <p>2. presenting the texts as annotated suffix trees (ASTs)
3. evaluating similarity between the ontology topics and the texts according to texts’</p>
          <p>AST features
4. constructing the text profiles
(a) computing the similarity matrix of the ontology topics according to the text
corpus
(b) computing the similarity matrix between the texts
5. finding and analyzing text clusters
6. finding clusters of ontology topics
7. mapping the clusters into higher layers of the ontology structure.</p>
          <p>Steps 5 and 6 can be skipped so that the following applies to both individual texts
and text corpora.
2.3</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Texts preprocessing</title>
        <p>Each text is split into sentences and each ontology topic usually consists of one
sentence. We represent both the texts and ontology as sets of sentences that are taken as
strings. To construct a simple machine representation, two stages need to be
completed:</p>
        <sec id="sec-2-3-1">
          <title>1. extracting meaningful parts from texts;</title>
          <p>2. removing from them the unnecessary symbols such as html tags, punctuation
marks, etc, and transforming them to the lower case.</p>
          <p>While the latter step can be easily done automatically, the first one is
conventionally manual. For example, NRU HSE teaching syllabuses include not only subjects
but some administration issues as well. Another obstacle to automation of the process
is the fact that the teaching syllabuses have no unified template but rather are
formatted and stored in different styles and formats.
2.4</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>Annotated suffix tree representation of a text</title>
        <p>An annotated suffix tree (AST) is a data structure used for computing and
representing of all the text fragments frequencies. AST for a string is a rooted tree, where
each node is labeled with one character and one number. Each path from the root to a
leaf reads/encodes one of the string suffixes. Frequency of a node is the frequency of
fragment occurrences in the string which is read/encoded by the corresponding path
from the root to the node. AST for a collection of strings reads/ encodes every suffix
of each string and their occurrence frequency in all the strings.</p>
        <p>
          Examination of a set consisting of an ontology and a text corpus is done according
to procedures described in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. It involves constructing an AST for each text and
evaluating the relevance of each ontology topic to the text. The details can be found in
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Therefore, the ontology topics assigned with the highest estimations for a text are
selected to form the text’s profile either as a fuzzy set of the estimates or a crisp set of
the selected topics.
        </p>
        <p>In some cases, a text can be seen as a more complicated entity than just a set of
strings. If, as it happens, keywords for a text are provided, one AST may be not
enough to represent the keywords-text combination. The ontology, being a hierarchic
structure with cross-references, should not be treated as a primitive set of strings too.
Here we come to an advanced model of the query set.
2.5</p>
      </sec>
      <sec id="sec-2-5">
        <title>Generating profiles: abstracts, keywords, cross-references</title>
        <p>Consider a journal publication that is represented by its abstract together with
keywords. On the one hand, keywords may be considered as part of the abstract. Hence
after building an AST for the strings of the abstract, keywords can be added one by
one to the tree. On the other hand, keywords can be treated as being apart from the
abstract as a different constituent of the publication. In this case, one should build two
ASTs. First is constructed for strings in the abstract, the second is built for the
keywords. Thus the process of ontology topics evaluation has to be repeated twice, using
both of the created ASTs. These estimations are to be summed, possibly with different
weights, to form the total ontology topics estimation.</p>
        <p>To take into account the third constituent, the cross-references, let us first imagine
an ontology as a graph structure. It is composed of two parts: 1) a tree structure that is
the hierarchic relation between ontology topics; 2) random references between
ontology topics at various layers that can be interpreted as edges of the ontology graph.
Hence let us define the distance between two topics as the length of the path between
them. If there is no such a path, the distance is set to zero. Now suppose that scores
for all the ontology topics are computed. The score of the topic is amended with
the score of the topic
related to
by using the distance between
and</p>
        <p>.</p>
        <p>by distance( N , N1 ) , the score of the
TotalTopicEval(N ) := TopicEval + α distance(N , N1 )TopicEval(N1 ) ,
where
is the ontology topics scoring
Denoting the distance between</p>
        <p>and
topic</p>
        <p>can be set as
α is a constant such that 0 ≤ α ≤ 1 and
function.
2.6</p>
      </sec>
      <sec id="sec-2-6">
        <title>Similarity between ontology topics according to the profiles</title>
        <p>
          The scoring of ontology topics over a text corpus results in a topic-to-text matrix
, where is the total score of topic in text . The columns in the matrix are
referred to as profiles of the texts. This matrix can be transformed into a similarity
matrix of the ontology topics by computing dot products of rows of matrix . This
allows us to use similarity clustering methods, including spectral clustering as
discussed in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>For the sake of simplicity, the procedures of clustering and cluster lifting further
described are based on the similarity matrix (and therefore the clusters) involving
only leaf topics. Therefore, all texts profiles are to be cleaned of upper layer topics
before forming the similarity topic-to-topic matrix.
2.7</p>
      </sec>
      <sec id="sec-2-7">
        <title>Spectral clustering</title>
        <p>
          Additive Fuzzy Spectral Clustering method (FADDIS) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] combines the Additive
Fuzzy Clustering Model and the Spectral Clustering approach. The Spectral
Clustering approach relies on the eigenstructure of the similarity matrix. Additive Fuzzy
Clustering Method finds one cluster at a time by subtracting the similarities taken into
account by preceding clusters from the initial similarity matrix [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Therefore
FADDIS method sequentially finds the cluster membership vector and its intensity
using the maximum eigenvalue and corresponding eigenvector of the residual
similarity matrix. A special attention should be given to the data pre-processing stage:
FADDIS involves the pseudo-inverse Laplace transformation of the initial similarity
matrix. It was shown by experiments that such a transformation may make more clear
the structure of clusters to be extracted.
2.8
        </p>
      </sec>
      <sec id="sec-2-8">
        <title>Query set lifting over ontology</title>
        <p>After a fuzzy, or crisp, topic set is extracted as a cluster or single text profile, this
set is considered as an abstraction query to the ontology: a few topics of the higher
rank are to be found so that the query set is covered, to an extent, by these high-rank
nodes, or “head subjects” representing the query set in as general way as possible. We
refer to such a result, and the process, as “lifting” the query set over the hierarchically
organized ontology.</p>
        <p>
          The lifting algorithm [
          <xref ref-type="bibr" rid="ref5 ref6">5,6</xref>
          ] proceeds according to the assumption that if all or
almost all elements of a set are covered by a high-layer topic, then the set has been
lifted to that very topic.
        </p>
        <p>To conform to this hypothesis we introduce a penalty function to be minimized by
lifting the query topic set to the ontology root. It is defined as the weighted sum of
different types of nodes that do not fit. The “odd” nodes are determined during the
lifting procedure. At the level of leaves we have leaves that either belong to the query
set or not. A topic that generalizes most of the topics in a cluster is algorithmically
interpreted at the head subject for the query set. Those nodes that are covered by a
head subject but do not belong to the set are referred to as gaps. Those nodes that are
not covered by a head subject but do belong to the query set are referred to as
offshoots. The problem is to minimize the total number of head subjects, gaps and
offshoots.</p>
        <p>We denote the number of head subjects by , the number of offshoots, by ,
and the number of gaps, by . Then we recurrently minimize the penalty
function P = h * H + off * O + g * G at each step of the lifting process; here ,
and are the corresponding penalty weights.</p>
        <p>(a)
(b)</p>
        <p>Consider the following example of a three-layer ontology and a query set consisting
of leaves 1, 2, 5, 7 and 9 (Fig. 1). On Fig. 1(a) there is only one head subject that
covers leaves 1, 2, 5, 7 and 9 as well as leaves 3, 4, 6, 8 which are gaps. On Fig. 1(b)
there are 2 head subjects, one offshoot (leaf 5) and 2 gaps: leaves 3 and 8. The
opti107.253 D.2.11 Software Architectures 3 C.1.2 Multiple Data Stream Architectures
105.72 C.1.3 Other Architecture Styles (Multiprocessors)
... ... ...
mal lifting is determined now by the minimal value of penalty in both cases that
depends on the relation between the gap and offshoot penalties.</p>
        <p>Examples of experimental studies</p>
      </sec>
      <sec id="sec-2-9">
        <title>ACM Journal abstracts</title>
        <p>As mentioned above, we downloaded and examined a number of journal
publications. Each of them is represented by an abstract, several keywords and
manually indexed ACM-CCS topics.</p>
        <p>
          This last item gives us a tool to evaluate the machine-constructed profiles, based on
AST-evaluation of ACM–CCS ontology topics, so that we are able to find how well
the estimated topics match those manually selected. Here is an example of profiles for
two journal publications [
          <xref ref-type="bibr" rid="ref2 ref8">2, 8</xref>
          ], one good and the other poor, in Tables 1 and 2.
        </p>
        <p>Each of the tables consists of two parts. The left part presents our machine
generated annotated suffix tree profile (AST–profile), the right one stands for the index
terms, which were used by publications’ authors to annotate the publication. In the
tables: TE is the total score of the ontology topic, ID is the index of the ontology
topic. ‘#" denotes the place of the ontology topic in the descending sorted order of the
profile. We expect that index terms are to be high scored according to their abstracts
by the AST–procedure and to be placed on the top of AST–profile.</p>
        <p>The profile A was constructed for the publication that can be previewed on the
following web page http://portal.acm.org/citation.cfm?id=1265951.
The profile B was generated for the publication on
http://portal.acm.org/citation.cfm?id=1265956. Only five best
scoring ontology topics are present here. However, the AST-profile consists of the all
leaves of the ACM CCS ontology.</p>
        <sec id="sec-2-9-1">
          <title>Ontology topic 127.503 C.4 PERFORMANCE OF SYSTEMS #</title>
          <p>40 C.1.2 Multiple Data Stream</p>
          <p>Architectures
(Multiprocessors)
102.03 B.8.1 Reliability, Testing, and
Fault</p>
          <p>Tolerance
79.475 B.4.5 Reliability, Testing, and Fault- 108 B.4.3 Interconnections</p>
          <p>Tolerance (Subsystems)
76.611 B.8.2 Performance Analysis and</p>
          <p>Design Aids
72.382 B.3.4 Reliability, Testing, and
Fault</p>
          <p>Tolerance
...</p>
          <p>...</p>
          <p>...</p>
          <p>135 B.6.1 Design Styles</p>
          <p>There are two ontology topics for the publication A. One can see them among the
top five ontology topics, on the first and on the third place correspondingly. The
publication B is annotated with three ontology topics that are placed on 40th, 108th and
135th places of the AST-profile. While the profile A should be regarded as more or
less satisfactory, the profile B is totally inadequate. This difference comes from the
difference in the abstracts. The AST–procedure takes into account only matches
between the ontology topic’s substrings and the abstract’s substrings. If no long
substrings match, the whole ontology topic will be scored rather low (a long enough
subsequence is of 5-7 symbols). In the case of publication B, there are hardly any
matches between the ontology topics and the abstract that are longer than 3-4
symbols. In contrast, the abstract of the publication A includes whole words of some
ontology topics, such as ‘parallel’ and ‘architecture’. What is more, it is the involvement
of the common word ‘architecture’ that causes so many unrelated ontology topics to
be high scored too.</p>
          <p>The AST–method is able to detect all the fuzzy matches between an ontology topic
and a text. From this point of view the topics "Single Data Stream Architectures" and
"Multiple Data Stream Architectures (Multiprocessors)" are identical if only substring
"Data Stream Architecture" occurs in the text under examination. The small
difference in their total scores may be caused simply by the presence of shorter substrings
like ‘gle’ or even ‘e’. Here is the main shortcoming of the AST–method. It is not
possible to catch an ontology topic in a text if it is formulated by using other words than
in the ontology.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-2-10">
        <title>Syllabuses for HSE courses in Applied Mathematics and Informatics:</title>
      </sec>
      <sec id="sec-2-11">
        <title>Preliminary Results</title>
        <p>The study of the VINITI Mathematics ontology and the collection of teaching
syllabuses showed several shortcomings, both of the ontology and the syllabuses. After
applying the AST–procedure, we derived the topic-to-topic similarity matrix and
extracted crisp clusters by means of the FADDIS method mentioned above. Here are a
couple of observations. First: Almost each of the crisp clusters contained some topics
from the Topology partition. It means that one or two topologic concepts are studied
in almost every mathematical course. This should imply that a course in Topology
should be included in the curriculum, which is not the case so far. Second: As the
VININTY Mathematics ontology has not been updated since 1980’s, it was expected
that it may have issues in covering more modern topics in mathematics and
informatics. Our analysis suggests several nests that should be possibly added to the ontology.
For example, the topic ‘Lattices’ is a leaf in the current ontology. According to our
results, it should be a parental node with three children: ‘Modular lattices’,
‘Distributive lattices’ and ‘Semimodular lattices’. Third, the VININTY Mathematics ontology
has been found as being rather imbalanced in the coverage. The profiles of the
‘Differential Equations’ and ‘Calculus’ courses according to the ontology are covering all
details. This is no wonder because these two constitute almost half of the ontology.
Yet branches for less classical subjects such as ‘Game Theory’ or ‘Optimization’ are
small and not informative.</p>
        <p>One more observation is that the main teaching subjects have no matches among
the VININTY Mathematics ontology higher ranks. Such is, for instance, ‘Discrete
Mathematics’.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>
        An idea and some initial stages of a different method for abstracting concepts from
text documents are presented. It is based on using ontologies as representation of
knowledge. We try to simulate the process of abstraction of texts in three coherent
steps. First, we match ontology topics to the texts and construct texts profiles by
employing text mining techniques. Next step is performed for a corpus of documents:
considering leaf topics as the base for abstraction, we find clusters of topics. Finally,
we lift cluster query sets to higher layers of the hierarchy to find and visualize head
subjects, along with their gaps and offshoots. The head subjects represent the
abstraction sought by the method. The method is being developed as an adaptation of the
method from [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, a number of novel procedures have been developed in this
work as well. Such are using sentence-by-sentence AST modeling, combining
different aspects of the texts (such as key-words and cross-referencing) into the scoring
system and the like.
      </p>
      <p>The computation experiments lead us to a number of issues that are to be subjects
for further developments. The lack of an adequate taxonomy of Mathematics and
Informatics in Russian is among them. The AST technology suffers from the effects
of repetitive terms such as "architecture", "method", "system" and the like, that act as
noise to falsely raise the similarity scores. On the other hand, the scores are dropping
down when the texts use slightly different terms for the taxonomy topics. This latter
aspect could be treated by using neighbors of the taxonomy topics found in texts
retrieved by search engines when queried with the topics. We expect that the neighbors
would allow not only better scoring in the cases of different terminology, but also
would be useful in filling in the gaps generated by lifting the head subjects. The other
directions for development would be extension of the concept of ontology from the
hierarchy to (semi) lattice structures and finding adequate formalisms for dealing with
situations at which there are several ontologies related to the texts.
5</p>
      <p>List of references</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>ACM</given-names>
            <surname>Computing Classification System</surname>
          </string-name>
          (
          <year>1998</year>
          ), http://www.acm.org/about/class/1998
          <source>(Cited 9 September</source>
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Eshaghian-Wilner M. M.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Khitun</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Navab</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>K. L.</given-names>
          </string-name>
          :
          <article-title>"The spin-wave nanoscale reconfigurable mesh and the labeling problem"</article-title>
          .
          <source>ACM Journal on Emergering Technologies in Computing Systems (JETC) 3</source>
          (
          <issue>2</issue>
          ) (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>I.</given-names>
            <surname>Yu</surname>
          </string-name>
          . Nikol'skaya, V. M.
          <article-title>Yefremenkova: "Mathematics in VINITI RAS: From Abstract Journal to Databases"</article-title>
          .
          <source>Scientific and Technical Information Processing</source>
          <volume>35</volume>
          (
          <issue>3</issue>
          )
          <fpage>128</fpage>
          -
          <lpage>138</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>Mercadé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Espinosa</surname>
          </string-name>
          ,
          <string-name>
            <surname>J-E. Adsuara</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Adrados</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Segura</surname>
            and
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Maes</surname>
          </string-name>
          <article-title>: "Orymold: ontology based gene expression data integration and analysis tool applied to rice"</article-title>
          ,
          <source>BMC Bioinformatics</source>
          ,
          <volume>10</volume>
          :
          <fpage>158</fpage>
          (
          <year>2009</year>
          ) doi:10.1186/
          <fpage>1471</fpage>
          -2105-10-158.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Mirkin</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nascimento</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fenner</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira L. M.</surname>
          </string-name>
          <article-title>: Fuzzy Thematic Clusters Mapped to Higher Ranks in a Taxonomy</article-title>
          .
          <source>International Journal of Software and Informatics</source>
          <volume>4</volume>
          (
          <issue>3</issue>
          ),
          <fpage>257</fpage>
          -
          <lpage>275</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mirkin</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nascimento</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira L.M.</surname>
          </string-name>
          <article-title>: Cluster-lift method for mapping research activities over a concept tree</article-title>
          .
          <source>Recent Advances in Machine Learning II</source>
          ,
          <fpage>245</fpage>
          -
          <lpage>247</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Pampapathi</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mirkin</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levene</surname>
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A suffix tree approach to anti-spam email filtering</article-title>
          .
          <source>Machine Learning</source>
          <volume>65</volume>
          (
          <issue>1</issue>
          ),
          <fpage>309</fpage>
          -
          <lpage>338</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Patwardhan</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dwyer</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lebeck</surname>
            <given-names>A. R.</given-names>
          </string-name>
          :
          <article-title>"A self-organizing defect tolerant SIMD architecture"</article-title>
          .
          <source>ACM Journal on Emergering Technologies in Computing Systems (JETC) 3</source>
          (
          <issue>2</issue>
          ) (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Sato</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sato</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jain L.C.</surname>
          </string-name>
          <article-title>: Fuzzy Clustering Models and Applications</article-title>
          . Physics-Verlag (
          <year>1997</year>
          ).
          <source>ISBN:3790810266</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>V.</given-names>
            <surname>Karkaletsis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fragkou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Petasis</surname>
          </string-name>
          and E.
          <source>Iosif: Ontology Based Information Extraction from Text, Lecture Notes in Computer Science</source>
          , V.
          <volume>6050</volume>
          ,
          <string-name>
            <surname>Knowledge-Driven Multimedia</surname>
          </string-name>
          Information Extraction and Ontology Evolution,
          <fpage>89</fpage>
          -
          <lpage>109</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Von Luxburg</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>A tutorial in Spectral Clustering</article-title>
          . Statistics and Computing,
          <volume>17</volume>
          (
          <issue>4</issue>
          ),
          <fpage>395</fpage>
          -
          <lpage>416</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>