<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Metric Generalization and Modification of Classification Algorithms Based on Formal Concept Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Evgeny Kolmakov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>FCA-based classifiers can deal with nonbinary data representation in different ways: use it directly or binarize it. Those algorithms that binarize data use metric information from the initial feature space only as a result of scaling (feature binarization procedure). Metric approach in this area allows one significantly reducing classification refusals number and provides additional information which can be used for classifier training. In this paper we propose an approach which generalizes some of existing FCA classification methods and allows one to modify them. Unlike other algorithms, the proposed classifier model uses initial metric information together with order object-attribute dependencies.</p>
      </abstract>
      <kwd-group>
        <kwd>classification</kwd>
        <kwd>pattern recognition</kwd>
        <kwd>formal concept analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Formal concept analysis (FCA) is a branch of applied lattice theory allowing one
to formalize some machine learning models. It provides tools to solve various
tasks in many domains of computer science, such as knowledge representation
and management, data mining, including classification and clustering. There are
many FCA-based classification algorithms known [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. One of the particular
features of FCA methods is that object x ∈ X is being described using binary
attributes. However, in many cases attributes can be, e.g., real numbers, graphs,
etc. There are classification methods using nonbinary representation directly,
e.g., see these works on pattern structures [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], but many classifiers use it
only after scaling procedure. The scaling procedure is the transformation of the
initial feature space F into the Boolean cube Bn. It leads to the significant loss
of the metric information provided by F space. In this paper we propose
generalizations and modifications of several FCA-based classifiers, which use scaling
procedure, by introducing new classifier model on the basis of class estimates. It
generalizes straight hypotheses-based algorithm [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and both of GALOIS
classification procedures [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We also define the pseudometric on arbitrary finite lattice,
which is based on the ideas from Rulearner rules induction algorithm [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and so
has intelligible interpretation in terms of formal concepts and concept lattice.
      </p>
      <p>In what follows we keep to standard lattice theory and FCA definitions.
Therefore here we briefly describe some basic definitions, classifiers and introduce
the notation which is used further. Let G and M be an arbitrary sets called the
set of objects and the set of attributes respectively and I ⊆ G × M be a binary
relation. The triple K = (G, M, I) is called a formal context. The following (·)′
mappings define a Galois connection between 2G and 2M sets partially ordered
by set-theoretic inclusion:</p>
      <p>A′ = {m ∈ M | gIm for all g ∈ A} , B′ = {g ∈ G | gIm for all m ∈ B} .
A pair (A, B), such that A ⊆ G, B ⊆ M and A′ = B, B′ = A is called a formal
concept of K with formal extent A and formal intent B. For object g ∈ G we
write g′ instead of {g}′. Define the “projection” mappings ext : (A, B) 7→ A and
int : (A, B) 7→ B. Formal concepts of a given context K form a complete lattice
denoted by B(K). It is called the concept lattice of a context K. Let hL, ∧, ∨i
be a lattice and x ∈ L. By x▽ (x△) we denote the order ideal (filter) generated
by x. By At(L), J (L) and M (L) we denote the set of all atoms, join-irreducible
and meet-irreducible elements of L respectively. The function f : L → R is called
supermodular if f (x) + f (y) 6 f (x ∨ y) + f (x ∧ y) for all x, y ∈ L.</p>
      <p>
        A concept C is called consistent if all objects in ext(C) belong to the same
class. Both GALOIS classification procedures are described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. GALOIS(1)
calculates the similarity ΓC (x) between an object x and each consistent concept
C, then x is assigned to the class corresponding to C with the highest value of
ΓC (x). GALOIS(2) finds all consistent concepts C satisfying int(C) ⊆ x′, then
x is assigned to the most numerous class in the previous set.
      </p>
      <p>
        Let K = (G, M, I) be a context and w ∈/ M be a target attribute. The input
data for classification may be described by three contexts w.r.t. w: the positive
context K+ = (G+, M, I+), the negative context K− = (G−, M, I−) and the
undefined context Kτ = (Gτ , M, Iτ ). G−, G+ and Gτ are sets of positive, negative
and undefined objects respectively. Iǫ ⊆ Gǫ × M , where ǫ ∈ {−, +, τ } are binary
relations that define structural attributes. Galois operators in these contexts are
denoted by (·)+, (·)−, and (·)τ respectively. A formal concept of a positive
context is called a positive concept. Negative and undefined concepts are defined
similarly. If the intent B+ of a positive concept (A+, B+) is not contained in the
intent g− of any negative example g ∈ G−, then it is called a positive
hypothesis with respect to the property w. A positive intent B+ is called falsified if
B+ ⊆ g− for some negative example g ∈ G−. Negative hypotheses are defined
similarly. By “hypotheses-based classifier” we mean the classification procedure
from [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which can be described as follows. If unclassified object g ∈ Gτ contains
a positive but no negative hypotheses, it is classified positively, similar for
negative. If g does not contain any positive or negative hypothesis (insufficient data)
or contains both positive and negative hypotheses (inconsistent data), then no
classification happens.
2
      </p>
      <p>Generalization and Modification of Algorithms
The common drawback of the FCA-based classifiers using binary features
after scaling is that they forget the initial feature space metric structure. The
main idea of this paper is to use this metric information together with
ordertheoretic relations between objects and attributes provided by a concept lattice.
It is important that F and Bn spaces with additional structures (metric and
formal context) are being used at the same time, providing more possibilities for
classifier training methods.
2.1</p>
      <p>Metric estimates
Denote by H+ and H− the sets of concepts constructed from a training set,
which intents are positive and negative hypotheses respectively. We assume that
(F , ρ) is a metric space, and let S(x, A) be the similarity measure (based on
the metric from F , see examples in Section 3) between an object x and a set of
objects A. Let us define the estimates for positive and negative classes:
Γ+(x) =</p>
      <p>X I(x, C)S(x, ext(C)),
Γ−(x) =</p>
      <p>X I(x, C)S(x, ext(C)),
C∈H+</p>
      <p>C∈H−
where I(x, C) = [int(C) ⊆ x′] and [·] is the indicator function. Then the classifier
will have the following form: a(x) = sign Γ (x) = sign(Γ+(x) − Γ−(x)).
Proposition 1. If hypotheses-based classifier correctly predicts class label for an
object then a(x) = sign Γ (x) does the same.</p>
      <p>In comparison with hypotheses-based classifier the number of classification
refusals is reduced, but the total error rate can increase.
2.2</p>
      <p>Analogy with algorithms based on estimate calculations
To calculate the estimates in the method above we use positive and negative
hypotheses sets, i.e. special subsets of concept lattice. Such calculation of estimates
can be generalized to an arbitrary concept lattice subsets somehow
characterizing individual classes y ∈ Y .</p>
      <p>Let C be the set of concepts which we call the support concepts system.
Suppose that each concept from C characterizes only one class y ∈ Y , that
is C = Fy∈Y Cy, where Y is the set of classes. Then define the estimate of object
x for class y as follows:
Γy(x) = X S(x, C).</p>
      <p>
        C∈Cy
The classifier will have the following form: a(x) = arg maxy∈Y Γy(x). The
estimates of this type are similar to the estimates used in estimate calculations
methods [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and the sets Cy are the support sets analogues.
      </p>
      <p>Consider specific examples of support concepts system C, similarity measure
S(x, C) and analyze corresponding classifiers:
1. C = H+ F H− are positive and negative hypotheses sets,</p>
      <p>S(x, C) = [int(C) ⊆ x′]Sˆ(x, ext(C)), where Sˆ(x, ext(C)) is the given
similarity measure. The corresponding classifier was described above.
2. C = Fy∈Y Cy is the consistent concepts set.</p>
      <p>If S(x, C) = | (M \ int(C)) ∪ x′| we get modified GALOIS(1) algorithm.</p>
      <p>If S(x, C) = [int(C) ⊆ x′] we get GALOIS(2) algorithm.
2.3</p>
      <p>Analogy with metric classifiers
Let C = {C1, . . . , Cn} be the support concepts system. Suppose that there is the
distance measure ρ in F space. Sort C in increasing order w.r.t. the values of the
distance ρ(x, Ci) between object x and concepts Ci:</p>
      <p>ρ(x, Cx(1)) 6 ρ(x, Cx(2)) 6 · · · 6 ρ(x, Cx(n)),
where Cx(i) is the i-th neighbour of x among C, yx(i) is the class, characterized by
Cx(i) concept. Define the estimate of object x for class y:</p>
      <p>n
Γy(x) = X wi(x)[yx(i) = y],</p>
      <p>i=1
wi(x) is x i-th neighbour weight (positive function non-increasing w.r.t. i).</p>
      <p>The defined estimates are completely analogous to the metric classifiers
estimates, except that the neighbours here are not objects but support concepts.</p>
      <p>
        Thus choosing the suitable weights wi(x) we get analogs of all known metric
classifiers (kNN, Parzen window, potential functions and others), but in terms
of concepts. For example:
– wi(x) = [i 6 k] is k nearest neighbours method;
– wi(x) = [i 6 k]wi is k weighted nearest neighbour method (wi depends only
on the neighbour number);
– wi(x) = K( ρ(xh,(Cxx()i)) ) is Parzen window method (K(z) is non-increasing
positive-valued function defined on [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ], h(x) is the window width).
      </p>
      <p>All the proposed methods are the generalizations of the existing methods
and can be used for their modifications. They use both metric information from
F and object-attribute dependencies provided by concept lattice. This allows to
reduce the number of classification refusals and error rate.
2.4</p>
      <p>
        Pseudometric on the set of concepts
Another approach which uses the notion of similarity in FCA algorithms is to
define a distance function on the set of concepts. In Rulearner algorithm ([
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) the
most important characteristics of concept lattice element u were the value of the
function cover(u) = |J (L) ∩ u▽| and M (L) ∩ u△ set. The comparison of lattice
elements is performed on the basis of these characteristics. In the case of reduced
context, this ties up with a fact, that every concept is characterized by its extent
(distinct objects correspond to join-irreducible elements) and intent (distinct
features correspond to meet-irreducible elements). Thus, cover(u) corresponds to
the number of objects from training set covered by the concept u, and M (L)∩u△
corresponds to the attributes characterizing u. We use these observations to
define the distance function on an arbitrary finite lattice. Due to the propositions
dual to theorems 3.1 and 3.3 from [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], the following theorem holds.
Theorem 1. Let hL, ∧, ∨i be a lattice and f : L → R is isotone and
supermodular function, then df (x, y) = f (x) + f (y) − 2f (x ∧ y) defines a pseudometric on
this lattice.
      </p>
      <p>Consider arbitrary finite lattice hL, ∧, ∨i, non-empty subset D ⊆ L and a
function f : L → Z+, defined as follows: f (x) = |D(x)|, where D(x) = D ∩ x▽.
Proposition 2. The function f (x) is isotone and supermodular.
Proof. The isotone property of f follows from the following chain of implications:
x 6 y ⇒ x▽ ⊆ y▽ ⇒ D(x) ⊆ D(y) ⇒ f (x) = |D(x)| 6 |D(y)| = f (y).</p>
      <sec id="sec-1-1">
        <title>To prove supermodularity consider the following:</title>
        <p>f (x)+f (y) = |D(x)|+|D(y)| = |D(x)∪D(y)|+|D(x)∩D(y)| 6 f (x∨y)+f (x∧y).
To prove the last inequality observe that D(x) ∪ D(y) ⊆ D(x ∨ y) follows from
the following inclusions:</p>
        <p>x 6 x ∨ y ⇒ D(x) ⊆ D(x ∨ y), y 6 x ∨ y ⇒ D(y) ⊆ D(x ∨ y).</p>
        <p>The equality D(x) ∩ D(y) = D(x ∧ y) follows from x▽ ∩ y▽ = (x ∧ y)▽.
⊓⊔</p>
        <p>Thus, according to the theorem above, the function f (x) induces the
pseudometric df (x, y) on the lattice, defined by the following equality:
df (x, y) = f (x) + f (y) − 2f (x ∧ y).</p>
      </sec>
      <sec id="sec-1-2">
        <title>The value of df (x, y) has simple interpretation.</title>
        <p>Proposition 3. df (x, y) = |D(x) ⊕ D(y)|, where A ⊕ B = (A \ B) ∪ (B \ A).
Proof. From the proof of the proposition above we conclude that the equality
D(x) ∩ D(y) = D(x ∧ y) holds. Consider the chain of equalities:
f (x) + f (y) − 2f (x ∧ y) = |D(x)| + |D(y)| − 2|D(x ∧ y)| =
= |D(x)| + |D(y)| − 2|D(x) ∩ D(y)| =
= |D(x) ∪ D(y)| + |D(x) ∩ D(y)| − 2|D(x) ∩ D(y)| =
= |D(x) ∪ D(y)| − |D(x) ∩ D(y)| = |D(x) ⊕ D(y)|.
⊓⊔
Corollary 1. If hL, ∧, ∨i is a finite Boolean algebra and D is the set of all atoms
of L, then df (x, y) is exactly the Hamming distance.</p>
        <p>In order to compare formal concepts it is reasonable to choose D = J (L)
or D = At(L). In terms of this pseudometric two concepts are the closer, the
less object concepts are covered by only one of them. Moreover, the cover(u)
and df (x, y) functions are tied: cover(u) = df (u, V L). One of the drawbacks of
the defined distance measure is that the number of elements from D covered by
x ∧ y is not taken into account. In some cases it may lead to inadequate distance
estimates.</p>
        <p>Possible modifications:
1. Various normalizations to take the number of elements into account, e.g.:
2. Weighting elements of D, e.g. let D = J (L) and we be the proportion of the
hypotheses covering e = (g′′, g′). Then d(x, y) will have the following form:
d(x, y) =
|D(x) ⊕ D(y)|
|D(x) ∪ D(y)|</p>
        <p>.
d(x, y) =</p>
        <p>X</p>
        <p>we.</p>
        <p>e∈D(x)⊕D(y)</p>
        <p>The distance between concepts can be applied to modify the classification
algorithms mentioned above. For example, let an object x be classified with
hypotheses-based algorithm. Suppose there are two positive hypotheses H1+, H2+
and two negative hypotheses H1−, H2− for the classification of x. In this case
the algorithm refuses to classify x. Suppose we know the concept distances
d(H1+, H2+), d(H1−, H2−) and also d(H1+, H2+) ≫ d(H1−, H2−). Then it is natural
to classify x as positive, because the distant concepts (in terms of the proposed
measure) are less ”correlated” (since they cover many distinct object concepts),
and hence their answers are more significant. Distance between concepts can also
be used for reducing the size of concepts system (used by classifier, e.g.
consistent concepts) in order to improve generalization ability of classifier, reduce the
overfitting and remove concepts based on noisy data.
3</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Experiments</title>
      <p>
        In this section the experimental results are presented. The algorithms have been
tested on two data sets taken from UCI Machine Learning Repository [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]: SPECT
and SPECTF Heart Data Set (training set consists of 80 objects, testing set
consists of 187 objects, 22 binary attributes in SPECT, 44 real-valued attributes
in SPECTF) and Liver Disorders Data Set (training set consists of 150
objects, testing set consists of 195 objects, 6 real-valued attributes, 30 binary
attributes (after scaling)). Tested algorithms: GALOIS(1, 2), Rulearner, straight
hypotheses-based algorithm, modified GALOIS(1) (described by the second
example in Section 2.2), modified hypotheses-based algorithm with metric
estimates (described in Section 2.1 with different similarity functions). Euclidian
metric ρ(x, y) was used in F space in both experiments. Similarity function:
S(x, C) = K(ρ(x, C), a), where K(r, a) and ρ(x, C) are one of the following
functions:
      </p>
      <p>K1(r, a) =</p>
      <p>1
1 + exp(ar)</p>
      <p>K2(r, a) =</p>
      <p>1
r + a</p>
      <p>.
ρ1(x, C) = inf ρ(x, c),
c∈C
ρ2(x, C) =
ρ3(x, C) = sup ρ(x, c).</p>
      <p>c∈C</p>
      <p>We introduce the following notation: νc is the proportion of classified objects,
νr = 1 − νc is the proportion of refused classifications, et is total error rate
(including refusals), er is the error rate among classified objects.</p>
      <p>Algorithm νc νr et er
GALOIS(1) 1 0 0.1604 0.1604
Modified GALOIS(1) 1 0 0.0856 0.0856
GALOIS(2) 1 0 0.0802 0.0802
Rulearner 0.7487 0.2513 0.2727 0.0286
Hypotheses-based 0.5936 0.4064 0.6150 0.1842
K = K1, a = 0.0125, ρ = ρ1 0.8021 0.1979 0.3155 0.1467
K = K1, a = 0.0125, ρ = ρ2 0.8021 0.1979 0.2888 0.1133
K = K1, a = 0.0125, ρ = ρ3 0.8021 0.1979 0.2834 0.1067
K = K1, a = 1, ρ = ρ2 0.7273 0.2727 0.3422 0.0956
K = K2, a = 1, ρ = ρ1 0.8021 0.1979 0.2941 0.1200</p>
      <p>K = K2, a = 1, ρ = ρ2 0.8021 0.1979 0.3209 0.1533</p>
      <p>
        The aim of the experiments was to compare FCA classification methods
and not to achieve low error rate in solving particular tasks. Hence we used
simple scaling procedure: normalizing all attributes to [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] interval and then
applying interval-based nominal scaling (the number of intervals was chosen to
be 5). It explains high error rate of all classifiers in the second task. Individual
scaling (e.g. scaling with floating-size intervals) for each task may significantly
reduce error rate, but this work is not focused on this problem. From the results
above we may conclude that for hypotheses-based algorithm modifications the
number of refusals is substantially reduced together with total error rate et.
Modified GALOIS(1) classifier improved GALOIS(1) on the first data set and
disimproved it on the second. This may be due to the different nature of binary
data description: in the first case 22 binary attributes were obtained from 44
real-valued using complex binarization procedure, while in the second one this
procedure was very simple. The choice of K(r, a) and ρ(x, C) affects only et but
not νr, hence their accurate selection may improve classification quality.
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>In this paper we have formally described and experimentally studied a new
approach to classification which encompasses the usage both of metric information
provided by the initial feature space and the order object-attribute
dependencies. Also we have defined the pseudometric on arbitrary finite lattice, which
has intelligeble interpretation in terms of concepts and hence can be used for
comparing concepts in order to improve FCA classification methods. Further
developments can be focused on studying of classifiers obtained from the
proposed model by fixing the support concepts system C and the similarity measure
S(x, C) and on the possibilities of choosing such support concepts system that
allows to construct only a part of concept lattice.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>S.O.</surname>
          </string-name>
          <article-title>Kuznetsov: Complexity of Learning in Concept Lattices from Positive and Negative Examples</article-title>
          .
          <source>Discrete Applied Mathematics</source>
          ,
          <year>2004</year>
          , No.
          <volume>142</volume>
          (
          <issue>13</issue>
          ), pp.
          <fpage>111</fpage>
          -
          <lpage>125</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Sahami: Learning classification Rules Using Lattices</article-title>
          .
          <string-name>
            <given-names>N.</given-names>
            <surname>Lavrac</surname>
          </string-name>
          and S. Wrobel eds., pp.
          <fpage>343346</fpage>
          ,
          <string-name>
            <surname>Proc</surname>
            <given-names>ECML</given-names>
          </string-name>
          , Heraclion, Crete,
          <source>Greece (April</source>
          <year>1995</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C.</given-names>
            <surname>Caprineto</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Romano: GALOIS An order-theoretic approach to conceptual clustering</article-title>
          .
          <source>In proceedings of ICML93</source>
          , pp.
          <fpage>3340</fpage>
          , Amherst, USA (
          <year>July 1993</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaytoue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.O.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Duplessis</surname>
          </string-name>
          ,
          <article-title>Mining gene expression data with pattern structures in formal concept analysis</article-title>
          .
          <source>Information Sciences</source>
          , Volume
          <volume>181</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>10</given-names>
          </string-name>
          <source>, 15 May</source>
          <year>2011</year>
          , pp.
          <fpage>1989</fpage>
          -
          <lpage>2001</lpage>
          , Information Science,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>S.O.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          ,
          <article-title>Scalable Knowledge Discovery in Complex Data with Pattern Structures</article-title>
          . In: P.
          <string-name>
            <surname>Maji</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
            ,
            <given-names>M.N.</given-names>
          </string-name>
          <string-name>
            <surname>Murty</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          , S.K. Pal, Eds.,
          <source>Proc. 5th International Conference Pattern Recognition and Machine Intelligence</source>
          (PReMI'
          <year>2013</year>
          ), Lecture Notes in Computer Science (Springer), Vol.
          <volume>8251</volume>
          , pp.
          <fpage>30</fpage>
          -
          <lpage>41</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>O.</given-names>
            <surname>Prokasheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Onishchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gurov</surname>
          </string-name>
          .
          <article-title>Classification methods based on Formal Concept Analysis</article-title>
          .
          <source>FCAIR 2013 Formal Concept Analysis Meets Information Retrieval. Workshop co-located with the 35th European Conference on Information Retrieval (ECIR</source>
          <year>2013</year>
          ).
          <source>March 24</source>
          ,
          <year>2013</year>
          , Moscow, Russia. National Research University Higher School of Economics, pp.
          <fpage>95</fpage>
          -
          <lpage>104</lpage>
          . ISSN 1613-
          <fpage>0073</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Yu</surname>
          </string-name>
          .I.
          <article-title>Zhuravlev: An Algebraic Approach to Recognition and Classification Problems</article-title>
          , in Problems of Cybernetics, Issue
          <volume>33</volume>
          (
          <issue>Nauka</issue>
          , Moscow,
          <year>1978</year>
          ), pp.
          <volume>568</volume>
          [In Russian].
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Dan</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <article-title>Simovici: Betweenness, Metrics and Entropies in Lattices</article-title>
          .
          <source>Proceedings of the 38th International Symposium on Multiple Valued Logic</source>
          .
          <fpage>22</fpage>
          -
          <issue>24</issue>
          <year>May</year>
          ,
          <year>2008</year>
          , Dallas, TX, USA. IEEE Computer Society Washington, pp.
          <fpage>26</fpage>
          -
          <lpage>31</lpage>
          . ISSN 0195-623X
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Bache</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Lichman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2013</year>
          ). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>