<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Patterns via Clustering as a Data Mining Tool</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lars Lumpe</string-name>
          <email>larslumpe@gmail.com1</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan E. Schmidt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institut fu ̈r Algebra, Technische Universita ̈t Dresden</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Research shows that pattern structures are a useful tool for analyzing complex data. In our paper, we present a new general framework of using pattern structures as a data mining tool and as an application of the new framework we show a way to handle a classification problem of red wines.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Pattern structures within the framework of formal concept analysis have been
introduced in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Since then they have been the subject of further
investigations like in [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref2 ref9">2, 9, 11, 10, 12</xref>
        ] and have turned out to be a useful tool for analyzing
various real-world applications (cf. [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7 ref8">4–8</xref>
        ]). In this paper, we want to present a
new application. But first we will introduce a general way to construct a pattern
structure. Then we are going to use several clustering algorithm to find
important pattern in it. Further we will use this pattern to build a model to solve a
classification problem. In particular, we will take the dataset from [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and train
an algorithm to predict the quality of red wines.
      </p>
      <sec id="sec-1-1">
        <title>We start with some general definitions:</title>
        <p>Definition 1 (restriction). Let P := (P, ≤P) be a poset, then for every set U
the poset</p>
        <p>P | U := (U, ≤P ∩(U × U ))
is called the restriction of P onto U .</p>
        <p>If we consider a poset of patterns, it often arises as a dual of a given poset.
Definition 2 (opposite or dual poset). Let P := (P, ≤P) be a poset. Then we
call</p>
        <p>Pop := (P, ≥P) with ≥P:= {(q, p) ∈ P × P | p ≤ q}
the opposite or dual of P.</p>
        <p>Definition 3 (Interval Poset). Let P := (P, ≤P) be a poset. Then we call</p>
        <p>IntP := {[p, q]P | p, q ∈ P } with [p, q]P := {t ∈ P | p ≤P t ≤P q}
the set of all intervals of P, and we refer to IntP := (IntP, ⊆) as the interval
poset of P.</p>
        <p>Remark: (i) Let P be a poset. Then IntP is a lower bounded poset, that is, ∅ is
the least element of IntP, provided P has at least 2 elements. Furthermore if P
is a (complete) lattice than so is IntP.
(ii) If A is a set of attributes than (RA, ≤) := (R, ≤)A is a lattice. With (i) it
follows that Int(RA, ≤) is a lower bounded lattice.</p>
        <p>Definition 4 (kernel operator). A kernel operator on a poset P := (P, ≤) is
a map γ : P → P such that for all x, y ∈ P :
kx ≤ y ⇔ kx ≤ ky
(1)</p>
        <p>A subset ζ of P is called a kernel system in P if for every x ∈ P the
restriction of P onto {t ∈ ζ | t ≤ x} has a greatest element.</p>
        <p>Remark: A closure operator on P := (P, ≤) is defined as a kernel operator
on Pd, and a closure system in P is defined as a kernel system in Pd.</p>
        <p>The main definitions of FCA are based on a binary relation I between a
set of so called objects G and a set of so called attributes M . However, in many
real-world knowledge discovery problems, researchers have to deal with data sets
that are much more complex than binary data tables. In our case, there was a set
of numerical attributes, such as the amount of acetic acid, density, the amount
of salt, etc., describing the quality of a red wine. To deal with this kind of data,
pattern structures are a useful tool.</p>
        <p>Definition 5 (pattern setup, pattern structure). A triple P = (G, D, δ) is a
pattern setup if G is a set, D = (D, v) is a poset, and δ : G → D is a map. In
case every subset of δG := {δg | g ∈ G} has an infimum in D, we will refer to P
as pattern structure.</p>
        <p>
          For pattern structures, an important complexity reduction is often provided
by so-called o-projections:
Proposition 1 (o-projection). For a pattern structure P := (G, E, ε) and a
kernel operator κ on E, the triple
with D := E|D where D := κE and
opr(P , κ) := (G, D, δ)
δ : G → D, g 7→ κ(εg)
is a pattern structure, called the o-projection of P via κ (see [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]).
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Construction of a pattern structure</title>
      <p>The following definition establishes the connection between pattern setups and
pattern structures.</p>
      <p>Definition 6 (embedded pattern structure). Let E be a complete lattice and
P := (G, D, δ) a pattern setup with D := E|D. Then we call</p>
      <p>Pe := (G, E, D, δ) with δ : G → L, g 7→ δg
the embedded pattern structure. We say the pattern setup P is embedded
in the pattern structure (G, E, δ).</p>
      <p>This definition shows, that it is possible to build a pattern structure from a
given pattern stup. The construction below is another demonstration of how to
build a pattern structure from a pattern setup.</p>
      <p>Construction 1. Let G be a set and let L := (L, ≤) be a poset, then for every
map % : G → L the elementary pattern structure is given by:</p>
      <p>P% := (G, E, ε) with E := (2L, ⊇) and ε : G → 2L, g 7→ {%g}.</p>
      <p>Hence, the pattern setup (G, L, %) is embedded in the pattern structure
(G, E, ε). In many cases this construction leads to a large set of patterns.
Therefore we need the following: Let ζ be a closure system in 2L := (2L, ⊆), that is,
ζ is a kernel system in E and let γ : 2L → 2L be the associated closure operator
of ζ w.r.t. 2L. Thus, γ is a kernel operator on E. Then (G, D, δ) is a pattern
structure for D := E|ζ and</p>
      <sec id="sec-2-1">
        <title>Indeed the map</title>
        <p>is a residual map from E to D with δ = ψ ◦ ε.</p>
        <p>By the above proposition, (G, D, δ) is a pattern structure, since
δ : G → D, g 7→ γ{g}.
ψ : 2L → ζ, X 7→ γX
opr(P%, γ) = (G, D, δ)
is the o-projection of P% via γ.
4</p>
        <p>
          Connection to Data Mining and to our Dataset
In this section we describe how we use the construction 1 to handle a typical
data mining classification problem. We train a model to predict classes of red
wines. But first we give an insight to our data.
4.1
To apply our previous results on a public data set we choose the red wine data
set from [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. There are 1599 examples of wines, described by 11 numerical
attributes. The input includes objective tests (e.g. fixed acidity, sulphates, PH
values, residual sugar, chlorides, density, alcohol...) and the output is based on
sensory data (median of at least 3 evaluations made by wine experts). Each
expert graded the wine quality between 0 (very bad) and 10 (very excellent). For
our purpose, we established a binary distinction where every wine with a quality
score above 5 is classified ”good” and all below as ”bad”. This led to a set of 855
positive and 744 negative examples. We split the data into a training set (75%
of the examples) and a test set (25% of the examples).
4.2
        </p>
        <p>Describing the Proceeding
Many data mining relevant data sets (like the red wine dataset) can be described
via an evaluation matrix:
Definition 7 (evaluation map, evaluation setup). Let G be a finite set, M a
set of attributes and Wm := (Wm, ≤m) a complete lattice for every attribute
m ∈ M . Further, let</p>
      </sec>
      <sec id="sec-2-2">
        <title>Then, a map such that</title>
        <p>W := Y</p>
        <p>m∈M
α : G → W, g 7→</p>
        <p>Wm and W :=</p>
        <p>Wm.</p>
        <p>Y
m∈M</p>
        <p>
          Y
m∈M
{αmg}
αm : G → Wm, g 7→ wm
is called evaluation map. We call E := (G, M, W, α) an evaluation setup.
Example 1. In the wine data set in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] we can interpret the wines as a set
G, the describing attributes as the set M and Wm as the numerical range of
attribute m with the natural order.
        </p>
        <p>In the above example the the evaluation map</p>
        <p>α : G × M → W
assigns to every wine bottle the values of all attributes m ∈ M . This map is a
good starting point for an elementary pattern structure with</p>
        <p>L := Y
m∈M</p>
        <p>Wm and L :=</p>
        <p>Wm.</p>
        <p>Y
m∈M
Thus, E := (2L, ⊇) is the dually ordered power set of vectors with values of
the attributes, which describe the wine. On E we installed the following kernel
operator</p>
        <p>γ : 2L → 2L, X 7→ [infLX, supLX]L,
As a matter of fact,
is the dual interval lattice of L and the map δ is given by
This leads to the o-projection of the elementary pattern structure P% via γ, that
is,</p>
        <p>(G, D, δ) := opr(P%, γ).</p>
        <p>D = (D, ⊇) with D = IntL</p>
        <p>δ : G → D, g 7→ {%g}.</p>
        <p>Often the dual power set lattice E is too large for applications. Therefore, we
concentrate on relevant patterns in D, that is, in the dual interval lattice of E.</p>
        <p>To identify important patterns in E for the red wine classification, we looked
at the positive examples of the training set and combined the results of different
clustering algorithms implemented in python. In particular, we used a k-means
algorithm, k-medoids algorithm (with metrices Mahalanobis, Euclidean and
correlation), a Gausian Mixture Model and a Bayesian Gaussian Mixture Model
to cluster the good red wines. Furthermore we interprete the leaves of decision
trees (with Gini Impurity and entropy as splitting measure) as cluster of wines
to find important patterns in E for our case. The same cluster algorithm can
lead to different output clusters; this is a result of the different metrics, which
were used to measure the distance and the randomly choosen starting points of
the algorithms. Hence we tried every algorithm 5 times. For every attempt we
used different specifications. The number of clusters for the k-medoids, k-means,
Gausian Mixture Model and Gausian Mixture Model algorithm is set randomly
between 2 and 50. For the decision trees we set the number of examples in a leaf
to at least 100. This leads to more than 700 clusters in E.</p>
        <p>Via the kernel operator on E:</p>
        <p>
          γ : 2L → 2L, X 7→ [infLX, supLX]L
we get patterns in D of the clusters in E. Since γ is a closure operator on the
power set 2L := (2L, ⊆) we can think of the patterns as closures of clusters.
On the next step we eliminated all clusters with less than 100 wines. Then we
looked at the ratio of good examples (wines with a scoring of 5 or better) and
all examples (good and bad) in the patterns and took the five patterns with the
best ratio. These patterns are listed below. The range of the attributes is printed
in red. For a better interpretability we scaled every attribute to the range [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ].
        </p>
        <p>Fig. 1. Interval 1: decision tree (entropy), 108 wines, 108 good and 0 bad
Combining these 5 intervals to predict the classes of the test set leads to the
following confusion matrix:
The following table presents a comparison of our method to other algorithms.</p>
        <p>Our method is easy to interprete and leads to the second best precision of
all listed algorithms, but the recall value is the worst under all methods. More
patterns would probably lead to a better recall, but likely worsen the precision.
Further investigations are needed to find the best collections of patterns for
different usecases (e.g. maximize accuracy). The here presented proceeding is just
an example of building a model from our framework. Hopefully, further
investigations show, that it is possible to create stronger models with our framework.
We introduced a new general framework for the application of pattern structures.
Then we gave an example how this general framework can be used to predict the
quality of red wines. In the presented way the pattern structures can be a useful
tool in analysing data. As shown here they are capable to give good predictions
and the good interpretability makes them even more powerful.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>T.S.</given-names>
            <surname>Blyth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.F.</given-names>
            <surname>Janowitz</surname>
          </string-name>
          (
          <year>1972</year>
          ), Residuation Theory, Pergamon Press, pp.
          <fpage>1</fpage>
          -
          <lpage>382</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A.</given-names>
            <surname>Buzmakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. O.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Napoli</surname>
          </string-name>
          (
          <year>2015</year>
          ) ,
          <article-title>Revisiting Pattern Structure Projections</article-title>
          .
          <source>Formal Concept Analysis. Lecture Notes in Artificial Intelligence (Springer)</source>
          , Vol.
          <volume>9113</volume>
          , pp
          <fpage>200</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>P.</given-names>
            <surname>Cortez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cerdeira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Matos</surname>
          </string-name>
          , J.
          <string-name>
            <surname>Reis</surname>
          </string-name>
          (
          <year>2009</year>
          ),
          <article-title>Modeling wine preferences by data mining from physicochemical properties</article-title>
          .
          <source>Decision Support Systems</source>
          <volume>47</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>547</fpage>
          -
          <lpage>553</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>B.</given-names>
            <surname>Ganter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. O.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          (
          <year>2001</year>
          ),
          <article-title>Pattern Structures and Their Projections</article-title>
          .
          <source>Proc. 9th Int. Conf. on Conceptual Structures</source>
          , ICCS'01,
          <string-name>
            <given-names>G.</given-names>
            <surname>Stumme and H. Delugach</surname>
          </string-name>
          (Eds.).
          <source>Lecture Notes in Artificial Intelligence (Springer)</source>
          , Vol.
          <volume>2120</volume>
          , pp.
          <fpage>129</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>T. B. Kaiser</surname>
            ,
            <given-names>S. E.</given-names>
          </string-name>
          <string-name>
            <surname>Schmidt</surname>
          </string-name>
          (
          <year>2011</year>
          ),
          <article-title>Some remarks on the relation between annotated ordered sets and pattern structures</article-title>
          .
          <source>Pattern Recognition and Machine Intelligence. Lecture Notes in Computer Science</source>
          (Springer), Vol.
          <volume>6744</volume>
          , pp
          <fpage>43</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaytoue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. O.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Duplessis</surname>
          </string-name>
          (
          <year>2011</year>
          ),
          <article-title>Mining gene expression data with pattern structures in formal concept analysis</article-title>
          .
          <source>Information Sciences (Elsevier)</source>
          , Vol.
          <volume>181</volume>
          , pp.
          <fpage>1989</fpage>
          -
          <lpage>2001</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>S. O.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          (
          <year>2009</year>
          ),
          <article-title>Pattern structures for analyzing complex data</article-title>
          . In H. Sakai et al. (Eds.).
          <source>Proceedings of the 12th international conference on rough sets, fuzzy sets, data mining and granular computing (RSFDGrC'09). Lecture Notes in Artificial Intelligence (Springer)</source>
          , Vol.
          <volume>5908</volume>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>S. O.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          (
          <year>2013</year>
          ),
          <article-title>Scalable Knowledge Discovery in Complex Data with Pattern Structures</article-title>
          . In: P.
          <string-name>
            <surname>Maji</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
            ,
            <given-names>M.N.</given-names>
          </string-name>
          <string-name>
            <surname>Murty</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
            ,
            <given-names>S.K.</given-names>
          </string-name>
          <string-name>
            <surname>Pal</surname>
          </string-name>
          , (Eds.).
          <source>Proc. 5th International Conference Pattern Recognition and Machine Intelligence (PReMI'2013). Lecture Notes in Computer Science</source>
          (Springer), Vol.
          <volume>8251</volume>
          , pp.
          <fpage>30</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>L.</given-names>
            <surname>Lumpe</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          (
          <year>2015</year>
          ),
          <source>A Note on Pattern Structures and Their Projections. Formal Concept Analysis. Lecture Notes in Artificial Intelligence (Springer)</source>
          , Vol.
          <volume>9113</volume>
          , pp
          <fpage>145</fpage>
          -
          <lpage>150</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. L.
          <string-name>
            <surname>Lumpe</surname>
            ,
            <given-names>S. E.</given-names>
          </string-name>
          <string-name>
            <surname>Schmidt</surname>
          </string-name>
          (
          <year>2016</year>
          ),
          <source>Morphisms Between Pattern Structures and Their Impact on Concept Lattices, FCA4AI@ ECAI</source>
          <year>2016</year>
          , pp
          <fpage>25</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. L.
          <string-name>
            <surname>Lumpe</surname>
            ,
            <given-names>S. E.</given-names>
          </string-name>
          <string-name>
            <surname>Schmidt</surname>
          </string-name>
          (
          <year>2015</year>
          ),
          <article-title>Pattern Structures and Their Morphisms</article-title>
          .
          <source>CLA</source>
          <year>2015</year>
          , pp
          <fpage>171</fpage>
          -
          <lpage>179</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. L.
          <string-name>
            <surname>Lumpe</surname>
            ,
            <given-names>S. E.</given-names>
          </string-name>
          <string-name>
            <surname>Schmidt</surname>
          </string-name>
          (
          <year>2016</year>
          ),
          <article-title>Viewing Morphisms Between Pattern Structures via Their Concept Lattices</article-title>
          and via Their Representations,
          <source>International Symposium on Methodologies for Intelligent Systems</source>
          <year>2017</year>
          , pp.
          <fpage>597</fpage>
          -
          <lpage>608</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Jun</surname>
          </string-name>
          (
          <year>2009</year>
          ),
          <article-title>A simple and fast algorithm for K-medoids clustering</article-title>
          .
          <source>Expert systems with applications 36</source>
          (
          <issue>2</issue>
          ), pp.
          <fpage>3336</fpage>
          -
          <lpage>3341</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>