<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Reduct Calculation and Discretization of Numeric Attributes in Sparse Decision Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wojciech Swieboda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hung Son Nguyen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Mathematics, The University of Warsaw</institution>
          ,
          <addr-line>Banacha 2, 02-097, Warsaw</addr-line>
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <fpage>207</fpage>
      <lpage>210</lpage>
      <abstract>
        <p>In this paper we discuss three problems in Data Mining Sparse Decision Systems: the problem of short reduct calculation, discretization of numerical attributes and rule induction. We present algorithms that provide approximate solutions to these problems and analyze the complexity of these algorithms. In the paper we discuss algorithms for Data Mining [3] Sparse Decision Tables. We first review basic notions of Information Systems, Decision Systems and Rough Set Theory [9]. We introduce a convenient representation for sparse decision tables and finally discuss algorithms for short reduct calculation, discretization and rule induction.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Introduction
An information system is a pair I = (U, A) where U denotes the universe of objects and
A is the set of attributes. An attribute a ∈ A is a mapping a : U → Va. The co-domain
Va of attribute a is often also called the value set of attribute a.</p>
      <p>A decision system is a pair D = (U, A ∪ {dec}) which is an information system with
a distinguished attribute dec : U → {1, . . . , d} called a decision attribute. Attributes in
A are called conditions or conditional attributes and may be either nominal or numeric
(i.e. with Va ⊆ R).</p>
      <p>Throughout this paper n will denote the number of objects in a decision system and
k will denote the number of conditional attributes.</p>
      <p>F
F
F
F
F
T
T
symbolic attributes in two separate tables, and store decisions (which we assume are
never missing) of objects in a separate vector.</p>
      <p>Another related representation, more general then EAV model, is
Subject-PredicateObject (SPO), and is used e.g. in Resource Description Framework (RDF) Model and
implemented in several Triplestore databases.
4</p>
      <p>
        Problems for Sparse Decision Systems
In our paper we address the following problems for Sparse Decision Systems:
1. Finding a short reduct or a superreduct [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        A reduct is a subset of attributes R ⊆ A which guarantees discernibility of objects
belonging to different decision classes.
2. Discretization of numerical attributes [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Discretization of a decision system is determining a set of cuts on numerical
attributes so that the induced partitions (i.e. intervals between cutpoints) guarantee
discernibility of objects belonging to different decision classes.
3. Generating set of rules or dynamic rules [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
a2
F
F
F
F
F
T
T
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bazan</surname>
            ,
            <given-names>J.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>H.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>S.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Synak</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wróblewski</surname>
          </string-name>
          , J.: Rough set algorithms in classification problem pp.
          <fpage>49</fpage>
          -
          <lpage>88</lpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Duda</surname>
            ,
            <given-names>R.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hart</surname>
            ,
            <given-names>P.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stork</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          :
          <article-title>Pattern Classification</article-title>
          . Wiley, New York, 2. edn. (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hand</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannila</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smyth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Principles of Data Mining</article-title>
          . MIT Press (
          <year>2001</year>
          ), http: //mitpress.mit.edu/026208290X
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hastie</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tibshirani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedman</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          :
          <article-title>The elements of statistical learning: data mining, inference, and prediction: with 200 full-color illustrations</article-title>
          . New York: Springer-Verlag (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Komorowski</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pawlak</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polkowski</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skowron</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Rough sets: A tutorial (</article-title>
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>H.S.:</given-names>
          </string-name>
          <article-title>Discretization problem for rough sets methods</article-title>
          .
          <source>In: Polkowski and Skowron [10]</source>
          , pp.
          <fpage>545</fpage>
          -
          <lpage>552</lpage>
          , http://dx.doi.org/10.1007/3-540-69115-4\_
          <fpage>75</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>H.S.:</given-names>
          </string-name>
          <article-title>Approximate boolean reasoning: Foundations and applications in data mining (</article-title>
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Pawlak</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Rough sets</article-title>
          .
          <source>International Journal of Information and Computer Sciences</source>
          <volume>11</volume>
          (
          <issue>5</issue>
          ),
          <fpage>341</fpage>
          -
          <lpage>356</lpage>
          (
          <year>1982</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Pawlak</surname>
            ,
            <given-names>Z.: Rough</given-names>
          </string-name>
          <string-name>
            <surname>Sets</surname>
          </string-name>
          .
          <source>Theoretical Aspects of Reasoning about Data</source>
          . Springer, Formerly Kluwer Academic Publishers, Boston, Dordrecht, London (
          <year>1991</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Polkowski</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skowron</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . (eds.):
          <article-title>Rough Sets and Current Trends in Computing</article-title>
          , First International Conference, RSCTC'
          <fpage>98</fpage>
          , Warsaw, Poland, June 22-26,
          <year>1998</year>
          ,
          <source>Proceedings, Lecture Notes in Computer Science</source>
          , vol.
          <volume>1424</volume>
          . Springer (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Stead</surname>
            ,
            <given-names>W.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hammond</surname>
            ,
            <given-names>W.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Straube</surname>
            ,
            <given-names>M.J.:</given-names>
          </string-name>
          <article-title>A chartless record - is it adequate?</article-title>
          <source>Journal of Medical Systems</source>
          <volume>7</volume>
          ,
          <fpage>103</fpage>
          -
          <lpage>109</lpage>
          (
          <year>1983</year>
          ), http://dx.doi.org/10.1007/BF00995117,
          <fpage>10</fpage>
          .1007/BF00995117
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>