<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Rough Set Methods and Submodular Functions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hung Son Nguyen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wojciech Swieboda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Mathematics, The University of Warsaw</institution>
          ,
          <addr-line>Banacha 2, 02-097, Warsaw</addr-line>
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this article we discuss the connection of Rough Set methods and submodular functions. We show that discernibility measure used in reduct calculation is submodular and provides a bridge between certain methods from Rough Set theory and Submodular Function theory. In this article, we aim to highlight connections between Rough Set Theory and Submodular Function Theory. Rough Set problems (such as nding reducts, inference of decision rules, discretizing numeric attributes) are all based on approximating indiscernibility relation (in the product space U U , where U is a universe of objects). In this paper we only focus on the problem of nding a single, possibly short (decision) reduct, which is one of fundamental problems in Rough Set theory. One of natural measures of \goodness" of approximation induced by a subset of attributes in a decision system is a discern measure which we introduce in the rst subsection. This function is submodular, hence the approximation of indiscernibility relation may be considered within the framework of Submodular Function optimization. We will discuss maximization methods that only utilize three properties of discern measure: submodularity, monotonicity and the ease of computation using lazy evaluations. We will also highlight the potential of applying certain Rough Set Methods to other Submodular Function optimization problems by describing an example computational problem, whose aim is to optimize (using lazy evaluations) a submodular function which takes as the argument a subset of attributes from an information (or decision) system which contains several default values. We point out the fact that optimizations of this kind can be performed for very large datasets using an SQL interface.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Introduction
2.1</p>
      <p>An information system is a pair A = (U; A), where the set U denotes the
universe of objects and A is the set of attributes, i.e. mappings of the form:
a : U ! Va. Va is called the value set of attribute a.</p>
      <p>A decision system is an information system D = (U; A [ f g
d ) where d is a
distinguished decision attribute. The remaining attributes are called conditions
or conditional attributes. An example decision system is shown in Table (a).</p>
      <p>For a subset of attributes B A we de ne (on U U ) B-indiscernibility
relation IN D(B) as follows:</p>
      <p>(x; y) 2 IN D(B) () 8a 2 A a(x) = a(y)</p>
      <p>IN D(B) is an equivalence relation and hence de nes a partitioning of U
into equivalence classes which we denote by [x]B (x 2 U ). The complement of
IN D(B) in U U is called discernibility relation, denoted DISC(B). The lower
and upper approximations of a concept X (using attributes from B) are de ned
by</p>
      <p>LB(X) =
UB(X) =
x 2 U : [x]IND(B)</p>
      <p>X</p>
      <p>and
x 2 U : [x]IND(B) \ X 6= ? :</p>
      <p>In general, reducts are minimal subsets of attributes contain necessary
information about all attributes. Below we remind just two de nitions.
{ A reduct is a minimal set of attributes R A such that IN D(R) IN D(A).
{ A decision-relative reduct is a minimal set of attributes R A such that
IN D(R) IN D(fdecg) [ IN D(A). In other words, it is a minimal subset of
attributes which su ces to discern all pairs of objects belonging to di erent
decision classes.</p>
      <p>We proceed with two de nitions that will come in handy in reduct calculation.</p>
      <p>A con ict is a pair of objects belonging to di erent decision classes. We de ne
conf licts : 2U ! R+ so that for X U :</p>
      <p>1
conf licts(X) = 2 jf(x; y) 2 X</p>
      <p>X : dec(x) 6= dec(y)gj
We de ne c : 2A ! R+ as follows. For B</p>
      <p>A:
where the summation is taken over all equivalence classes of partitioning
induced by IN D(B). Function c is a natural extension of the de nition of
conicts function to subsets of attributes. Subset of attributes B A is a reduct if
c(B) = c(A).</p>
      <p>For a subset of attributes B A we de ne
discern(B) = c(;)</p>
      <p>c(B)
Let I denote the indicator function. In the formula above:</p>
      <p>1
c(;) =</p>
      <p>2</p>
      <p>
        Function discern is closely related to decision reducts { R A is a decision
reduct if it a minimal subset of attributes with discern(R) = discern(A). A well
known method for calculating short decision reducts is Maximal Discernibility
heuristic (or Johnson's heuristic) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This method iteratively extends a set of
attributes in a greedy fashion, picking in each step the attribute with the largest
marginal discern.
2.2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Submodular Functions</title>
      <p>
        Let be a nite set. A set function f : 2 7! R is submodular [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] if it satis es
any of the three following equivalent properties:
{ For T
{ For T; S
{ for T
      </p>
      <p>S</p>
      <p>and x 2 S n T : f (T [ fxg) f (T )
: f (T ) + f (S) f (T [ S) + f (T \ S).
and x; y 2 n T : f (T [ fxg) + f (T [ fyg)
f (S [ fxg)</p>
      <p>f (S).
f (T [ fx; yg) + f (T ).</p>
      <p>The rst of these formulations can be naturally interpreted as \diminishing
returns" property.</p>
      <p>
        Submodular functions naturally arise within the context of various
combinatorial optimization problems and Machine Learning problems. Maximization
problems involving submodular functions are usually NP-hard (see [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and
references therein), although several heuristics (with provable bounds) have been
proposed in the literature.
      </p>
      <p>A notable characteristic of several of these algorithms is that they may use
lazy evaluation, i.e. they may update the value of the function f upon inclusion
of an additional element. Another property often stressed for some algorithms is
whether they are suited for optimization of arbitrary submodular functions or
monotone submodular functions.
2.3</p>
    </sec>
    <sec id="sec-3">
      <title>Discernibility and Submodularity</title>
      <p>Lemma 1. Let D = (U; A[fdg) be a decision system. Set function discern : 2A 7! R
is a monotone increasing submodular function.</p>
      <p>Proof. Recall that:
discern(B) =
Please notice that an unordered pair fx; yg either does not contribute to discern(T ),
or is counted in discern(T ) exactly twice.</p>
      <p>{ Notice that for T; S</p>
      <p>A:
discern(T ) =</p>
      <p>X I (d(x) 6= d(y) ^ 9a 2 T : a(x) 6= a(y))
x;y2U
X I (d(x) 6= d(y) ^ 9a 2 T [ S : a(x) 6= a(y))
x;y2U
= discern(T [ S)</p>
      <p>Hence discern is monotone.
{ We will show that discern(T )+discern(S) discern(T [S)+discern(T \S).</p>
      <p>Let us rst consider an unordered pair of objects fx; yg which is counted at
least once in discern(T [ S) + discern(T \ S). It follows that this pair is
discerned by an attribute a 2 T [ S, hence it is counted at least once in
discern(T ) + discern(S). If a pair (x; y) is counted twice in discern(T \ S) +
discern(T [ S), then it is counted by discern(T \ S), and hence it is counted
twice in discern(T ) + discern(S). Therefore discern is submodular.</p>
      <p>Please also notice that discern(B) is a function of the partitioning of objects
induced by IN D(B). In implementation of algorithms we explicitly keep the
partitioning IN D(B) and further subdivide (shatter) it into IN D(B [ f g
a )
when needed (i.e. when an algorithm requests the value of discern(B [ fag).
This property of certain submodular functions is in fact exploited by several
optimization algorithms that use lazy evaluation.</p>
      <p>In fact, in order to calculate discern(B), it su ces to know the cardinalities of
partitions in partitioning of d induced by IN D(B) (see Figure 2 and the example
in the next section). In other words, it su ces to determine the appropriate
contingency table (pivot table or cross tabulation).
3</p>
      <p>Application of Rough Set methods to Submodular
Function Optimization
In this section we provide an example method previously applied in Rough Set
Theory that can be applied to other submodular functions.</p>
      <p>We will focus on submodular functions f with the following two properties:
{ f depends on an underlying ( xed) decision or information system and the
argument of f is a subset of attributes of this decision (information) system.
{ f (B) is determined by the contingency table of attributes from B [ fdg.</p>
      <p>
        Examples of such functions are previously mentioned discern, Entropy of a
partition, and Gini index[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>In various data mining applications one faces an optimization problem in
which the dataset at hand contains numerous default values. For example, in
Data Mining Cup 2009, more than 90% attribute values had the same default
value (zero). Another potential area is text mining, where Boolean model of
Information Retrieval (and Vector Space Model) usually lead to a representation
of a collection of documents which is sparse.</p>
      <p>When mining huge data repositories, the data may be stored in a relational
database and only accessed through SQL queries. A convenient data
representation that can handle optimization of functions mentioned earlier is in terms
of EAV (entity-attribute-value) triples, rather than tables, which often leads to
data compression. Table (b) shows EAV representation system from Table (a)
in which attribute values MSc (a1), High (a2), Yes (a3) and Neutral (a4) are
regarded as default values (and hence omitted). We assume that values of the
decision attribute are stored in a separate table.</p>
      <p>Suppose that an optimization algorithm performs calculation of f (B [ fag).
When the data set is represented (compressed) in EAV format, determining the
partitioning of objects induced by IN D(B [ f g
a ) can be greatly simpli ed if
the partitioning of objects induced by IN D(B) is known beforehand. It su ces
to update partition identi ers of objects without missing values on attribute a.
Figure 1 illustrates this step.</p>
      <p>Partitioning of d induced by IN D(B [ fag) su ces to determine the value
of f (B [ fag), since the value of f (B [ fag) only depends on the contingency
table of attributes from B [ f g</p>
      <p>a .</p>
      <p>Let us refer to Figure 2 for an illustration (for function discern). Upon
determining the contingency table (which counts decision values in each partition),
c(fa1; a3g) is the number of con icting pairs within each partition, i.e.:
c(fa1; a3g) = 2 1 + 0 1 + 1 1 + 0 1 + 1 0 = 3
Similarly, there are 4 objects with decision A and 4 objects with decision R,
hence c(;) = 4 4. Finally, discern(fa1; a3g) = c(;) c(fa1; a3g) = 13.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] we have demonstrated a SAS implementation of the greedy heuristic
(for short decision reduct calculation) working on large and sparse data sets.
      </p>
      <p>Diploma Experience French Reference Decision</p>
      <p>
        Application of Submodular Functions in Rough Set
Theory
In Table 2 we provide interpretation of several submodular function optimization
problems in terms of decision systems. The table follows the outline given in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
although we narrow the exposure to maximization problems and to algorithms
that solve constrained problems (discern, Entropy and Gini index are all
monotone). All problems mentioned in this table have approximate solvers available
in an open source package SFO[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for Matlab. Furthermore, most algorithms
mentioned in the table use lazy evaluations and have provable approximation
bounds. We further provide interpretations or potential applications of these
problems in terms of decision systems.
Partition 1
Partition 2
Partition 3
      </p>
      <p>Partition 1
Partition 2</p>
      <p>Partition 4
Partition 3</p>
      <p>Partition 5</p>
      <p>Partition 1
Partition 2</p>
      <p>Partition 4
Partition 3</p>
      <p>Partition 5
In this article we presented the connection between Rough Set theory methods
and methods from Submodular Function theory. The key (although very
simple) observation is that discernibility is a monotone submodular function. We
have brie y discussed an example method previously developed in Rough Set
theory framework (i.e., lazy evaluation of discernibility when the data set
contains numerous default values) and discussed its application to other submodular
functions. We have also provided the interpretation or potential applications of
several submodular function maximization problems in terms of Rough Set
Theory and decision systems.</p>
      <p>
        An example speci c problem which to our knowledge has not been previously
addressed is as follows: Find a set of k disjoint decision reducts of a decision
system D = (U; A [ fdg). eSPASS algorithm mentioned in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] gives an approximate
solution to this problem.
This work is partially supported by the National Centre for Research and
Development (NCBiR) under Grant No. SP/I/1/77065/10 by the Strategic scienti c
research and experimental development program: "Interdisciplinary System for
Interactive Scienti c and Scienti c-Technical Information".
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Fujishige</surname>
            ,
            <given-names>S. Submodular</given-names>
          </string-name>
          <string-name>
            <surname>Functions</surname>
          </string-name>
          and Optimization, Elsevier,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Johnson</surname>
            , D.,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Approximation algorithms for combinatorial problems</article-title>
          <source>In Journal of Computer and System Sciences</source>
          ,
          <volume>9</volume>
          :
          <fpage>256</fpage>
          {
          <fpage>278</fpage>
          ,
          <year>1974</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Krause</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guestrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kleinberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Near-optimal sensor placements: Maximizing information while minimizing communication cost</article-title>
          .
          <source>In IPSN</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Krause</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guestrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Near-optimal sensor placements in Gaussian processes: Theory, e cient algorithms and empirical studies</article-title>
          .
          <source>In JMLR</source>
          , volume
          <volume>9</volume>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Krause</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rajagopal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guestrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Simultaneous placement and scheduling of sensors</article-title>
          .
          <source>In Information Processing in Sensor Networks</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Krause</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>SFO: A Toolbox for Submodular Function Optimization</article-title>
          <source>In Journal of Machine Learning Research</source>
          ,
          <volume>11</volume>
          :
          <fpage>1141</fpage>
          {
          <fpage>1144</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Leskovec</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krause</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guestrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faloutsos</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>VanBriesen</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Glance</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <article-title>Cost-e ective outbreak detection in networks</article-title>
          .
          <source>In KDD</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Nemhauser</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolsey</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fisher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>An analysis of the approximations for maximizing submodular set functions</article-title>
          <source>In Mathematical Programming</source>
          ,
          <volume>14</volume>
          :
          <fpage>265</fpage>
          {
          <fpage>294</fpage>
          ,
          <year>1978</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>H. S.</given-names>
          </string-name>
          :
          <article-title>Approximate Boolean Reasoning: Foundations and Applications in Data Mining</article-title>
          .
          <source>Transactions on Rough Sets: Volume</source>
          <volume>5</volume>
          ,
          <year>2006</year>
          , pages.
          <volume>334</volume>
          {
          <issue>506</issue>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Pawlak</surname>
            ,
            <given-names>Z. Rough</given-names>
          </string-name>
          <string-name>
            <surname>Sets</surname>
          </string-name>
          .
          <source>Theoretical Aspects of Reasoning about Data</source>
          . Springer, Formerly Kluwer Academic Publishers, Boston, Dordrecht, London,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Bassem</surname>
            <given-names>Sayra</given-names>
          </string-name>
          , Dirk Van Gucht,
          <string-name>
            <given-names>and Marc</given-names>
            <surname>Gyssens</surname>
          </string-name>
          .
          <article-title>Measures in databases and datamining</article-title>
          .
          <source>Tech. Report TR602</source>
          , Indiana University Computer Science,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Komorowski</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pawlak</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polkowski</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skowron</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rough</surname>
          </string-name>
          <article-title>Sets: A Tutorial In Rough Fuzzy Hybridization: A New Trend in Decision-Making</article-title>
          , pages
          <fpage>3</fpage>
          <lpage>{</lpage>
          98. Springer, Heidelberg,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Swieboda</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>H. S.</given-names>
          </string-name>
          <article-title>Mining large and sparse data sets with Rough Sets</article-title>
          .
          <source>In Proceedings of CS&amp;P</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>