-

Rough Set Methods and Submodular Functions

Hung Son Nguyen

Wojciech Swieboda

0 0 Institute of Mathematics, The University of Warsaw , Banacha 2, 02-097, Warsaw Poland

In this article we discuss the connection of Rough Set methods and submodular functions. We show that discernibility measure used in reduct calculation is submodular and provides a bridge between certain methods from Rough Set theory and Submodular Function theory. In this article, we aim to highlight connections between Rough Set Theory and Submodular Function Theory. Rough Set problems (such as nding reducts, inference of decision rules, discretizing numeric attributes) are all based on approximating indiscernibility relation (in the product space U U , where U is a universe of objects). In this paper we only focus on the problem of nding a single, possibly short (decision) reduct, which is one of fundamental problems in Rough Set theory. One of natural measures of \goodness" of approximation induced by a subset of attributes in a decision system is a discern measure which we introduce in the rst subsection. This function is submodular, hence the approximation of indiscernibility relation may be considered within the framework of Submodular Function optimization. We will discuss maximization methods that only utilize three properties of discern measure: submodularity, monotonicity and the ease of computation using lazy evaluations. We will also highlight the potential of applying certain Rough Set Methods to other Submodular Function optimization problems by describing an example computational problem, whose aim is to optimize (using lazy evaluations) a submodular function which takes as the argument a subset of attributes from an information (or decision) system which contains several default values. We point out the fact that optimizations of this kind can be performed for very large datasets using an SQL interface.

Introduction 2.1

An information system is a pair A = (U; A), where the set U denotes the universe of objects and A is the set of attributes, i.e. mappings of the form: a : U ! Va. Va is called the value set of attribute a.

A decision system is an information system D = (U; A [ f g d ) where d is a distinguished decision attribute. The remaining attributes are called conditions or conditional attributes. An example decision system is shown in Table (a).

For a subset of attributes B A we de ne (on U U ) B-indiscernibility relation IN D(B) as follows:

(x; y) 2 IN D(B) () 8a 2 A a(x) = a(y)

IN D(B) is an equivalence relation and hence de nes a partitioning of U into equivalence classes which we denote by [x]B (x 2 U ). The complement of IN D(B) in U U is called discernibility relation, denoted DISC(B). The lower and upper approximations of a concept X (using attributes from B) are de ned by

LB(X) = UB(X) = x 2 U : [x]IND(B)

and x 2 U : [x]IND(B) \ X 6= ? :

In general, reducts are minimal subsets of attributes contain necessary information about all attributes. Below we remind just two de nitions. { A reduct is a minimal set of attributes R A such that IN D(R) IN D(A). { A decision-relative reduct is a minimal set of attributes R A such that IN D(R) IN D(fdecg) [ IN D(A). In other words, it is a minimal subset of attributes which su ces to discern all pairs of objects belonging to di erent decision classes.

We proceed with two de nitions that will come in handy in reduct calculation.

A con ict is a pair of objects belonging to di erent decision classes. We de ne conf licts : 2U ! R+ so that for X U :

1 conf licts(X) = 2 jf(x; y) 2 X

X : dec(x) 6= dec(y)gj We de ne c : 2A ! R+ as follows. For B

A: where the summation is taken over all equivalence classes of partitioning induced by IN D(B). Function c is a natural extension of the de nition of conicts function to subsets of attributes. Subset of attributes B A is a reduct if c(B) = c(A).

For a subset of attributes B A we de ne discern(B) = c(;)

c(B) Let I denote the indicator function. In the formula above:

1 c(;) =

Function discern is closely related to decision reducts { R A is a decision reduct if it a minimal subset of attributes with discern(R) = discern(A). A well known method for calculating short decision reducts is Maximal Discernibility heuristic (or Johnson's heuristic) [ 9 ], [ 2 ]. This method iteratively extends a set of attributes in a greedy fashion, picking in each step the attribute with the largest marginal discern. 2.2

Submodular Functions

Let be a nite set. A set function f : 2 7! R is submodular [ 1 ] if it satis es any of the three following equivalent properties: { For T { For T; S { for T

and x 2 S n T : f (T [ fxg) f (T ) : f (T ) + f (S) f (T [ S) + f (T \ S). and x; y 2 n T : f (T [ fxg) + f (T [ fyg) f (S [ fxg)

f (S). f (T [ fx; yg) + f (T ).

The rst of these formulations can be naturally interpreted as \diminishing returns" property.

Submodular functions naturally arise within the context of various combinatorial optimization problems and Machine Learning problems. Maximization problems involving submodular functions are usually NP-hard (see [ 6 ] and references therein), although several heuristics (with provable bounds) have been proposed in the literature.

A notable characteristic of several of these algorithms is that they may use lazy evaluation, i.e. they may update the value of the function f upon inclusion of an additional element. Another property often stressed for some algorithms is whether they are suited for optimization of arbitrary submodular functions or monotone submodular functions. 2.3

Discernibility and Submodularity

Lemma 1. Let D = (U; A[fdg) be a decision system. Set function discern : 2A 7! R is a monotone increasing submodular function.

Proof. Recall that: discern(B) = Please notice that an unordered pair fx; yg either does not contribute to discern(T ), or is counted in discern(T ) exactly twice.

{ Notice that for T; S

A: discern(T ) =

X I (d(x) 6= d(y) ^ 9a 2 T : a(x) 6= a(y)) x;y2U X I (d(x) 6= d(y) ^ 9a 2 T [ S : a(x) 6= a(y)) x;y2U = discern(T [ S)

Hence discern is monotone. { We will show that discern(T )+discern(S) discern(T [S)+discern(T \S).

Let us rst consider an unordered pair of objects fx; yg which is counted at least once in discern(T [ S) + discern(T \ S). It follows that this pair is discerned by an attribute a 2 T [ S, hence it is counted at least once in discern(T ) + discern(S). If a pair (x; y) is counted twice in discern(T \ S) + discern(T [ S), then it is counted by discern(T \ S), and hence it is counted twice in discern(T ) + discern(S). Therefore discern is submodular.

Please also notice that discern(B) is a function of the partitioning of objects induced by IN D(B). In implementation of algorithms we explicitly keep the partitioning IN D(B) and further subdivide (shatter) it into IN D(B [ f g a ) when needed (i.e. when an algorithm requests the value of discern(B [ fag). This property of certain submodular functions is in fact exploited by several optimization algorithms that use lazy evaluation.

In fact, in order to calculate discern(B), it su ces to know the cardinalities of partitions in partitioning of d induced by IN D(B) (see Figure 2 and the example in the next section). In other words, it su ces to determine the appropriate contingency table (pivot table or cross tabulation). 3

Application of Rough Set methods to Submodular Function Optimization In this section we provide an example method previously applied in Rough Set Theory that can be applied to other submodular functions.

We will focus on submodular functions f with the following two properties: { f depends on an underlying ( xed) decision or information system and the argument of f is a subset of attributes of this decision (information) system. { f (B) is determined by the contingency table of attributes from B [ fdg.

Examples of such functions are previously mentioned discern, Entropy of a partition, and Gini index[ 11 ].

In various data mining applications one faces an optimization problem in which the dataset at hand contains numerous default values. For example, in Data Mining Cup 2009, more than 90% attribute values had the same default value (zero). Another potential area is text mining, where Boolean model of Information Retrieval (and Vector Space Model) usually lead to a representation of a collection of documents which is sparse.

When mining huge data repositories, the data may be stored in a relational database and only accessed through SQL queries. A convenient data representation that can handle optimization of functions mentioned earlier is in terms of EAV (entity-attribute-value) triples, rather than tables, which often leads to data compression. Table (b) shows EAV representation system from Table (a) in which attribute values MSc (a1), High (a2), Yes (a3) and Neutral (a4) are regarded as default values (and hence omitted). We assume that values of the decision attribute are stored in a separate table.

Suppose that an optimization algorithm performs calculation of f (B [ fag). When the data set is represented (compressed) in EAV format, determining the partitioning of objects induced by IN D(B [ f g a ) can be greatly simpli ed if the partitioning of objects induced by IN D(B) is known beforehand. It su ces to update partition identi ers of objects without missing values on attribute a. Figure 1 illustrates this step.

Partitioning of d induced by IN D(B [ fag) su ces to determine the value of f (B [ fag), since the value of f (B [ fag) only depends on the contingency table of attributes from B [ f g

a .

Let us refer to Figure 2 for an illustration (for function discern). Upon determining the contingency table (which counts decision values in each partition), c(fa1; a3g) is the number of con icting pairs within each partition, i.e.: c(fa1; a3g) = 2 1 + 0 1 + 1 1 + 0 1 + 1 0 = 3 Similarly, there are 4 objects with decision A and 4 objects with decision R, hence c(;) = 4 4. Finally, discern(fa1; a3g) = c(;) c(fa1; a3g) = 13.

In [ 13 ] we have demonstrated a SAS implementation of the greedy heuristic (for short decision reduct calculation) working on large and sparse data sets.

Diploma Experience French Reference Decision

Application of Submodular Functions in Rough Set Theory In Table 2 we provide interpretation of several submodular function optimization problems in terms of decision systems. The table follows the outline given in [ 6 ], although we narrow the exposure to maximization problems and to algorithms that solve constrained problems (discern, Entropy and Gini index are all monotone). All problems mentioned in this table have approximate solvers available in an open source package SFO[ 6 ] for Matlab. Furthermore, most algorithms mentioned in the table use lazy evaluations and have provable approximation bounds. We further provide interpretations or potential applications of these problems in terms of decision systems. Partition 1 Partition 2 Partition 3

Partition 1 Partition 2

Partition 4 Partition 3

Partition 5

Partition 1 Partition 2

Partition 4 Partition 3

Partition 5 In this article we presented the connection between Rough Set theory methods and methods from Submodular Function theory. The key (although very simple) observation is that discernibility is a monotone submodular function. We have brie y discussed an example method previously developed in Rough Set theory framework (i.e., lazy evaluation of discernibility when the data set contains numerous default values) and discussed its application to other submodular functions. We have also provided the interpretation or potential applications of several submodular function maximization problems in terms of Rough Set Theory and decision systems.

An example speci c problem which to our knowledge has not been previously addressed is as follows: Find a set of k disjoint decision reducts of a decision system D = (U; A [ fdg). eSPASS algorithm mentioned in [ 5 ] gives an approximate solution to this problem. This work is partially supported by the National Centre for Research and Development (NCBiR) under Grant No. SP/I/1/77065/10 by the Strategic scienti c research and experimental development program: "Interdisciplinary System for Interactive Scienti c and Scienti c-Technical Information".

1. Fujishige , S. Submodular Functions and Optimization, Elsevier, 2005 .

2. Johnson , D., S.

Approximation algorithms for combinatorial problems

In Journal of Computer and System Sciences , 9 : 256 { 278 , 1974 .

3. Krause , A. , Guestrin , C. , Gupta , A. , Kleinberg , J. Near-optimal sensor placements: Maximizing information while minimizing communication cost . In IPSN , 2006 .

4. Krause , A. , Singh , A. , Guestrin , C. Near-optimal sensor placements in Gaussian processes: Theory, e cient algorithms and empirical studies . In JMLR , volume 9 , 2008 .

5. Krause , A. , Rajagopal , R. , Gupta , A. , Guestrin , C. Simultaneous placement and scheduling of sensors . In Information Processing in Sensor Networks , 2009 .

6. Krause , A.

SFO: A Toolbox for Submodular Function Optimization

In Journal of Machine Learning Research , 11 : 1141 { 1144 , 2010 .

7. Leskovec , J. , Krause , A. , Guestrin , C. , Faloutsos , C. , VanBriesen , J., Glance , N. Cost-e ective outbreak detection in networks . In KDD , 2007 .

8. Nemhauser , G. , Wolsey , L. , Fisher , M.

An analysis of the approximations for maximizing submodular set functions

In Mathematical Programming , 14 : 265 { 294 , 1978 .

9. Nguyen , H. S. : Approximate Boolean Reasoning: Foundations and Applications in Data Mining . Transactions on Rough Sets: Volume 5 , 2006 , pages. 334 { 506 ( 2006 )

10. Pawlak , Z. Rough Sets . Theoretical Aspects of Reasoning about Data . Springer, Formerly Kluwer Academic Publishers, Boston, Dordrecht, London, 1991 .

11. Bassem

Sayra

, Dirk Van Gucht,

and Marc

Gyssens . Measures in databases and datamining . Tech. Report TR602 , Indiana University Computer Science, 2004 .

12. Komorowski , J. , Pawlak , Z. , Polkowski , L. , Skowron , A.

Rough

Sets: A Tutorial In Rough Fuzzy Hybridization: A New Trend in Decision-Making , pages 3 { 98. Springer, Heidelberg, 1998 .

13. Swieboda , W. , Nguyen , H. S. Mining large and sparse data sets with Rough Sets . In Proceedings of CS&P , 2010 .