-

Reduct Calculation and Discretization of Numeric Attributes in Sparse Decision Systems

Wojciech Swieboda

Hung Son Nguyen

0 0 Institute of Mathematics, The University of Warsaw , Banacha 2, 02-097, Warsaw Poland

207 210

In this paper we discuss three problems in Data Mining Sparse Decision Systems: the problem of short reduct calculation, discretization of numerical attributes and rule induction. We present algorithms that provide approximate solutions to these problems and analyze the complexity of these algorithms. In the paper we discuss algorithms for Data Mining [3] Sparse Decision Tables. We first review basic notions of Information Systems, Decision Systems and Rough Set Theory [9]. We introduce a convenient representation for sparse decision tables and finally discuss algorithms for short reduct calculation, discretization and rule induction.

Introduction An information system is a pair I = (U, A) where U denotes the universe of objects and A is the set of attributes. An attribute a ∈ A is a mapping a : U → Va. The co-domain Va of attribute a is often also called the value set of attribute a.

A decision system is a pair D = (U, A ∪ {dec}) which is an information system with a distinguished attribute dec : U → {1, . . . , d} called a decision attribute. Attributes in A are called conditions or conditional attributes and may be either nominal or numeric (i.e. with Va ⊆ R).

Throughout this paper n will denote the number of objects in a decision system and k will denote the number of conditional attributes.

F F F F F T T symbolic attributes in two separate tables, and store decisions (which we assume are never missing) of objects in a separate vector.

Another related representation, more general then EAV model, is Subject-PredicateObject (SPO), and is used e.g. in Resource Description Framework (RDF) Model and implemented in several Triplestore databases. 4

Problems for Sparse Decision Systems In our paper we address the following problems for Sparse Decision Systems: 1. Finding a short reduct or a superreduct [ 1 ].

A reduct is a subset of attributes R ⊆ A which guarantees discernibility of objects belonging to different decision classes. 2. Discretization of numerical attributes [ 6 ].

Discretization of a decision system is determining a set of cuts on numerical attributes so that the induced partitions (i.e. intervals between cutpoints) guarantee discernibility of objects belonging to different decision classes. 3. Generating set of rules or dynamic rules [ 1 ]. a2 F F F F F T T

1. Bazan , J.G. , Nguyen , H.S. , Nguyen , S.H. , Synak , P. , Wróblewski , J.: Rough set algorithms in classification problem pp. 49 - 88 ( 2000 )

2. Duda , R.O. , Hart , P.E. , Stork , D.G. : Pattern Classification . Wiley, New York, 2. edn. ( 2001 )

3. Hand , D. , Mannila , H. , Smyth , P. : Principles of Data Mining . MIT Press ( 2001 ), http: //mitpress.mit.edu/026208290X

4. Hastie , T. , Tibshirani , R. , Friedman , J.H. : The elements of statistical learning: data mining, inference, and prediction: with 200 full-color illustrations . New York: Springer-Verlag ( 2001 )

5. Komorowski , J. , Pawlak , Z. , Polkowski , L. , Skowron , A. : Rough sets: A tutorial ( 1998 )

6. Nguyen , H.S.: Discretization problem for rough sets methods . In: Polkowski and Skowron [10] , pp. 545 - 552 , http://dx.doi.org/10.1007/3-540-69115-4\_ 75

7. Nguyen , H.S.:

Approximate boolean reasoning: Foundations and applications in data mining (

2006 )

8. Pawlak , Z. : Rough sets . International Journal of Information and Computer Sciences 11 ( 5 ), 341 - 356 ( 1982 )

9. Pawlak , Z.: Rough Sets . Theoretical Aspects of Reasoning about Data . Springer, Formerly Kluwer Academic Publishers, Boston, Dordrecht, London ( 1991 )

10. Polkowski , L. , Skowron , A . (eds.): Rough Sets and Current Trends in Computing , First International Conference, RSCTC' 98 , Warsaw, Poland, June 22-26, 1998 , Proceedings, Lecture Notes in Computer Science , vol. 1424 . Springer ( 1998 )

11. Stead , W.W. , Hammond , W.E. , Straube , M.J.:

A chartless record - is it adequate?

Journal of Medical Systems 7 , 103 - 109 ( 1983 ), http://dx.doi.org/10.1007/BF00995117, 10 .1007/BF00995117