<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Reduct Calculation and Discretization of Numeric Attributes in Sparse Decision Systems</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Wojciech</forename><surname>Swieboda</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Mathematics</orgName>
								<orgName type="institution">The University of Warsaw</orgName>
								<address>
									<addrLine>Banacha 2</addrLine>
									<postCode>02-097</postCode>
									<settlement>Warsaw</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Hung</forename><forename type="middle">Son</forename><surname>Nguyen</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Mathematics</orgName>
								<orgName type="institution">The University of Warsaw</orgName>
								<address>
									<addrLine>Banacha 2</addrLine>
									<postCode>02-097</postCode>
									<settlement>Warsaw</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Reduct Calculation and Discretization of Numeric Attributes in Sparse Decision Systems</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">2E8A6DCC2C241F1F5B3D1D8FD840B530</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T18:18+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper we discuss three problems in Data Mining Sparse Decision Systems: the problem of short reduct calculation, discretization of numerical attributes and rule induction. We present algorithms that provide approximate solutions to these problems and analyze the complexity of these algorithms.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In the paper we discuss algorithms for Data Mining <ref type="bibr" target="#b2">[3]</ref> Sparse Decision Tables. We first review basic notions of Information Systems, Decision Systems and Rough Set Theory <ref type="bibr" target="#b8">[9]</ref>. We introduce a convenient representation for sparse decision tables and finally discuss algorithms for short reduct calculation, discretization and rule induction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Rough Set Preliminaries</head><p>An information system is a pair I = (U, A) where U denotes the universe of objects and A is the set of attributes. An attribute a ∈ A is a mapping a : U → V a . The co-domain V a of attribute a is often also called the value set of attribute a.</p><p>A decision system is a pair D = (U, A∪{dec}) which is an information system with a distinguished attribute dec : U → {1, . . . , d} called a decision attribute. Attributes in A are called conditions or conditional attributes and may be either nominal or numeric (i.e. with V a ⊆ R).</p><p>Throughout this paper n will denote the number of objects in a decision system and k will denote the number of conditional attributes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Sparse Data Sets and Decision Systems</head><p>In many situations a convenient way to represent the data set is in terms of Entity-Attribute-Value (EAV) Model <ref type="bibr" target="#b10">[11]</ref>, which encodes observations in terms of triples. For an information system I = (U, A), the set of triples is {(u, a, v) : a(u) = v}. This representation is especially handy for information systems with numerous attributes, missing or default values. Instances with missing and default values are not included in EAV representation, which results in compression of the data set. In this paper we are only dealing with default values. Their interpretation/semantics is the same as of any other attribute. In practice we store triples corresponding to numeric attributes and to </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Problems for Sparse Decision Systems</head><p>In our paper we address the following problems for Sparse Decision Systems:</p><p>1. Finding a short reduct or a superreduct <ref type="bibr" target="#b0">[1]</ref>.</p><p>A reduct is a subset of attributes R ⊆ A which guarantees discernibility of objects belonging to different decision classes. 2. Discretization of numerical attributes <ref type="bibr" target="#b5">[6]</ref>.</p><p>Discretization of a decision system is determining a set of cuts on numerical attributes so that the induced partitions (i.e. intervals between cutpoints) guarantee discernibility of objects belonging to different decision classes. 3. Generating set of rules or dynamic rules <ref type="bibr" target="#b0">[1]</ref>.  </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>a 1 a 2 a 3</head><label>3</label><figDesc>Decision x 1 (−∞, +∞) (1.25, +∞) (−∞, 1.2] F x 2 (−∞, +∞) (−∞, 1.1] (−∞, 1.2] F x 3 (−∞, +∞) (1.25, +∞) (−∞, 1.2] F x 4 (−∞, +∞) (1.1, 1.25] (1.2, +∞) F x 5 (−∞, +∞) (1.25, +∞) (1.2, +∞) F x 6 (−∞, +∞) (1.25, +∞) (1.2, +∞) T x 7 (−∞, +∞) (−∞, 1.1] (1.2, +∞)T</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>A typical decision system with symbolic attributes represented as a table. Attributes Diploma, Experience, French and Reference are conditions, whereas Decision is the decision attribute. All conditional attributes in this decision system are nominal</figDesc><table><row><cell cols="5">Diploma Experience French Reference Decision</cell></row><row><cell cols="5">x 1 MBA Medium Yes Excellent Accept</cell></row><row><cell>x 2 MBA</cell><cell>Low</cell><cell cols="3">Yes Neutral Reject</cell></row><row><cell>x 3 MCE</cell><cell>Low</cell><cell>Yes</cell><cell>Good</cell><cell>Reject</cell></row><row><cell>x 4 MSc</cell><cell>High</cell><cell cols="3">Yes Neutral Accept</cell></row><row><cell>x 5 MSc</cell><cell cols="4">Medium Yes Neutral Reject</cell></row><row><cell>x 6 MSc</cell><cell>High</cell><cell cols="3">Yes Excellent Accept</cell></row><row><cell>x 7 MBA</cell><cell>High</cell><cell>No</cell><cell>Good</cell><cell>Accept</cell></row><row><cell>x 8 MCE</cell><cell>Low</cell><cell cols="3">No Excellent Reject</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>A decision system in which all conditional attributes are numeric</figDesc><table><row><cell cols="2">a 1 a 2 a 3 Decision</cell></row><row><cell>x 1 0 1.3 0</cell><cell>F</cell></row><row><cell>x 2 3.3 0.9 0</cell><cell>F</cell></row><row><cell>x 3 0 1.5 0</cell><cell>F</cell></row><row><cell>x 4 0 1.2 2.5</cell><cell>F</cell></row><row><cell>x 5 0 1.3 3.6</cell><cell>F</cell></row><row><cell>x 6 3.7 2.7 2.4</cell><cell>T</cell></row><row><cell>x 7 4.1 1.0 2.8</cell><cell>T</cell></row><row><cell cols="2">symbolic attributes in two separate tables, and store decisions (which we assume are</cell></row><row><cell>never missing) of objects in a separate vector.</cell><cell></cell></row><row><cell cols="2">Another related representation, more general then EAV model, is Subject-Predicate-</cell></row><row><cell cols="2">Object (SPO), and is used e.g. in Resource Description Framework (RDF) Model and</cell></row><row><cell>implemented in several Triplestore databases.</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 .</head><label>3</label><figDesc>EAV representation of decision system in table1. The default values (omitted in this representation) for consecutive attributes are 'MBA', 'Low', 'Yes' and 'Excellent'</figDesc><table><row><cell cols="3">Entity Attribute Value</cell><cell></cell></row><row><cell>x 1</cell><cell>a 2</cell><cell>Medium</cell><cell></cell></row><row><cell>x 2</cell><cell>a 4</cell><cell>Neutral</cell><cell></cell></row><row><cell>x 3</cell><cell>a 1</cell><cell>MCE</cell><cell></cell></row><row><cell>x 3 x 4 x 4 x 4 x 5 x 5 x 5 x 6 x 6 x 7</cell><cell>a 4 a 1 a 2 a 4 a 1 a 2 a 4 a 1 a 2 a 2</cell><cell>Good MSc High Neutral MSc Medium Neutral MSc High High</cell><cell>Entity Decision x 1 Accept x 2 Reject x 3 Reject x 4 Accept x 5 Reject x 6 Accept x 7 Accept x 8 Reject</cell></row><row><cell>x 7</cell><cell>a 3</cell><cell>No</cell><cell></cell></row><row><cell>x 7</cell><cell>a 4</cell><cell>Good</cell><cell></cell></row><row><cell>x 8</cell><cell>a 1</cell><cell>MCE</cell><cell></cell></row><row><cell>x 8</cell><cell>a 3</cell><cell>No</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 .</head><label>4</label><figDesc>EAV representation of decision system in table 2. The default value (omitted in this representation) for each attribute is 0</figDesc><table><row><cell cols="3">Entity Attribute Value</cell><cell></cell></row><row><cell>x 1</cell><cell>a 2</cell><cell>1.3</cell><cell></cell></row><row><cell>x 2</cell><cell>a 1</cell><cell>3.3</cell><cell></cell></row><row><cell>x 2 x 3 x 4 x 4 x 5 x 5 x 6 x 6 x 6</cell><cell>a 2 a 2 a 2 a 3 a 2 a 3 a 1 a 2 a 3</cell><cell>0.9 1.5 1.2 2.5 1.3 3.6 3.7 2.7 2.4</cell><cell>Entity Decision x 1 T x 2 T x 3 T x 4 T x 5 T x 6 T x 7 T</cell></row><row><cell>x 7</cell><cell>a 1</cell><cell>4.1</cell><cell></cell></row><row><cell>x 7</cell><cell>a 2</cell><cell>1.0</cell><cell></cell></row><row><cell>x 7</cell><cell>a 3</cell><cell>2.8</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 .</head><label>5</label><figDesc>A discretized version of the decision system presented in table 2.</figDesc><table /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Rough set algorithms in classification problem</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">G</forename><surname>Bazan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">S</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">H</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Synak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wróblewski</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2000">2000</date>
			<biblScope unit="page" from="49" to="88" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Pattern Classification</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">O</forename><surname>Duda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">E</forename><surname>Hart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">G</forename><surname>Stork</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2001">2001</date>
			<publisher>Wiley</publisher>
			<pubPlace>New York</pubPlace>
		</imprint>
	</monogr>
	<note>2. edn</note>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Principles of Data Mining</title>
		<author>
			<persName><forename type="first">D</forename><surname>Hand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mannila</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Smyth</surname></persName>
		</author>
		<ptr target="http://mitpress.mit.edu/026208290X" />
		<imprint>
			<date type="published" when="2001">2001</date>
			<publisher>MIT Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Hastie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Tibshirani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Friedman</surname></persName>
		</author>
		<title level="m">The elements of statistical learning: data mining, inference, and prediction: with 200 full-color illustrations</title>
				<meeting><address><addrLine>New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Rough sets: A tutorial</title>
		<author>
			<persName><forename type="first">J</forename><surname>Komorowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Pawlak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Polkowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Skowron</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Discretization problem for rough sets methods</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">S</forename><surname>Nguyen</surname></persName>
		</author>
		<idno type="DOI">10.1007/3-540-69115-4\_75</idno>
		<ptr target="http://dx.doi.org/10.1007/3-540-69115-4\_75" />
		<editor>Polkowski and Skowron</editor>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="545" to="552" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Approximate boolean reasoning: Foundations and applications in data mining</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">S</forename><surname>Nguyen</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Rough sets</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Pawlak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Information and Computer Sciences</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="341" to="356" />
			<date type="published" when="1982">1982</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Rough Sets. Theoretical Aspects of Reasoning about Data</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Pawlak</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1991">1991</date>
			<publisher>Formerly Kluwer Academic Publishers</publisher>
			<pubPlace>Boston, Dordrecht, London</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Rough Sets and Current Trends in Computing</title>
	</analytic>
	<monogr>
		<title level="m">First International Conference, RSCTC&apos;98</title>
		<title level="s">Proceedings, Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">L</forename><surname>Polkowski</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Skowron</surname></persName>
		</editor>
		<meeting><address><addrLine>Warsaw, Poland</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="1998">June 22-26, 1998. 1998</date>
			<biblScope unit="volume">1424</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">A chartless record -is it adequate?</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">W</forename><surname>Stead</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">E</forename><surname>Hammond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Straube</surname></persName>
		</author>
		<idno type="DOI">10.1007/BF00995117,10.1007/BF00995117</idno>
		<ptr target="http://dx.doi.org/10.1007/BF00995117,10.1007/BF00995117" />
	</analytic>
	<monogr>
		<title level="j">Journal of Medical Systems</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="103" to="109" />
			<date type="published" when="1983">1983</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
