<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Learning Model Rules from High-Speed Data Streams</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Ezilda</forename><surname>Almeida</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Carlos</forename><surname>Ferreira</surname></persName>
						</author>
						<author>
							<persName><forename type="first">João</forename><surname>Gama</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Liaad -Inesc</forename><surname>Porto</surname></persName>
						</author>
						<title level="a" type="main">Learning Model Rules from High-Speed Data Streams</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">CDFFEC499E87AFBE1B806A884E50A344</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T07:17+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Data Streams</term>
					<term>Regression</term>
					<term>Rule Learning</term>
					<term>Change Detection</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Decision rules are one of the most expressive languages for machine learning. In this paper we present Adaptive Model Rules (AMRules), the first streaming rule learning algorithm for regression problems. In AMRules the antecedent of a rule is a conjunction of conditions on the attribute values, and the consequent is a linear combination of attribute values. Each rule in AMRules uses a Page-Hinkley test to detect changes in the process generating data and react to changes by pruning the rule set. In the experimental section we report the results of AMRules on benchmark regression problems, and compare the performance of our algorithm with other streaming regression algorithms.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Regression analysis is a technique for estimating a functional relationship between a dependent variable and a set of independent variables. It has been widely studied in statistics, machine learning and data mining. Predicting numeric values usually involves complicated regression formulae. Model trees <ref type="bibr" target="#b13">[14]</ref> and regression rules <ref type="bibr" target="#b14">[15]</ref> are the most powerful data mining models. Trees and rules do automatic feature selection, being robust to outliers and irrelevant features; exhibit high degree of interpretability; and structural invariance to monotonic transformation of the independent variables. One important aspect of rules is modularity: each rule can be interpreted per si <ref type="bibr" target="#b5">[6]</ref>.</p><p>In the data stream computational model <ref type="bibr" target="#b6">[7]</ref> examples are generated sequentially from time evolving distributions. Learning from data streams require incremental learning, using limited computational resources, and the ability to adapt to changes in the process generating data. In this paper we present Adaptive Model Rules, the first one-pass algorithm for learning regression rule sets from time-evolving streams. AMRules can learn ordered and unordered rules. The antecedent of a rule is a set of literals (conditions based on the attribute values). The consequent of a rule is a function that minimizes the mean square error of the target attribute computed from the set of examples covered by rule. This function might be either a constant, the mean of the target attribute, or a linear combination of the attributes. Each rule is equipped with an online change detector. It monitors the mean square error using the Page-Hinkley test, providing information about the dynamics of the process generating data.</p><p>The paper is organized has follows. The next Section presents the related work in learning regression trees and rules from data focusing on streaming algorithms. Section 3 describe in detail the AMRules algorithm. Section 4 presents the experimental evaluation using stationary and time-evolving streams. AMRules is compared against other regression systems. Last Section presents the lessons learned.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>In this section we analyze the related work in two dimensions. One dimension is related to regression algorithms, the other dimension is related to incremental learning of regression algorithms.</p><p>In regression domains, <ref type="bibr" target="#b13">[14]</ref> presented the system M5. It builds multivariate trees using linear models at the leaves. In the pruning phase for each leaf a linear model is built. Later, <ref type="bibr" target="#b4">[5]</ref> have presented M5 a rational reconstruction of Quinlan's M5 algorithm. M5 first constructs a regression tree by recursively splitting the instance space using tests on single attributes that maximally reduce variance in the target variable. After the tree has been grown, a linear multiple regression model is built for every inner node, using the data associated with that node and all the attributes that participate in tests in the subtree rooted at that node. Then the linear regression models are simplified by dropping attributes if this results in a lower expected error on future data (more specifically, if the decrease in the number of parameters outweighs the increase in the observed training error). After this has been done, every subtree is considered for pruning. Pruning occurs if the estimated error for the linear model at the root of a subtree is smaller or equal to the expected error for the subtree. After pruning terminates, M5 applies a smoothing process that combines the model at a leaf with the models on the path to the root to form the final model that is placed at the leaf.</p><p>Cubist <ref type="bibr" target="#b14">[15]</ref> is a rule based model that is an extension of Quinlan's M5 model tree. A tree is grown where the terminal leaves contain linear regression models. These models are based on the predictors used in previous splits. Also, there are intermediate linear models at each step of the tree. A prediction is made using the linear regression model at the terminal node of the tree, but is smoothed by taking into account the prediction from the linear model in the previous node of the tree (which also occurs recursively up the tree). The tree is reduced to a set of rules, which initially are paths from the top of the tree to the bottom. Rules are eliminated via pruning and/or combined for simplification.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Streaming Regression Algorithms</head><p>Many methods can be found in the literature for solving classification tasks on streams, but only a few exist for regression tasks. To the best of our knowledge, we note only two papers for online learning of regression and model trees. In the algorithm of <ref type="bibr" target="#b12">[13]</ref> for incremental learning of linear model trees the splitting decision is formulated as hypothesis testing. The split least likely to occur under the null hypothesis of non-splitting is considered the best one. The linear models are computed using the RLS (Recursive Least Square) algorithm that has a complexity, which is quadratic in the dimensionality of the problem. This complexity is then multiplied with a user-defined number of possible splits per numerical attribute for which a separate pair of linear models is updated with each training example and evaluated. The Fast Incremental Model Tree (FIMT) proposed in <ref type="bibr" target="#b9">[10]</ref>, is an incremental algorithm for any-time model trees learning from evolving data streams with drift detection. It is based on the Hoeffding tree algorithm, but implements a different splitting criterion, using a standard deviation reduction (SDR) based measure more appropriate to regression problems. The FIMT algorithm is able to incrementally induce model trees by processing each example only once, in the order of their arrival. Splitting decisions are made using only a small sample of the data stream observed at each node, following the idea of Hoeffding trees. Another data streaming issue addressed in <ref type="bibr" target="#b9">[10]</ref> is the problem of concept drift. Data streaming models capable of dealing with concept drift face two main challenges: how to detect when concept drift has occurred and how to adapt to the change. Change detection in the FIMT is carried out using the Page-Hinkley change detection test <ref type="bibr" target="#b10">[11]</ref>. Adaptation in FIMT involves growing an alternate subtree from the node in which change was detected.</p><p>IBLStreams (Instance Based Learner on Streams) is an extension of MOA that consists in an instance-based learning algorithm for classification and regression problems on data streams by <ref type="bibr" target="#b0">[1]</ref>; IBLStreams optimizes the composition and size of the case base autonomously. On arrival of a new example (x 0 , y 0 ), this example is first added to the case base. Moreover, it is checked whether other examples might be removed, either since they have become redundant or since they are outliers. To this end, a set C of examples within a neighborhood of x 0 are considered as candidates. This neighborhood if given by the k c nearest neighbors of x 0 , determined according a distance measure ∆, and the candidate set C consists of the examples within that neighborhood. The most recent examples are excluded from removal due to the difficulty to distinguish potentially noisy data from the beginning of a concept change. Even though unexpected observations should be removed only in the former but not in the latter case. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">The AMRules Algorithm</head><p>The problem of learning model rules from data streams raises several issues. First, the dataset is no longer finite and available prior to learning, it is impossible to store all data in memory and learn from them as a whole. Second, multiple sequencial scans over the training data are not allowed. An algorithm must therefore collect the relevant information at the speed it arrives and incrementally decide about splitting decisions. Third the training dataset may consist of data from different distributions. In this section we present an incremental algorithm for learning model rules to address these issues, named Adaptive Model Rules from High-Speed Data Streams (AMRules).The pseudo code of the algorithm is given in Algorithm 1.</p><p>The algorithm begins with an empty rule set (RS), and a default rule {} → L, where L is initialized to NULL. L is a data structure used to store the sufficient statistics required to expand a rule and for prediction. Every time a new training example is available the algorithm proceeds with checking The set of rules is learned in parallel, as described in Algorithm 1. We consider two cases: learning ordered or unordered set of rules. In the former case, every example updates statistics of the first rule that covers it. In the latter every example updates statistics of all the rules that covers it. If an example is not covered by any rule, the default rule is updated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Expansion of a Rule</head><p>Before discussing how rules are expanded, we will first discuss the evaluation measure used in the attribute selection process. <ref type="bibr" target="#b9">[10]</ref> describe a standard deviation reduction measure (SDR) for determining the merit of a given split. It can be efficiently computed in an incremental way. Given a leaf where a sample of the dataset S of size N has been observed, a hypothetical binary split h A over attribute A would divide the examples in S in two disjoint subsets S L and S R , with sizes N L and N R respectively. The formula for SDR measure of the split h A is given below:</p><formula xml:id="formula_0">SDR(h A ) = sd(S) − N L N sd(S L ) − N R N sd(S R ) sd(S) = 1 N ( N i=1 (y i − ȳ) 2 ) = = 1 N ( N i=1 yi 2 − 1 N ( N i=1 yi) 2 )</formula><p>To make the actual decision regarding a split, the SDR measured for the best two potential splits are compared, by dividing the second-best value by the best one to generate a ratio r in the range 0 to 1. Having a predefined range for the values of the random variables, the Hoeffding probability bound ( ) <ref type="bibr" target="#b16">[17]</ref> can be used to obtain high confidence intervals for the true average of the sequence of random variables. The value of is calculated using the formula:</p><formula xml:id="formula_1">= R 2 ln (1/δ) 2n</formula><p>The process to expand a rule by adding a new condition works as follows. For each attribute X i , the value of the SDR is computed for each attribute value v j . If the upper bound (r + = r + ) of the sample average is below 1 then the true mean is also below 1. Therefore with confidence 1− the best attribute over a portion of the data is really the best attribute. In this case, the rule is expanded with condition X a ≤ v j or X a &gt; v j . However, often two splits are extremely similar or even identical, in terms of their SDR values, and despite the intervals shrinking considerably as more examples are seen, it is still impossible to choose one split over the other. In these cases, a threshold (τ ) on the error is used. If falls below this threshold and the splitting criterion is still not met, the split is made on the best split with a higher SDR value and the rule is expanded.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Prediction Strategies</head><p>The set of rules learned by AMRules can be ordered or unordered. They employ different prediction strategies to achieve optimal prediction. In the former, only the first rule that cover an example is used to predict the target example. In the latter, all rules covering the example are used for prediction and the final prediction is decided by using weighted vote.</p><p>Each rule in AMrules implements 3 prediction strategies: i) the mean of the target attribute computed from the examples covered by the rule; ii) a linear combination of the independent attributes; iii) an adaptive strategy, that chooses between the first two strategies, the one with lower MSE in the previous examples.</p><p>Each rule in AMRules contains a linear model, trained using an incremental gradient descent method, from the examples covered by the rule. Initially, the weights are set to small random numbers in the range -1 to 1. When a new example arrives, the output is computed using the current weights. Each weight is then updated using the Delta rule: w i ← w i + η(ŷ − y)x i , where ŷ is the output, y the real value and η is the learning rate.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Change Detection</head><p>The AMRules uses the Page-Hinkley (PH) test <ref type="bibr" target="#b11">[12]</ref> to monitor the error and signals a drift when a significant increase of this variable is observed. The PH test is a sequential analysis technique typically used for monitoring change detection in signal processing. The PH test is designed to detect a change in the average of a Gaussian signal <ref type="bibr" target="#b10">[11]</ref>. This test considers a cumulative variable m T , defined as the accumulated difference between the observed error and the mean of the error till the current moment:</p><formula xml:id="formula_2">m T = T t=1 (e t − ēT − α)</formula><p>where ēT = 1/T t t=1 e t and α corresponds to the magnitude of changes that are allowed.</p><p>The minimum value of this variable is also computed: M T = min(m t , t = 1 . . . T ). As a final step, the test monitors the difference between M T and m T : P H T = m T − M T . When this difference is greater than a given threshold (λ) we signal a change in the distribution. The threshold λ depends on the admissible false alarm rate. Increasing λ will entail fewer false alarms, but might miss or delay change detection.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experimental Evaluation</head><p>The main goal of this experimental evaluation is to study the behavior of the proposed algorithm in terms of mean absolut error (MAE) and root mean squared error (RMSE). We are interested in studying the following scenarios:</p><p>-How to grow the rule set? </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Experimental Setup</head><p>All our algorithms were implemented in java using Massive Online Analysis (MOA) data stream software suite <ref type="bibr" target="#b1">[2]</ref>. For all the experiments, we set the input parameters of AMRules to: N min = 200, τ = 0.05 and δ = 0.01. The parameters for the Page-Hinkley test are λ = 50 and α = 0.005. Table <ref type="table" target="#tab_0">1</ref> summarizes information about the datasets used and reports the learning rate used in the perceptron learning. All of the results in the tables 2, 3 and 4 are averaged of ten-fold cross-validation <ref type="bibr" target="#b15">[16]</ref>. The accuracy is measured using the following metrics: Mean absolute error (MAE) and root mean squared error (RMSE) <ref type="bibr" target="#b18">[19]</ref>. We used two evaluation methods. When no concept drift is assumed, the evaluation method we employ uses the traditional train and test scenario. All algorithms learn from the same training set and the error is estimated from the same test sets. In scenarios with concept drift, we use the prequential (predictive sequential) error estimate <ref type="bibr" target="#b7">[8]</ref>. This evaluation method evaluates a model sequentially. When an example is available, the current regression model makes a prediction and the loss is computed. After the prediction the regression model is updated with that example.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Datasets</head><p>The experimental datasets include both artificial and real data, as well sets with continuous attributes. We use ten regression datasets from the UCI Machine Learning Repository <ref type="bibr" target="#b2">[3]</ref> and other sources. The datasets used in our experimental work are:</p><p>2dplanes this is an artificial data set described in <ref type="bibr" target="#b3">[4]</ref>. Airlerons this data set addresses a control problem, namely flying a F16 aircraft. Puma8NH and Puma32H is a family of datasets synthetically generated from a realistic simulation of the dynamics of a Unimation Puma 560 robot arm. Pol this is a commercial application described in <ref type="bibr" target="#b17">[18]</ref>.The data describes a tele communication problem. Elevators this data set is also obtained from the task of controlling a F16 aircraft.</p><p>Fried is an artificial data set used in Friedman (1991) and also described in <ref type="bibr">Breiman (1996,p.139</ref>). Bank8FM a family of datasets synthetically generated from a simulation of how bank-customers choose their banks. Kin8nm this dataset is concerned with the forward kinematics of an 8 link robot arm. Airline this dataset using the data from the Data Expo competition (2009). The dataset consists of a large amount of records, containing flight arrival and departure details for all the commercial flights within the USA, from October 1987 to April 2008. This is a large dataset with nearly 120 million records (11.5 GB memory size) <ref type="bibr" target="#b9">[10]</ref>. Table <ref type="table" target="#tab_0">1</ref> summarizes the number of instances and the number of attributes of each dataset. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Experimental Results</head><p>In this section, we empirically evaluate the AMRules. The results are described in four parts. In the first part we compare the AMRules variants, the second part we compare AMRules against other streaming algorithms and the third part compare AMRules against other state-of-the-art regression algorithms. The last part presents the analysis of AMRules behavior in the context of time-evolving data streams.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Comparison between AMRules Variants</head><p>In this section we focus on two strategies that we found potentially interesting. It is a combination of expanding only one rule, the rule that first triggered, with predicting strategy uses only the first rule that covers test examples. Obviously, for this approach it is necessary to use ordered rules (AM Rules o ). The second setting employs unordered rule set, where all the covering rules expand and the corresponding prediction strategy uses a weighted sum of all rules that cover test examples (AM Rules u ).</p><p>Ordered rule sets specializes one rule at a time, and as a result it often produces less rules than the unordered strategy. Ordered rules need to consider the previous rules and remaining combinations, which might not be easy to interpret in more complex sets. Unordered rule sets are more modular, because they can be interpreted alone.</p><p>Table <ref type="table">2</ref> summarize the mean absolute error and the root mean squared error of these variants. Overall, the experimental results points out the unordered rule sets are more competitive than ordered rule sets in terms of MAE and RMSE. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Comparison with other Streaming Algorithms</head><p>We compare the performance of our algorithm with three other streaming algorithms, FIMT and IBLStreams. FIMT is an incremental algorithm for learning model trees, addressed in <ref type="bibr" target="#b9">[10]</ref>. IBLStreams is an extension of MOA that consists in an instance-based learning algorithm for classification and regression problems on data streams by <ref type="bibr" target="#b0">[1]</ref>.</p><p>The performance measures for these algorithms are given in Table <ref type="table" target="#tab_4">3</ref>. The comparison of these streaming algorithms shows that AMRules get better results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Comparison with State-of-the-art Regression Algorithms</head><p>Another experiment which involves adaptive model rules is shown in Table <ref type="table" target="#tab_5">4</ref>. We compare AMRules with other nonincremental regression algorithms available in WEKA <ref type="bibr" target="#b8">[9]</ref>. All these experiments using algorithms are performed using WEKA. We use the standard method of ten-fold crossvalidation, using the same folds for all the algorithms included.</p><p>The comparison of these algorithms show that AMRules is very competitive in terms of (MAE, RMSE) than all the other methods, except M5Rules. AMRules is faster than all the other algorithms considered in this study. These results were somewhat expected, since these datasets are relatively small for the incremental algorithm. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Evaluation in Time-Evolving Data streams</head><p>In this subsection we first study the evolution of the error measurements (MAE and RMSE) and evaluate the change detection method. After, we evaluate the streaming algorithms on non-stationary streaming real-world problem, we use the Airline dataset from the DataExpo09 competition.</p><p>To simulate drift we use Fried dataset. The simulations allow us to control the relevant parameters and to evaluate the drift detection. Figure <ref type="figure" target="#fig_2">1</ref> and Figure <ref type="figure" target="#fig_0">2</ref> depict the MAE and RMSE curves of the streaming algorithms using the dataset Fried. These figures also illustrate the point of drift and the points where the change was detected. Only two of the algorithms -FIMT and AMRules -were able to detect a change. Table <ref type="table" target="#tab_3">5</ref> report the average results over ten experiments varying the seed of the Fried dataset. We measure the number of nodes for FIMT, the number of rules AMrules and the the delay (in terms of number of examples) in detection the drift. The delay gives indication of how fast the algorithm will be able to start the adaptation strategy. These two algorithms obtained similar results. The general conclusions are that FIMT and AMRules algorithms are robust and have better results than IBLStreams. Figures <ref type="figure" target="#fig_5">3 and 4</ref> show the evaluation of the MAE and the RMSE of the streaming algorithms on nonstationary real-world problem. FIMT and AMRules obtain approximately similar behavior in terms of MAD and MSE. Both exhibit somewhat better performance than IBLStreams, but not significantly different.  FIMT IBLStreams 2dplanes 1.16E+00 (0.01) 8.00E-01 (0.00) 1.03E+00 (0.00) 1.52E+00 (0.01) 1.00E+00 (0.00) 1.30E+00 (0.00) Airlerons 1.00E-04 (0.00) 1.90E-04 (0.00) 3.20E-04 (0.00) 1.70E-04 (0.00) 1.00E-09 (0.00) 3.00E-04 (0.00) Puma8NH 2.66E+00 (0.01) 3.26E+00 (0.03) 3.27E+00 (0.01) 4.28E+00 (0.03) 12.0E+00 (0.63) 3.84E+00 (0.02) Puma32H 1.20E-02 (0.00) 7.90E-03 (0.00) 2.20E-02 (0.00) 1.00E-04 (0.01) 1.20E-02 (0.00) 2.70E-02 (0.00) Pol 15.6E+00 (3.70) 38.2E+00 (0.17) 29.7E+00 (0.55) 23.3E+00 (4.08) 1,75E+03 (1383) 50,7E+00 (0.71) Elevators 1.90E-03 (0.00) 3.50E-03 (0.00) 5.00E-03 (0.00) 2.20E-03 (0.00) 3.00E-05 (0.00) 6.20E-03 (0.00) Fried 1.13E+00 (0.01) 1.72E+00 (0.00) 2.10E+00 (0.00) 1.67E+00 (0.25) 4.79E+00 (0.01) 2.21E+00 (0.00) Bank8FM 4.30E-02 (0.00) 3.30E-02 (0.00) 7.70E-02 (0.00) 4.30E-02 (0.00) 2.20E-03 (0.00) 9.60E-02 (0.00) Kin8nm 1.60E-01 (0.00) 1.60E-01 (0.00) 9.50E-01 (0.00) 2.00E-01 (0.00) 2.10E-01 (0.00) 1.20E-01 (0.00) </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>M5Rules</head><p>MLPerceptron LinRegression 2dplanes 1.16E+00 (0.01) 8.00E-01 (0.01) 8.70E-01 (0.01) 1.91E+00 (0.00) 1.52E+00 (0.01) 9.8E-01 (0.01) 1.09E+00 (0.01) 2.37E+00 (0.00) Airlerons 1.00E-04 (0.00) 1.00E-04 (0.00) 1.40E-04 (0.00) 1.10E-04 (0.00) 1.70E-04 (0.00) 2.00E-04 (0.00) 1.71E-04 (0.00) 2.00E-04 (0.00) Puma8NH 3.26E+00 (0.03) 2.46E+00 (0.00) 3.34E+00 (0.17 Fried 1.13E+00 (0.01) 1.25E+00 (0.00) 1.35E+00 (0.03) 2.03E+00 (0.00) 1.67E+00 (0.25) 1.60E+00 (0.00) 1.69E+00 (0.04) 2.62E+00 (0.00) Bank8FM 4.30E-02 (0.00) 2.20E-02 (0.00) 2.60E-02 (0.00) 2.90E-02 (0.00) 4.30E-02 (0.00) 3.10E-02 (0.00) 3.40E-02 (0.00) 3.80E-02 (0.00) Kin8nm 1.60E-01 (0.00) 1.30E-01 (0.00) 1.30E-01 (0.00) 1.60E-01 (0.00) 2.00E-01 (0.00) 1.70E-01 (0.00) 1.60E-01 (0.00) 2.00E-01 (0.00) Fig. <ref type="figure" target="#fig_0">2</ref>. Root mean squared error of streaming algorithms using the dataset Fried.  the best of our knowledge, in the literature there is no other method that addresses this issue.</p><p>AMRules learns ordered and unordered rule sets. The experimental results point out that unordered rule sets, in comparison to ordered rule sets, are more competitive in terms of error metrics (MAE and RMSE). AMRules achieves better results than the others algorithms even for medium sized datasets. The AMRule algorithm is equipped with explicit change detection mechanisms that signals change points during the learning process. This information is relevant to understand the dynamics of evolving streams.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Algorithm 2 :</head><label>2</label><figDesc>Expandrule: Expanding one Rule Input: r: One Rule τ : Constant to solve ties δ : Confidence Result: r : Expanded Rule begin Let Xa be the attribute with greater SDR Let X b be the attribute with second greater SDR Compute = R 2 ln(1/δ) 2n (Hoeffding bound) Compute r = SDR(X b ) SDR(Xa) (Ratio of the SDR values for the best two splits) Compute U pperBound = r + if U pperBound &lt; 1 ∨ &lt; τ then Extend r with a new condition based on the best attribute Xa ≤ vj or Xa &gt; vj Release sufficient statistics of Lr r ← r ∪ {Xa ≤ vjorXa &gt; vj} return r whether for each rule from rule set (RS) the example is covered by any rule, that is if all the literals are true for the example. The target values of the examples covered by a rule are used to update the sufficient statistic of the rule (L). To detect changes we propose to use the Page-Hinkley (PH) change detection test. If a change is detected the rule is removed from the rule set. Otherwise, the rule might be expanded. The expansion of the rule is considered only after certain minimum number of examples (N min ). The expansion of a rule is explained in Algorithm 2.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Table 2 .</head><label>2</label><figDesc>Results of ten-fold cross-validation for AMRules algorithms Mean absolut error (variance) Root mean squared error (variance) Datasets AMRules o AMRules u AMRules o AMRules u 2dplanes 1.23E+00 (0.01) 1.16E+00 (0.01) 1.67E+00 (0.02) 1.52E+00 (0.01) Airlerons 1.10E-04 (0.00) 1.00E-04 (0.00) 1.90E-04 (0.00) 1.70E-04 (0.00) Puma8NH 3.21E+00 (0.04) 3.26E+00 (0.02) 4.14E+00 (0.05) 4.28E+00 (0.03) Puma32H 1.10E-02 (0.00) 1.20E-02 (0.00) 1.60E-02 (0.00) 1.20E-02 (0.00) Pol 14.0E+00 (25.1) 15.6E+00 (3.70) 23.0E00 (44.50) 23.3E00 (4.08) Elevators 3.50E-03 (0.00) 1.90E-03 (0.00) 4.80E-03 (0.00) 2.20E-03 (0.00) Fried 2.08E+00 (0.01) 1.13E+00 (0.01) 2.78E+00 (0.08) 1.67E+00 (0.25) Bank8FM 4.31E-02 (0.00) 4.30E-02 (0.00) 4.80E-02 (0.00) 4.30E-02 (0.00) Kin8nm 1.60E-01 (0.00) 1.50E-01 (0.00) 2.10E-01 (0.00) 2.00E-01 (0.00)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Mean absolut error of streaming algorithms using the dataset Fried.</figDesc><graphic coords="5,315.00,443.94,252.00,126.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head></head><label></label><figDesc>) 3.64E+00 (0.01) 4.28E+00 (0.03) 3.19E+00 (0.01) 4.14E+00 (0.20) 4.45E+00 (0.01) Puma32H 1.20E-02 (0.00) 6.80E-03 (0.00) 2.30E-02 (0.00) 2.00E-02 (0.00) 1.20E-02 (0.00) 8.60E-03 (0.00) 3.10E-02 (0.00) 2.60E-02 (0.00) Pol 15.6E+00 (3.70) 2.79E+00 (0.05) 14.7E+00 (5.53) 26.5E+00 (0.21) 23.3E+00 (4.08) 6.56E+00 (0.45) 20.1E+00 (15.1) 30.5E+00 (0.16) Elevators 1.90E-03 (0.00) 1.70E-03 (0.00) 2.10E-03 (0.00) 2.00E-03 (0.00) 2.20E-03 (0.00) 2.23E-03 (0.00) 2.23E-03 (0.00) 2.29E-03 (0.00)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Mean absolut error of streaming algorithms using the dataset Airlines.</figDesc><graphic coords="6,54.00,536.37,252.00,126.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Fig. 4 .</head><label>4</label><figDesc>Fig. 4. Root mean squared error of streaming algorithms using the dataset Airlines.</figDesc><graphic coords="6,315.00,343.51,252.00,126.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Algorithm 1 :</head><label>1</label><figDesc>AMRules Algorithm</figDesc><table /><note>Input: S: Stream of examples ordered-set: boolean flag Nmin: Minimum number of examples λ: Constant to solve ties α: the magnitude of changes that are allowed j: rule index Result: RS Set of Decision Rules begin Let RS ← {} Let defaultRule {} → (L ← N U LL) foreach example (xi, yi) do foreach Rule r ∈ RSj do if r covers the example then Let ŷi be the prediction of the rule r, computed using Lr Compute error =(ŷi − yi) 2 Call PHTest(error, α, λ) if Change is detected then Remove the rule else Update sufficient statistics of r Update Perceptron of r if Number of examples in Lr ≥ Nmin then r ← ExpandRule(r) if ordered-set then BREAK if none of the rules in RS triggers then Update sufficient statistics of the default rule Update Perceptron of the default rule if Number of examples in L ≥ Nmin then RS ← RS ∪ ExpandRule(def aultRule)</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>•</head><label></label><figDesc>Update only the first rule that covers training examples. In this case the rule set is ordered, and the corresponding prediction strategy uses only the first rule that covers test examples. • Update all the rules that covers training examples.</figDesc><table><row><cell>In this case the rule set is unordered, and the cor-</cell></row><row><cell>responding prediction strategy uses a weighted sum</cell></row><row><cell>of all rules that covers test examples.</cell></row><row><cell>-How does AMRules compares against other streaming</cell></row><row><cell>algorithms?</cell></row><row><cell>-How does AMRules compares against other state-of-the-</cell></row><row><cell>art regression algorithms?</cell></row><row><cell>-How does AMRules learned models evolve in time-</cell></row><row><cell>changing streams?</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 1 .</head><label>1</label><figDesc>Summary of datasets</figDesc><table><row><cell cols="4">Datasets # Instances # Attributes Learning rate</cell></row><row><cell>2dplanes</cell><cell>40768</cell><cell>11</cell><cell>0.01</cell></row><row><cell>Airlerons</cell><cell>13750</cell><cell>41</cell><cell>0.01</cell></row><row><cell>Puma8NH</cell><cell>8192</cell><cell>9</cell><cell>0.01</cell></row><row><cell>Puma32H</cell><cell>8192</cell><cell>32</cell><cell>0.01</cell></row><row><cell>Pol</cell><cell>15000</cell><cell>49</cell><cell>0.001</cell></row><row><cell>Elevators</cell><cell>8752</cell><cell>19</cell><cell>0.001</cell></row><row><cell>Fried</cell><cell>40769</cell><cell>11</cell><cell>0.01</cell></row><row><cell>Bank8FM</cell><cell>8192</cell><cell>9</cell><cell>0.01</cell></row><row><cell>Kin8nm</cell><cell>8192</cell><cell>9</cell><cell>0.01</cell></row><row><cell cols="2">Airline 115Million</cell><cell>11</cell><cell>0.01</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 5 .</head><label>5</label><figDesc>Average results from the evaluation of change detection over ten experiments.</figDesc><table><row><cell cols="2">Algorithms Delay</cell><cell>Size</cell></row><row><cell cols="3">AMRules 1484 56 (nr. Rules)</cell></row><row><cell>FIMT</cell><cell cols="2">2096 290 (nr. Leaves)</cell></row><row><cell>IBLStreams</cell><cell>-</cell><cell>-</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 3 .</head><label>3</label><figDesc>Results of ten-fold cross-validation for Streaming Algorithms</figDesc><table><row><cell></cell><cell cols="3">Mean absolut error (variance)</cell><cell>Root mean squared error (variance)</cell></row><row><cell>Datasets</cell><cell>AMRules u</cell><cell>FIMT</cell><cell>IBLStreams</cell><cell>AMRules u</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 4 .</head><label>4</label><figDesc>Results of ten-fold cross-validation for AMRules u and others Regression Algorithms</figDesc><table><row><cell></cell><cell></cell><cell cols="2">Mean absolute error (variance)</cell><cell>Root mean squared error (variance)</cell></row><row><cell>Datasets</cell><cell>MRules u</cell><cell>M5Rules</cell><cell>MLPerceptron LinRegression</cell><cell>MRules u</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments:</head><p>The authors acknowledge the financial support given by the projects FCT-KDUS (PTDC/EIA/098355/2008); FCOMP -01-0124-FEDER-010053, the ERDF through the COMPETE Programme and by Portuguese National Funds through FCT within the project FCOMP -01-0124-FEDER-022701.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Iblstreams: a system for instance-based classification and regression on data streams</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ammar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Eyke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Evolving Systems</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="235" to="249" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Moa: Massive online analysis</title>
		<author>
			<persName><forename type="first">A</forename><surname>Bifet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Holmes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Pfahringer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kranen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kremer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Jansen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Seidl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">JMLR)</title>
				<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="1601" to="1604" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Uci repository of machine learning databases</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">E</forename><surname>Blake</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Merz</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Breiman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Friedman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Olshen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Stone</surname></persName>
		</author>
		<title level="m">Classification and Regression Trees</title>
				<meeting><address><addrLine>Monterey, CA</addrLine></address></meeting>
		<imprint>
			<publisher>Wadsworth and Brooks</publisher>
			<date type="published" when="1984">1984</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Using model trees for classification</title>
		<author>
			<persName><forename type="first">E</forename><surname>Frank</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Inglis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Holmes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">H</forename><surname>Witten</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="63" to="76" />
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Foundations of Rule Learning</title>
		<author>
			<persName><forename type="first">J</forename><surname>Frnkranz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gamberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Lavra</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2012">2012</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Knowledge Discovery from Data Streams</title>
		<author>
			<persName><forename type="first">J</forename><surname>Gama</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2010">2010</date>
			<publisher>CRC Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Issues in evaluation of stream learning algorithms</title>
		<author>
			<persName><forename type="first">J</forename><surname>Gama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sebasti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">P</forename><surname>Rodrigues</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD &apos;09</title>
				<meeting>the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD &apos;09<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="329" to="338" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">The weka data mining software: an update</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Frank</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Holmes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Pfahringer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Reutemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">H</forename><surname>Witten</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGKDD Explor</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="10" to="18" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Learning model trees from evolving data streams</title>
		<author>
			<persName><forename type="first">E</forename><surname>Ikonomovska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dzeroski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Data Min. Knowl. Discov</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="128" to="168" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Test f pagehinckley, an approach for fault detection in an agro-alimentary production system</title>
		<author>
			<persName><forename type="first">H</forename><surname>Mouss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mouss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Mouss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Sefouhi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Asian Control Conference</title>
				<meeting>the Asian Control Conference</meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="815" to="818" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Continuous inspection schemes</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">S</forename><surname>Page</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Biometrika</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="100" to="115" />
			<date type="published" when="1954">1954</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Incremental learning of linear model trees</title>
		<author>
			<persName><forename type="first">D</forename><surname>Potts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sammut</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">61</biblScope>
			<biblScope unit="issue">1-3</biblScope>
			<biblScope unit="page" from="5" to="48" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Learning with continuous classes</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Quinlan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Australian Joint Conference for Artificial Intelligence</title>
				<imprint>
			<publisher>World Scientific</publisher>
			<date type="published" when="1992">1992</date>
			<biblScope unit="page" from="343" to="348" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Combining instance-based and model-based learning</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Quinlan</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1993">1993</date>
			<publisher>Morgan Kaufmann</publisher>
			<biblScope unit="page" from="236" to="243" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">A study of cross-validation and bootstrap for accuracy estimation and model selection</title>
		<author>
			<persName><forename type="first">K</forename><surname>Ron</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1995">1995</date>
			<biblScope unit="page" from="1137" to="1143" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Probability inequalities for sums of bounded random variables</title>
		<author>
			<persName><forename type="first">H</forename><surname>Wassily</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the American Statistical Association</title>
		<imprint>
			<biblScope unit="volume">58</biblScope>
			<biblScope unit="issue">301</biblScope>
			<biblScope unit="page" from="13" to="30" />
			<date type="published" when="1963">1963</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Rule-based machine learning methods for functional prediction</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Indurkhya</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Artificial Intelligence Research</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="383" to="403" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Advantages of the mean absolute error (mae) over the mean square error (rmse) in assessing average model performance</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Willmott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Matsuura</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Climate Research</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="79" to="82" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
