<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Classification of Imbalanced Spatial Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mierswa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wurst</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Klinkenberg</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Scholz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Euler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alina Lazar Department of Computer Science and Information Systems Youngstown State University Youngstown</institution>
          ,
          <addr-line>OH 44555</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Bradley A. Shellito Department of Geography Youngstown State University Youngstown</institution>
          ,
          <addr-line>OH 44555</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2005</year>
      </pub-date>
      <abstract>
        <p>This paper describes a method of improving the prediction of urbanization. The four datasets used in this study were extracted using Geographical Information Systems (GIS). Each dataset contains seven independent variables related to urban development and a class label which denotes the urban areas versus the rural areas. Two classification methods Support Vector Machines (SVM) and Neural Networks (NN) were used in previous studies to perform the two-class classification task. Previous results achieved high accuracies but low sensitivity, because of the imbalanced feature of the datasets. There are several ways to deal with imbalanced data, but two sampling methods are compared in this study.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The aim of this paper is to show that class imbalance has a
powerful impact on the performance of binary
classification algorithms. Most machine learning
algorithms provide models with better performances when
trained using balanced training datasets. However, most of
the real-world datasets from various domains like medical
diagnosis, document classification, fraud and intrusion
detection are highly imbalanced towards the positive or the
minority class.</p>
      <p>In general, classification algorithms are designed to
optimize the overall accuracy performance. However, for
imbalanced data, good accuracy does not mean that most
examples from the minority class were correctly classified.
Therefore, additional performance measures like recall,
fmeasure, g-means, AUC should be included when we
study imbalanced problems.</p>
      <p>
        One common approach to solve the imbalance problem is
to sample the data to build an equally distributed training
dataset. Several sampling techniques were proposed and
analyzed in the literature (Van Hulse, Khoshgoftaar, and
Napolitano 2007) including random under-sampling,
random over-sampling and more intelligent sampling
techniques. A second class of methods uses meta-costs and
assigns different penalties for the misclassified instances,
depending on their true class. The problem with this type
of methods is that it is hard to come up with a good penalty
cost. The last type of methods is the algorithmic-based
approach. They tweak the classifier to accommodate
imbalanced datasets. The algorithm-based methods use
meta-learning
        <xref ref-type="bibr" rid="ref5">(Liu, An, and Huang 2006, Zhu 2007)</xref>
        or
online active learning
        <xref ref-type="bibr" rid="ref5">(Ertekin et al. 2007)</xref>
        to build better
classifiers. Different combinations of these methods were
also reported.
      </p>
      <p>
        Real-world imbalanced datasets come from diverse
application areas like medical diagnosis, fraud detection,
intrusion detection, gene profiling, and object detection
from satellite images
        <xref ref-type="bibr" rid="ref8">(Kubat, Holte, and Matwin 1998)</xref>
        .
Our study investigates the effect of two sampling
techniques when applied on four large GIS datasets with an
imbalance ratio between 2.4 and 12.5. The four datasets
contain over a million instances each, therefore there is no
need to use over-sampling. Besides that, over-sampling is
known to introduce excessive noise and ambiguity.
Instead, the sampling methods considered were random
sampling, under-sampling and the Wilson’s editing
algorithm in combination.
      </p>
      <p>
        SVM and NN were used before in various studies to predict
urbanization and land cover with almost similar results, but
different prediction patterns
        <xref ref-type="bibr" rid="ref9">(Lazar and Shellito 2005,
Shellito and Lazar 2005)</xref>
        . Even if SVM itself does not
provide a mechanism to deal with imbalanced data, it can
be easily modified. SVM builds the decision boundary on a
limited number of instances that are close to the boundary,
being unaffected by instances far away from the boundary.
This observation can be used as an active learning selection
strategy that provides a balanced training set for the early
training stages of the SVM algorithm
        <xref ref-type="bibr" rid="ref5">(Ertekin et al. 2007)</xref>
        .
In the Background section we summarize related studies
that deal with the problem of imbalanced datasets. The
section Support Vector Machines and Multi-Layer
Perceptrons presents the methods used, while the section
describing our experiments presets a comparison between
min
w∈Rl ×Rl ,b∈R
τ (w) =
2
w
2
1
2
where yi ( w, xi + b) ≥ 1 for all i = 1 ,…, l.
      </p>
      <p>One challenge is that in practice an ideal separating
hyperplane may not exist due to a large overlap between
input data points from the two classes. In order to make
the algorithm flexible a noise variable εi ≥ 0 for all i =
1,…,l, is introduced in the objective function as follows:
min
w∈Rl ×Rl ,b∈R
τ (w,ε i ) =
w 2 + C ∑li=1ε i
random sampling, under-sampling and Wilson’s editing.
The last section presents the conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        Previous research
        <xref ref-type="bibr" rid="ref3 ref9">(Lazar and Shellito 2005, Pijanowski et
al. 2005, Pijanowski et al. 2002, Pijanowski et al. 2001,
Shellito and Lazar 2005, Shellito and Pijanowski 2003)</xref>
        has
shown that classification methods such as Support Vector
Machines (SVM) and Neural Networks (NN) can be
successfully used to predict patterns of urbanization in
large datasets. SVM and NN can then be used as
predictive tools to determine if grid cells can be accurately
predicted as urban or non-urban cells. The effectiveness of
the predictive capability of the SVM and NN can be
measured through standard accuracy and other measures.
The dataset generated for Mahoning County had over
1,000,000 instances and the imbalanced ratio was
approximately 5:1. Even if the accuracy for both SVM and
NN were over 90%, the recall was quite low 55%.
Lately, several studies dealt with imbalanced datasets and
their effect on classification performance; however none of
the studies included datasets with over a million instances.
Extensive experimental results using several sampling
techniques combined with several classification methods
applied on several datasets were reported by (Van Hulse,
Khoshgoftaar, and Napolitano 2007). The sampling
techniques considered were: random minority
oversampling, random majority oversampling, one-side
selection, Wilson’s editing, SMOTE
        <xref ref-type="bibr" rid="ref1 ref7">(Akbani, Kwek, and
Japkowicz 2004)</xref>
        , borderline SMOTE and cluster-based
oversampling. They concluded that some of the more
complicated sampling techniques especially one-side
selection and cluster-based oversampling exhibit inferior
performance in comparison with some of the simple
sampling techniques.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Support Vector Machines</title>
      <p>The machine learning algorithms named support vector
machines proposed by (Vapnik 1999) consist of two
important steps. Firstly, the dot product of the data points
in the feature space, called the kernel, is computed.
Secondly, a hyperplane learning algorithm is applied to the
kernel.</p>
      <p>Let (xi, yi), i = 1,…,l, be the training set of examples. The
decision yi ∈ {-1, 1} is associated with each input instance
xi ∈ RN for a binary classification task. In order to find a
linear separating hyperplane with good generalization
abilities, for the input data points, the set of hyperplanes
〈w,x〉+b=0 is considered. The optimal hyperplane can be
determined by maximizing the distance between the
hyperplane and the closest input data points. The
hyperplane is the solution of the following problem:
(1)
(2)
(3)
(4)
when yi ( w, xi + b) ≥ 1 − ε i for all i = 1 ,…, l.
By using Lagrange multipliers the previous problem can be
formulated as the following convex maximization problem
(Liu, An, and Huang 2006):
W (α ) = ∑</p>
      <p>il=1α i − 1 ∑il, j=1 yi y jα iα j K ( xi , x j )
when the following conditions are true, 0 ≤ α i ≤ C for
all i = 1 ,…, l, and</p>
      <p>l
∑i=1α i yi = 0 . Here the positive
constant C controls the trade-off between the maximization
of (3) and the training error minimization, ∑εi.</p>
      <p>From the optimal hyperplane equation the decision
function for classification can be generated. For any
unknown instance x the decision will be made based on:
f (x) = sign(∑
l</p>
      <p>y α K (xi , x) + b)
i=1 i i
which geometrically corresponds to the distance of the
unknown instance to the hyperplane.</p>
      <p>The method described until now works well on linear
problems. Function K, the kernel from (4) enables good
results for nonlinear decision problems. The dot product of
the initial input space is called the new higher-dimensional
feature space.</p>
      <p>K : R l × R l → R , K ( xi , x j ) = φ ( xi ),φ ( x j )
(5)
A polynomial kernel, the radial basis and the sigmoid
function are suitable kernels with similar behavior in terms
of the resulting accuracy and they can be tuned by
changing the values of the parameters. There is no good
method to choose the best kernel function. The results
reported in this paper were obtained by using the following
radial basis function (Schölkopf and Smola 2002) as kernel.
K ( xi , x j ) = exp(−
xi − x j
2γ 2</p>
    </sec>
    <sec id="sec-4">
      <title>Multi-layer Perceptron Neural Networks</title>
      <p>
        The multi-layer perceptron (MLP)
        <xref ref-type="bibr" rid="ref4">(Witten and Frank
2000)</xref>
        is a popular technique because of its well-known
ability to perform arbitrary mappings, not only
classifications. Usually built out of three or four layers of
neurons, the input layer, the hidden layers and the output
layer, this network of neurons can be trained to identify
almost any input-output function. The back-propagation
algorithm used for the training process adjusts the synaptic
weighs of the neurons according with the error at the
output. During the first step of the algorithm the predicted
outputs are calculated using the input values and the
network weights. Afterwards, in the backward pass the
partial derivatives of the cost function are propagated back
through the network and the weights are adjusted
accordingly.
      </p>
      <p>The problem with the MLP methods is that they are
susceptible to converge towards local minimums. MLP
methods are considered as “black box”, because it is
impossible to obtain snap-shots of the process.</p>
    </sec>
    <sec id="sec-5">
      <title>Sampling Methods</title>
      <p>Since the datasets considered have over a million instances
we decided to investigate under-sampling (US). This
sampling technique discards random instances from the
majority class until the two classes are equally represented.
The other sampling method used in this study is called
Wilson’s editing (Barandela el al. 2004) (WE). A k-means
nearest neighbor classification procedure is used with k=3
to classify each instance in the training set using all the
remaining instances. Afterwards, all the instances from the
majority class that were misclassified are removed.</p>
    </sec>
    <sec id="sec-6">
      <title>Performance Metrics</title>
      <p>Especially in the case of imbalanced datasets,
classification accuracy alone is not the best metric to
evaluate a classifier. Several other performance metrics
can be used in order to get a more comprehensive picture
of the classifier’s capabilities.</p>
      <p>
        Recall or sensitivity is the metric that measures the
accuracy on the positive instances, It can be defined as
TruePositive / (TruePositive + FalseNegative). Specificity
measures the accuracy on the negative instances and can
be defined as TrueNegative / (TrueNegative +
FalsePositive). Both sensitivity and specificity are
incorporated in the g-means measure
        <xref ref-type="bibr" rid="ref5">(Ertekin et al. 2007)</xref>
        ,
which is defined as square root from sensitivity *
specificity.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Datasets</title>
      <p>
        Seven broad predictor variables, which aid in describing
the distribution of urbanization within the counties, were
constructed using ESRI’s ArcGIS 9.2 software package.
ArcGIS allows for modeling of a vast array of geospatial
techniques, including the cell-by-cell raster models. These
variables were chosen as they reflect large-scale factors
that influence the patterns of urbanization and general
urban trends for the region, as well as being similar to GIS
variables for urban modeling within the Midwest
        <xref ref-type="bibr" rid="ref3 ref9">(Pijanowski et al. 2005, Pijanowski et al. 2002, Pijanowski 2001,
Shellito and Pijanowski 2003)</xref>
        . The variables constructed
were:
a. Distance to City Centers
b. Distance to Highways
c. Distance to Interstates
d. Distance to Railroads
e. Distance to Lakes
f. Distance to Rivers
g. Density of Agriculture
For the county, a series of base layers was compiled to
build the variables. The NLCD (National Land Cover
Database) 2001 data was used for location of urban areas
and as a source of agricultural data. Base layers for
highways, interstates, and railways were drawn from US
Census 2000 TIGER files. Lakes and rivers data was
derived from Ohio Department of Transportation (ODOT)
data. All base layers were projected into the UTM
(Universal Transverse Mercator) projection and used to
develop the predictor variables in raster format at 30m
resolution. Distance variables were created by calculating
the Euclidian distance of each cell from the closest feature
in the base layers. The density variable was constructed by
using a 3x3 moving window neighborhood operation and
summing up the number of base layer grid cells in the
neighborhood. Urban land was identified by selecting all
grid cells with the “developed” classification in the NLCD
dataset.
      </p>
      <p>Predictor variables for each county were constructed by
incorporating data from their bordering Ohio counties, to
simulate the influence of nearby spatial factors outside the
county borders (for instance, the proximity of a nearby city
center in a bordering county could potentially effect the
urban development within the target county). The resultant
predictor variables created at this multi-county level were
then clipped down to the boundaries of the chosen county
and used in the analysis.</p>
      <p>
        This type of data was extracted for four counties from the
state of Ohio: Delaware, Holmes, Mahoning and Medina.
All four resulting datasets contain more than a million
instances each. Table 1 shows for each county dataset how
many instances belong to the positive class, how many
instance belong to the negative class and the ratio between
the positive and the negative instances. All datasets are
mildly imbalanced from a 2.4:1 ratio for Mahoning County
to a 12.5:1 ratio for Holmes County.
For the first set of experiments we used two classifiers, the
SVM and the Multi-Layer Perceptron (MLP). We used the
libSVM
        <xref ref-type="bibr" rid="ref3">(Chang and Lin 2001)</xref>
        software to run the
parameter search, the training and the testing for SVM and
Weka for the MLP. The experiments were similar with the
experiments reported in
        <xref ref-type="bibr" rid="ref9">(Lazar and Shellito 2005, Shellito
and Lazar 2005)</xref>
        for Mahoning County.
      </p>
      <p>Random stratified sampling, which maintains the ratio of
positive versus negative instances in the datasets, was used
to generate datasets of 10,000 instances for the parameter
search and datasets of 50,000 for training sets.</p>
      <p>A grid parameter search was performed for the SVM
classifier and the values for the two parameters C and
gamma are listed below in table 2.
Next, both classifiers SVM and MLP were trained on the
50,000 instances datasets and the models obtained were
tested using the entire datasets. The results obtained are
reported in Table 3. For each dataset and for each classifier
(SVM and MLP) three performance metrics are listed:
accuracy, recall and g-means.
The results show that even if SVM has higher accuracy for
three of the datasets MLP has higher recall, so a better
classification of the positive instances for three of the
datasets. Recall has the largest values for the Mahoning
County dataset, which also has the lowest imbalanced ratio.
Looking at the low recall values for the other three
datasets, we need to investigate ways to better classify the
instances from the positive class. Experiments using
different sampling techniques are reported in the next
section.</p>
    </sec>
    <sec id="sec-8">
      <title>Experiments</title>
      <p>We run experiments using RapidMiner (Mierswa et al.
2006) on the four datasets Delaware, Holmes, Mahoning
and Medina as follows. For each dataset we performed 5
runs of a five-fold cross validation with the libSVM
software. The rbf kernel was used. The two parameters C
and gamma where changed to values previously found by
running a grid parameter search.
Three sampling techniques were used: random stratified
sampling (RS), equal under-sampling (US) and Wilson’s
editing sampling (WE). Each experiment was iterated
through subsample datasets with sizes between 100 and
5000, with a step of 100.</p>
      <p>The results are shown on two counties Holmes and
Medina, due to space limitation. The Holmes County has
the highest imbalanced ratio of approximately 12.5 and
Medina has a 4.3 imbalanced ratio.</p>
      <p>All four figures show that both under-sampling and
Wilson’s editing sampling have a great influence on the
classification performance of the SVM learner. As
accuracy is not relevant in the case of imbalanced datasets
we looked at recall and g-means. The Wilson’e editing
worked only slightly better than the equal under-sampling,
but required extensive preprocessing. The biggest
difference in performance can be seen in Figure 1 with the
recall for the Holmes County.</p>
    </sec>
    <sec id="sec-9">
      <title>Conclusions</title>
      <p>We have presented an experimental analysis performed on
large imbalanced GIS extracted datasets. The goal was to
find what sampling techniques improve the classification
performance especially for the minority class. It is
important in the case of imbalanced datasets that additional
performance measures like recall and g-means are
compared in addition to the usual accuracy. We concluded
that both equal under-sampling and Wilson’s editing work
better that just simple random stratified sampling, but there
is no significant difference between the two.</p>
      <p>Further research may investigate how other learners like
Neural Networks or Decision Trees perform with
undersampling and Wilson’s editing sampling. Over-sampling,
cost-sensitive learning, and meta-learners are other
alternatives that can be used to improve the performance
for our datasets.</p>
      <p>Liu, Y.; An A.; and Huang, X. 2006. Boosting Prediction
Accuracy on Imbalanced Datasets with SVM Ensembles. Lecture
Notes in Artificial Intelligence, vol. 3918: 107-118.
T. 2006. YALE: Rapid Prototyping for Complex Data Mining
Task. In Proceedings of the 12th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining
(KDD06).</p>
      <p>Pijanowski, B.; Pithadia, S.; Shellito, B. A.; and Alexandridis, K.
2005. Calibrating a neural network based urban change model for
two metropolitan areas of the upper Midwest of the United
States. International Journal of Geographical Information
Science 19 (2): 197-216.</p>
      <p>Pijanowski, B.; Brown, D.; Shellito, B. A.; and Manik, G.
2002. Use of Neural Networks and GIS to Predict Land Use
Change. Computers, Environment, and Urban Systems 26(6):
553-575.</p>
      <p>Pijanowski, B.; Shellito, B. A.; Bauer, M. and Sawaya, K. 2001.
“Using GIS, Artificial Neural Networks and Remote Sensing to
Model Urban Change in the Minneapolis-St. Paul and Detroit
Metropolitan Areas.” In Proceedings of the ASPRS Annual
Conference, St. Louis, MO.</p>
      <p>Schölkopf, B.; and Smola, A. 2002. Learning with Kernels. MIT
Press, Cambridge Massachusetts.</p>
      <p>Shellito, B. A.; and Lazar, A. 2005. Applying Support Vector
Machines and GIS to Urban Pattern Recognition. In Papers of the
Applied Geography Conferences, volume 28.</p>
      <p>Shellito, B. A.; and Pijanowski, B. 2003. “Using Neural Nets to
Model the Spatial Distribution of Seasonal Homes.”
Cartography and Geographic Information Science 30 (3):
281290.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Akbani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Kwek,
          <string-name>
            <given-names>S.</given-names>
            ; and
            <surname>Japkowicz</surname>
          </string-name>
          <string-name>
            <surname>N.</surname>
          </string-name>
          <year>2004</year>
          .
          <article-title>Applying support vector machines to imbalanced datasets</article-title>
          .
          <source>Proceedings of European Conference on Machine Learning</source>
          .
          <fpage>39</fpage>
          -
          <lpage>50</lpage>
          . Pisa, Italy, SpringerVerlag, Germany.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <year>2004</year>
          .
          <article-title>The Imbalanced Training Sample Problem: Under or Over Sampling? In Joint IAPR International Workshops on Structural, Syntactic, and Statistical Pattern Recognition (SSPR/</article-title>
          <source>SPR'04), Lecture Notes in Computer Science</source>
          <volume>3138</volume>
          :
          <fpage>806</fpage>
          -
          <lpage>814</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <surname>C-J. 2001 LIBSVM :</surname>
          </string-name>
          <article-title>a library for support vector machines</article-title>
          ,
          <year>2001</year>
          . Software at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Last accessed
          <volume>01</volume>
          /15/
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Cristianini</surname>
            ,
            <given-names>N</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Shawe-Taylor</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2000</year>
          .
          <article-title>An Introduction to Support Vector Machines and Other Kernel-based Learning Methods</article-title>
          , Cambridge University Press, Cambridge, England.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Ertekin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ; and
          <string-name>
            <given-names>Lee</given-names>
            <surname>Giles</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>Learning on the border: active learning in imbalanced data classification</article-title>
          .
          <source>In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management</source>
          (Lisbon, Portugal,
          <source>November 06 - 10</source>
          ,
          <year>2007</year>
          ). CIKM '
          <volume>07</volume>
          .
          <fpage>127</fpage>
          -
          <lpage>136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Koggalage</surname>
          </string-name>
          , R.; and
          <string-name>
            <surname>Halgamuge</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2004</year>
          .
          <article-title>“Reducing the Number of Training Samples for Fast Support Vector Machine Classification</article-title>
          .
          <source>” Neural Information Processing - Letters and Reviews</source>
          <volume>2</volume>
          (
          <issue>3</issue>
          ):
          <fpage>57</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Kubat</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Holte</surname>
          </string-name>
          , R. C.
          <article-title>; and</article-title>
          <string-name>
            <surname>Matwin. S.</surname>
          </string-name>
          <year>1998</year>
          .
          <article-title>Machine Learning for the detection of oil spills in satellite radar images</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>30</volume>
          (
          <issue>2-3</issue>
          ):
          <fpage>195</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Lazar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Shellito</surname>
            ,
            <given-names>B. A.</given-names>
          </string-name>
          <year>2005</year>
          .
          <article-title>Comparing Machine Learning Classification Schemes - a GIS Approach</article-title>
          . In Proceedings of
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>