<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Instance space analysis for unsupervised outlier detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sevvandi Kandanaarachchi</string-name>
          <email>sevvandi.kandanaarachchi@monash.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mario A. Mu n˜oz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>y Kate Smith-Milesz</string-name>
          <email>smith-miles@unimelb.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Business and Economics, Monash University</institution>
          ,
          <addr-line>Clayton VIC 3800</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>meta-features, as well as the knowledge based used to train The success of various methods for unsupervised outlier detection the instance space model. Hence, outselect facilitates depends on how well their definition of an outlier matches the the end-to-end evaluation for a new dataset. dataset properties. Given that each definition, and hence each method, has strengths and weaknesses, measuring those properties could help us to predict the most suitable method for the dataset at 2 hand. In this paper, we construct and validate a set of meta-features that measure such properties. We then conduct the first instance space analysis for unsupervised outlier detection methods based on the meta-features. The analysis provides insights into the methods' strengths and weaknesses, and facilitates the recommendation of an appropriate method with good accuracy.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Experimental setting</title>
      <p>To demonstrate that an outlier detection method has regions
of “good” performance within the instance space, we define
an experimental setting including: (a) a portfolio of methods,
(b) a collection of instances, and (c) a definition of good
performance based on an evaluation metric. In the next
sections, we describe the details of each component of our
setting. We note that that the methodology would not change
even if the experimental setting does.</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction</title>
      <p>The applications of outlier detection include taks as diverse
as identifying fraudulent credit card transactions, and
recognising terrorist plots in social media. The significance of
these applications has spurred recent research efforts,
leading to a steady growth in the number of detection methods
[1], and with a variety of definitions of an outlier. As result, it
is not uncommon for the methods to disagree on the number
and location of the outliers. It therefore becomes challenging
to identify the most suitable outlier detection method for the
problem at hand. Indeed, it is unlikely that a single method
outperforms all others in all applications [2, 3]. Therefore,
on a new problem, our focus should be on understanding the
suitability of various methods, by exploiting the experience
gained through solving similar problems.</p>
      <p>This paper extends the computational study of Campos
et al. [4] by making the first exploration of algorithm
selection for outlier detection methods. Using a
comprehensive set of meta-features, i.e., measures of dataset
properties, we are able to predict the suitability of a method for a
given problem. Moreover, we perform the the first instance
space analysis [5, 6, 7], which helps us gain further
understanding of the comparative strengths and weaknesses of an
outlier detection method, more nuanced than a summary
table of statistical results can offer. As a resource for the
research community, we make available our repository of more
than 12000 datasets [8]. Moreover, we provide the R
package outselect [9], containing the code to calculate all the
(2.1)</p>
      <p>k(dataset) = min (bN=20c; 100)
where N is the number of observations in the dataset. The
maximum of k = 100 limits the number of computations that
can result from a large dataset. The motivation for choosing
bN=20c is as follows: Choosing k = N would result in
a very large neighbourhood making some methods such as
KNN and LOF not discriminate enough between outliers and
non-outliers. On the other hand, a very small value of k such
as k = 1 may miss small clusters of outliers, as within that
small cluster the density is high and distances are low. As
such we choose a small value of k, that is not too small.
2.2 Datasets Our benchmark set contains approximately
12000 datasets, most of which were generated by following
the approach by [4, 23]. First, we take one of 170 datasets
[24] which originally represented a classification problem
sourced from the UCI Machine Learning Repository. Then,
we generate a new dataset by down-sampling one of the
classes. We repeat this procedure for all of the classes,
labelling the down-sampled class as the outliers. In addition
we also use datasets from [4, 23] which have outlier class
down-sampled at rates 5% and 2%. Moreover, to guarantee
that the new datasets are fit for outlier detection
benchmarking, we carry out the following tasks:</p>
      <sec id="sec-2-1">
        <title>Down-sampling For a dataset with K classes, each class</title>
        <p>in turn is designated the outlier-class and observations
belonging to that class are down-sampled such that the
outlier percentage p 2 f2; 5g. The down-sampling is
randomly carried out 10 times to produce 10 variants
for each value of p.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Categorical variables From each down-sampled dataset,</title>
        <p>we create one dataset with categorical variables
removed, and one with categorical variables converted to
numerical values using the method of inverse data
frequency [4].</p>
      </sec>
      <sec id="sec-2-3">
        <title>Duplicate observations Since the distance between dupli</title>
        <p>cates is zero, which in turn leads to infinite densities,
duplicate observations are removed to avoid numerical
instability caused by division by zero errors.</p>
        <p>Missing values For each attribute in each dataset, the
number of missing values are computed. Similar to [4], if an
attribute has less than 10% of missing values, the
observations containing the missing values are removed,
otherwise, the attribute is removed.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.3 Evaluation metric While outlier detection is consid</title>
        <p>ered an unsupervised learning task, assessing algorithm
performance requires a ground truth given by the outlier labels
[4]. This makes it possible to define an evaluation metric.
Although no single universally accepted metric exists, the
most popular ones in outlier detection are: (a) the Area
Under the Receiver Operator Characteristic curve (AU C), and
(b) the Precision-Recall curve. Other metrics include
precision at n [25], average precision [26], and a combination of
positive and negative predictive values. In our study we use
AU C as the evaluation metric, and define good performance
of a method on a dataset if AU C &gt; 0:8. Alternative
definitions can be considered within the methodology, however
a detailed study on the impact of this choice is beyond the
scope of this proof-of-concept methodological paper.
2.4 Meta-features Given that our outlier detection
datasets have been generated by down-sampling
classification datasets, we are able to borrow many of the features
that have been used to summarize interesting properties of
classification datasets, and then add some additional features
that are unique to the outlier detection challenge. We have
categorized the features in two main classes, with the first
being generic features which measure various aspects of
a dataset but are not particularly tailored towards outlier
detection. While they are broadly relevant, they offer little
insight into the outlier structure of a dataset.</p>
        <p>The second class of features measure properties of the
outlier structure of a dataset using density, residual and
graph-based characteristics. These use the outlier labels in
their feature computation. The reason for using class labels
in feature computation is two-fold. Firstly, different outlier
methods may disagree on outliers and their rankings. As
such, we need to know how each method declares its outliers.
Are they density related, distance related or graph related
outliers? Secondly, the use of ground truth in labelling
outliers gives the opportunity to learn when each method’s
definition is suitable. Learning these relationships from
training data enables identificaton of the most suitable outlier
method for a given problem. This is common in industry
applications where training data contains labelled outliers,
and the task is to find effective methods which extract these
outliers for future unlabelled data [27, 28].</p>
        <p>We compute a broad set of candidate features, and later
select a subset for the instance space analysis. Due to space
constraints, we provide a brief overview of these features
here. More details of these features are given in our gihub
repository [9]. Table 1 summarizes the features by category.</p>
      </sec>
      <sec id="sec-2-5">
        <title>Simple features These are related to the basic structure of a</title>
        <p>dataset, e.g. the number of observations.</p>
      </sec>
      <sec id="sec-2-6">
        <title>Statistical features These relate to statistical properties</title>
        <p>such as skewness and kurtosis of a dataset.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Information theoretic features measure the amount of in</title>
      <p>formation present in a dataset, e.g. entropy of a dataset.
Outlier features In order to make our feature set richer we
include density-based, residual-based and graph-based
features. We compute these features for different
subsets of the dataset, namely outliers, non-outliers and
proxy-outliers, with the last subset being data points
that are either far away from “normal” data, reside in
low density regions, or have different graph properties
compared to the rest. We define proxy-outliers using
distance, density, residual and graph based perspectives.</p>
      <p>If there is a significant overlap between proxy-outliers
and actual outliers, then we expect certain outlier
detection methods to perform well on such datasets.
Proxyoutlier based features fall into the category of landmark- stance) features and strengths and weaknesses of methods.
ing, which has been popular in meta-learning studies Our approach to constructing the instance space is based
[29, 30, 31]. on the most recent implementation of the methodology [24].</p>
      <p>Critical to both algorithm performance prediction and
in1. Density based features are computed either using</p>
      <p>stance space construction is the selection of features that are
the density based clustering algorithm DBSCAN</p>
      <p>both distinctive (i.e. uncorrelated to other features) and
pre[32] or kernel density estimates.</p>
      <p>dictive (i.e correlated with algorithm performance).
There2. Residual based features are computed by fitting</p>
      <p>fore, we follow a procedure to select a small subset of
feaa series of linear models and obtaining residuals.</p>
      <p>tures that best meet these requirements, using a subset of
3. Graph based features are based on
graph</p>
      <p>2053 instances for which there are three well performing
theoretic measures such as vertex degree, shortest</p>
      <p>algorithms or less. This choice is somewhat arbitrary,
mopath and connected components. The dataset is</p>
      <p>tivated by the fact that we are less interested to consider
converted to a directed graph using the software</p>
      <p>datasets where many algorithms perform well or poorly,
[33] to facilitate this feature computation.</p>
      <p>given our aim is to understand unique strengths and
weaknesses of outlier detection methods.
3 Predicting outlier detection method performance We pre-process the data such that it becomes amenable
from meta-features to machine learning and dimensionality projection methods.
Using a set of 178 candidate features, we predict suitable Given that some features are ratios, one feature can often
outlier detection methods for given datasets. For this pur- produce excessively large or infinite values. Hence, we
pose, we train 12 Random Forest classifiers - one for each bound all the features between their median plus or minus
outlier method - with the 178 features as inputs and a binary five times their interquartile range. Any not-a-number value
predictor for good performance as the output. Table 2 pro- is converted to zero, while all infinite values are equal to
vides the average cross-validation accuracy over 10 folds for the bounds. Next, we apply Box-Cox transformation to
each outlier detection method. As most outlier methods are each feature to stabilise the variance and normalise the data.
only good for a small number of datasets, a naive prediction Finally, we apply a z-transform to standardise each feature.
that performance is not good for all datasets achieves the de- Starting with a set of 178 features, we determine
fault accuracy. Since our Random Forest accuracy is greater whether a feature has unique values for most instances. A
than the default accuracy for all methods, the meta-features feature provides little information if it produces the same
are clearly able to help distinguish which methods are well value for most datasets. Hence, we discard those that have
suited for some datasets, even when these are a minority. less than 30% unique values, leaving 135 features. Then, we
check the correlation between the features and AU C.
Sort4 Instance space construction and analysis ing the absolute value of the correlation from highest to
lowHaving validated that the features contain sufficient infor- est, we select the top three features per algorithm. This
remation to be predictive of outlier detection method perfor- sults in 26 features, as some of them are correlated to more
mance, we now use the features to construct an instance than one algorithm and we do not need replicates. Next, we
space to visualise the relationships between dataset (in- identify groups of similar features, using a k-means
clustering algorithm and a dissimilarity measure of 1 j j, where
is the correlation between two features. As result, we obtain
seven clusters. Retaining one feature from each cluster, there
are 2352 possible feature combinations. To determine the
best one for the dimensionality reduction, we project each
candidate feature subset into a 2-d space using PCA with
two components. Then, we fit a random forest model per
algorithm to classify the instances in the trial instance space
into groups of good and bad performance, as defined in
Section 2.3. That is, we fit 12 models for each feature
combination, and define the best combination as the one
producing the lowest average out-of-the-bag error across all models.</p>
      <p>This process results in 7 features finally being selected.</p>
      <p>We then use the Prediction Based Linear Dimensionality
Reduction (PBLDR) method [24] to find a projection from
7-d to 2-d that creates the most linear trends of algorithm
performance and feature values across the instance space,
to assist visualisation of directions of hardness and feature
correlations. Given PBLDR’s stochasticity, we calculate
30 different projections and select the one with the highest
topological preservation, defined as the correlation between
high- and low-dimensional distances. The final projection
matrix is defined by Equation 4.2 to represent each dataset
as a 2-d vector Z depending on its 7-d feature vector.
(4.2)
2
6
6
6
Z = 66
6
6
6
4
0:0862
0:1737
0:0460
0:0938
0:1202
0:1854
0:3543
0:2078 3&gt; 2 Mean Entropy Attr 3
0:1845 7 6 IQR TO SD 95 7
0:2847 77 66 OPO Res ResOut Median 77
0:2025 77 66 OPO Res Out Mean 77
0:0378 77 66 OPO Den Out 95P 77
0:0822 75 64 OPO GDeg PO Mean 75
0:1325 OPO GDeg Out Mean
Mean Entropy Attr is the average entropy of the attributes
of the dataset, and corresponds to a measurement of
the information contained by the attribute. For
example, if an attribute is constant for all observations, then
the entropy is zero. Thus, high values of entropy
indicate highly unpredictable fluctuations. If the outliers
are causing high values of entropy, then it is likely that
these points are different from the rest, resulting in
multiple outlier methods performing well on these datasets,
as seen in Figure 2.</p>
      <p>IQR TO SD 95 is the 95th percentile of the IQR to standard
deviation ratio of each attribute of a dataset. It captures
the presence of outliers by means of comparing two</p>
      <p>measures of spread, of which one is affected by outliers
and another is robust. As this feature is first computed
per attribute before taking the 95th percentile, data
points that stand out in a single dimension give a small
IQR to SD ratio for that dimension. Thus, if the 95th
percentile of the IQR to SD ratio is small, then that
implies there are data points that stand out in different
dimensions. If these data points are indeed outliers, it
should be easy for distance based methods to find them.</p>
      <p>We expect both KNN and KNNW to perform well on
datasets with small values of this feature.</p>
      <p>OPO Res ResOut Median is the ratio between the median
of residuals of proxy-outliers to the median of residuals
of non-proxy-outliers. It is calculated by randomly
choosing a subset of non-binary attributes from the
dataset, s. Then, from each attribute a in s, we
fit a linear model with a as the dependent variable
and others as the independent variables. Next, for
each model, we compute the residuals r. Define the
residual based proxy-outliers ! as those with the top
3% absolute residual values. Finally, we compute</p>
      <p>= median(r[!])=median(r[!]), where ! denotes
non-proxy-outliers. The average of across all models
is the feature value.</p>
      <sec id="sec-3-1">
        <title>OPO Res Out Mean is the ratio between the mean of</title>
        <p>residuals of outliers to the mean of residuals of non
outliers. It is calculated following a similar approach
to that of OPO Res ResOut Median with the
exception that ! denotes actual outliers instead of
proxyoutliers and the mean instead of the median is used.</p>
        <p>We expect KNN and KNNW to perform well on
datasets with high values of OPO Res ResOut Median
and OPO Res Out Mean.</p>
        <p>OPO Den Out 95P is the ratio between the 95% of density
estimates of non-outliers to 95% of density estimates
of outliers. It is calculated by performing PCA on the
dataset, omitting class information. Then, considering
two consecutive components at a time and up to the 10th
component, we build nine 2-d spaces. For each one of
these spaces, we compute kernel density estimates x.</p>
        <p>Then, we compute y1 = x[!] and y2 = x[!], where ! low graph degree, as their connections to other points are low
is the set of outliers and ! denotes non-outliers. Let 1 in number, making OPO GDeg Out Mean correlate with
be the 95th percentile of y1 and 2 be the 95th percentile KNN and KNNW performances.
of y2. Let = 1= 2, whose average across all 2-d is
the feature value. If outliers reside in less dense regions, 4.1 Footprint analysis of algorithm strengths and
weakthis gives rise to high values of this feature, enabling nesses We define a footprint as an area of the instance space
certain density based methods to perform well on these where an algorithm is expected to perform well. To construct
datasets. As such, we expect LDF to perform well on a footprint, we follow the approach first introduced in [34]:
datasets with high values of this feature. (a) we measure the length of the edges between all instances,
and remove those edges with a length lower than a threshold,
OPO GDeg PO Mean is a ratio of mean degree of inner ; (b) we calculate a Delaunay triangulation within the
conpoints to mean degree of non-inner points. It is calcu- vex hull formed by the remaining edges; (c) we create a
conlated by generating a graph from the dataset using dis- cave hull, by removing any triangle with edges larger than
tances between points [33]. This graph would have v1 another threshold, ; (d) we calculate the density and purity
and its nearest neighbour v2 as vertices connected with of each triangle in the concave hull; and, (e) we discard any
an edge. Then, we compute the degree of each vertex triangle that does not fulfil the density and purity thresholds.
v. Let ! be the set of inner points, which are defined as The values for parameters for the lower and upper distance
the vertices with the highest 3% degree values. Then the thresholds, f ; g, are set to 1% and 25% of the maximum
feature value is equal to mean( [!])=mean( [!]) where distance respectively. The density threshold, , is set to 10,
! denotes outliers and ! denotes non-outliers. and the purity threshold, , is set to 75%. We then remove
any contradictions that could appear when two conclusions
OPO GDeg Out Mean is similar to OPO GDeg PO could be drawn from the same section of the instance space
Mean with the exception of using outliers instead of due to overlapping footprints, e.g., when comparing two
alinner points. These two features compute a ratio of gorithms. This is achieved by comparing the area lost by the
vertex-degree between different groups: outliers and overlapping footprints when the contradicting sections are
non-outliers, inner-points and non-inner points. As removed. The algorithm that would loose the largest area
outliers are generally isolated, they have low vertex- in a contradiction gets to keep it, as long as it maintains the
degree values. Thus, these outliers make smaller angles density and purity thresholds.
with other data points, making it easier for angle based Table 3 presents the results from the analysis. The
methods such as FAST ABOD to find them. Therefore best algorithm is the one such that AUC is the highest for
we expect FAST ABOD to perform well on datasets the given instance. The results are a percentage of the
with low values of OPO GDeg Out Mean, even though area (6.6703) and density (305.6825) of the convex hull
we do not perform any angle based computation. that encloses all instances. The table demonstrate that
ttttfvooihhhoafuoaeerltpmutlbtaireoCaeeatotpnrtnoxrowsicmyofibeen(-fuOebOosttontiuhePaPnftsentOiOlhnKic(seegMepDrRRasraieeEeecslaassselOrnpi.eodRRSrTfueEesedhattisnsehmliiOOtscsnirsituoduiflasartpstibonryrMlMtmeetoFloAeeaoipidgdtttrnetrhiuircoadalo)rernxn.esetsy)osesaAi-nsalaooreodnlanfuowinddtfcnwldreaio2oeormtne,mermvns-sewaptattlraternohuohoneespaxndeestyyobneaornotsoefrtu-tshosutptmgoiehptrdglmoeyaiupeelxareslryotaetlrsoss--r-f. ltrsacgKweheoroofeNeqemrotunNtcduecoieanrorpneph.aarmleabonsgrvKlfeeeofeerenotDr,httirttosteEmhwhh.eOaifimhaOntninlSsslanFcdet,raAeptgnshaKeeSunciersnNTceftoaohNstffrArhopeemaewBaoatsrsect,OtpnehIhtrN.dsDahaicnsnetFadttLfttseb,uto.nOeeFldrfisACd,etslodoSsLitmmnToOtihinbAOntdeshiaontBPedatmieOnenfaincgetDnneshwadiws,ett;KyeiiOltnsoDhhaDswtentEhFanIdeeNOnicrgpcelSuoe,ruhriwsaragewinvthead2yeertt
This signifies a unique strength of KDEOS as it performs
rffaodgwwreonfreoehgravdtlmiiphiFlcoahTteAhnithnlhirdSeeohieanfeaTnattmgnvpitoorAeteeeoeefrrbaBehamftneorsihOogistyregwmgfhDohrfrefoean.teaegunarepTnoeittcahholufsertieinrheeddteiosrehruesistfengapin(nK(rleItOseetQerrNpeiefarPgraRomtNqhOcioroubemsfTasa,oGo.nOarauwtnfdDlrilclSheeedKevniDgilrNestiearstrOo9anNttrinp5guhecWyc)eteleaestMabtsaotenuekopgdedntevpeadnteionntonhrse)taeo.itrattthahesiieOldettnlsaiiyvucfsnm9fiarthdlel5tcieauraae%uauvrserlndeeesst,
iifaa4oVmnneld.uig2asesdtwtttoluiauhitrTaehnrioiAlroetichdihcsnsediushm,si,etnstowpwaipgessmlraceegeocrtloaacieececoretgacaiencskitntndsoohiemonomypcanasrealr’dlsotnsagfhooijdnooeofbfvarourcdeatsitsosentnteathrsieaptuteawmtcortnhgiiocfngneeomitattsnrohieadoisnnltttitfesefeanholtcwdnigaettrechnliisiyoeifecsn,n.nsoewss.amittnsnaevapednsBnilaatcecaytoceaehneiurnuuccntseofslsspioittmeanrharsrecgenppaceudcauoibtetecatmesitoennsesm.mmdtgcppteaeaistattonhceIairnndeece-titioning of the instance space to expose regions of strength
for each method. We use support vector machines (SVM)
for this partitioning. Of the 12 outlier methods we consider
only FAST ABOD, KDEOS, KNN and KNNW, as these
methods have larger footprints that span a contiguous
region of the instance space. For these outlier methods, we
use our definition of good performance as the output and
the instance space coordinates as the input for the SVM. We
train 4 SVMs, one for each of these selected outlier meth- 5 Conclusion
ods. The prediction accuracies using 5-fold cross validation, We have investigated the algorithm selection problem for
along with the percentage of instances in the majority class, unsupervised outlier detection. Given measurable features
are given in Table 4, showing better than default accuracy for of a dataset, we find the most suitable outlier method with
all methods except KDEOS. reasonable accuracy. This is important because each method</p>
        <p>Figure 3 shows the dominant regions of algorithms has its strengths and weaknesses and no single method
outFAST ABOD, KNN, KNNW and KDEOS. We can see that performs all others for all instances. We have explored the
FAST ABOD is stronger on the left side of the instance strengths and weaknesses of outlier methods by analysing
space with a few sparse instances lighting up on the extreme their footprints in the constructed instance space. Moreover,
right, whereas KDEOS is stronger on the top region of we have identified different regions of the instance space
the instance space. KNN and KNNW have overlapping that reveal the relative strengths of different outlier detection
regions of strength with KNNW sharing some instances methods. Our work clearly demonstrates for example that
with KDEOS at the top. However, KDEOS is the stronger KDEOS, which gives poor performance on average, has
algorithm in the top region of the instance space. a region of strength in the instance space where no other</p>
        <p>We combine these SVM predictions of regions of algorithm excels.
strength to obtain a final partitioning of the instance space. In addition to these contributions, we hope to have laid
Regions of overlap mean that some instances have multi- some important foundations for future research into new and
ple preferred outlier methods. For such instances, we break
clude the expansion of the instance space by generating new
and diverse instances to fill the space and considering other
classes of outlier detection methods, such as subspace,
clustering and PCA approaches. In addition we have transformed
non-numeric data to numeric in this study. Further avenues
of research include incorporating datasets with non-numeric
attributes in the instance space. To aid this expansion and
future research, we make all of our data and implementation
scripts available on our website.</p>
        <p>Broadening the scope of this work, we have been
adapting the instance space methodology to other problems
besides outlier detection. For example, machine learning [24]
and time series forecasting [35]. Part of this larger project
is to build freely accessible web-tools that carry out the
instance space analysis automatically, including testing of
algorithms on new instances. Such tools will be available at
matilda.unimelb.edu.au in the near future.
(a)
(b)
(c)
(d)</p>
        <p>Acknowledgements
improved outlier detection methods, in the following ways: Funding was provided by the Australian Research Council
(a) rigorous evaluation of new methods using the compre- through grants FL140100012 and LP160101885. This
rehensive corpus of over 12000 datasets with diverse character- search was also supported by the Monash eResearch
Cenistics we have made available; (b) using the instance space, tre and eSolutions-Research Support Services through the
the strengths and weaknesses of new outlier methods can be MonARCH HPC Cluster.
identified, and their uniqueness compared to existing
methods; (c) using the R package outselect the outlier methods References
suitable for a new dataset can be identified, and the new
instance can be plotted on the instance space. Equally valu- [1] A. Zimek, E. Schubert, and H.-P. Kriegel, “A survey on
able, the instance space analysis can also reveal if a new out- unsupervised outlier detection in high-dimensional numerical
lier method is similar to existing outlier methods, or offers a data,” Statistical Analysis and Data Mining: The ASA Data
unique contribution to the available repertoire of techniques. Science Journal, vol. 5, no. 5, pp. 363–387, 2012.</p>
        <p>As a word of caution, we note that our current instance [2] D. H. Wolpert, W. G. Macready et al., “No free lunch theorems
space is computed using our collection of datasets, outlier Ifnorstisteuaterc,hT,e”chT.eRchenpi.c,a1l99R5e.port SFI-TR-95-02-010, Santa Fe
methods and features. Thus, we do not make claim to have [3] D. H. Wolpert and W. G. Macready, “No free lunch theorems
constructed the definitive instance space for all unsupervised for optimization,” IEEE transactions on evolutionary
compuoutlier detection methods. Hence, the selected features for tation, vol. 1, no. 1, pp. 67–82, 1997.
the instance space may change with the expansion of the cor- [4] G. O. Campos, A. Zimek, J. Sander, R. J. Campello, B.
Mipus of datasets and outlier methods. Future research paths in- cenkova´, E. Schubert, I. Assent, and M. E. Houle, “On
the evaluation of unsupervised outlier detection: measures, Mining. Springer, 2009, pp. 813–822.
datasets, and an empirical study,” Data Mining and Knowledge [20] L. J. Latecki, A. Lazarevic, and D. Pokrajac, “Outlier
detecDiscovery, vol. 30, no. 4, pp. 891–927, 2016. tion with kernel density functions,” in International Workshop
[5] K. Smith-Miles, D. Baatar, B. Wreford, and R. Lewis, “To- on Machine Learning and Data Mining in Pattern Recognition.
wards objective measures of algorithm performance across in- Springer, 2007, pp. 61–75.</p>
        <p>stance space,” Comput. Oper. Res., vol. 45, pp. 12–24, 2014. [21] E. Schubert, A. Zimek, and H.-P. Kriegel, “Generalized
out[6] K. Smith-Miles and S. Bowly, “Generating new test instances lier detection with flexible kernel density estimates,” in
Proby evolving in instance space,” Computers &amp; Operations ceedings of the 2014 SIAM International Conference on Data
Research, vol. 63, pp. 102–113, 2015. Mining. SIAM, 2014, pp. 542–550.
[7] K. Smith-Miles and T. T. Tan, “Measuring algorithm footprints [22] H.-P. Kriegel, A. Zimek et al., “Angle-based outlier detection
in instance space,” in Evolutionary Computation (CEC), 2012 in high-dimensional data,” in Proceedings of the 14th ACM
IEEE Congress on. IEEE, 2012, pp. 1–8. SIGKDD international conference on Knowledge discovery
[8] S. Kandanaarachchi, M. Munoz Acosta, K. Smith-Miles, and and data mining. ACM, 2008, pp. 444–452.</p>
        <p>R. Hyndman, “Datasets for outlier detection,” Feb 2019. [23] M. Goldstein and S. Uchida, “A comparative evaluation of
[Online]. Available: https://monash.figshare.com/articles/ unsupervised anomaly detection algorithms for multivariate
Datasets 12338 zip/7705127/4 data,” PloS one, vol. 11, no. 4, p. e0152173, 2016.
[9] S. Kandanaarachchi, outselect: Algorithm selection for [24] M. A. Mun˜oz, L. Villanova, D. Baatar, and K. Smith-Miles,
unsupervised outlier detection, 2018, r package version “Instance spaces for machine learning classification,” Machine
0.0.0.9000. [Online]. Available: https://github.com/sevvandi/ Learning, vol. 107, no. 1, pp. 109–147, 2018.
outselect [25] N. Craswell, “Precision at n,” in Encyclopedia of database
[10] E. Achtert, H.-P. Kriegel, and A. Zimek, “Elki: a software systems. Springer, 2009, pp. 2127–2128.
system for evaluation of subspace clustering algorithms,” in In- [26] E. Zhang and Y. Zhang, “Average precision,” in Encyclopedia
ternational Conference on Scientific and Statistical Database of database systems. Springer, 2009, pp. 192–193.</p>
        <p>Management. Springer, 2008, pp. 580–585. [27] P. Talagala, R. Hyndman, K. Smith-Miles, S.
Kan[11] S. Ramaswamy, R. Rastogi, and K. Shim, “Efficient algo- danaarachchi, M. Mun˜oz et al., “Anomaly detection in
streamrithms for mining outliers from large data sets,” in ACM Sig- ing nonstationary temporal data,” Journal of Compuational
mod Record, vol. 29, no. 2. ACM, 2000, pp. 427–438. and Graphical Statistics, 2019, accepted.
[12] F. Angiulli and C. Pizzuti, “Fast outlier detection in high [28] C. Leigh, O. Alsibai, R. J. Hyndman, S. Kandanaarachchi,
dimensional spaces,” in European Conference on Principles O. C. King, J. M. McGree, C. Neelamraju, J. Strauss, P. D.
of Data Mining and Knowledge Discovery. Springer, 2002, Talagala, R. D. Turner et al., “A framework for automated
pp. 15–27. anomaly detection in high frequency water-quality data from
[13] V. Hautamaki, I. Karkkainen, and P. Franti, “Outlier detec- in situ sensors,” Science of The Total Environment, 2019.
tion using k-nearest neighbour graph,” in Pattern Recognition, [29] B. Pfahringer, H. Bensusan, and C. G. Giraud-Carrier,
“Meta2004. ICPR 2004. Proceedings of the 17th International Con- learning by landmarking various learning algorithms.” in
ference on, vol. 3. IEEE, 2004, pp. 430–433. ICML, 2000, pp. 743–750.
[14] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, [30] Y. Peng, P. A. Flach, C. Soares, and P. Brazdil, “Improved
“Lof: identifying density-based local outliers,” in ACM sigmod dataset characterisation for meta-learning,” in International
record, vol. 29, no. 2. ACM, 2000, pp. 93–104. Conference on Discovery Science. Springer, 2002, pp. 141–
[15] E. Schubert, A. Zimek, and H.-P. Kriegel, “Local outlier 152.</p>
        <p>detection reconsidered: a generalized view on locality with [31] K. A. Smith-Miles, “Cross-disciplinary perspectives on
metaapplications to spatial, video, and network outlier detection,” learning for algorithm selection,” ACM Computing Surveys
Data Mining and Knowledge Discovery, vol. 28, no. 1, pp. (CSUR), vol. 41, no. 1, p. 6, 2009.</p>
        <p>190–237, 2014. [32] M. Ester, H.-P. Kriegel, J. Sander, X. Xu et al., “A
density[16] J. Tang, Z. Chen, A. W.-C. Fu, and D. W. Cheung, “Enhancing based algorithm for discovering clusters in large spatial
effectiveness of outlier detections for low density patterns,” in databases with noise.” in Kdd, vol. 96, no. 34, 1996, pp. 226–
Pacific-Asia Conference on Knowledge Discovery and Data 231.</p>
        <p>Mining. Springer, 2002, pp. 535–548. [33] G. Csardi and T. Nepusz, “The igraph software package for
[17] W. Jin, A. K. Tung, J. Han, and W. Wang, “Ranking out- complex network research,” InterJournal, Complex Systems,
liers using symmetric neighborhood relationship,” in Pacific- vol. 1695, no. 5, pp. 1–9, 2006.</p>
        <p>Asia Conference on Knowledge Discovery and Data Mining. [34] M. Mun˜oz and K. Smith-Miles, “Performance analysis of
Springer, 2006, pp. 577–593. continuous black-box optimization algorithms via footprints
[18] H.-P. Kriegel, P. Kro¨ger, E. Schubert, and A. Zimek, in instance space,” Evol. Comput., vol. 25, no. 4, pp. 529–554,
“Loop: local outlier probabilities,” in Proceedings of the 18th 2017.</p>
        <p>ACM conference on Information and knowledge management. [35] Y. Kang, R. Hyndman, and K. Smith-Miles, “Visualising
ACM, 2009, pp. 1649–1652. forecasting algorithm performance using time series instance
[19] K. Zhang, M. Hutter, and H. Jin, “A new local distance-based spaces,” Int. J. Forecast, vol. 33, no. 2, pp. 345–358, 2017.
outlier detection approach for scattered real-world data,” in
Pacific-Asia Conference on Knowledge Discovery and Data</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>