=Paper= {{Paper |id=Vol-2175/paper05 |storemode=property |title=Unsupervised Ensembles for Outlier Detection |pdfUrl=https://ceur-ws.org/Vol-2175/paper05.pdf |volume=Vol-2175 |authors=Guilherme Campos |dblpUrl=https://dblp.org/rec/conf/vldb/Campos18 }} ==Unsupervised Ensembles for Outlier Detection== https://ceur-ws.org/Vol-2175/paper05.pdf
                Unsupervised Ensembles for Outlier Detection

                                                          Guilherme O. Campos
                                                Supervised by Prof. Dr. Wagner Meira Jr.
                                                   Department of Computer Science
                                                   Federal University of Minas Gerais
                                                         Belo Horizonte, Brazil
                                                        gocampos@dcc.ufmg.br

ABSTRACT                                                                           Given many outlier detection results, how do we select
Ensemble techniques have been applied to the unsupervised                       which members we are going to include into the ensemble?
outlier detection problem in some scenarios. Challenges are                     How do we combine these selected members into a single and
the generation of diverse ensemble members and the combi-                       more robust result? These challenges mixed to an unsuper-
nation of individual results into an ensemble. For the latter                   vised environment are still pertinent and more research is
challenge, some methods tried to design smaller ensembles                       need to increase our knowledge over this topic in question.
out of a wealth of possible ensemble members, to improve                           In this paper we propose an initial approach towards a
the diversity and accuracy of the ensemble (relating to the                     more robust ensemble method by transferring the supervised
ensemble selection problem in classification). In this pa-                      boosting technique to the unsupervised scenario of outlier
per, We propose a boosting strategy to solve the ensemble                       detection ensembles. The proposed outlier ensemble selec-
selection problem, called BoostSelect. We evaluate BoostSe-                     tion technique is called BoostSelect.
lect over a large benchmark of datasets for outlier detection,
showing improvements over baseline approaches.                                  2.   RELATED WORK
                                                                                   The ensemble approach to learning has been studied in
1.    INTRODUCTION                                                              outlier detection several times. In analogy to supervised
   The identification of outliers (i.e., data objects that do not               learning, an ensemble can be expected to improve over its
fit well to the general data distribution) is very important in                 components if these components deliver results with a rea-
many practical applications. Application examples are the                       sonable accuracy while being diverse [31]. The two main
detection of credit card fraud in financial transactions data,                  challenges for creating good ensembles are, therefore, (i) the
the identification of measurement errors in scientific data,                    generation of diverse (potential) ensemble members, and (ii)
or the analysis of sports statistics data.                                      the combination (or selection) of members to an ensemble.
   Recent research on the unsupervised problem of outlier de-                      Some strategies to achieve diversity among ensemble mem-
tection advanced the area by applying ensemble techniques                       bers have been explored, such as feature bagging (i.e., com-
[31]. Ensemble methods, i.e., combining the findings or re-                     bining outlier scores learned on different subsets of attributes)
sults of individual learners to an integrated, typically more                   [13], different parameter choices for some base method [5],
reliable and better result, are well established in the super-                  the combination of actually different base methods [16, 10,
vised context of classification or regression [20]. In unsuper-                 23], the introduction of a random component in a given
vised learning, the theoretical underpinnings are less clear                    learner [14], the use of different subsamples of the data ob-
but can be drawn in analogy to the supervised context as it                     jects (parallel [33] or sequential [21, 19]), adding some ran-
has been done for clustering ensembles [30].                                    dom noise component on the data (“perturbation”) [32], or
   The focus of my PhD thesis is on ensemble selection,                         using approximate neighborhoods for density estimates [8].
which has been well studied in supervised scenarios [4] (also                   Likewise, different combination procedures have been pro-
called selective ensembles [29], or ensemble pruning [15, 27,                   posed based on outlier scores or on outlier ranks [13, 5, 10,
30]). Ensemble selection is also related to boosting [22],                      31].
which is often used to change training conditions for addi-                        Some methods have also been proposed to select the more
tionally sought, yet to be trained ensemble members or to                       diverse or the more accurate ensemble members [23, 18].
select the most suitable additional ensemble members from                       These unsupervised methods construct a target result vector
a pool of solutions.                                                            from unfiltered results and then sequentially select individ-
                                                                                ual results that somehow fit to the target vector while be-
                                                                                ing different from already selected solutions. The “Greedy
                                                                                ensemble” approach [23] fails in generating a good target
                                                                                vector by selecting a percentage of top instances for each
                                                                                ensemble candidate to compose the vector. This procedure
                                                                                normally selects many inliers to fit the target vector due
                                                                                to the imbalance between few outliers and many inliers. To
                                                                                solve this problem, “SelectV” approach [18] generate the tar-
Proceedings of the VLDB 2018 PhD Workshop, August 27, 2018. Rio de
Janeiro, Brazil.Copyright (C) 2018 for this paper by its authors. Copying       get vector by taking the average over all outlier detection re-
permitted for private and academic purposes..                                   sults (possible ensemble members). The main problem with


                                                                            1
“SelectV” is that the algorithm does not consider diversity            Algorithm 1 BoostSelect
as an important factor in selecting ensemble members.                  Input: P := set of normalized outlier score lists,
                                                                       d := drop rate (percentage), t := threshold (percentage),
3.    BOOSTING FOR ENSEMBLE SELECTION                                  combination := combination technique
   Starting from the ideas discussed for the “Greedy ensem-            Output: E := ensemble members
ble” [23] and for “SelectV” [18], we propose here an im-                1: W := [n], E := ∅
proved outlier ensemble selection method that is amenable               2: target := combination(P )         . Generating the target
to the application of boosting techniques. Boosting is well                vector
studied in supervised contexts [22]. We design and apply                3: target := convertBinary(target, t) . Top K = bn · tc
an equivalent technique in the unsupervised setting, to se-                scores h← 1, others ← 0         i
                                                                                          1            1
lect good components for an ensemble of outlier detectors,              4: W := out = 2K     , in = 2(n−K)          . K = number of
resulting in our method BoostSelect.                                       outliers, n = size
                                                                        5: Sort P by weighted Pearson Correlation (wP C) to
3.1    Construction of the Target Vector                                   target                                 . Descending order
   As a prerequisite for the combination of different outlier           6: f := getF irst(P )                    . Remove f from P
score lists (i.e., individual results, potential ensemble mem-          7: E := E ∪ f
bers), we normalize the scores following established proce-             8: while P 6= ∅ do
dures [10]. The target vector is constructed by combining               9:    curr := combination(E)            . Current prediction
the scores of all available results. Different combination             10:     sort P by wP C to curr              . Ascending order
methods could be used here, without further assumptions                11:     f := getF irst(P )                . Remove f from P
taking the average score is the most natural approach [31],            12:     if    wP C(combination(E ∪ f ), target)             >
i.e., the target vector lists the average scores of all individ-           wP C(curr, target) then
ual results for each data object. From this target vector, we          13:         E := E ∪ f                . Include into ensemble
preliminarily assume the top bn · tc objects (ranked by their          14:         Boosting(W, target, f, t, d) . Adapt the weights
combined score) to be outliers, where n is the dataset size            15:     end if
and 0 < t  1 is a parameter capturing the expected per-               16: end while
centage of outliers in the dataset (i.e., there are K = bn · tc
outliers assumed to be present). The target vector thus be-
comes a binary vector, listing 1 for an (alleged) outlier and          Algorithm 2 Boosting
0 for an (alleged) inlier and serves as pseudo ground truth            Input: W := weight vector, target := target vector, f :=
for the boosting approach to ensemble selection.                       new ensemble member, t := threshold (percentage), d :=
                                                                       drop rate (percentage)
3.2    Weights and Ensemble Diversity                                  Output: W := Updated weights
   Weighted Pearson correlation has been proposed as a sim-             1: outliers := convertBinary(f, t)
ilarity measure for outlier rankings [23]. We follow the pro-           2: for i ∈ 1 : size(target) do
cedure of Schubert et al. [23], setting weights for Pearson             3:    if target(i) == 1 & outliers(i) == 1 then
correlation to outliers and inliers. Different from previous            4:        W (i) := W (i) ∗ d
approaches, though, these values are only the initial weights.          5:    end if
The weights will be updated by the boosting procedure.                  6: end for
   The potential ensemble members are sorted according to
their weighted Pearson correlation to the target vector. The
candidate that is most similar to the target vector is chosen
                                                                       member, while very easy outliers that have been detected by
as the first ensemble member.
                                                                       many ensemble members already will get assigned smaller
   Remaining potential ensemble members are iteratively re-
                                                                       and smaller weights.
sorted in ascending order according to their similarity to the
                                                                         Algorithm 1 lists the steps of the overall framework Boost-
current prediction of the ensemble, resulting in a preference
                                                                       Select in pseudo code. The boosting procedure is detailed
for the most different (i.e., most complementary) additional
                                                                       in Algorithm 2.
ensemble members. Potential members are included if their
inclusion would increase the similarity of the ensemble pre-
diction to the target vector, otherwise they are discarded.            4.    EXPERIMENTS
If the correlation improves, the ensemble is updated and
the remaining lists are re-sorted by their weighted Pearson
correlation to the updated prediction.
                                                                       4.1   Datasets
                                                                          For evaluation, we use a benchmark data repository for
3.3    Boosting Procedure                                              outlier detection [3]. The repository is based on 23 basic
   The boosting is performed upon the inclusion of a new               datasets, processed in different ways mainly to provide vari-
member into the ensemble. The idea is to reduce the weights            ants with different percentage of outliers and with different
for those outliers that have already been identified by any            handling of dataset characteristics such as duplicates, at-
ensemble member. The weights are reduced by some speci-                tribute normalization, and categorical values. As suggested
fied parameter 0 < d < 1 (drop rate).                                  for analysis [3], we focus on the normalized datasets without
   The boosting effect is that the selection will prefer to in-        duplicates, which leaves us with 422 dataset variants.
clude such additional ensemble members that detect those
outliers that have not yet been detected by any ensemble               4.2   Ensemble Members

                                                                   2
                                                                                RG    0    161   116           70           81           137          89           126           160           118          137          35      291      127       130         127         127         136         134          146          135         146         143
   As basic outlier detection results (i.e., potential ensem-                   RS 261      0    131           81           94           143          97           159           172           141          172          48      317      160       166         173         167         174         179          185          181         185         188

ble members) we use the results provided along with the                 RBS_D25T05   306   291    0            134          153          168          141          166           191           162          183          74      304      174       153         150         155         155         162          175          162         164         175
                                                                                                                                                                                                                                                                                                                                                                  350



datasets [3], testing 12 neighborhood-based outlier detec-              RBS_D25T10

                                                                        RBS_D25T15
                                                                                     352

                                                                                     341
                                                                                           341

                                                                                           328
                                                                                                 288

                                                                                                 269
                                                                                                                0

                                                                                                               222
                                                                                                                            200

                                                                                                                             0
                                                                                                                                         264

                                                                                                                                         265
                                                                                                                                                      155

                                                                                                                                                      193
                                                                                                                                                                   223

                                                                                                                                                                   193
                                                                                                                                                                                 259

                                                                                                                                                                                 250
                                                                                                                                                                                               205

                                                                                                                                                                                               205
                                                                                                                                                                                                            237

                                                                                                                                                                                                            224
                                                                                                                                                                                                                         82

                                                                                                                                                                                                                         81
                                                                                                                                                                                                                                 322

                                                                                                                                                                                                                                 307
                                                                                                                                                                                                                                          208

                                                                                                                                                                                                                                          190
                                                                                                                                                                                                                                                    176

                                                                                                                                                                                                                                                    164
                                                                                                                                                                                                                                                                179

                                                                                                                                                                                                                                                                164
                                                                                                                                                                                                                                                                            189

                                                                                                                                                                                                                                                                            159
                                                                                                                                                                                                                                                                                        180

                                                                                                                                                                                                                                                                                        169
                                                                                                                                                                                                                                                                                                    194

                                                                                                                                                                                                                                                                                                    180
                                                                                                                                                                                                                                                                                                                 218

                                                                                                                                                                                                                                                                                                                 192
                                                                                                                                                                                                                                                                                                                              185

                                                                                                                                                                                                                                                                                                                              170
                                                                                                                                                                                                                                                                                                                                          199

                                                                                                                                                                                                                                                                                                                                          188
                                                                                                                                                                                                                                                                                                                                                      217

                                                                                                                                                                                                                                                                                                                                                      192
tion algorithms changing the neighborhood size k from 1 to              RBS_D50T05   285   279   254           158          157           0           136          176           188           149          176          69      309      178       154         147         157         155         164          178          164         170         177
                                                                                                                                                                                                                                                                                                                                                                  300


100. The algorithms are: KNN [17], KNNW [1], LOF [2],                   RBS_D50T10   333   325   281           267          229          286           0           236           258           208          232          77      320      198       172         177         186         177         188          214          181         196         213

                                                                        RBS_D50T15   296   263   256           199          229          246          186           0            223           185          175          69      292      155       138         149         143         146         163          156          145         162         163
SimplifiedLOF [25], LoOP [9], LDOF [28], ODIN [6], FastA-               RBS_D75T05   262   250   231           163          172          234          164          199            0            145          170          53      313      166       150         148         157         153         167          170          160         170         172
                                                                                                                                                                                                                                                                                                                                                                  250


BOD [11], KDEOS [24], LDF [12], INFLO [7], and COF [26].                RBS_D75T10   304   281   260           217          217          273          214          237           277            0           207          69      312      180       161         165         173         168         176          197          169         176         194

                                                                                     285   250   239           185          198          246          190          247           252           215           0           66      291      152       129         139         135         141         155          149          138         157         155
For LDOF and KDEOS, k must be larger than 1, for FastA-                 RBS_D75T15

                                                                              Naive 387    374   348           340          341          353          345          353           369           353          356           0      333      223       194         202         198         198         213          230          203         221         229
                                                                                                                                                                                                                                                                                                                                                                  200


BOD, k must be larger than 2, resulting in 1196 results per                  Greedy 131    105   118           100          115          113          102          130           109           110          131          89       0       111       112         121         126         117         119          115          116         108         112

dataset (less on some small datasets where k cannot reach                    SelectV 295   262   248           214          232          244          224          267           256           242          270          199     311       0        189         179         175         186         183          179          181         187         186
                                                                                                                                                                                                                                                                                                                                                                  150
                                                                         BS_D25T05 292     256   269           246          258          268          250          284           272           261          293          228     310      233        0          215         218         142         207          208          175         213         217
100). These results compose the set of potential ensemble                BS_D25T10 295     249   272           243          258          275          245          273           274           257          283          220     301      243       207          0          196         200         150          201          213         181         220

members.                                                                 BS_D25T15 295     255   267           233          263          265          236          279           265           249          287          224     296      247       204         226          0          192         199          162          204         199         197
                                                                                                                                                                                                                                                                                                                                                                  100
                                                                         BS_D50T05 286     248   267           242          253          267          245          276           269           254          281          224     305      236       280         222         230          0          214          215          157         207         214
   The outlier scores of these results are processed (following          BS_D50T10 288     243   260           228          242          258          234          259           255           246          267          209     303      239       215         272         223         208          0           202          203         166         217

Kriegel et al. [10]) by applying an inverse logarithmic scaling          BS_D50T15 276     237   247           204          230          244          208          266           252           225          273          192     307      243       214         221         260         207         220           0           200         192         197
                                                                                                                                                                                                                                                                                                                                                                  50

on FastABOD results and an inverse linear scaling on ODIN                BS_D75T05 287

                                                                         BS_D75T10 276
                                                                                           241

                                                                                           237
                                                                                                 260

                                                                                                 258
                                                                                                               237

                                                                                                               223
                                                                                                                            252

                                                                                                                            234
                                                                                                                                         258

                                                                                                                                         252
                                                                                                                                                      241

                                                                                                                                                      226
                                                                                                                                                                   277

                                                                                                                                                                   260
                                                                                                                                                                                 262

                                                                                                                                                                                 252
                                                                                                                                                                                               253

                                                                                                                                                                                               246
                                                                                                                                                                                                            284

                                                                                                                                                                                                            265
                                                                                                                                                                                                                         219

                                                                                                                                                                                                                         201
                                                                                                                                                                                                                                 306

                                                                                                                                                                                                                                 314
                                                                                                                                                                                                                                          241

                                                                                                                                                                                                                                          235
                                                                                                                                                                                                                                                    247

                                                                                                                                                                                                                                                    209
                                                                                                                                                                                                                                                                209

                                                                                                                                                                                                                                                                241
                                                                                                                                                                                                                                                                            218

                                                                                                                                                                                                                                                                            223
                                                                                                                                                                                                                                                                                        265

                                                                                                                                                                                                                                                                                        215
                                                                                                                                                                                                                                                                                                    219

                                                                                                                                                                                                                                                                                                    256
                                                                                                                                                                                                                                                                                                                 222

                                                                                                                                                                                                                                                                                                                 230
                                                                                                                                                                                                                                                                                                                               0

                                                                                                                                                                                                                                                                                                                              201
                                                                                                                                                                                                                                                                                                                                          221

                                                                                                                                                                                                                                                                                                                                           0
                                                                                                                                                                                                                                                                                                                                                      222

                                                                                                                                                                                                                                                                                                                                                      227
results, since FastABOD and ODIN give inverse score results              BS_D75T15 279     234   247           205          230          245          209          259           250           228          267          193     310      236       205         202         225         208         205          225          200         195          0
                                                                                                                                                                                                                                                                                                                                                                  0
(i.e., the lower the scores, the higher is the chance of an




                                                                                                                                                                                                                                                    BS_D25T05

                                                                                                                                                                                                                                                                BS_D25T10

                                                                                                                                                                                                                                                                            BS_D25T15

                                                                                                                                                                                                                                                                                        BS_D50T05

                                                                                                                                                                                                                                                                                                     BS_D50T10

                                                                                                                                                                                                                                                                                                                  BS_D50T15

                                                                                                                                                                                                                                                                                                                              BS_D75T05

                                                                                                                                                                                                                                                                                                                                          BS_D75T10

                                                                                                                                                                                                                                                                                                                                                      BS_D75T15
                                                                                     RG

                                                                                           RS




                                                                                                                                                                                                                         Naive

                                                                                                                                                                                                                                 Greedy

                                                                                                                                                                                                                                          SelectV
                                                                                                  RBS_D25T05

                                                                                                               RBS_D25T10

                                                                                                                            RBS_D25T15

                                                                                                                                         RBS_D50T05

                                                                                                                                                      RBS_D50T10

                                                                                                                                                                    RBS_D50T15

                                                                                                                                                                                  RBS_D75T05

                                                                                                                                                                                               RBS_D75T10

                                                                                                                                                                                                            RBS_D75T15
observation to be an outlier). Then a simple linear scaling
from 0 to 1 is applied to transform all scores into the same
range.                                                                  Figure 1: Summarization of pairwise comparisons
                                                                        over all 422 datasets. The number counts the wins
4.3    Competitors and Settings                                         in term of ROC AUC (average ROC AUC in case
   We compare BoostSelect against the Greedy [23] and Se-               of random ensembles) of the ensemble listed in the
lectV [18] ensembles. We also generate a “Naı̈ve” ensemble              row against the ensemble listed in the column.
and Random ensembles as baselines. The “Naı̈ve” ensemble
is a combination of all individual outlier results (i.e., a full
ensemble without selection procedure).                                  a large margin all random ensemble approaches, but still has
   For each instance of an ensemble selection strategy (Greedy,         more losses when compared to BoostSelect. Even though
SelectV, and BoostSelect, respectively, on each dataset), we            neither the threshold t nor the drop rate d has a strong
generate 1000 “Random” ensembles consisting of the same                 impact on wins, setting a relatively large drop rate and a
number of members as the corresponding selective ensemble,              relatively small threshold overall seems to be a good choice
where the ensemble members are randomly selected.                       of parameters for BoostSelect, although the optimal param-
   We used the Greedy ensemble rate parameter as 0.01, as               eter choice differs from dataset to dataset.
suggested by the authors of the Greedy ensemble [23]. We                   Looking at the top left quadrant of the heat map (Fig-
test a range of parameters for BoostSelect: d = [0.25, 0.5, 0.75]       ure 1), where the random ensembles compete against them-
and t = [0.05, 0.1, 0.15].                                              selves, we also see a broad dominance by the random ensem-
   As combination technique for ensembles we use the aver-              bles based on BoostSelect. This suggests that the number of
age score.                                                              ensemble members selected by BoostSelect is a better choice
                                                                        than those selected by the other strategies.
4.4    Results
   Figure 1 shows pairwise comparisons between all ensem-
bles over all datasets, considering the ROC AUC evaluation              5.         CONCLUSION AND FUTURE DIRECTIONS
measure (area under the curve of the receiver operating char-              We proposed a new ensemble selection strategy for unsu-
acteristic). We compare the ensemble selection techniques               pervised outlier detection ensembles, using the unsupervised
“Naı̈ve”, “Greedy”, “SelectV”, and “BS” (BoostSelect). We               equivalent to a boosting strategy for ensemble selection. Ex-
include random ensembles for each ensemble selection strat-             periments show the favorable behavior of the new ensemble
egy and for each parametrization of BoostSelect: RG (Ran-               selection strategy compared to existing methods (Greedy
dom Greedy), RS (Random SelectV), RBS (Random Boost-                    and SelectV) on a large set of benchmark datasets. Main
Select). The numbers represent on how many datasets the                 differences between our method BoostSelect, the Greedy en-
ensemble listed in the row has performed better than the                semble, and SelectV can be attributed to a different way of
ensemble listed in the column. Numbers representing the                 focusing on diversity and accuracy of ensemble members.
majority (more than 50%) of the datasets are white, smaller             Greedy goes all out for diversity and mostly disregards ac-
numbers black. The larger the number, the darker is its                 curacy, while SelectV ignores diversity and maximizes accu-
background. For the random ensemble, we take the average                racy of the ensemble members. Our new method BoostS-
performance over the 1000 instances.                                    elect considers both, diversity and accuracy, in a balanced
   The best overall method is BoostSelect with d = 0.75 and             manner and performs competitively on average over a large
t = 0.05, which has only more losses than wins when com-                selection of benchmark datasets with strong improvements
peting against BoostSelect with d = 0.25 and t = 0.1. The               on many of the benchmark datasets.
Greedy ensemble does not perform well in general, having                   The behavior of BoostSelect is robust to the parameters
more losses than wins against every other competitor. Se-               on many datasets but depends strongly on the choice of pa-
lectV is better than all random variants and Greedy, but                rameters on some datasets. As future work, we are specially
worse than Naı̈ve and worse than all BoostSelect results.               interested on this behavior and potential relation to prop-
The Naı̈ve ensemble behaves very consistently, as it beats by           erties of the datasets. We are also interested to improve


                                                                    3
the target vector generation step and to include a complete         [18] S. Rayana and L. Akoglu. Less is more: Building
study over the combination step of outlier detection ensem-              selective anomaly ensembles. ACM TKDD,
ble.                                                                     10(4):42:1–42:33, 2016.
                                                                    [19] S. Rayana, W. Zhong, and L. Akoglu. Sequential
Acknowledgments This work was partially supported by                     ensemble learning for outlier detection: A
CAPES - Brazil, Fapemig, CNPq, and by projects InWeb,                    bias-variance perspective. In Proc. ICDM, pages
MASWeb, EUBra-BIGSEA, INCT-Cyber, and Atmosphere.                        1167–1172, 2016.
                                                                    [20] L. Rokach. Ensemble-based classifiers. Artif. Intell.
6.   REFERENCES                                                          Rev., 33:1–39, 2010.
                                                                    [21] M. Salehi, X. Zhang, J. C. Bezdek, and C. Leckie.
 [1] F. Angiulli and C. Pizzuti. Fast outlier detection in               Smart sampling: A novel unsupervised boosting
     high dimensional spaces. In Proc. PKDD, pages 15–26,                approach for outlier detection. In AI 2016: Advances
     2002.                                                               in Artificial Intelligence - 29th Australasian Joint
 [2] M. M. Breunig, H.-P. Kriegel, R. Ng, and J. Sander.                 Conference, Hobart, TAS, Australia, December 5-8,
     LOF: Identifying density-based local outliers. In Proc.             2016, Proceedings, pages 469–481, 2016.
     SIGMOD, pages 93–104, 2000.                                    [22] R. E. Schapire and Y. Freund. Boosting. Foundations
 [3] G. O. Campos, A. Zimek, J. Sander, R. J. G. B.                      and Algorithms. MIT Press, Cambridge, MA, 2012.
     Campello, B. Micenkov, E. Schubert, I. Assent, and             [23] E. Schubert, R. Wojdanowski, A. Zimek, and H.-P.
     M. E. Houle. On the evaluation of unsupervised outlier              Kriegel. On evaluation of outlier rankings and outlier
     detection: Measures, datasets, and an empirical study.              scores. In Proc. SDM, pages 1047–1058, 2012.
     Data Min. Knowl. Disc., 30:891–927, 2016.                      [24] E. Schubert, A. Zimek, and H.-P. Kriegel. Generalized
 [4] R. Caruana, A. Niculescu-Mizil, G. Crew, and                        outlier detection with flexible kernel density estimates.
     A. Ksikes. Ensemble selection from libraries of models.             In Proc. SDM, pages 542–550, 2014.
     In Proc. ICML, 2004.
                                                                    [25] E. Schubert, A. Zimek, and H.-P. Kriegel. Local
 [5] J. Gao and P.-N. Tan. Converting output scores from                 outlier detection reconsidered: a generalized view on
     outlier detection algorithms into probability estimates.            locality with applications to spatial, video, and
     In Proc. ICDM, pages 212–221, 2006.                                 network outlier detection. Data Min. Knowl. Disc.,
 [6] V. Hautamki, I. Krkkinen, and P. Frnti. Outlier                     28(1):190–237, 2014.
     detection using k-nearest neighbor graph. In Proc.             [26] J. Tang, Z. Chen, A. W.-C. Fu, and D. W. Cheung.
     ICPR, pages 430–433, 2004.                                          Enhancing effectiveness of outlier detections for low
 [7] W. Jin, A. K. H. Tung, J. Han, and W. Wang.                         density patterns. In Proc. PAKDD, pages 535–548,
     Ranking outliers using symmetric neighborhood                       2002.
     relationship. In Proc. PAKDD, pages 577–593, 2006.             [27] G. Tsoumakas, I. Partalas, and I. Vlahavas. An
 [8] E. Kirner, E. Schubert, and A. Zimek. Good and bad                  ensemble pruning primer. In O. Okun and
     neighborhood approximations for outlier detection                   G. Valentini, editors, Applications of Supervised and
     ensembles. In Proc. SISAP, pages 173–187, 2017.                     Unsupervised Ensemble Methods, pages 1–13. Springer,
 [9] H.-P. Kriegel, P. Krger, E. Schubert, and A. Zimek.                 2009.
     LoOP: local outlier probabilities. In Proc. CIKM,              [28] K. Zhang, M. Hutter, and H. Jin. A new local
     pages 1649–1652, 2009.                                              distance-based outlier detection approach for scattered
[10] H.-P. Kriegel, P. Krger, E. Schubert, and A. Zimek.                 real-world data. In Proc. PAKDD, pages 813–822,
     Interpreting and unifying outlier scores. In Proc.                  2009.
     SDM, pages 13–24, 2011.                                        [29] Z. Zhou, J. Wu, and W. Tang. Ensembling neural
[11] H.-P. Kriegel, M. Schubert, and A. Zimek.                           networks: Many could be better than all. Artif. Intell.,
     Angle-based outlier detection in high-dimensional                   137(1-2):239–263, 2002.
     data. In Proc. KDD, pages 444–452, 2008.                       [30] Z.-H. Zhou. Ensemble Methods. Foundations and
[12] L. J. Latecki, A. Lazarevic, and D. Pokrajac. Outlier               Algorithms. CRC Press, 2012.
     detection with kernel density functions. In Proc.              [31] A. Zimek, R. J. G. B. Campello, and J. Sander.
     MLDM, pages 61–75, 2007.                                            Ensembles for unsupervised outlier detection:
[13] A. Lazarevic and V. Kumar. Feature bagging for                      Challenges and research questions. SIGKDD Explor.,
     outlier detection. In Proc. KDD, pages 157–166, 2005.               15(1):11–22, 2013.
[14] F. T. Liu, K. M. Ting, and Z.-H. Zhou. Isolation-based         [32] A. Zimek, R. J. G. B. Campello, and J. Sander. Data
     anomaly detection. ACM TKDD, 6(1):3:1–39, 2012.                     perturbation for outlier detection ensembles. In Proc.
[15] D. D. Margineantu and T. G. Dietterich. Pruning                     SSDBM, pages 13:1–12, 2014.
     adaptive boosting. In Proc. ICML, pages 211–218,               [33] A. Zimek, M. Gaudet, R. J. G. B. Campello, and
     1997.                                                               J. Sander. Subsampling for efficient and effective
[16] H. V. Nguyen, H. H. Ang, and V. Gopalkrishnan.                      unsupervised outlier detection ensembles. In Proc.
     Mining outliers with ensemble of heterogeneous                      KDD, pages 428–436, 2013.
     detectors on random subspaces. In Proc. DASFAA,
     pages 368–383, 2010.
[17] S. Ramaswamy, R. Rastogi, and K. Shim. Efficient
     algorithms for mining outliers from large data sets. In
     Proc. SIGMOD, pages 427–438, 2000.


                                                                4