<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Partial Weighted Set Cover Problem with Applications to Outlier Detection and Clustering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sebastian Bothe</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tamas Horvath</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Computer Science, University of Bonn</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fraunhofer IAIS</institution>
          ,
          <addr-line>Schloss Birlinghoven, 53754 St. Augustin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We de ne the partial weighted set cover problem, a generic combinatorial optimization problem, that includes some classical data mining problems as special cases. We prove that it is computationally intractable and give a local search algorithm for this problem. As application examples, we then show how to translate clustering and outlier detection problems into this generic problem. Our experiments on synthetic and real-world datasets indicate that the quality of the solution produced by the generic local search algorithm is comparable to that obtained by state-of-the-art clustering and outlier detection algorithms.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>(i) In the rst step we proceed as follows: For each of the k sets, we select one of
its supersets from S that, together with the remaining k 1 sets, maximizes
the utility function and ful lls the following conditions: The new utility value
is greater than the old one and the new candidate set does not cover any
new element already contained by any of the other k 1 sets. If all these
conditions are satis ed, we replace the set by this maximal superset selected.
(ii) In the second step we then select O (log k) sets from the k sets obtained after
step (i) uniformly and independently at random. For each set selected, we
replace it either with one of its direct subsets (i.e., maximal proper subsets)
or direct supersets (i.e., minimal proper supersets) in S selected uniformly
at random. If the new k sets obtained in this way have a better utility or
they have the same utility and the outcome of a biased coin ip is head, we
keep the new k sets; otherwise we use the ones obtained after step (i).
Regarding (i), note that by construction, if at least one of the k sets will be
changed after this step, we have a strict increase in the utility. Furthermore, this
step is greedy, as the k sets are processed separately. Regarding (ii), this step
may not result in strict increase in the utility with a certain probability speci ed
by the user.</p>
      <p>In the second part of the paper, we consider set systems S over some nite
set S Rd. More precisely, a subset of S belongs to S if and only if it can be
realized by a d-dimensional ball of radius r around a point p 2 S, where r is
the element of a nite geometric progression for some scale factor speci ed by
the user. The relative weights of the elements in a ball are de ned by a function
monotonically decreasing in their distances to the center. If a set can be realized
by more than one ball, we take the ball with the highest weight. Using these
weights, the reward term for a family of k sets in S is de ned as the sum of their
weights. De ning the generality of a set by the radii of the ball representing it,
the penalty terms are given by the sum of the radii of the k balls and by the sum
of all relative weights of the elements except for the highest one.</p>
      <p>Using the set system with the utility function as well as the generic algorithm
sketched above, we arrive at a combinatorial optimization problem and at an
algorithm solving it that are highly relevant e.g. for (soft) clustering and outlier
detection problems. In fact, as we experimentally demonstrate on synthetic and
real-world data, our empirical results on these two particular data mining
problems are comparable to those obtained by state-of-the-art algorithms. This is
especially remarkable because, as we will discuss in detail, the balls inducing the
set system are split into concentric annuli and two points belonging to the same
ball have the same relative weight if and only if they belong to the same annulus.
According to our experimental results, this type of distance discretization does
not seem to have a strong impact on the quality of the output.</p>
      <p>One of the main advantages of our approach is its generality: After the
domain dependent step of specifying the set system and the utility function for a
broad class of problems, we can solve the problem with a domain independent
algorithm. Further potential advantages will be discussed in the last section.</p>
      <p>The rest of the paper is organized as follows. In Section 2 we formally de ne
the partial weighted set cover problem, show its computational intractability,
and give a local search algorithm for approximating its solution. In Section 3
we adapt our generic approach to set systems de ned by d-balls and empirically
demonstrate the usefulness of our method on clustering and outlier detection
problems. Finally, in Section 4 we discuss our results and present some interesting
problems for future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The Partial Weighted Set Cover Problem</title>
      <p>In this section we de ne the partial weighted set cover problem, show that it
is computationally intractable, and present a generic local search algorithm for
this problem.</p>
      <p>Let S be some nite set and S 2S be a set system over S. We assume that
there is a function wX : X ! R 0 for all X 2 S, where R 0 denotes the set of
non-negative real numbers. Thus, for all sets X in S and for all x 2 X, wX (x)
de nes the relative weight of x with respect to X. In addition to the relative
weights on the elements of X, we assume that there are two set functions W
and G, both mapping S to R 0. While</p>
      <p>W (X) = X wX (x)</p>
      <p>x2X
de nes the weight for all X 2 S, G(X) speci es the generality of X. In the
applications we consider, G(X) will be de ned by the size of X, for some appropriate
notion of size depending on the particular problem at hand. Using the de nitions
above we are ready to de ne the following combinatorial optimization problem
considered in this work:
The Partial Weighted Set Cover (PWSC) Problem: Given a weighted set
system S over some nite set as de ned above, a positive integer k, and a
function PO : Sk ! R 0, nd</p>
      <p>argmax
Sk S;jSkj=k</p>
      <p>U (Sk) ;
with</p>
      <p>U (Sk) =</p>
      <p>X W (X)</p>
      <p>X G(X)
X2Sk</p>
      <p>X2Sk</p>
      <p>PO(Sk)
(1)
In the de nition above, PO is used to penalize the elements of S that are covered
by more than one set from Sk. Thus, our goal is to select k sets from S with
maximum weights subject to the constraints that the sets must be as speci c
as possible and the overlap amongst the k sets must be as small as possible.
There are many problems that can be regarded as special cases of the PWSC
problem. As an example, in the next section we show that soft clustering and
outlier detection problems can be viewed as such special cases, allowing these
classical data mining problems to be translated into combinatorial optimization
problems.</p>
      <p>Before giving our algorithm for the PWSC problem, we rst discuss its
complexity. Not surprisingly, the PWSC problem is computationally intractable.
This is stated in the proposition below.</p>
      <p>Proposition 1. The PWSC problem is NP-hard.</p>
      <p>Proof. We prove the claim by using the following polynomial reduction from the
decision version of the set cover problem1. Let S be a set system over some nite
ground set S and de ne wX (x) = 1, W (X) = jXj, G(X) = 0, and
PO(Sk) =</p>
      <p>X jY j
Y 2Sk
[ Y
Y 2Sk
for all X 2 S, for all x 2 X, and for all Sk S with jSkj = k. For the output
Sk of the PWSC problem for the weighted set system constructed we have that
U (Sk) = jSj if and only if there exist k sets in Sk that cover S. tu</p>
      <p>To overcome the computational limitation stated in Proposition 1 above, we
resort to a local search algorithm for nding some approximate solution of the
PWSC problem (see Alg. 1). The parameters of the algorithm are a weighted set
system S over some nite set S, a positive integer k, and a set function PO on
Sk. In lines 1 and 2 of the main algorithm, we greedily select k sets maximizing
the utility function given in (1). Then, until some termination condition holds,
we iteratively call two update functions (lines 3{6).</p>
      <p>The input to the rst update function (Update A) is a family Sk S with
Sk = fS1; : : : ; Skg. For every i = 1; : : : ; k, it selects a proper superset Si0 of
Si from S such that the replacement of Si by Si0 in Sk maximizes the utility
function. If the utility of the k sets after the replacement is greater than that
of Sk, we take the new con guration and continue the process with the next set
(lines 2{4 of Update A). Note that this function is greedy, as it updates the k
sets in Sk separately.</p>
      <p>In function Update B we select O (log k) sets uniformly at random from the
input family Sk = fS1; : : : ; Skg (line 3 of Update B) and try to shrink/enlarge
them without any decrease in the utility. More precisely, for each set Si 2 Sk
selected, we rst calculate the family F1 of maximal proper subsets (resp. the
family F2 of minimal proper supersets) of Si in S that minimize the L1-distance
of the relative weights of the elements belonging to the intersection (line 5 resp.
line 6). We then select a set from F1 [ F2 uniformly at random (line 7) and
replace Si in Sk by the set selected. After having processed all O (log k) sets in
this way, we compare the utility of the new k sets with that of the input ones.
We keep the new con guration if its utility is greater or it has the same utility
and the outcome of a biased coin ip is Head (lines 9{11). The rationale behind
the de nition of F1 and F2 is that we would like to avoid big steps during local
search.
1 The decision version of the set cover problem is de ned as follows: Given a set system
S over some nite set S and a positive integer k, decide whether there exist k sets
in S that cover S. This problem is known to be NP-complete.</p>
      <sec id="sec-2-1">
        <title>Algorithm 1 Main</title>
        <p>Parameters: weighted set system S over some nite set S, k 2 N, and PO : Sk ! R 0
1: set S0 = ;
2: for i 2 [k] do Si = Si 1 [ f argmax U (Si 1 [ fXg)g</p>
        <p>X2SnSi 1</p>
        <p>F10 = fZ 2 S : Z ( Si and X jwZ (x)
x2Z</p>
        <p>wSi (x)j is minimumg
F20 = fZ 2 S : Z ) Si and X jwZ (x)</p>
        <p>wSi (x)j is minimumg
x2Si
7: select a set Si0 uniformly at random from F1 [ F2
8: set Sk0 = Sk n fSig [ fSi0g
9: if U (Sk0) &gt; U (S) then Sk = Sk</p>
        <p>0
10: else if U (Sk0) = U (S) then
11: set Sk = Sk0 if the outcome of a biased coin ip is Head
12: return Sk
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Applications</title>
      <p>
        To demonstrate the practical usefulness of our approach, in this section we
present the applications of the PWSC problem to two classical problems of
data mining: To clustering and outlier detection. In case of clustering, the task
is to identify subsets of observations (i.e., clusters) minimizing the inter-cluster
distances (i.e., the distance between instances within the same subset) and
maximizing the inter-cluster distances (i.e., the distance between clusters). Note that
this informal de nition applies to soft clustering as well. Regarding outlier
detection, we use the following de nition: \An outlier is an observation that deviates
so much from other observations as to arouse suspicion that it was generated by
a di erent mechanism" [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Thus, the goal of outlier detection is to distinguish
the set of outlier observations from that of the inlier ones. Similarly for example
to DBSCAN [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], we reduce the outlier detection problem to clustering; all
instances not belonging to any of the clusters are regarded as outliers. In order to
model the two problems above by the PWSC problem and to apply Algorithm 1,
we need to construct an appropriate weighted set system and de ne the reward
function PO.
3.1
      </p>
      <p>The PWSC Problem for Clustering and Outlier Detection
Both clustering and outlier detection use a concept of similarity between
observations. We consider the case that the observations form a nite set S Rd for
some d and that the similarity between observations is de ned by some metric
D on Rd. For each point P 2 S, we consider a set of d-balls around P for all
radii de ned by the elements of a nite geometric progression for some scale
factor. The set system S over S is then de ned by the family of subsets of S,
each covered by such a ball centered around a point P for some P 2 S. We will
refer to the resulting PWSC as BallCover.</p>
      <p>More precisely, we assume without loss of generality that the smallest
distance between two di erent points in S is 1, i.e.,</p>
      <p>de ning the scale factor 1 + , we de ne
Given some positive real number
L 2 N by
where R =</p>
      <p>max D(P1; P2). Thus,
P1;P22S</p>
      <p>min
P1;P22S;P16=P2</p>
      <p>D(P1; P2) = 1 :
L = log1+ R</p>
      <p>;
D(P; Q)
(1 + )L
for all P; Q 2 S. For all P 2 S, we determine an integer 0 LP L that gives
an upper bound on the set of balls of center P ; the algorithm calculating LP
is discussed below. Using these concepts, for S and above we de ne the set
system S over S by</p>
      <p>S = fSP;l : P 2 S; 0
l</p>
      <p>LP g
with</p>
      <p>SP;l = fP 0 2 S : D(P; P 0) &lt; (1 + )lg :
The de nitions imply that S 2 S and that SP;L = S for all P 2 S.</p>
      <p>For all P 2 S and for all l = 0; 1; : : : ; LP , we de ne the relative weights of
the instances in SP;l by
for all Q 2 SP;i n SP;i 1 and for all i = 0; 1; : : : ; l, where SP; 1 = ;. That is,
SP;l is partitioned into l + 1 annuli, where the 0th annulus is the point P , and
the relative weight of a point Q belonging to the ith annulus is 1=(i + 1), i.e., it
is inversely proportional to the distance of the annulus from P . In this way, we
disregard the exact distance for any two points belonging to the same annulus
in SP;l.</p>
      <p>We now specify the generality (G) and the penalty function (PO) for S.
Regarding the generality, we de ne it by</p>
      <p>G(SP;l) = (1 + )l
for all P 2 S and for all l = 0; 1; : : : ; LP , where 0 is some user speci ed
parameter. Regarding PO, let S0 be a subset of S and let S0 S be the set of
points contained by at least two sets in S0. Then PO(Sk) is de ned by
PO(Sk) = X</p>
      <p>X wX (x)
x2S0 X2S0
max wX (x)
X2S0
!
:
That is, according to the de nition of the utility function in (1), we subtract all
relative weights of a point x covered by more than one set, except for the highest
one. Finally we note that if a subset of S has more than one ball representation,
we take the ball (and the corresponding weights) that has the highest weight.</p>
      <p>It remains to discuss the determination of the upper bound LP for a point P 2
S. Since balls having large annuli of low density are poor choices for clustering,
we need to disregard them as candidates. To nd the maximum ball around
P that contains no annuli of low density, we observe the change in density
as a function of the radius while growing the ball. The density is measured
by the number of instances covered relative to the ball's radius. Accordingly,
we sort the instances by their distance to P . The position in this list then
provides the number of instances covered at the respective distance, giving rise
to a monotone function sampled at nitely many non-equidistant positions. Our
goal is to estimate the rst plateau of this unknown continuous density function.
To achieve this, we rst interpolate the function value at equidistant positions
using nearest neighbor interpolation. We then approximate the rst derivative by
folding the interpolated signal using a Sobel kern. Finally, we smooth the result
and determine the rst position that is numerically a zero point.This position
de nes the maximal radius LP we consider for the ball around P . Due to space
limitations we omit the formal de nition of this algorithm.
3.2</p>
      <sec id="sec-3-1">
        <title>Experiments</title>
        <p>
          To demonstrate the usefulness of our BallCover approach to the tasks of outlier
detection and clustering, we have conducted a series of experiments. We have
compared our results achieved on synthetic and real-world data from the UCI
machine learning repository[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] to those obtained by state-of-the-art algorithms.
The problems of outlier detection and clustering have been studied extensively in
the past. As a result, a wide range of di erent concepts and algorithms have been
proposed. For instance, there are outlier detection algorithms using information
theoretical criteria, spectral decomposition, clustering, proximity, or density. For
a recent overview of outlier detection algorithms, the reader is referred to [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>
          An exhaustive comparison to all relevant outlier detection and clustering
algorithms is beyond the scope of this work. Therefore, we focus only on
algorithms using density criterion to identify outliers or clusters, as these are the
most similar methods to the algorithm proposed in this paper. More precisely,
we consider the following algorithms:
Local outlier factor (LOF) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] The algorithm identi es outliers by
comparing the density of a point with that of its surrounding points. The size of
the surrounding neighborhood is speci ed by the user supplied parameter
M inP ts. Within the M inP ts neighborhood around a point, the local
outlier factor (LOF) is calculated as the average density of all points in the
neighborhood normalized by the points own densities. Points with density
much lower than their neighbors produce a high LOF value and are
considered outliers. For our experiments we use the implementation available with
the ELKI [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] toolkit. To identify the set of outliers, we order the instances
according to their LOF value and select the top n instances, where n is the
true number of outliers. In our experiments on synthetic data we set the
M inP ts parameter (i.e., the number of instances in a cluster) to 1000; other
choices (10; 20; 100) of this parameter have not lead to better results. For
the UCI datasets we follow [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and set M inP ts = 10.
        </p>
        <p>
          Support vector novelty detection (SVND) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] is an extension of support
vector machines (SVM) to the case of unlabeled data. In SVM the
maximal separating hyperplane is determined by the location of instances with
di erent labels in the feature space. However, there are no labels in SVND.
Therefore, the goal is to nd a simple subset of instances such that the
probability of an instance falling into this set meets a probability threshold
parameter . The boundary of the set is expressed in terms of a kernel
expansion and its complexity is controlled by empirical risk minimization. The
algorithm takes the probability threshold and the kernel as parameters.
For our experiments we set to the true fraction of outliers and use the
Gaussian kernel. The Gaussian kernel itself requires the speci cation of the
variance , which we set to the median distance of all points in the dataset,
following the recommendation in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. In our experiments we use the open
source implementation libsvm [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          DBSCAN [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] constructs a clustering by expanding clusters around dense points,
called core points. A point is dense if it has at least M inP ts neighbors in a
distance at maximum . All points in the neighborhood are recursively added
to the cluster as long as they have M inP ts neighbors. Points that do not
belong to any cluster are considered as outliers. For our experiments we use
the implementation available from sklearn [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], i.e., we leave the parameter
M inP ts at the default choice of 10 and set to the medium of pairwise
distances between two points in the data.
        </p>
        <p>
          Isolation Forest [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] constructs an ensemble of trees, by randomly choosing
attributes and splits at each inner node of a tree. The tree growing stops once
each instance is isolated in a single leaf or the tree exceeds a height
threshold. Each tree is grown on a random sample of data and the tree height is
restricted to the height of a binary tree with number of leaves equal to the
sample size. Each instance is scored by the expected average path length
between the tree root and the node containing the instance. Instances with
a short path length are isolated earlier from the rest of the data and are
considered as outliers. For our experiments we use the implementation available
in sklearn [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and keep all parameters to their defaults (ensemble size 100,
data sample size of 256). The authors propose instances with path length
measure much smaller than 0.5 to be considered as inliers. We therefore use
this threshold to determine the prediction of outliers.
        </p>
        <p>
          For our empirical evaluation, we need datasets with known ground truth, i.e.,
for which we know which instances are outliers within the data. Therefore, we
create synthetic datasets and use publicly available classi cation datasets from
the UCI repository for comparison. For the synthetic data, we place k-Gaussians
uniformly at random in a hypercube, each with a random diagonal co-variance
matrix. The centers may be generated close to each other and the resulting
distributions may have arbitrary overlap. We draw the same number of instances
from each Gaussian and add uniform noise over the hypercube extended by the
largest 3 . The parameters for the data we generated here are as follows: the
center coordinates of the Gaussians are drawn uniformly at random from the
interval [0; 20] and the variances 2 from [0; 1]. We generated ten 2-dimensional
datasets with four Gaussians, having 1000 samples each and added 20% uniform
noise samples as outliers. Regarding the real-world data from the UCI repository,
we follow the transformation used in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] to construct an outlier detection task
from the corresponding multi-class classi cation task, i.e., we consider classes
3,4,5,7,8,9,14,15 as outliers for the arrhythmia set, classes 1,2 for annthyroid.
The pima and ionosphere datasets are binary classi cation tasks, the minority
class are considered as outliers. We further take wilt and a random sample of
the adult dataset into account using the same proposition. Our selection includes
datasets of small up to large size (350-7000) and of low and high dimensions
(5274), to cover a broad range of application scenarios.
        </p>
        <p>We tested all algorithms described above by applying them to the full datasets
including the outliers during the training stage. The ground truth labels are not
presented to the learner; they are only used in the calculation of the performance
measure. We use F1-score, with the normal instances as positive class, to access
the quality, as it accounts for the class imbalance. For our algorithm, we use the
true number of balls with the synthetic datasets and report results for di erent
choices of k on the UCI datasets. Further, we x the parameters = 0:1 and
= 0:05 for all experiments. The results achieved for the di erent combinations
of the datasets and algorithms are summarized in Table 1. As we can see, the
performance of our approach depends on the choice of k, but in most cases, there
is a choice that matches the quality of the competitors (adult, ionosphere, pima,
wilt, synthetic data). Only for the annthyroid dataset, the performance of our
algorithm is clearly worse than that of any reference. In turn, we can report a
slightly better performance as the best competitor for the arrhythmia dataset.</p>
        <p>For the clustering application we use the synthetic Gaussian datasets. Each
of the Gaussians forms a separate cluster and each instance is labeled according
to it. The uniform noise represents an additional component and is identi ed as
another cluster id. For comparison, we use the DBSCAN and K-Means
clustering algorithms as reference. We follow the same parameter selection scheme
for DBSCAN as used within our outlier experiments. In contrast to DBSCAN,
the K-Means algorithm creates a complete partition of all instances, especially
without excluding any noise. Therefore we consider di erent choices of k setting
it at least to the true number of Gaussian plus one additional for the noise. We
treat the cluster id as target of a multiclass classi cation problem and assess the
performance in terms of the weighted average over the F1 scores for each of the
classes. To match the cluster id used by the algorithm with that of the ground
truth, we apply the following mapping: For the K-Means and DBSCAN
algorithms we assign to each cluster the ground truth label of the majority vote of
the instances in that cluster. For our algorithm, we use the center points to
select the ground truth label for each ball. The results indicate, that our algorithm
performs better than K-Means for small choices of k. Increasing k leads to
results comparable to our approach for most datasets. There are some cases where
K-Means performs better and some where our approach is slightly better (c.f.
Table 2). Further, across all sets used in this experiment, the performance of our
algorithm competes with the results of DBSCAN; in some cases, our algorithm
performs much better. A closer look at those cases reveals that our algorithm
has an advantage if the centers of the Gaussians are very close to each other,
resulting in large overlaps. In such cases, those clusters are merged by DBSCAN
and our algorithm is able to separate the corresponding data instances. An
example of this situation is depicted in Figure 1. The true association of points to
clusters is at the left. Note the two overlapping Gaussians at the middle bottom
region. Our algorithm seeks dense balls with low overlap and small radii and can
detect the structure correctly. For DBSCAN, the two Gaussians are too close
to each other and joined into one cluster.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>The approach presented in this paper is a rst step towards a systematic study of
its applications to other data mining/machine learning problems. The advantage
of translating such problems into the partial weighted set system problem is that
it allows for the application of techniques developed for combinatorial
optimization problems. For example, one might transform the underlying problem into
set systems of some advantageous structural properties (e.g., into matroids or
greedoids) that can be utilized by the algorithm. In this way, new tractable
subclasses of the problem could be identi ed. Another interesting research direction
is the restriction of the utility function to set function classes of advantageous
algorithmic properties.</p>
      <p>The distance discretization used in the applications considered in this work
raises the question whether our approach can be combined with
state-of-theart techniques improving the speed, such as, for example, with locality sensitive
hashing.</p>
      <p>
        Our utility function includes two penalty terms. One of them is concerned
with the generality of the elements in the set system. It is an interesting question
whether and if so, how can we control the complexity of the output via these
terms and the parameter k. Finally, we are going to investigate how to extend
our method to an interactive one, in which the set system and the utility function
are automatically adapted according to the feedback of an expert (c.f. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]).
Acknowledgments This research was supported by the EU FP7-ICT-2013-11
project under grant 619491 (FERARI).
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hawkins</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Identi cation of Outliers</article-title>
          .
          <source>Monographs on applied probability and statistics. Chapman and Hall</source>
          (
          <year>1980</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ester</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sander</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>X.:</given-names>
          </string-name>
          <article-title>A density-based algorithm for discovering clusters in large spatial databases with noise</article-title>
          .
          <source>In: Proc. of 2nd International Conference on Knowledge Discovery and. (</source>
          <year>1996</year>
          )
          <volume>226</volume>
          {
          <fpage>231</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Lichman</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>UCI machine learning repository (</article-title>
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Aggarwal</surname>
            ,
            <given-names>C.: Outlier</given-names>
          </string-name>
          <string-name>
            <surname>Analysis</surname>
          </string-name>
          . Springer New York (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Breunig</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
          </string-name>
          , R.T.,
          <string-name>
            <surname>Sander</surname>
          </string-name>
          , J.: Lof:
          <article-title>Identifying density-based local outliers</article-title>
          .
          <source>SIGMOD Rec</source>
          .
          <volume>29</volume>
          (
          <issue>2</issue>
          ) (May
          <year>2000</year>
          )
          <volume>93</volume>
          {
          <fpage>104</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Schubert</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Emrich</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , Zu e,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Schmid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.A.</given-names>
            ,
            <surname>Zimek</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>A framework for clustering uncertain data</article-title>
          .
          <source>PVLDB</source>
          <volume>8</volume>
          (
          <issue>12</issue>
          ) (
          <year>2015</year>
          )
          <year>1976</year>
          {
          <fpage>1979</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ting</surname>
            ,
            <given-names>K.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Z.H.</given-names>
          </string-name>
          :
          <article-title>Isolation forest</article-title>
          .
          <source>In: 2008 Eighth IEEE International Conference on Data Mining. (Dec</source>
          <year>2008</year>
          )
          <volume>413</volume>
          {
          <fpage>422</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Scholkopf,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Platt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.C.</given-names>
            ,
            <surname>Shawe-Taylor</surname>
          </string-name>
          , J.C.,
          <string-name>
            <surname>Smola</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williamson</surname>
          </string-name>
          , R.C.
          <article-title>: Estimating the support of a high-dimensional distribution</article-title>
          .
          <source>Neural Comput</source>
          .
          <volume>13</volume>
          (
          <issue>7</issue>
          )
          <issue>(</issue>
          <year>July 2001</year>
          )
          <volume>1443</volume>
          {
          <fpage>1471</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Caputo</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sim</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Furesjo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smola</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Appearance-based object recognition using svms: Which kernel should i use? Proc of NIPS workshop on Statistical methods for computational experiments in visual processing and computer vision</article-title>
          ,
          <year>Whistler 2002</year>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.J.:</given-names>
          </string-name>
          <article-title>LIBSVM: A library for support vector machines</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          <volume>2</volume>
          (
          <year>2011</year>
          )
          <volume>27</volume>
          :
          <fpage>1</fpage>
          {
          <fpage>27</fpage>
          :
          <fpage>27</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          )
          <volume>2825</volume>
          {
          <fpage>2830</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Boley</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mampaey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tokmakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wrobel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>One click mininginteractive local pattern discovery through implicit preference and performance learning</article-title>
          .
          <source>In: KDD 2013 Workshop on Interactive Data Exploration and Analytics (IDEA)</source>
          .
          <article-title>(</article-title>
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>