=Paper= {{Paper |id=Vol-2444/ialatecml_paper2 |storemode=property |title=Validating One-Class Active Learning with User Studies - a Prototype and Open Challenges |pdfUrl=https://ceur-ws.org/Vol-2444/ialatecml_paper2.pdf |volume=Vol-2444 |authors=Holger Trittenbach,Adrian Engelhardt,Klemens Böhm }} ==Validating One-Class Active Learning with User Studies - a Prototype and Open Challenges== https://ceur-ws.org/Vol-2444/ialatecml_paper2.pdf
Validating One-Class Active Learning with User
  Studies – a Prototype and Open Challenges

            Holger Trittenbach, Adrian Englhardt, and Klemens Böhm

             Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
         {holger.trittenbach, adrian.englhardt, klemens.boehm}@kit.edu



        Abstract. Active learning with one-class classifiers involves users in the
        detection of outliers. The evaluation of one-class active learning typically
        relies on user feedback that is simulated, based on benchmark data. This
        is because validations with real users are elaborate. They require the de-
        sign and implementation of an interactive learning system. But without
        such a validation, it is unclear whether the value proposition of active
        learning does materialize when it comes to an actual detection of out-
        liers. User studies are necessary to find out when users can indeed provide
        feedback. In this article, we describe important characteristics and pre-
        requisites of one-class active learning for outlier detection, and how they
        influence the design of interactive systems. We propose a reference ar-
        chitecture of a one-class active learning system. We then describe design
        alternatives regarding such a system and discuss conceptual and techni-
        cal challenges. We conclude with a roadmap towards validating one-class
        active learning with user studies.

        Keywords: Active learning · One-class classification · Outlier detection
        · User study.


1     Introduction

Active Learning is the paradigm to involve users in classifier training, to improve
classification results by means of feedback. An important application of active
learning is outlier detection. Here, one-class classifiers learn to discern inliers
from outliers. Examples are network security [17, 37] or fault monitoring [45]. In
these cases, one-class classifiers with active learning (OCAL) ask users to pro-
vide a binary label (“inlier” or “outlier”) for some of the observations. Then they
use these labels in subsequent training iterations to learn an accurate decision
boundary. OCAL differs from other active learning applications such as balanced
binary or multi-class classification. This is because the strategies to select ob-
servations for feedback (“query strategies”) take into account that outliers are
rare to non existent.
    Over the last years, several OCAL-specific query strategies have been pro-
posed. They focus on improving classification accuracy with minimal feedback [3,
15, 16, 18, 45]. To evaluate them experimentally, authors generally rely on data
sets with an available ground truth, in order to simulate user feedback. On the

    c 2019 for this paper by its authors. Use permitted under CC BY 4.0.


                                            17
Validating
 2          One-Class
         Trittenbach et Active
                        al.    Learning with User Studies

 one hand, this seems to be convenient, since it allows to evaluate algorithmic
 improvements without a user study. On the other hand, simulating user feed-
 back requires some assumptions on how users give feedback. These assumptions
 are generally implicit, and authors do not elaborated on them, since they have
 become widely accepted in the research community. In particular, there are two
 fundamental assumptions behind active learning:
    (Feedback) Users provide accurate feedback independently from the pre-
               sentation of the classification result and from the observation
               selected for feedback (the “query”).
  (Acceptance) Users do not question how the feedback provided changes
               the classification results. Their motivation to provide feed-
               back, even for many observations, does not hinge on their
               understanding of how their feedback influences the classifier.
 Think of an OCAL system that asks a user to provide a class label on a
 20-dimensional real-valued vector, where features are the result of some pre-
 processing, such as principal component analysis. We argue that, without further
 information, users are unlikely to comprehend the query, and cannot provide ac-
 curate feedback. This is in line with observations from literature [14,33]. Even if
 they could provide the feedback, any change in the classifier is not tangible. This
 is because there is no suitable visualization or description of a 20-dim decision
 boundary. We argue that users may question whether their feedback has a posi-
 tive effect on the classifier, or even any effect at all, lose interest, and eventually
 discontinue to provide feedback.
     This is a peculiar situation: On the one hand, the value proposition of active
 learning is to obtain helpful information from users that is not yet contained in
 the training data. On the other hand, there currently is no validation whether one
 can realize this value in an actual application. Such a validation would require
 to implement an interactive OCAL system and to conduct a user study.
     However, such an experimental validation is difficult, since there are sev-
 eral conceptual and technical issues. We have experienced this first hand, when
 we have looked at smart meter data of an industrial production side [7, 42] to
 identify unusual observations, and to collect respective ground truth labels from
 human experts. In our preliminary experiments with this data, we found that
 both active-learning assumptions do not hold in practice. In particular, we have
 observed that domain experts ask for additional information such as visualiza-
 tions and explanations that go way beyond a simple presentation of classification
 results. Since there is an over-reliance on active-learning assumptions, only little
 effort has been spent on making OCAL interpretable, comprehensible, and us-
 able. So it is unclear what the minimal requirements behind an OCAL system
 are to carry out a user study. Second, there are conceptual issues that are in
 the way of implementing OCAL systems. One issue is that the design space of
 OCAL systems is huge. It requires to define a learning scenario, to choose a
 suitable classifier and a learning strategy, as well as selecting multiple hyperpa-
 rameter values. In addition, there may be several conflicting objectives: One may
 strive to improve classification accuracy. Another objective may be to use OCAL




                                           18
Validating One-ClassValidating
                     Active Learning
                               One-Classwith User
                                         Active    Studies
                                                Learning with User Studies                                     3


  Backend                                                                  Frontend
     One-Class Classifiers                                                  Operator Interface




                                    OcalAPI.jl
                                                                                                    Operator




                                                      Application Server
       Query Strategies


        Explanations

                                                                           Participant Interface
                             Data


                    Experiments                                                                    Participants



                                             Fig. 1. System overview



 as an exploratory tool, to present users with as many interesting instances as
 possible. Another issue is that objectives of a user study are diverse. One may
 want to collect a reliable ground truth for a novel data set, or to evaluate spe-
 cific components of the active learning system, e.g., how well users respond to a
 particular visualization. Third, there are technical issues which we have experi-
 enced first hand when implementing an OCAL system prototype. For instance,
 runtimes of training state-of-the-art classifiers may be too long for interactivity.
 Another example is that it is unclear how to visualize decision boundaries in
 multi-dimensional data sets.
      Although there are many difficulties, we deem user studies imperative to
 understand the determining factors behind realizing the value of OCAL. These
 factors can serve as guidelines for data mining research and can eventually lead
 to a more differentiated evaluation of novel query strategies and classifiers. The
 objective of our article is to point out important characteristics and prerequisites
 of OCAL and how they influence the design of interactive systems. To our knowl-
 edge, this is the first overview on conceptual and technical challenges regarding
 OCAL systems. We derive these challenges based on an architectural sketch on
 the components of an existing OCAL system, which we have implemented as a
 prototype. We conclude by proposing a roadmap towards validating OCAL with
 user studies.


 2      OCAL System Architecture

 The purpose of an OCAL system is to facilitate experiments with several users.
 An experiment is a specific technical configuration, i.e., a data set, a classifier, a
 query strategy, and one or more users, the participants, who provide feedback.
    An OCAL system consists of several modules. Participants interact with
 the system through a participant interface that visualizes information on active




                                                                   19
Validating
 4          One-Class
         Trittenbach et Active
                        al.    Learning with User Studies

 learning iterations, such as the classification result and the progress of the ex-
 periment. The training of the classifier, query selection, and the preparation of
 additional information such as visualizations and explanations take place in an
 algorithm backend. Finally, there is a human operator who configures, monitors
 and evaluates the experiments through an operator interface. This typically is
 the researcher who conducts the experiments. Figure 1 is an overview of the
 system architecture. In the following, we describe the different modules and link
 them to our prototype implementation.


 Algorithm Backend: On a technical level, the algorithm backend consists of
 a classifier module SVDD.jl 1 and a module OneClassActiveLearning.jl 2 , which
 implements active learning components such as the query strategies. A third
 module provides additional information, e.g., classifier visualizations. For our
 prototype, we have implemented the classifiers, query strategies and basic vi-
 sualization information in OcalAPI.jl 3 , a ready-to-use JSON REST API. This
 decoupling allows to re-use the algorithm backend independent of the participant
 and operator interface.


 Operator Interface: The operator interface allows an operator to configure
 so-called experiment setups. A setup consists of a data set, a parameterized
 classifier and a query strategy. Depending on the research question, the operator
 may also configure which information is displayed in the participant interface.
 This gives way to A/B tests, to, say, validate if a certain visualization has an
 effect on feedback quality. The operator can invite several users to participate in
 an experiment run, i.e., an instantiation of an experiment setup. He can monitor
 and inspect the experiment runs in an overview panel and export experiment
 data for further analysis.


 Participant Interface: The participant interface has two functions. First, it
 is an input device to collect feedback during the experiment. Second, it provides
 the participants with information that supports them to provide educated feed-
 back. For instance, this may be a visualization of a classifier, a view on the raw
 data or a history of classification accuracy over the past iterations. The par-
 ticipant then provides feedback for some observations. During this process, the
 interface captures user interactions, e.g., mouse movement and selection. When
 the query budget or time limit is not exhausted, the participant proceeds with
 the next iteration.

    Our implementation of the interfaces is preliminary, since there are several
 open challenges, both conceptual and technical (see Section 3). We plan to make
 1
   https://github.com/englhardt/SVDD.jl
 2
   https://github.com/englhardt/OneClassActiveLearning.jl
 3
   https://github.com/englhardt/OcalAPI.jl




                                         20
Validating One-ClassValidating
                     Active Learning
                               One-Classwith User
                                         Active    Studies
                                                Learning with User Studies            5

 it publicly available in the future as well. An important takeaway from this sec-
 tion is an intuition about how OCAL systems can be designed, on an architec-
 tural level. This intuition may be useful to understand the following discussions
 on the design space of OCAL systems and the challenges related to the three
 modules.


 3     Design Decisions for OCAL Systems
 The design and implementation of OCAL systems are inherently interdisci-
 plinary and require expertise from several areas, including data mining, human-
 computer interaction, UX-design, and knowledge of the application domain. Al-
 though all disciplines are important, we now focus on the data mining perspec-
 tive. We first discuss different types of interaction and elaborate on the design
 options for one-class classifiers and query strategies. We then present different
 options to prepare information for users during the learning iterations. Finally,
 we elaborate on several technical challenges when implementing OCAL systems.

 3.1   Type of Interaction
 The common definition of active learning is that a query strategy selects one or
 more observations for feedback. So, strictly speaking, a user does not have the
 option to also give feedback on other observations not selected by the system.
 However, there are related disciplines that do away with this restriction. For
 instance, one research direction is Visual Interactive Analytics (VIA) [8, 25, 43],
 where a user interactively explores outliers in a data set. VIA systems provide
 different kinds of visualization to assist users in identifying outliers, in particular
 with high-dimensional data sets. The unification of active learning and VIA is
 Visual Inter-Active Labeling (VIAL) [6,27]. VIAL combines active learning with
 user-supporting visualizations from the VIA community. Variants of VIAL and
 active learning are conceivable as well. For instance, instead of asking for labels
 of specific observations, the query strategy could provide a set of observations
 from which users can select one or more to label.
     It is an open question in which cases one should use VIAL or active learning.
 A user study in [5] indicates that users label more observations if they are free
 to choose the observations. However, the resulting classifier accuracy is higher
 with an AL query strategy. It is unclear whether these insights transfer to out-
 lier detection where classes are unbalanced. In fact, we see this as one of the
 overarching questions to answer with user studies.

 3.2   Type of Feedback
 In this article, feedback is binary, i.e., users decide whether an observation be-
 longs to the inlier or outlier class. However, other types of feedback are conceiv-
 able as well. For instance, in multi-class settings, the system may ask users to
 state to which classes an observation does not belong [10]. Another example is to




                                           21
Validating
 6          One-Class
         Trittenbach et Active
                        al.    Learning with User Studies

 ask users for feedback on features, as opposed to instances [13]. Existing OCAL
 approaches in turn focus on binary feedback. It is an open question if and how
 OCAL can benefit from allowing for different types of feedback.


 3.3   OCAL Design Space

 An OCAL system consists of three building blocks: the learning scenario, the
 classifier, and the query strategy. In brief, a learning scenario are underlying
 assumptions about the application and user interaction. This includes the feed-
 back type, e.g., sequential labels, the budget available for feedback, as well as
 assumptions on the data distribution, the objective of the learning process, e.g.,
 to improve the accuracy of the classifier, and an initial setup, which includes how
 many labels are available when the active learning starts. In addition, there are
 several semi-supervised classifiers, such as SVDDneg [39], and query strategies,
 e.g., high-confidence sampling [3], which one can combine almost arbitrarily with
 any of the learning scenarios. Almost all classifiers and query strategies require
 to set additional hyperparameters. Their value can have significant influence on
 result quality, and a poor choice may bog it down. Moreover, a good query strat-
 egy and hyperparemeter values may also depend on the active learning progress,
 i.e., the number of labels already provided by the user.
      Navigating this design space is challenging, and it is generally not feasible to
 consider and evaluate all possible combinations. Although there is an overview
 and a benchmark on OCAL [41], a good solution still is application-specific and
 may require fine-tuning of several components.


 3.4   Preparation of Information

 Classifier training and query selection produce a lot of data. On a fine-granular
 level, this includes the parameterized decision function for the classifier and in-
 formativeness scores for the query strategy. After processing this data, query
 strategies select the most informative instances and predict a label for each ob-
 servation. In general, this data can be processed and enriched in many ways
 before presenting it to a user. On a coarse level, one can provide users with
 additional information, such as explanations of the classifier or contextual infor-
 mation on the learning progress. We now discuss several types of information
 to present during an active learning iteration: the query, the result, black-box
 explanations and contextual information.


 Query presentation: After selecting observations for feedback, “queries” in
 short, they must be presented to the user. In general, there are two representa-
 tions of a query. First, the query has a raw-data representation. Examples are
 text documents, multimedia files, multi-dimensional time series of real-valued
 sensors, or sub-graphs of a network. Second, the data often is pre-processed to
 a feature representation, a real-valued vector that the classifier can process. In




                                          22
Validating One-ClassValidating
                     Active Learning
                               One-Classwith User
                                         Active    Studies
                                                Learning with User Studies             7

 principle, queries can be presented to users in either representation. Our expe-
 rience is that domain experts are more familiar with raw data and demand it
 even if the feature representation is interpretable.
     Next, one can provide context information for queries. For an individual
 instance, one can show the nearest neighbors of the query or a difference to
 prototypes of both classes. Another approach is to use visualization techniques
 for high-dimensional data [28, 32] to highlight the query. One can also visualize
 the score distribution over all candidate queries. Depending on the type of the
 query strategy, it also is possible to generate heatmaps that indicate areas in the
 data space with high informativeness [44] together with the query.

 Result presentation: The presentation of a classification result largely de-
 pends on the classifier. With OCAL, the classifiers predominantly used rely on
 the notion of Support Vector Data Description (SVDD) [39]. In a nutshell, SVDD
 is an optimization problem to fit a hypersphere around the data, while allowing
 a small percentage of the data, the outliers, to lie outside the hypersphere. By
 using the kernel trick, the decision boundaries can become of arbitrary shape.
 So a natural presentation of SVDD is a contour plot that shows distances to
 the decision boundary. However, when data has more than two dimensions, con-
 tour plots are not straightforward. The reason is that contour plots rely on the
 distance to the decision boundary for a two-dimensional grid of observations
 (x1 , x2 ). However, the distance depends on the full vector (x1 , x2 , . . . , xn ) and
 thus cannot be computed for low-dimensional projections. One remedy would be
 to train a classifier for each of the projections to visualize. However, the classifier
 trained on the projection may differ significantly from the classifier trained on
 all dimensions. So a two-dimensional contour plot may have very little benefit.
 With common implementations of one-class classifiers, one is currently restricted
 to present results as plain numeric values, raw data, and predicted labels.

 Black-Box Explanations: Orthogonal to inspecting the queries and the clas-
 sification result, there are several approaches to provide additional explanations
 of the classification result. The idea is to treat the classifier, or more generally
 any predictive model, as a black box, and generate post-hoc explanations for the
 prediction of individual observations. This is also called local explanation, since
 explanations differ between instances. Recently, CAIPI, a local explainer based
 on the popular explanation framework LIME [30], has been proposed to explain
 classification results in an active learning setting [40]. The idea behind CAIPI is
 to provide the user with explanations for the prediction of a query and ask them
 to correct wrong explanations. Another application of LIME is to explain why an
 observation has been selected as a query [29]. The idea behind this approach is to
 explain the informativeness of a query by its neighborhood. The authors use un-
 certainty sampling, and this approach may also work with other query strategies,
 such as high-confidence sampling [3]. However, with more complex query strate-
 gies, for instance ones that incorporate local neighborhoods [45] or probability
 densities [16], applying LIME may not be straightforward. For outlier detection,




                                           23
Validating
 8          One-Class
         Trittenbach et Active
                        al.    Learning with User Studies

 there exist further, more specific approaches to generate explanations for outlier-
 ness. An example is to visualize two-dimensional projections for input features
 that contribute most to an outlier score [19]. Other examples are methods from
 the VIA community that allow users to explore outliers interactively [8, 25, 43].

 Contextual Information: The participant interface can also provide addi-
 tional information that spans several active learning iterations. For instance, the
 interface can give users access to the classification history, allow them to revisit
 their previous responses, and give them access to responses of other users, if
 available. This can entail several issues, such as how to combine possibly di-
 verging responses from different users, and the question whether users will be
 biased by giving them access to feedback of others. Studying such issues is fo-
 cus of collaborative interactive learning [9]. Others have proposed to give users
 access to 2D scatter plots of the data, the confusion matrix and the progress of
 classification accuracy on labeled data [24]. In this case, accuracy measures may
 be biased. For instance, after collecting a ground truth for the first few labels,
 accuracy may be very high. It may decrease when more labels become available,
 and the labeled sample covers a larger share of the data space. So it remains
 an open question whether contextual information will indeed support users to
 provide accurate feedback.

     To conclude, one faces many options in the design of OCAL systems. In
 particular, there are many approaches to support users with information so that
 they can make informed decisions on the class label. However, the approaches
 discussed have not yet been evaluated by means of user studies. Instead, they
 are limited to a theoretical discussion, simulated feedback based on benchmark
 data, or pen and paper surveys [40]. It is largely unclear which methods do
 enable users to provide feedback and indeed improve the feedback collected.

 3.5   Technical Challenges
 Active learning induces several technical requirements to make systems inter-
 active, and to collect user feedback. Most requirements are general for active
 learning systems But their realization with one-class classifiers is difficult.

 Cold Start In most cases, active learning starts with a fully unsupervised set-
 ting, i.e., there is no labeled data available. This restricts the possible combina-
 tions of classifiers and query strategies in two cases. First, some query strategies,
 e.g., sampling close to the decision boundary, require a trained one-class classi-
 fier to calculate informativeness. In this case, the classifier must be applicable
 both in an unsupervised and a supervised setting. Second, some query strategies
 rely on labeled data, e.g., when estimating probability densities for the inlier
 class [15, 16]. In this case, one cannot calculate informativeness without labels.
 Current benchmarks mostly avoid this issue by simply assuming that some ob-
 servations from each class are already labeled. In a real system, one must think




                                          24
Validating One-ClassValidating
                     Active Learning
                               One-Classwith User
                                         Active    Studies
                                                Learning with User Studies             9

 about how to obtain the initially labeled observations [2, 21]. One option would
 be to start with a query strategy that does not require any label, such as random
 sampling, and switch to a more sophisticated strategy once there are sufficiently
 many labels. Another option is to let users pick the observations to label in
 the beginning, and then switch to an active learning strategy [2, 6]. However,
 deciding when to do switches between query strategies with OCAL is an open
 question.



 Batch Query Selection Currently, query selection for one-class classifiers is
 sequential, i.e., for one observation at a time. However, this sequentiality may
 have several disadvantages, such as frequent updating and re-training of the one-
 class classifier. Further, it might be easier for users to label several observations in
 a batch than one observation at a time [34]. This may be the case when showing a
 diverse set of observations helps a user to develop an intuition regarding the data
 set. However, there currently are no strategies to select multiple observations in
 batches with one-class classifiers. An open question is whether strategies that
 have been proposed for other use cases, such as multi-class classification [12], are
 applicable with one-class classifiers.



 Incremental Learning The runtime for updating a classifier constrains the
 frequency of querying the user. In particular, excessive runtimes for classifier
 training result in long waiting times and do away with interactivity. Intuitively,
 there is an upper limit that users are willing to wait, but the specific limit
 depends on the application.
      Several strategies are conceivable to mitigate runtime issues. First, one can
 rely on incremental learning algorithms [20]. However, state-of-the-art one-class
 classifiers like SSAD [18] have been proposed without any feature for incre-
 mental learning. Second, one can sub-sample to reduce the number of training
 observations. Several strategies have been proposed explicitly for one-class clas-
 sifiers [23, 26, 38]. But to our knowledge, there are no studies that combine sub-
 sampling with OCAL. Finally, one can use speculative execution to pre-compute
 the classifier update for both outcomes (inlier or outlier) while the user is de-
 ciding on a label [36]. While such a strategy requires additional computational
 resources, it might reduce waiting times significantly and improve interactivity.
 The open question is how to proceed with pre-computing when the look-ahead
 l is more than one feedback iteration. This is a combinatorial problem, and pre-
 computing all 2l learning paths is intractable. Instead, one may use conditional
 probabilities to pre-compute only the most likely search paths. However, there
 currently is no method to plan pre-computation beyond l = 1. If users select
 observations to label by themselves, pre-computation would require to compute
 classifier update for all observations and outcomes, which is infeasible. Thus,
 there is a trade-off between giving users flexibility to decide freely on which
 observations to label, and the capabilities of pre-computation.




                                           25
Validating
 10         One-Class
         Trittenbach et Active
                        al.    Learning with User Studies

 Evaluation at Runtime Without a good quality estimate, it is impossible to
 know whether the feedback obtained from a user already is sufficient [2], i.e., the
 one-classifier has converged, and additional feedback would not alter the decision
 boundary any further. However, evaluating the classification quality of OCAL
 at runtime is difficult [22]. This issue exists in both, when benchmarking with
 simulated feedback, and in real systems – here, we focus on the latter. Users
 may become frustrated if they face periods where their feedback does not have
 any effect.

     However, showing users any estimated classification quality is difficult for two
 reasons. First, there might be a short term bias, i.e., the classifier performance
 might fluctuate significantly. This may be irritating, and it may be difficult to
 assess for the user. Second, the number of observations in the ground truth
 increases over time. With only a few labeled observations, the quality estimates
 may have a large error. This error may reduce with more iterations. So the open
 question is how to estimate classification quality reliably, and how to adapt
 these quality estimates during learning. One conceivable option is to switch
 between exploration and exploitation, i.e., switch from querying for examples
 that improve classification quality to selection strategies that improve the quality
 estimate of the classifier. However, there currently is no such switching method
 for OCAL.




 Management of Data Flows Developing an active learning system also re-
 quires a sound software architecture. Although this is not a research challenge
 per se, there are several aspects to consider when implementing OCAL systems.
 One key aspect is the management of data flows. In particular, with a distributed
 application, see Section 2, there are several locations where one has to retain the
 data set, the classifier, the predictions, and the informativeness scores. For large
 data sets in particular, transferring data between a client and a backend or load-
 ing data sets from disc may affect runtimes significantly. This calls for efficient
 data caching. Further, one must decide where computations take place. For in-
 stance, to visualize contour plots, one must predict the decision boundary for a
 grid of observations, possibly in multiple projections of the data. In this case,
 transferring the model over the network may be very little overhead. This can be
 an efficient strategy when evaluating the model for an observation is cheap. This
 is the case with SVDD, since the model consists of only a few support vectors.
 With multi-user studies, one may even reuse trained classifiers and informative-
 ness scores from other user sessions with an equivalent feedback history. In this
 case, it might be more efficient to pre-compute grid predictions in the backend.
 So there are several trade-offs and factors that determine an efficient data flow.
 There currently is no overview on these trade-offs. It also is unclear how they
 affect design decisions for OCAL systems.




                                         26
Validating One-ClassValidating
                     Active Learning
                               One-Classwith User
                                         Active    Studies
                                                Learning with User Studies             11

 4     Validating OCAL with User Studies
 There are a few active learning user studies which have been conducted for special
 use cases, such as text corpus annotation [1, 11, 31] and network security [4].
 However, it is unclear how findings relate to outlier detection with OCAL –
 the previous sections illustrate the peculiarities of this application. Further, the
 plethora of design options make user studies with OCAL systems particularly
 challenging.
     Addressing all of the design options at once is not feasible, since there are
 too many combinations of classifiers, query strategies and ways to prepare infor-
 mation for users. So we propose to start with a narrow use case and to increase
 the complexity of the OCAL system step-wise. Specifically, we have identified
 the following steps towards a validation of OCAL in real applications.
       (i) Simplified Use Case: Much of the value of active learning is in domains
           where obtaining labels is difficult, even for domain experts. However, we
           argue that one should identify a use case that many people can easily
           relate to. This has several advantages. First, we deem reproducibility
           more important than to obtain sophisticated insights on very special use
           cases. User studies are easier to reproduce when they do not depend
           on specific domain expertise. Further, when relationships in data are well
           understood, one can more easily judge whether the presentation of queries
           and results is accurate. So we argue to base a validation of OCAL on
           standard benchmark data, for instance the hand-written digit image data
           set MNIST4 . Such a simplification also includes to fix the details of the
           feedback process, for instance to “sequential feedback” and “no initial
           labels”. If necessary, one should downsample data sets so that runtimes
           of classifiers and query strategies are not a bottleneck.
      (ii) Validation of Information Presented: The next step is to identify situa-
           tions when users can give accurate feedback. Since the focus is to vali-
           date a learning system with users, one should start with a data set with
           available ground truth and select the best combination of classifier and
           query strategy in an experimental benchmark. This might seem counter-
           intuitive at first sight. In a real application, there generally are not suffi-
           ciently many labels available to conduct such a benchmark – in fact, this
           may even be the motivation for active learning in the first place [2, 35].
           However, we argue that this is a necessary step to break the mutual de-
           pendency between selecting a good setup and collecting labels. Given a
           combination of classifier and query strategy, one can then apply different
           query and result presentations and work with explanations and contex-
           tual information. By evaluating this step with user experiments, one can
           derive assumptions which, if met, enable users to provide accurate feed-
           back.
     (iii) Validation of Classifier and Learning Strategy: Based on these assump-
           tions, one can vary the dimensions that have been fixed beforehand. This
 4
     http://yann.lecun.com/exdb/mnist/




                                             27
Validating
 12         One-Class
         Trittenbach et Active
                        al.    Learning with User Studies

          is, one fixes the information presented to the user and varies the query
          strategies and classifiers. Further, one may validate specific extensions
          such as batch query strategies.
     (iv) Generalization: The first step of generalization is to scale the experiments
          to a large number of observations, using the techniques discussed in Sec-
          tion 3.5. Finally, one can then validate the approach on similar data sets,
          e.g., on different image data.

     We expect the findings from these steps to be two-fold. On the one hand, we
 expect insights that are independent from the use case. For instance, whether
 scalability techniques are useful is likely to be use-case independent. On the other
 hand, many findings may depend on the type of data at hand. Explanations based
 on image data may be very different from the ones for, say, time series data.
     Our OCAL system prototype already includes different classifiers and query
 strategies, see Section 2. So, in general, any researcher can already use our system
 to conduct Step (i) and the pre-selection of the query strategy and classifier
 information required for Step (ii). Regarding our prototype, the next steps are
 to select and implement a working set of query and result presentations, as well
 as to include black-box explainers and contextual information.


 5     Conclusions
 Validating One-Class Active Learning through user studies is challenging. One
 reason is that there are several open conceptual and technical challenges in
 the design and implementation of interactive learning systems. This article has
 featured a systematic overview of these challenges, and we have pointed out
 open research questions with one-class active learning. Next, we have sketched
 an architecture of a one-class active learning system, which we have implemented
 as a prototype. Based on it, we propose a roadmap towards validating one-class
 active learning with user studies.

 Acknowledgement This work was supported by the German Research Foun-
 dation (DFG) as part of the Research Training Group GRK 2153: Energy Status
 Data – Informatics Methods for its Collection, Analysis and Exploitation.


 References
  1. Arora, S., Nyberg, E., Rosé, C.P.: Estimating annotation cost for active learning
     in a multi-annotator environment. In: NAACL Workshop. pp. 18–26. ACL (2009)
  2. Attenberg, J., Provost, F.: Inactive learning?: Difficulties employing ac-
     tive learning in practice. SIGKDD Explor. Newsl. 12(2), 36–41 (2011).
     https://doi.org/10.1145/1964897.1964906
  3. Barnabé-Lortie, V., Bellinger, C., Japkowicz, N.: Active learning
     for One-Class classification. In: ICMLA. pp. 390–395. IEEE (2015).
     https://doi.org/10.1109/ICMLA.2015.167




                                           28
Validating One-ClassValidating
                     Active Learning
                               One-Classwith User
                                         Active    Studies
                                                Learning with User Studies             13

  4. Beaugnon, A., Chifflier, P., Bach, F.: End-to-end active learning for computer
     security experts. In: AAAI Workshop (2018)
  5. Bernard, J., Hutter, M., Zeppelzauer, M., Fellner, D., Sedlmair, M.: Comparing
     visual-interactive labeling with active learning: An experimental study. Trans. Vis.
     Comput. Graph. 24(1), 298–308 (2017)
  6. Bernard, J., Zeppelzauer, M., Sedlmair, M., Aigner, W.: Vial: a unified process for
     visual interactive labeling. The Visual Computer 34(9), 1189–1207 (2018)
  7. Bischof, S., Trittenbach, H., Vollmer, M., Werle, D., Blank, T., Böhm, K.: Hipe:
     An energy-status-data set from industrial production. In: Proceedings of the Ninth
     International Conference on Future Energy Systems. pp. 599–603. ACM (2018)
  8. Bögl, M., Filzmoser, P., Gschwandtner, T., Lammarsch, T., Leite, R.A., Miksch, S.,
     Rind, A.: Cycle plot revisited: Multivariate outlier detection using a distance-based
     abstraction. Computer Graphics Forum 36(3), 227–238 (2017)
  9. Calma, A., Leimeister, J.M., Lukowicz, P., Oeste-Reiss, S., Reitmaier, T., Schmidt,
     A., Sick, B., Stumme, G., Zweig, K.A.: From active learning to dedicated col-
     laborative interactive learning. In: ARCS 2016; 29th International Conference on
     Architecture of Computing Systems. pp. 1–8 (Apr 2016)
 10. Cebron, N., Richter, F., Lienhart, R.: “I can tell you what it’s not”: ac-
     tive learning from counterexamples. Prog. Artif. Intell. 1(4), 291–301 (2012).
     https://doi.org/10.1007/s13748-012-0023-9
 11. Choi, M., Park, C., Yang, S., Kim, Y., Choo, J., Hong, S.R.: Aila: Attentive inter-
     active labeling assistant for document classification through attention-based deep
     neural networks. In: CHI. ACM (2019). https://doi.org/10.1145/3290605.3300460
 12. Demir, B., Persello, C., Bruzzone, L.: Batch-Mode Active-Learning methods for
     the interactive classification of remote sensing images. Trans. Geosci. Remote Sens.
     49(3), 1014–1031 (2011). https://doi.org/10.1109/TGRS.2010.2072929
 13. Druck, G., Settles, B., McCallum, A.: Active learning by labeling features. In:
     EMNLP. pp. 81–90. ACL (2009)
 14. Endert, A., Hossain, M.S., Ramakrishnan, N., North, C., Fiaux, P., Andrews, C.:
     The human is the loop: new directions for visual analytics. J. Intell. Inf. Syst.
     43(3), 411–435 (2014). https://doi.org/10.1007/s10844-014-0304-9
 15. Ghasemi, A., Manzuri, M.T., Rabiee, H.R., Rohban, M.H., Haghiri, S.: Active one-
     class learning by kernel density estimation. In: MLSP Workshop. pp. 1–6 (2011).
     https://doi.org/10.1109/MLSP.2011.6064627
 16. Ghasemi, A., Rabiee, H.R., Fadaee, M., Manzuri, M.T., Rohban, M.H.: Active
     learning from positive and unlabeled data. In: ICDM Workshop. pp. 244–250
     (2011). https://doi.org/10.1109/ICDMW.2011.20
 17. Görnitz, N., Kloft, M., Rieck, K., Brefeld, U.: Active learning for
     network     intrusion     detection.   In:    AiSec    Workshop.     ACM      (2009).
     https://doi.org/10.1145/1654988.1655002
 18. Görnitz, N., Kloft, M., Rieck, K., Brefeld, U.: Toward supervised anomaly detec-
     tion. J. Artif. Intell. Res. 46, 235–262 (2013)
 19. Gupta, N., Eswaran, D., Shah, N., Akoglu, L., Faloutsos, C.: Beyond outlier de-
     tection: LOOKOUT for pictorial explanation. In: ECML. pp. 122–138. Springer
     (2018). https://doi.org/10.1145/1235
 20. Kefi-Fatteh, T., Ksantini, R., Kaâniche, M.B., Bouhoula, A.: A novel incremental
     one-class support vector machine based on low variance direction. Pattern Recog-
     nit. 91, 308–321 (2019). https://doi.org/10.1016/j.patcog.2019.02.027
 21. Kottke, D., Calma, A., Huseljic, D., Krempl, G.M., Sick, B.: Challenges of reli-
     able, realistic and comparable active learning evaluation. In: Proceedings of the
     Workshop and Tutorial on Interactive Adaptive Learning. pp. 2–14 (2017)




                                            29
Validating
 14         One-Class
         Trittenbach et Active
                        al.    Learning with User Studies

 22. Kottke, D., Schellinger, J., Huseljic, D., Sick, B.: Limitations of assessing active
     learning performance at runtime. arXiv:1901.10338 (2019)
 23. Krawczyk, B., Triguero, I., Garcı́a, S., Woźniak, M., Herrera, F.: Instance re-
     duction for one-class classification. Knowl. Inf. Syst. 59(3), 601–628 (2019).
     https://doi.org/10.1007/s10115-018-1220-z
 24. Legg, P., Smith, J., Downing, A.: Visual analytics for collaborative human-machine
     confidence in human-centric active learning tasks. Hum. Cent. Comput. Inf. Sci.
     9(1), 5 (2019). https://doi.org/10.1186/s13673-019-0167-8
 25. Leite, R.A., Gschwandtner, T., Miksch, S., Kriglstein, S., Pohl, M., Gstrein, E.,
     Kuntner, J.: Eva: Visual analytics to identify fraudulent events. Trans. Vis. Com-
     put. Graph. 24(1), 330–339 (2017). https://doi.org/10.1109/TVCG.2017.2744758
 26. Li,    Y.:    Selecting     training    points    for   one-class     support   vec-
     tor machines. Pattern Recognit. Lett. 32(11), 1517–1522 (2011).
     https://doi.org/10.1016/j.patrec.2011.04.013
 27. Lin, H., Gao, S., Gotz, D., Du, F., He, J., Cao, N.: Rclens: Interactive rare cate-
     gory exploration and identification. Trans. Vis. Comput. Graph. 24(7), 2223–2237
     (2017). https://doi.org/10.1109/TVCG.2017.2711030
 28. Liu, S., Maljovec, D., Wang, B., Bremer, P.T., Pascucci, V.: Visualizing high-
     dimensional data: Advances in the past decade. Trans. Vis. Comput. Graph. 23(3),
     1249–1268 (2016). https://doi.org/10.1109/TVCG.2016.2640960
 29. Phillips, R.L., Chang, K.H., Friedler, S.A.: Interpretable active learning. arXiv
     preprint arXiv:1708.00049 (2017)
 30. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why Should I Trust You?”: Ex-
     plaining the predictions of any classifier. In: SIGKDD. pp. 1135–1144 (2016).
     https://doi.org/10.1145/2939672.2939778
 31. Ringger, E.K., Carmen, M., Haertel, R., Seppi, K.D., Lonsdale, D., McClanahan,
     P., Carroll, J.L., Ellison, N.: Assessing the costs of machine-assisted corpus anno-
     tation through a user study. In: LREC. pp. 3318–3324 (2008)
 32. Sacha, D., Zhang, L., Sedlmair, M., Lee, J.A., Peltonen, J., Weiskopf, D., North,
     S.C., Keim, D.A.: Visual interaction with dimensionality reduction: A struc-
     tured literature analysis. Trans. Vis. Comput. Graph. 23(1), 241–250 (2016).
     https://doi.org/10.1109/TVCG.2016.2598495
 33. Seifert, C., Granitzer, M.: User-based active learning. In: ICDM Workshop. pp.
     418–425. IEEE (2010). https://doi.org/10.1109/ICDMW.2010.181
 34. Settles, B.: From theories to queries: Active learning in practice. In: AISTATS
     Workshop. pp. 1–18 (2011)
 35. Settles, B.: Active learning. Synthesis Lectures on Artificial Intelligence and Ma-
     chine Learning pp. 1–114 (2012)
 36. Sperrle, F., Bernard, J., Sedlmair, M., Keim, D., El-Assady, M.: Speculative exe-
     cution for guided visual analytics. In: VIS Workshop (2018)
 37. Stokes, J.W., Platt, J.C., Kravis, J., Shilman, M.: Aladin: Active learning of
     anomalies to detect intrusions. Tech. rep., Microsoft Research (2008)
 38. Sun, W., Qu, J., Chen, Y., Di, Y., Gao, F.: Heuristic sample reduction method
     for support vector data description. Turkish Journal of Electrical Engineering &
     Computer Sciences 24(1), 298–312 (2016)
 39. Tax, D.M.J., Duin, R.P.W.: Support vector data description. Mach. Learn. 54(1),
     45–66 (2004). https://doi.org/10.1023/B:MACH.0000008084.60811.49
 40. Teso, S., Kersting, K.: Explanatory interactive machine learning. In: AAAI (2019)
 41. Trittenbach, H., Englhardt, A., Böhm, K.: An overview and a benchmark of active
     learning for outlier detection with One-Class classifiers. arXiv:1808.04759 (2018)




                                           30
Validating One-ClassValidating
                     Active Learning
                               One-Classwith User
                                         Active    Studies
                                                Learning with User Studies             15

 42. Vollmer, M., Englhardt, A., Trittenbach, H., Bielski, P., Karrari, S., Böhm, K.: En-
     ergy time-series features for emerging applications on the basis of human-readable
     machine descriptions. In: Proceedings of the Tenth ACM International Conference
     on Future Energy Systems. pp. 474–481. ACM (2019)
 43. Wilkinson, L.: Visualizing big data outliers through distributed aggregation. Trans.
     Vis. Comput. Graph. 24(1), 256–266 (2017)
 44. Yang, Y., Loog, M.: A benchmark and comparison of active learn-
     ing for logistic regression. Pattern Recognit. 83, 401–415 (2018).
     https://doi.org/10.1109/TVCG.2017.2744685
 45. Yin, L., Wang, H., Fan, W.: Active learning based support vector data descrip-
     tion method for robust novelty detection. Knowl. Based. Syst. 153, 40–52 (2018).
     https://doi.org/10.1016/j.knosys.2018.04.020




                                            31