<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Summary Extraction on Data Streams in Embedded Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sebastian Buschjager</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katharina Morik</string-name>
          <email>katharina.morikg@tu-dortmund.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maik Schmidt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>TU Dortmund University Computer Science VIII: Arti cial Intelligence Unit</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>More and more data is created by humans and cyber-physical systems having sensing, acting and networking capabilities. Together, these systems form the Internet of Things (IoT). The realtime analysis of its data may provide us with valuable insights about the complex inner processes of the IoT. Moreover, these insights o er new opportunities ranging from sensor monitoring to actor control. The volume and velocity of the data at the distributed nodes challenge human as well as machine monitoring of the IoT. Broadcasting all measurements to a central node might exceed the network capacity as well as the resources at the central node or the human attention span. Hence, data should be reduced already at the local nodes such that the submitted information can be used for e cient monitoring. There are several methods that aim at data summarization ranging from clustering, aggregation to compression. Where most of the approaches transform the representation, we want to select unchanged data items from the data stream, already while they are generated by the cyberphysical system and at the cyber-physical system. The observations are selected independent of their frequencies. They are meant to be e ciently transmitted. The ideal case is that no important measurement is missing in the selection and that no redundant items are transmitted. The data summary is easily interpreted and is available in realtime. We focus on submodular function maximization due to its strong theoretical background. We investigate its use for data summarization and enhance the Sieve-Streaming algorithm for data summarization on data streams such that it delivers smaller sets with high recall.</p>
      </abstract>
      <kwd-group>
        <kwd>Data summarization</kwd>
        <kwd>Embedded systems</kwd>
        <kwd>Streaming Data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        With increasing processing capabilities, more and more data is gathered every
day at virtually any place on earth. Most of this data is produced by small
embedded electronics with sensing, acting and networking capabilities [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        To capitalize on the vast amount of data produced by these IoT devices,
machine learning has proven a valuable tool, but is usually limited to large
server systems due to resource constraints. In order to speed up the analysis for
monitoring, we wish to move parts of the analysis to the measuring device by the
means of data summarization. For numerical data streams, moving averages have
been used. For categorial values, data summarization describes the extraction of
representative items from a data set and has been used for texts [
        <xref ref-type="bibr" rid="ref7 ref8">7,8</xref>
        ], speech [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ],
and images [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Informally, a summary should contain a few, but informative
items, where each item re ects one hidden state or class of the underlying system.
Given such a summary, we can use it to
{ perform certain actions, once a new item is added to the summary
{ compare summaries of di erent devices and points in time
{ provide a quick overview for human experts
      </p>
      <p>
        Data summarization has been viewed as a submodular function maximization
problem, which o ers an appealing theoretical framework with strong quality
bounds. In this paper, we focus on data summarization on data streams by
the means of submodular function maximization, namely the Sieve-Streaming
algorithm [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>The main question when computing a data summary is, what items are
\novel" enough to be included into the summary and which are too similar
to already seen items. If this novelty threshold is chosen too small, the
summary will contain redundant items, whereas if the novelty is expected to be too
high, we will not add any item to the summary. Sieve-Streaming deals with this
challenge by managing multiple summaries in parallel, each with its own novelty
threshold.</p>
      <p>We will exploit the ideas of Sieve-Streaming for monitoring of distributed
measuring devices. Hence, we rst investigate, whether the Sieve-Streaming is
appropriate for our setting. Additionally, we extend the algorithm for the use in
embedded systems:
{ We reduce the number of summaries by tighter bounds of the utility
threshold used by Sieve-Streaming without sacri cing its theoretical guarantees.
{ By dynamic thresholding, we further improve the quality of Sieve-Streaming
without increasing its computational costs.
{ We evaluate the enhancements empirically.</p>
      <p>The paper is organized as the following. The next section will give a brief
overview of the Sieve-Streaming algorithm. Section 3 shows our enhancements
of the Sieve-Streaming algorithm in detail. Section 4 presents experiments on
three di erent data sets to evaluate data summarization in general and our
enhancements of Sieve-Streaming in detail. We conclude the paper in section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Sieve-Streaming</title>
      <p>
        In this section, we shortly present the Sieve-Streaming algorithm and introduce
the notation used in this paper. A more detailed overview of submodular function
maximization can be found in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Submodular function maximization considers
the problem of selecting a set S V with xed size jSj = K from a ground
set V , such that a set function f : 2V ! R is maximized, which assigns a utility
score to each possible subset S V . For set functions, the marginal gain is given
by the increase in f (S) when adding an element e 2 V to S:
f (ejS) = f (S [ feg)
f (S)
      </p>
      <p>The function f is called monotone, i for all e 2 V and for all S
that f (ejS) 0. Furthermore, we call f submodular i for all A
e 2 V n B it holds that</p>
      <p>V it holds</p>
      <p>B V and
f (ejA)</p>
      <p>f (ejB)</p>
      <p>Submodular functions incorporate a notion of diminishing returns: Adding
a new element to a smaller set will increase the utility value more than adding
the same element to a larger set. This property intuitively captures one aspect
of summarization, in which we seek small but expressive summaries and thus
adding elements to a summary should be done with care.</p>
      <p>
        Now, let us apply monotone submodular function maximization in a
streaming setting, so that data items e arrive one at a time and we have to decide
immediately, whether we want to add e to S, or not. A number of di erent
approaches for streaming settings have been proposed in literature with di erent
constraints and assumptions about the data (see [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] for a very recent overview).
The approach of Badanidiyuru and colleagues ts our settings best [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Submodularity and monotony allows us to greedily increase the utility value
by simply adding new items to the summary. Since the maximum summary
size is restricted to K, this approach simply picks the rst K elements without
considering other items in the stream. Thus, we should add items to the summary
only, if they o er a reasonable increase in the summary's utility value, i.e. add
enough novelty.</p>
      <p>Let OP T = maxS V;jSj=K f (S) denote the maximum possible function value.
Assume we know OP T beforehand and we have already selected some points in
S. In this situation we could expect the next item e to add at least OP T f(S) to
K jSj
the utility so that we can reach OP T with the remaining K jSj elements. In
a sense, this approach sieves out elements with marginal gains below the given
threshold.</p>
      <p>
        However, this approach does not work, if the optimal summary contains many
elements with a gain just below the threshold and one element with large marginal
gain. In this case, we could miss the items with smaller marginal gain waiting
for an item with large gain. To counter this behavior, the authors of [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] propose
to lower the threshold by 21 , adding e to S if the following holds:
f (ejS)
      </p>
      <p>OP T =2</p>
      <p>K</p>
      <p>f (S)
jSj</p>
      <p>Knowing the optimal utility value OP T is part of the problem we would like
to solve. Usually, this value is unknown beforehand and needs to be estimated.
In order to guarantee a good estimate we can generate multiple estimates for
OP T before running the algorithm and manage multiple sieves each with their
own threshold values in parallel.</p>
      <p>We can narrow down the value of OP T by using the properties of submodularity.
To do so, let m = maxe2V f (feg) denote the maximum value of a singleton item.
Given e occurs at least once in the data stream, the worst optimal summary
would contain e at least once. If e however occurs at least K times in the data
stream, then the best summary would contain e K-times, so that:
(1)
m</p>
      <p>OP T</p>
      <p>K m
(2)</p>
      <p>Therefore, knowing the maximum singleton value f (feg) already gives a
rough estimate of the range of values we can expect from f . This knowledge
can now be used to sample di erent threshold values from the interval [m; Km]
such that one of these thresholds will be close to OP T . More formally, we
manage di erent summaries in parallel, each using one threshold from the set
O = f(1 + ")i j i 2 Z; m (1 + ")i k mg, so that for at least one v 2 O it
holds: (1 ")OP T v OP T . Algorithm 1 summaries the proposed method.
It can be shown, that algorithm 1 extracts a summary with:</p>
      <p>1
f (S) ( ")OP T (3)</p>
      <p>2
2.1</p>
      <p>
        Utility function
Given a summary S = fe1; : : : ; eK g we can express the similarity between
elements ei and ej using a kernel function k(ei; ej ), which gives rise to the kernel
matrix = [k(ei; ej )]ij containing all similarity pairs. This matrix has the largest
Algorithm 1 Sieve-Streaming with known singleton maximum value m.
values on the diagonal as items are the most similar to themselves, whereas
values on the o -diagonal indicate the similarity between distinct elements and thus
are usually smaller. Intuitively, we seek an expressive summary, so that
contains many values near 0 on its o -diagonal. This intuition is formally captured
by the Informative Vector Machine [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposing the function:
In the previous section, we presented the Sieve-Streaming algorithm. Now, we
will introduce the changes that make it better suited for summarization in
embedded systems. First we show how to reduce the number of sieves without
sacri cing performance guarantees. Second, we introduce dynamic thresholding
which further increases the utility value. Last, we consider the trade-o between
memory consumption and runtime for an e cient implementation.
Reduce the number of sieves: Sieve-Streaming needs to keep track of
multiple sieves in parallel with corresponding thresholds chosen from the interval
[m; Km]. By taking the special structure of f into account, we can reduce the
size of this interval and thus the overall memory requirements. To do so, we
assume that the kernel function k( ; ) and its corresponding kernel matrix are
positive de nite.
      </p>
      <p>
        The upper bound Km of the threshold v is sharp and thus cannot be improved.
Intuitively, a summary reaches its maximum function value when all entries on
the o -diagonal are 0. In this case logdet(I + 2 ) equals log((1 + )K ) =
Klog(1 + ) = K m, which is exactly what we derived using submodularity.
The lower bound m, however, can be improved substantially: A summary reaches
its minimum utility when all entries on the o -diagonal equal the largest
possible kernel value C, that is, we picked the same element K times. W.l.o.g we
assume that C = 1 in the following, e.g. we use the RBF kernel k(x; y) =
exp( 21 jjx yjj2). Note, that it is possible to scale every positive de nite
kernel to the interval [0; 1] without damaging its positive de nite property (cf.
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). By using Sylvester's determinant theorem, we see that logdet(I + ) =
log(I + 1T 1) = log(1+ K) &gt; m = log(1+ ) where 1 denotes the vector, where
all entries are 1. In order to show that this intuition really creates a lower bound,
we use the characteristic polynomial det( I (I + )) = PiK=1( 1)k K iai.
The determinant can be computed in terms of characteristic coe cients ai, that
is det(I + ) = PiK=0 ai = 1 + det( ) + tr( ) + PiK=21 ai. Since I + is
positive de nite, each coe cient ai can be expressed as the sum of eigenvalues
of I + , which are also all positive [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Since is also positive de nite, we
see that det( ) 0. This leads then to
det(I +
) = 1 + det(
      </p>
      <p>) + tr(
and thus: logdet(I +
)
log(1 +</p>
      <p>K).</p>
      <p>K 1
) + X ai
i=2
1 + tr(
) = 1 +</p>
      <p>K
Dynamic Thresholding: Sieve-Streaming creates a xed number of sieves
jOj in the beginning of the algorithm. Due to the thresholding approach, sieves
with smaller thresholds will accept more items and thus ll-up more quickly,
whereas sieves with large thresholds are more picky. Once a summary is full the
corresponding sieve is not used anymore. This allows us to create another sieve
in its place with a di erent threshold while keeping the overall number of sieves
constant.</p>
      <p>In turn, we can use this new threshold to re ne the estimate of OP T . Let
vc denote the largest threshold corresponding to a full summary and let vo
denote the smallest threshold corresponding to a non-full summary. Note that
by construction vo &gt; vc and that - given the current summaries - OP T seems
to be in the interval [vc; vo]. Therefore, we can re ne the estimate of OP T if we
create a new sieve in [vc; vo].</p>
      <p>With this approach \older" sieves with larger threshold may contain more
items than \newer" sieves. In turn, one of these \older" sieves may become full
at one point, whereas the newly created sieves with smaller threshold are still
accepting elements. Since the thresholds for these sieves are smaller than that
the sieve that just became full, we know that these sieves will not o er any
better utility values. Therefore, we can close these sieves once a sieve with larger
threshold closes.</p>
      <p>In practice one rather unintuitive e ect may happen, in which all summaries
of sieves might be full at some point in time. In pure Sieve-Streaming, sieves
are created in the interval [m; Km], where the upper bound Km represents the
largest utility possible. This bound is independent from the data and therefore
we expect that at least one sieve will be open all the time aiming for an extreme
case with utility Km.</p>
      <p>Recall however, that only the half of each threshold (see eq. 1) is used when
the marginal gain of a new item is evaluated. Thus, Sieve-Streaming e ectively
uses 12 Km as upper bound. This is enough to ensure a ( 12 ") approximation,
but might be sub-optimal in some cases. To counter this, we gradually increase
the interval of thresholds to 2Km, once we notice that all sieves are closed.
Implementation: In a naive approach, each sieve needs to store K elements
and compute logdet(I + ) on-the- y as required. Computation of the
determinant is generally performed in cubic time, whereas the computation of takes
O(K2 d) leading to a total of O(K3 + K2 d) per element.</p>
      <p>To improve the overall speed, we can trade some runtime with memory if we
store permanently. Since k( ; ) is positive de nite, I + is also positive
de nite. Thus, we may use a Cholesky decomposition I + = LT L where L
denotes a lower triangular matrix to compute the determinant.</p>
      <p>Adding a new row / column vector to a Cholesky decomposition can be
performed in O(K2) time. Thus, when adding a new element to the summary, we
compute all necessary kernel values and update the Cholesky decomposition
accordingly leading to O(K2 + K d) computations per element. However, with
this method the memory requirements increase, since and L need to be stored
continuously, leading to O(2K2 + K) memory instead of O(K) memory.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>In this section we experimentally evaluate the enhancements introduced to
SieveStreaming. More speci cally, we want to answer the following questions:
1. Are extracted summaries meaningful in a sense, that they represent the
hidden states of a system?
2. How does the runtime decrease when a reduced number of sieves is used?
3. How does the utility value increase when dynamic thresholding is used and
how does this a ect the quality of the summary?</p>
      <p>Answering the rst question is di cult, because we need to know the data
generation process in detail to detect the hidden states of a system. For
realworld data this is nearly impossible. Hence, we use data sets D = f(xi; yi)ji =
1; : : : ; N g usually found in classi cation tasks. Here, each observation xi 2 Rd
is also associated with a label yi. We assume that each label yi represents one
hidden state of the system in the data generation process. Thus, we compute
a summary of the training data xi without considering the labels yi and then
report the number of di erent labels represented by the best summary found
across all sieves. With respect to the notation used in classi cation tasks, we
denote this measure as recall.</p>
      <p>To answer the second and third question, we measure the computation time
per data item and the best utility value of each algorithm (pure, reduced, and
dynamic).</p>
      <p>All experiments were performed on an Intel i7-6700 machine with 3:4 Ghz
and 32 GB RAM. We repeated each experiment 10 times with shu ed data sets
and always report average results. Our implementation is available at https:
//bitbucket.org/sbuschjaeger/iotstreaming2017.</p>
      <p>Figure 3 contains all experimental results. Each column represents one data set
whereas each row represents one measure. We use one synthetic data set and
two real-world data sets.</p>
      <p>Synthetic data: We composed a synthetic data set containing eight
fourdimensional Gaussian distributions as a Gaussian mixture model. Each
Gaussian mean value = ( i)i is uniform randomly generated with 2 i 3 for
i = 1; 2; 3; 4. Variance values are chosen uniformly random from (0; 0:8].</p>
      <p>For the data generation process we rst randomly pick one of the Gaussian
distributions and then sample a new data item x 2 R4 from this distribution.
We repeat this process 10000 times. To simulate infrequent states, e.g. failure,
we introduced di erent probabilities for each Gaussian. Two Gaussian are rare
to be selected with a probability of 0:001, whereas the remaining six Gaussian
receive the remaining probability mass equally, that is each has a probability of
0:133 to be used for data generation.</p>
      <p>We use the RBF Kernel K(xi; xj ) = exp( jjxi 10xjjj22 ) and set = 1 and
" = 0:1. Since there are eight hidden states in total, a summary of size K = 8
should be enough in theory to detect all states of the system. To account for
some error however, we vary K from 10 to 24.</p>
      <p>The rst image in the rst row of Figure 3 shows the recall of all three
algorithms on the synthetic data set. Sieve-Streaming and its tuned counterpart
achieve roughly 60% recall for K = 10 and then steadily increase their recall
with rising K reaching nearly 100% at K = 20. Dynamic Sieve-Streaming shows
a similar performance, but behaves slightly better for smaller K. Interestingly,
for K = 16 all variants produce the same output and for K = 18 Dynamic
Sieve-Streaming is outperformed by pure Sieve-Streaming. For larger K however,
Dynamic-Sieve Streaming detects all states with 100% recall for K = 24.
Note that a potential \failure" state has a probability of 0:001, that is roughly
every 1000 measurements such \failure" may occur. Usually, a human operator
would need to examine all 1000 states in this case to determine if there is a
failure or not. Using a summary with K = 24 however, a human operator only
needs to examine 24 di erent measurments, reducing the overall workload for
the human operator by 99:976%.</p>
      <p>Looking at the second row in Figure 3, we see that Sieve-Streaming and
its implementation on a smaller interval o ers the same utility, which can be
expected. Dynamic Sieve-Streaming consistently achieves higher utility values
explaining the higher recall.</p>
      <p>The last row of plots depict the runtime results. Sieve-Streaming needs
around 0:45 ms per element, whereas its reduced variant is faster for all K.
Dynamic Sieve-Streaming on the other hand needs up to 0:52 ms per element,
due to its dynamic sieve management.</p>
      <p>
        UJIndoor Location: As a second data set, we use the UJIndoorLoc data set
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This data set contains the GPS position and RSSI ngerprints of nearby
WiFi hotspots as well as the corresponding building, oor and room number of
17 smartphone users throughout their daily routine at the university of Jaume
in Spain. The dataset in total contains 19937 measurements and is usually used
for prediction tasks in which one tries to infer the GPS position based on the
RSSI ngerprints.
      </p>
      <p>We abandon the RSSI ngerprints and try to predict the semantic location,
that is the building, oor and room number directly based on the current GPS
position. We normalize the GPS data and use the RBF Kernel K(xi; xj ) =
exp( jjx0i:0x05jjj22 ). We set = 1 and " = 0:1. On average, each user visits 79
di erent locations, thus we try di erent K from 80 to 130.</p>
      <p>The second column in Figure 3 shows the results for this data set. The recall
is displayed in row one. Standard Sieve-Streaming and its reduced counterpart
achieve between 50% and 60% recall, whereas Dynamic Sieve-Streaming
manages to detect roughly 10% more items o ering a recall of 60% to 70%. Again,
this increase in recall can be explained with a higher utility value: Dynamic
Sieve-Streaming increases the utility value consistently by roughly 10 points
compared to Sieve-Streaming. The runtime of the three algorithms (second
column and third row of Figure 3) paints a di erent picture than before. Again,
pure Sieve-Streaming is the slowest algorithm, whereas Sieve-Streaming on the
smaller interval reduced the overall computational costs by around 0:2ms per
element. For smaller K, Dynamic Sieve-Streaming can be found in the middle
of the two algorithms. For larger K however, Dynamic Sieve-Streaming seems
to become faster than the reduced version of Sieve-Streaming. The reasons for
that is, that Sieve-Streaming may exit a data set early if all sieves are full. Since
Dynamic Sieve-Streaming reopens new sieves, it will run through the complete
data set. This in turn, enables cache locality and branch prediction to take full
e ect, so that the runtime per element decreases.</p>
      <p>
        MNIST: As a third data set, we use the well-known MNIST data set [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This
data set contains 60000 28 28 grayscale images of the handwritten digits 0
to 9 and is usually used as a baseline measure for image classi cation. Much
in line with the usual classi cation task, we try to extract a summary
containing one representative image for each digit. However, we will do this in an
unsupervised manner. Again, we normalize the data and use the RBF kernel
2
K(xi; xj ) = exp( jjxi78x4jjj2 ). We set = 1 and " = 0:1. Since there are 10
classes in total, we vary K from 8 to 16.
      </p>
      <p>The results are shown in the third column of gure 3. All algorithms performed
nearly the same, with Dynamic Sieve-Streaming being slightly better for all
values of K reaching a recall between 60 and 80%. Again, the utility value for
Dynamic Sieve-Streaming is consistently larger than for pure and reduced
SieveStreaming explaining the increased recall performance. Figure 2 displays two
summaries for K = 8 computed by pure Sieve-Streaming and dynamic
SieveStreaming respectively. Both algorithms show a similar summary, but pure
SieveStreaming selects multiple ones and ves. Dynamic Sieve-Streaming also selects
multiple ones, but does not choose to include two ves into the summary and
thus can add six, seven and eight to the summary increasing the recall
significantly. Looking at the runtime, standard Sieve-Streaming is again the slowest
algorithm variant reaching a computation time of 1ms per element, whereas
reduced and dynamic Sieve-Streaming are both faster with a maximum of 0:65
ms per element. Note, that we again observe the e ect, that Dynamic
SieveStreaming seems to be faster than its reduced counterpart due to cache locality
and branch prediction.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>
        In this paper we tackled the problem of data summarization on data streams
with small, embedded systems. More speci cally, we tried to answer the following
questions:
{ Is submodular function maximization a suitable framework for data
summarization and can summarization help us to detect the current state of a
system?
{ How can we make data summarization on data streams more suitable for
small embedded systems frequently found in IoT settings?
In order to answer these questions, we rst presented the Sieve-Streaming
algorithm from [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Sieve-Streaming works on the assumption of submodularity and
uses this to sieve out unwanted items. The main advantage of Sieve-Streaming is
its strong and universal theoretical foundation o ering the approximation
guarantee of at least ( 21 ").
      </p>
      <p>This theoretical foundation however, leaves room for a more specialized version of
Sieve-Streaming tailored towards data summarization. Therefore, we introduced
a more re ned version of Sieve-Streaming for data summarization by reducing
the overall number of Sieves needed. Additionally, we introduced a dynamic
version of Sieve-Streaming independent from the summarization task which is able
to reuse sieves with di erent thresholds. We note, that both enhancements do
not jeopardize the theoretical analysis of Sieve-Streaming and still maintain the
( 21 ") approximation guarantee.</p>
      <p>We evaluated our algorithmic changes on three data sets. The experiments have
shown that we are able to detect the states of systems using the presented data
summarization approach and that an increased utility function usually comes
with an increased recall of system states. Thus, Dynamic Sieve-Streaming is a
valuable enhancement of the standard Sieve-Streaming algorithm consistently
increasing the recall leading to a 99:976% reduction of workload for human
operators in our experiments.</p>
      <p>Dynamic Sieve-Streaming has its limitations. We did not always detect all
hidden states in the experiments. Future work will investigate other kernel
functions, which could have been tuned more.</p>
      <p>(a) Pure Sieve-Streaming
(b) Dynamic Sieve-Streaming
Fig. 3: Experiments on synthetic data (left column), UJIndoor location data (middle
column) and MNIST (right column). The rst row depicts recall (higher is better),
the second row shows the maximum utility value (higher is better) and the third row
displays the runtime (in milliseconds) per element (smaller is better) on the datasets.</p>
      <p>Additionally, we observe that Sieve-Streaming and its newer enhancements
do not deal with concept drift. Here, further work is also needed.</p>
      <p>In sum, dynamic Sieve-Streaming is a helpful tool for exploring and
monitoring data streams with a solid theoretical foundation. At the same time, we also
see a potential for new research directions including concept drift and specialized
kernel functions.</p>
      <p>Notes and Comments. Part of the work on this paper has been supported
by Deutsche Forschungsgemeinschaft (DFG) within the Collaborative Research
Center SFB 876 (http://sfb876.tu-dortmund.de) "Providing Information by
Resource-Constrained Analysis", project A1.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Badanidiyuru</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mirzasoleiman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karbasi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krause</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Streaming submodular maximization: Massive data summarization on the y</article-title>
          .
          <source>In: ACM SIGKDD</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Brooks</surname>
            ,
            <given-names>B.P.:</given-names>
          </string-name>
          <article-title>The coe cients of the characteristic polynomial in terms of the eigenvalues and the elements of an n n matrix</article-title>
          .
          <source>Applied mathematics letters</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Graf</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borer</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Normalization in support vector machines</article-title>
          .
          <source>In: DAGM Symposium of Pattern Recognition</source>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Krause</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golovin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Submodular function maximization</article-title>
          . In: Tractability: Practical Approaches to Hard Problems. Cambridge University Press (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lawrence</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seeger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herbrich</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , et al.:
          <article-title>Fast sparse gaussian process methods: The informative vector machine</article-title>
          .
          <source>In: NIPS</source>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>LeCun</surname>
          </string-name>
          , Y.,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ha</surname>
            <given-names>ner</given-names>
          </string-name>
          , P.:
          <article-title>Gradient-based learning applied to document recognition</article-title>
          .
          <source>Proceedings of the IEEE</source>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bilmes</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A class of submodular functions for document summarization</article-title>
          . In:
          <article-title>Meeting of the Association for Computational Linguistics (</article-title>
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Mirzasoleiman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Badanidiyuru</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karbasi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Fast constrained submodular maximization: Personalized data summarization</article-title>
          . In: ICML (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Mirzasoleiman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karbasi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krause</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Deletion-robust submodular maximization: Data summarization with "the right to be forgotten"</article-title>
          . In: ICML (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Seeger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Greedy forward selection in the informative vector machine</article-title>
          .
          <source>Tech. rep.</source>
          , University of California at Berkeley (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Stolpe</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The internet of things: Opportunities and challenges for distributed data analysis</article-title>
          .
          <source>ACM SIGKDD Explorations Newsletter</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Torres-Sospedra</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montoliu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart</surname>
            nez-Uso,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Avariento</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arnau</surname>
            ,
            <given-names>T.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benedito-Bordonau</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huerta</surname>
          </string-name>
          , J.:
          <article-title>Ujiindoorloc: A new multi-building and multioor database for wlan ngerprint-based indoor localization problems</article-title>
          . In: IPIN (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Tschiatschek</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyer</surname>
            ,
            <given-names>R.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bilmes</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          :
          <article-title>Learning mixtures of submodular functions for image collection summarization</article-title>
          .
          <source>In: NIPS</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kirchho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bilmes</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          :
          <article-title>Using document summarization techniques for speech data subset selection</article-title>
          .
          <source>In: HLT-NAACL</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>