<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexander Hartl</string-name>
          <email>alexander.hartl@tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Félix Iglesias Vázquez</string-name>
          <email>felix.iglesias@tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tanja Zseby</string-name>
          <email>tanja.zseby@tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Inst. of Telecom., TU Wien</institution>
          ,
          <addr-line>Gusshausstraße 25 / E389, 1040 Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Le Studium</institution>
          ,
          <addr-line>1 Rue Dupanloup, 45000 Orléans</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>information. We present SDOoop, a streaming data analysis algorithm that spots contextual anomalies undetectable by traditional methods, while enabling the inspection of data geometries, clusters and temporal patterns. We used SDOoop to model real network communications in critical infrastructures. We also evaluated SDOoop with data from intrusion detection and natural science domains and obtained performances equivalent or superior to state-of-the-art approaches. SDOoop is ideal for big data, being able to instantly process large volumes of Contextual Anomalies, Streaming Data Analysis A contextual (aka. conditional or out-of-phase) anomaly “occurs if a point deviates in its local context” a one-week period. If a cluster occurs exclusively during weekends, but a data point of this cluster accidentally appears on Wednesday, this method will not identify it as an anomaly, but as a normal inlier instead. Most traditional approaches are blind to identify contextual anomalies, which have been tackled mainly in time series analysis [2], but here experts also emphasize the low attention given to them despite its relevance for cybersecurity, healthcare and fraud detection [3].</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], i.e., if it happens outside its usual time. Consider a method whose observation horizon spans
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>We conducted exhaustive testing of SDOoop (described in [6] and https://github.com/CN-TU/
tpsdos-experiments), including: (a) a proof of concept (PoC) of the contextual outlier detection, (b)
anomaly detection comparisons with established algorithms on public datasets, and (c) evaluations
of SDOoop ability to discover and model temporal patterns in real communications from critical
infrastructures (smart metering) and the darkspace [7].</p>
      <p>In Fig. 1 we can see the distinctive ability of SDOoop to detect contextual outliers. Table 1 compares
accuracy (AAP and ROC-AUC [8]) of consolidated SAD algorithms for the SWAN-SF [9] and KDD
Cup’99 [10] datasets, related to solar flares and network security respectively. SDOoop performances
(T. Zseby)</p>
      <p>CEUR</p>
      <p>ceur-ws.org
1.0
00001....2.4680 Dimension2 -ARCCUO00..89
1.00.0 0.7
0.00.20.4D0im.60en.8sion1</p>
      <p>SDOoop
SW-KNN</p>
      <p>RRCT
0.60.000 0.025 0.050 0.075 0.100</p>
      <p>Fraction of out-of-phase outliers
(b) Performance
are excellent in both cases. While the anomalies defined in the SWAN-SF dataset are not contextual,
some of the U2R (User to Root) attacks in the KDD Cup‘99 dataset are, hence the notable advantage of
SDOoop. Table 2 shows a qualitative comparison of main SDA methods, SW- NN and SW-LOF being
the streaming (i.e., sliding window) versions of the popular  NN [11] and LOF [12] algorithms1.</p>
      <p>In tests with real communications, SDOoop discovered and modeled main temporal patterns of trafic
from critical infrastructures, corresponding to: ICMP pings (device checking), DNS lookups (name
resolution for meter reading transmissions), DNS caching, and heartbeat messages. As for the darkspace,
SDOoop captured anomalies through their diurnal and semi-diurnal periodicities, identified in previous
research [19] with Conficker.C worms, BitTorrent misconfigurations, horizontal scan, vertical scan and
UDP probing activities.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Conclusions</title>
      <p>SDOoop conforms to next-generation machine learning, which, besides accuracy and speed, must
provide interpretable and informative models.
1Algorithm implementations used in the evaluation are from the dSalmon Python package [13], while synthetic data have
been generated with MDCgen [14].</p>
      <p>manipulation in stock market, in: IEEE Int. Conf. on Data Sci. and Adv. Analytics (DSAA), 2015.
doi:10.1109/DSAA.2015.7344856.
[4] F. Iglesias, T. Zseby, A. Zimek, Outlier detection based on low density models, in: ICDMW, 2018,
pp. 970–979. doi:10.1109/ICDMW.2018.00140.
[5] A. Hartl, F. Iglesias, T. Zseby, SDOstream: Low-density models for streaming outlier detection, in:</p>
      <p>ESANN 2020 proceedings, 2020, pp. 661–666.
[6] A. Hartl, F. Iglesias, T. Zseby, SDOoop: Capturing periodical patterns and out-of-phase anomalies
in streaming data analysis (2024). arXiv:2409.02973, arXiv, eprint: 2409.02973, https://arxiv.org/
abs/2409.02973.
[7] CAIDA, The UCSD network telescope ”patch tuesday“ dataset, http://www.caida.org/data/passive/
telescope-patch-tuesday_dataset.xml, ???? Acc.: 2021-03-09.
[8] G. O. Campos, A. Zimek, J. Sander, R. J. Campello, B. Micenková, E. Schubert, I. Assent, M. E.</p>
      <p>Houle, On the evaluation of unsupervised outlier detection: Measures, datasets, and an empirical
study, DAMI 30 (2016) 891–927. doi:10.1007/s10618- 015- 0444- 8.
[9] R. A. Angryk, P. C. Martens, B. Aydin, D. Kempton, S. S. Mahajan, S. Basodi, A. Ahmadzadeh,
X. Cai, S. Filali Boubrahimi, S. M. Hamdi, M. A. Schuh, M. K. Georgoulis, Multivariate time series
dataset for space weather data analytics, Scientific Data 7 (2020).
[10] KDD Cup 1999 data, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, ????
[11] S. Ramaswamy, R. Rastogi, K. Shim, Eficient algorithms for mining outliers from large data sets,</p>
      <p>SIGMOD Rec. 29 (2000) 427–438.
[12] M. M. Breunig, H.-P. Kriegel, R. T. Ng, J. Sander, Lof: Identifying density-based local outliers,</p>
      <p>SIGMOD Rec. 29 (2000) 93–104.
[13] A. Hartl, F. Iglesias, T. Zseby, dSalmon: High-speed anomaly detection for evolving multivariate
data streams, in: Performance Evaluation Methodologies &amp; Tools, Springer, 2024, pp. 153–169.
doi:10.1007/978- 3- 031- 48885- 6_10.
[14] F. Iglesias, T. Zseby, D. Ferreira, A. Zimek, Mdcgen: Multidimensional dataset generator for
clustering, Journal of Classification 36 (2019) 599–618. doi: 10.1007/s00357- 019- 9312- 3.
[15] T. Pevnỳ, Loda: Lightweight on-line detector of anomalies, Machine Learning 102 (2016) 275–304.</p>
      <p>doi:10.1007/s10994- 015- 5521- 0.
[16] S. Sathe, C. C. Aggarwal, Subspace outlier detection in linear time with randomized hashing, in:</p>
      <p>IEEE 16th ICDM, IEEE, 2016, pp. 459–468.
[17] S. Guha, N. Mishra, G. Roy, O. Schrijvers, Robust random cut forest based anomaly detection on
streams, in: Int. Conf. on Mach. Learn., PMLR, 2016, pp. 2712–2721.
[18] E. Manzoor, H. Lamba, L. Akoglu, xStream: Outlier detection in feature-evolving data streams, in:
24th ACM SIGKDD, 2018, p. 1963–1972.
[19] F. Iglesias, T. Zseby, Pattern discovery in internet background radiation, IEEE Trans. on Big Data
5 (2017) 467–480. doi:10.1109/TBDATA.2017.2723893.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ruf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Kaufmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Vandermeulen</surname>
          </string-name>
          , G. Montavon,
          <string-name>
            <given-names>W.</given-names>
            <surname>Samek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kloft</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. G.</given-names>
            <surname>Dietterich</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-R. Müller</surname>
          </string-name>
          ,
          <article-title>A unifying review of deep and shallow anomaly detection</article-title>
          ,
          <source>Proceedings of the IEEE</source>
          <volume>109</volume>
          (
          <year>2021</year>
          )
          <fpage>756</fpage>
          -
          <lpage>795</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Shaukat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shabbir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. A.</given-names>
            <surname>Hameed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Abbas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Javed</surname>
          </string-name>
          ,
          <article-title>A review of time-series anomaly detection techniques: A step to future perspectives</article-title>
          , in: K. Arai (Ed.), Adv. in Inf. &amp;
          <string-name>
            <surname>Com</surname>
          </string-name>
          ., Springer,
          <year>2021</year>
          , pp.
          <fpage>865</fpage>
          -
          <lpage>877</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Golmohammadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. R.</given-names>
            <surname>Zaiane</surname>
          </string-name>
          ,
          <article-title>Time series contextual anomaly detection for detecting market</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>