<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Provenance-Aware LOD Datasets for Detecting Network Inconsistencies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Leslie F. Sikos</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dean Philp</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shaun Voigt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Catherine Howard</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markus Stumptner</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wolfgang Mayer</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Defence Science and Technology Group</institution>
          ,
          <addr-line>Adelaide</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of South Australia</institution>
          ,
          <addr-line>Adelaide</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Contextualized knowledge graphs (CKGs) have been gaining importance in recent years by providing context-aware datasets in various knowledge domains. In communication network analysis, for example, CKGs can be used to improve cyber-situational awareness or to reason about network topologies. Despite the potential of these graphs, there is a lack of published CKG-based datasets for communication networks. The complexity, scale, and rapid changes of real-world communication networks make it crucial to capture not only network knowledge in network datasets, but also additional metadata. Therefore, this paper presents communication network datasets, enriched with provenance, timestamps, and location data, which can be used for benchmarking, in silico experiments, and aimed at serving as the basis for further applications and research.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Cyber-situational awareness applications rely on heterogeneous data sources,
ranging from routing messages to router con guration les through to open
datasets, all of which have di erent le formats and data structures [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The
Resource Description Framework (RDF)1 can be used to provide a uniform
representation for network data derived from heterogeneous resources [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], however,
automatically generated data may not be considered authoritative, veri able,
and reproducible, unless data provenance (the source or origin of data) is
captured [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], optionally complemented by other types of metadata and the
uncertainty and vagueness of statements about dynamic network knowledge [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Providing provenance for RDF statements is a long-standing, non-trivial
problem in the Semantic Web research community, which led to di erent approaches.
Some extended the standard RDF data model (e.g., RDF+ [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], SPOTL [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and
RDF* [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]) or the RDFS semantics (Annotated RDF Schema [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], G-RDF [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]),
others proposed alternate data models (e.g., N3Logic [10]), decomposed RDF
graphs (RDF molecule [11]), encapsulated provenance with RDF triples (e.g.,
1 https://www.w3.org/RDF/
      </p>
      <p>Copyright 2018 for this paper by its authors. Copying permitted for private and
academic purposes.</p>
      <p>Provenance Context Entity (PaCE) [12], singleton property [13]), captured
context (e.g., named graphs [14], RDF triple coloring [15], nanopublications [16]),
and utilized vocabularies and ontologies, such as the Provenir ontology [17]and
NdFluents [18]. While there are several ontologies described in the literature for
network knowledge representation, very few, such as the Situational Awareness
(SAW) Ontology [19] and the Communication Network Topology and Forwarding
Ontology (CNTFO)2 [20], are purposefully designed for capturing
provenanceaware network knowledge for applications that require cyber-situational
awareness. With the need for CKG-based communication network datasets in mind,
as well as the lessons learned from popular datasets (e.g., DARPA '99 [21]), this
paper presents novel CKG-based datasets. The presented datasets utilize named
graphs to capture provenance, thereby di erentiating between network
knowledge statements (by source type), CNTFO terms to capture network knowledge
and network-speci c provenance, and PROV-O 3 to describe general provenance.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Provenance-Aware Network Knowledge Datasets</title>
      <p>Using the publicly available Common Open Research Emulator (CORE)4,
realistic scenarios are modeled in these datasets, in which two Australian businesses|
each with sites in Adelaide and in Melbourne|require two Internet Service
Providers (ISPs) and 24/7 Internet access (dual-homing). The underlying model
consists of 60 devices in total, each with several network interfaces. Two types
of network models have been constructed (8 models in total), covering IPv4 and
IPv6 base cases and well-documented deliberate miscon gurations, the latter
of which are errors that impact both network performance and security. These
models were used to generate context-aware RDF datasets, collectively called
ISPnet. These datasets are compliant with Semantic Web best practices and
constitute LOD data. All nodes of the corresponding RDF graphs are globally
dereferencable. The integrity of the datasets have been checked with HermiT,5
FaCT++,6 and Pellet.7 This paper focuses on two of these publicly released
datasets: 1) IPv4 base8 and 2) IPv4 overlapping subnets.9 They cover
heterogeneous network data derived from device con gurations, traceroutes, OSPF LSAs,
and arpings. The DL expressivity of both datasets is ALU . The rst dataset
denes 55 classes and 322 individuals with 1,595 axioms. The second dataset has
14 classes and 295 individuals de ned in the form of 1,264 axioms. The dataset
les are accompanied by standard-compliant VoID10 descriptions.
2 http://purl.org/ontology/network/
3 http://www.w3.org/ns/prov-o
4 https://www.nrl.navy.mil/itd/ncs/products/core
5 http://www.hermit-reasoner.com
6 http://owl.cs.manchester.ac.uk/tools/fact/
7 https://github.com/stardog-union/pellet
8 http://purl.org/dataset/ispnet/base/
9 http://purl.org/dataset/ispnet/overlap/
10 https://www.w3.org/TR/void/</p>
    </sec>
    <sec id="sec-3">
      <title>Case Study</title>
      <p>We provide an excerpt from our two datasets, namely, ISPnet and ISPnetOL.
The ISPnet dataset was generated using our base network model, whereas
ISPnetOL was generated using a deliberate miscon guration of the base
network model. Both datasets contain four types of named graphs that
correspond to heterogeneous network data sources (CORE, traceroute, arping, and
OSPF LSAs). The datasets demonstrate three levels of provenance:
triplelevel, graph-level, and dataset-level provenance. Triple-level provenance includes
statements such as ispnet:C1-ADL-R1 prov:atLocation dbpedia:Adelaide,
indicating that Router 1 of Customer 1 is geographically located in
Adelaide. Graph-level provenance includes statements such as ispnet:TRACEROUTE4
net:ImportHost "C1-ADL-PC3", which indicates that Computer 3 of
Customer 1 is where the traceroute command was executed. Dataset-level
provenance includes statements such &lt;http://purl.org/dataset/ispnet/base/&gt;
prov:wasAssociatedWith "DST Group Australia".</p>
      <p>By comparing the CORE graphs between the two datasets, it can be inferred
that C1-ADL-R1 eth1 was connected to 10.10.0.164/30 on 13 May 2018, whereas
on 14 May the connection changed to 10.10.0.185/29; this is the rst indication
of a con guration error (see Fig. 1).</p>
      <p>By comparing the PROVENANCE graphs in conjunction with the
TRACEROUTE, ARPING and LSDB graphs, it can be inferred that C1-ADL-PC3
could previously reach 10.10.0.169 (the customer gateway), but subsequently
could access only 10.10.0.173 (not the gateway).</p>
      <p>This allows another inference, namely, that Customer 1 in Adelaide has lost
Internet access, which is important for cyber-situational awareness. Without
statements of three facets of provenance, i.e., time, location, and importHost, we
could not have performed the required information fusion and reasoning to make
this inference. Importantly, this inference is indeed correct: our speci c deliberate
miscon guration example actually does cause Customer 1 to lose Internet access.</p>
      <p>Figure 2 shows a small part of the RDF graph of the rst dataset le of the
case study, demonstrating statements derived from three di erent data sources
(CORE, a traceroute, and an arping), and some of the associated provenance
statements.</p>
      <p>http://purl.org/dataset/ispnet/base/
ispnet:CORE
ispnet:PROVENANCE</p>
      <p>The statements about the IP address associated with interface
C1-ADL-GWY eth0 and I10.10.0.169 suggest that these entities are
actually identical (a link can be created between the two using owl:sameAs), only
they were named di erently at di erent stages of network knowledge discovery
based on the information available at the time. The automated identi cation of
"10.10.0.169"
ispnet:TRACEROUTE4
net:importUser
"root"
ispnet:I10.10.0.169</p>
      <p>net:Interface
ispnet:I10.10.0.65
net:Interface
net:importHost</p>
      <p>"C1-ADL-PC3"
ispnet:ARPING1</p>
      <p>net:importHost
"2018-05-14T16:43:04.828578"
net:importTime
net:importTime
"2018-05-14T16:42:38"
…
such relationships is bene cial for network analysts and enables the generation
of useful, non-trivial RDF statements that help understand network element
connectivity and tra c ow.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>Due to the unavailability of CKG-based datasets for communication networks,
practitioners and researchers need standard datasets to compare, contrast, and
build upon to further both practical applications and research. This paper
presented such context-aware network knowledge datasets, which can be used for
modeling communication networks and testing semantic formalisms for
capturing metadata-enriched network knowledge statements with RDF quadruples.
These datasets are novel in terms of complexity, statement-level and statement
group-level metadata, realistic environment model, and con guration
parameters. They cover heterogeneous network data derived from a variety of sources,
which can be utilized for facilitating information fusion.
10. Berners-Lee, T. (2008) Notation 3 Logic. https://www.w3.org/DesignIssues/</p>
      <p>N3Logic. Accessed 3 April 2018
11. Ding, L., Finin, T., Peng, Y., Da Silva, P.P., McGuinness, D.L. (2005) Tracking
RDF graph provenance using RDF molecules. Fourth International Semantic Web
Conference, Galway, Ireland, 6{10 November 2015
12. Sahoo, S.S., Bodenreider, O., Hitzler, P., Sheth, A., Thirunarayan, K. (2010)
Provenance Context Entity (PaCE): scalable provenance tracking for scienti c RDF
data. In: Gertz, M., Ludscher, B. (eds.) Scienti c and statistical database
management. Lect. Notes Comput. Sci., vol. 6187, pp. 461{470. Heidelberg: Springer.
https://doi.org/10.1007/978-3-642-13818-8_32
13. Nguyen, V., Bodenreider, O., Sheth, A. (2014) Don't like RDF rei cation?
Making statements about statements using singleton property. In: Proceedings of the
23rd International Conference on World Wide Web, pp. 759{770. New York: ACM.
https://doi.org/10.1145/2566486.2567973
14. Carroll, J.J., Bizer, C., Hayes, P., Stickler, P. (2005) Named graphs, provenance
and trust. In: Proceedings of the 14th International Conference on World Wide
Web, pp. 613{622. New York: ACM. https://doi.org/10.1145/1060745.1060835
15. Flouris, G., Fundulaki, I., Pediaditis, P., Theoharis, Y., Christophides, V. (2009)
Coloring RDF triples to capture provenance. In: Bernstein, A., Karger, D. R., Heath,
T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) The Semantic
Web { ISWC 2009. Lect. Notes Comput. Sci., vol. 5823, pp. 196{212. Heidelberg:
Springer. https://doi.org/10.1007/978-3-642-04930-9_13
16. Groth, P., Gibson, A., Velterop, J. (2010) The anatomy of a nanopublication.</p>
      <p>Inform. Serv. Use 30(1{2):51{56. https://doi.org/10.3233/ISU-2010-0613
17. Sahoo, S.S., Sheth, A. (2009) Provenir ontology: towards a framework for eScience
provenance management. Microsoft eScience Workshop, Pittsburgh, PA, USA, 15{
17 October 2009
18. Gimnez-Garca, J.M., Zimmermann, A., Maret, P. (2017) NdFluents: an ontology
for annotated statements with inference preservation. In: Blomqvist, E., Maynard,
D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) The Semantic Web.
Lect. Notes Comput. Sc., vol. 10249, pp. 638{654. Cham: Springer.</p>
      <p>https://doi.org/10.1007/978-3-319-58068-5_39
19. Sheth, A. (2007) Leveraging Semantic Web techniques to gain situational
awareness. Can Semantic Web techniques empower perception and comprehension in
cyber situational awareness? Cyber Situational Awareness Workshop, Fairfax, VA,
USA, 14-15 Nov 2007.
20. Sikos, L. F., Stumptner, M., Mayer, W., Howard, C., Voigt, S., Philp, D. (2018)
Representing Network Knowledge Using Provenance-Aware Formalisms for
CyberSituational Awareness. Procedia Computer Science
21. Thomas, C., Sharma,V., Balakrishnan, N. (2008) Usefulness of DARPA dataset
for intrusion detection system evaluation. In: Proceedings of the 2008 SPIE Defense
and Security Symposium, Orlando, FL, USA, 17 March 2008. https://doi.org/
10.1117/12.777341</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Sikos</surname>
            ,
            <given-names>L. F</given-names>
          </string-name>
          . (ed.) (
          <year>2018</year>
          ).
          <article-title>AI in Cybersecurity</article-title>
          . Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -98842-9
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Sikos</surname>
            ,
            <given-names>L. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stumptner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mayer</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Howard</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Voigt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Philp</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Summarizing Network Information for Cyber-Situational Awareness via Cyber-Knowledge Integration</article-title>
          .
          <article-title>AOC 2018 Convention</article-title>
          , Adelaide, Australia, May 2018
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Sikos</surname>
            ,
            <given-names>L. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stumptner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mayer</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Howard</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Voigt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Philp</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>Automated reasoning over provenance-aware communication network knowledge in support of cyber-situational awareness</article-title>
          . In: Liu,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Giunchiglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>B</surname>
          </string-name>
          . (eds.) Knowledge Science, Engineering, and Management. Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -99247-1_
          <fpage>12</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Sikos</surname>
            ,
            <given-names>L. F.</given-names>
          </string-name>
          <string-name>
            <surname>Handling</surname>
          </string-name>
          <article-title>Uncertainty and Vagueness in Network Knowledge Representation for Cyberthreat Intelligence</article-title>
          .
          <source>2018 IEEE World Congress on Computational Intelligence</source>
          , Rio de Janeiro, Brazil,
          <year>July 2018</year>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dividino</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sizov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schueler</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2009</year>
          )
          <article-title>Querying for provenance, trust, uncertainty and other meta knowledge in RDF</article-title>
          .
          <source>Web Semant. Sci. Serv</source>
          .
          <source>Agents World Wide Web</source>
          <volume>7</volume>
          (
          <issue>3</issue>
          ):
          <volume>204</volume>
          {
          <fpage>219</fpage>
          . https://doi.org/10.1016/j.websem.
          <year>2009</year>
          .
          <volume>07</volume>
          .004
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ho</surname>
            <given-names>art</given-names>
          </string-name>
          , J.,
          <string-name>
            <surname>Suchanek</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berberich</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2012</year>
          )
          <article-title>YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia</article-title>
          .
          <source>Artif. Intell</source>
          .
          <volume>194</volume>
          :
          <issue>28</issue>
          {
          <fpage>61</fpage>
          . https://doi.org/10.1016/j.artint.
          <year>2012</year>
          .
          <volume>06</volume>
          .001
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hartig</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thompson</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2014</year>
          )
          <article-title>Foundations of an alternative approach to rei cation in RDF</article-title>
          . https://arxiv.org/abs/1406.3399
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Zimmermann</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopes</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Straccia</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          (
          <year>2012</year>
          )
          <article-title>A general framework for representing, reasoning and querying with annotated Semantic Web data</article-title>
          .
          <source>Web Semant. Sci. Serv</source>
          .
          <source>Agents World Wide Web</source>
          <volume>11</volume>
          :
          <fpage>72</fpage>
          {
          <fpage>95</fpage>
          . https://doi.org/10.1016/ j.websem.
          <year>2011</year>
          .
          <volume>08</volume>
          .006
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Analyti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Damsio</surname>
            ,
            <given-names>C.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antoniou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pachoulakis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          (
          <year>2014</year>
          )
          <article-title>Why-provenance information for RDF, rules, and negation</article-title>
          . Ann. Math. Artif. Intell.
          <volume>70</volume>
          (
          <issue>3</issue>
          ):
          <volume>221</volume>
          {
          <fpage>277</fpage>
          . https://doi.org/10.1007/s10472-013-9396-0
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>