=Paper=
{{Paper
|id=None
|storemode=property
|title=Intrinsically Resilient Energy Control Systems
|pdfUrl=https://ceur-ws.org/Vol-966/STIDS2012_T12_SheldonEtAl_ResilientEnergySystems.pdf
|volume=Vol-966
|dblpUrl=https://dblp.org/rec/conf/stids/SheldonHDFGKMMW12
}}
==Intrinsically Resilient Energy Control Systems==
<pdf width="1500px">https://ceur-ws.org/Vol-966/STIDS2012_T12_SheldonEtAl_ResilientEnergySystems.pdf</pdf>
<pre>
            Using Semantic Web Technologies to Develop
            Intrinsically Resilient Energy Control Systems
             Frederick Sheldon and Daniel Fetzer                                                                Jingshan Huang
                   Oak Ridge National Laboratory                                                          University of South Alabama
                   Oak Ridge, TN 37831, U.S.A.                                                             Mobile, AL 36688, U.S.A.
                   {sheldonft, fetzerdt}@ornl.gov                                                          huang@southalabama.edu
                   Jiangbo Dang and Dong Wei                                                                      David Manz
 Siemens Corporation, Corporate Research and Technology                                              Pacific Northwest National Laboratory
              Princeton, NJ 08540, U.S.A.                                                                 Richland, WA 99354, U.S.A.
         {jiangbo.dang, dong.w}@siemens.com                                                                  david.manz@pnnl.gov
                             Thomas Morris                                                            Jonathan Kirsch and Stuart Goose
                   Mississippi State University                                            Siemens Corporation, Corporate Research and Technology
                Mississippi State, MS 39762, U.S.A.                                                     Berkeley, CA 94704, U.S.A.
                      morris@ece.msstate.edu                                                    {jonathan.kirsch, stuart.goose}@siemens.com

  Abstract—To preserve critical energy control functions while                           to determine counter-measures to prevent recurrence and
under attack, it is necessary to perform comprehensive analysis                          possibly collect evidence to legally prosecute the offenders.
on root causes and impacts of cyber intrusions without sacrificing                       This analysis and response must be done without interrupting
the availability of energy delivery. We propose to design an                             the availability of the energy delivery systems.
intrinsically resilient energy control system where we extensively                           To address the aforementioned challenges, this paper
utilize Semantic Web technologies, which play critical roles in                          presents the design and architecture of InTRECS, an
knowledge representation and acquisition. While our ultimate                             InTrinsically Resilient Energy Control System. The ultimate
goal is to ensure availability/resiliency of energy delivery                             goal of InTRECS is to provide tools and technologies to ensure
functions and the capability to assess root causes and impacts of
                                                                                         the availability/resiliency of energy delivery functions, along
cyber intrusions, the focus of this paper is to demonstrate a proof
                                                                                         with the capability to assess root causes and impacts of cyber
of concept of how Semantic Web technologies can significantly
contribute to resilient energy control systems.                                          intrusions. To meet these goals, InTRECS extensively applies
    Index Terms—cybersecurity, energy control system, ontology,                          Semantic Web technologies, including cybersecurity domain
knowledge base, semantic annotation, data integration.                                   ontologies, a comprehensive knowledge base, and semantic
                                                                                         data annotation & integration techniques. Semantic Web
                         I. INTRODUCTION                                                 technologies are built upon ontologies, which are formal,
    Our energy infrastructure depends on energy delivery                                 declarative knowledge models and have been shown to play
systems comprised of complex and geographically dispersed                                critical roles in knowledge representation and acquisition.
network architectures with vast numbers of interconnected                                    In this paper, we argue that applying Semantic Web
components. These systems provide critical functions to                                  technologies in InTRECS affords several benefits compared to
provide information and automated control over a large,                                  typical approaches that utilize relational databases:
complex network of processes that collectively ensure reliable                            While relational databases focus on syntactic
and safe production and distribution of energy. The energy                                    representation of data and lack the ability to explicitly
utilities are modernizing these vast networks with millions of                                encode semantics, Semantic Web technologies support
smart meters, high speed sensors, advanced control systems,                                   rich semantic encoding, which is critical in automated
and a supporting communications infrastructure. This                                          knowledge acquisition.
additional complexity brings benefits, but also increases the                             Powerful tools exist for capturing and managing
risks of cyber attacks that could potentially disrupt our energy                              ontological knowledge, including an abundance of
delivery. These systems must maintain high availability and                                   reasoning tools readily supplied for ontological models,
reliability even when under attack. After a security incident has                             making it much more convenient to query, manipulate, and
been detected, the incident response team needs the ability to                                reason over available data sets. As a result, semantics-
investigate and determine the root cause, attack methods,                                     based queries, instead of SQL queries, are made possible.
consequences, affected assets, impacted stakeholders, and other                           Advances in an energy delivery system (EDS) require
information in order to inform an effective response. The                                     changes to be made regularly regarding underlying data
response team needs this information in the short term in order                               models. In addition, more often than not, it is preferable to
to contain or eradicate the attack, recover compromised                                       represent data at different levels and/or with different
equipment, and restore normal operation. The team also needs                                  abstractions. There are no straightforward methods for
                                                                                              performing such updates if relational models are adopted.
This manuscript has been authored by contractors of the U.S. Government (USG) under
contract DE-AC05-00OR22725. Accordingly, the USG retains a nonexclusive, royalty-         Semantic Web technologies better enable EDS researchers
free license to publish or reproduce the published form of this contribution, or allow        to append additional data into repositories in a more
others to do so, for USG purposes.
    flexible and efficient manner. The formal semantics                  semantics (intended meanings) rather than data syntax (forms
    encoded in ontologies makes it possible to reuse data in             in which data are represented). Reasons for developing
    unplanned and unforeseen ways, especially when data                  ontologies include, but not limited to: (i) to share domain
    users are not data producers, which is now very common.              information among people and software; (ii) to enable reuse of
                                                                         domain knowledge; (iii) to analyze domain knowledge and
    While our ultimate goal is to ensure availability/resiliency         make it more explicit; and (iv) to separate domain knowledge
of energy delivery functions and the capability to assess root           from its implementation. There exist some domain ontologies
causes and impacts of cyber intrusions, the focus of this paper          in cybersecurity and related areas, e.g., Intrusion Detection
is to demonstrate a proof of concept of how Semantic Web                 System Ontology [1], Network Security Ontology [2], Process
technologies can significantly contribute to resilient energy            Control Ontology [4], INSPIRE Ontology [5], and GE SADL
control systems. The rest of the paper is organized as follows.          Host Defense Ontology [7]. These ontologies provide metadata
Section II gives a brief review on related research in ontologies        and standard terminologies in respective domains.
and semantic annotation & integration, respectively. Section III
describes the overall architecture of InTRECS, followed by               B. Semantic Data Annotation & Integration
methodology details for developing domain ontologies &                   Semantic data annotation & integration can bring critical
knowledge base and performing data annotation & integration.             impacts and benefits to data analysis and management.
Section IV demonstrates our preliminary experimental results.            Semantic annotation (tagging) systems can be divided into
Finally, Section V concludes with future research directions.            manual, semi-automatic, and automatic ones [9]. In manual
                        II. RELATED WORK                                 tagging systems (Sema-Link [10] for example), users employ
                                                                         controlled vocabularies from some ontology to tag documents.
A. Ontologies in Energy Delivery Control and Cybersecurity
                                                                         Such a manual process is time-consuming and requires deep
    Energy delivery control systems comprise complex network             domain expertise, in addition to the inconsistency issue. Semi-
architectures that may contain hundreds of specialized cyber             automatic tagging systems improve manual tagging systems
components and may extend across wide geographical regions.              by automatically parsing documents and recommending
Cyber attack investigation involves examining large volumes              potential tags. Human annotators only need to select tags from
of data from heterogeneous sources. Researchers are facing the           candidates suggested by the system. Automatic semantic
challenge of how to maintain the integrity of data derived from
                                                                         tagging systems offer further improvement by parsing and
diverse sources across distributed geographic areas ([1-7]).
These research efforts have resulted in various ad-hoc                   tagging documents with ontological concepts and instances in
proprietary formats for storing and analyzing data and                   a fully automatic way. Zemanta [11] is such an example. By
maintaining respective metadata. Different parties are likely to         suggesting contents from various sources, such as Wikipedia,
adopt different formats according to specific needs. Therefore,          YouTube Flickr, and Facebook, Zemanta disambiguates terms
the seamless communication among different parties, along                and maps them to the Common Tag Ontology [12]. Dang et al.
with the knowledge sharing and reuse that follow, become a               have developed one of the largest comprehensive, domain-
non-trivial problem. Turnitsa and Tolk [8] discussed in depth            independent ontological knowledge base, UNIpedia+ [13],
multi-resolution, multi-scope, and multi-structure challenges            which covers around 11 million named English entities. Based
during data exchange between different models.                           on UNIpedia+, they further developed an automatic tagging
    Semantic Web technologies that are based on domain                   system [14] to produce semantically linked tags for given data.
ontologies can render tremendous help. Ontologies are                    The information system architecture in the Los Angeles Smart
declarative     knowledge       models,      defining   essential        Grid project [15] enabled analytical tools and algorithms to
characteristics and relationships for specific domains of interest.      forecast energy load and identify load curtailment response
As a semantic foundation, ontologies greatly help domain                 through semantically meaningful data.
experts to formally define domain knowledge in terms of data


                                               Fig. 1. Overall architecture of InTRECS system.
                     III. METHODOLOGY                             sources. Query results, e.g., the root cause, extent, and
                                                                  impacts of the cyber intrusion, can then be provided back to
A. InTRECS Overall Architecture                                   end users. InTRECS will also push security alerts up to end
    Figure 1 illustrates the overall architecture of InTRECS,     users. Both query results and alerts are regarded as semantic
which is decomposed into six subsystems.                          decision support to end users because they extensively utilize
     Intrusion-Tolerant SCADA (InTRADA)                          Semantic Web technologies, namely, domain ontologies,
        We will develop a survivable SCADA system based           RDF triples resulting from semantic annotation, and
        on intrusion-tolerant replication [16]. InTRADA will      inferences & analysis performed at the semantic level.
        be capable of guaranteeing correct operations and         B. CoEDS Domain Ontologies and Knowledge Base
        excellent performance even when part of the system
                                                                       There are four components in CoEDS KB: (i) CoEDS
        has been compromised and is under the control of an
                                                                  domain ontologies, (ii) an RDF repository, (iii) a SPARQL
        intelligent attacker.
                                                                  RDF query engine, and (iv) an inference engine. Through
     Cybersecurity Ontologies and Knowledge Base for             automatic data integration and logic reasoning, CoEDS KB
        Energy Delivery Systems (CoEDS)                           will be able to provide a unified and consistent data layer for
        CoEDS knowledge base (KB) contains domain                 analyzing data at the semantic level. It will thus assist end
        ontologies, a resource description framework (RDF)        users to effectively obtain real-time decision support, so that
        repository, a SPARQL RDF query engine, and an             they can (i) obtain health status updates of SCADA replicas,
        inference engine. The KB will provide end users           (ii) analyze and better understand the root cause, extent, and
        with a unified and consistent data layer for analyzing    impacts of an attack, (iii) acquire situational awareness, and
        data at the semantic level.                               (iv) recommend courses of action.
     Semantic Data Integration and Processing (SeDIEP)
        Our focus is to develop an automatic semantic data          1) Interaction between CoEDS and other InTRECS
        annotation & integration engine for tagging data          subsystems: CoEDS KB actively exchanges information
        sources based on the metadata defined in CoEDS            with other subsystems of InTRECS on a regular basis.
        ontologies. An event-processing engine will handle            InTRADA receives system health and status
        dynamic events and generate security alerts.                      information from CoEDS KB, and incorporates such
     Root Cause and Impact Analysis (RoCIA)                              knowledge to enhance its fault-detection algorithms.
        RoCIA provides the basis to detect cyber incidents                This will enable InTRADA to more rapidly
        and investigate the root cause, attack methods,                   reconfigure itself in the event of a cyber attack by
        consequences, affected assets, impacted stakeholders,             helping it distinguish between performance faults
        attackers’ identity, and other metrics to inform an               caused by a malicious application and by more
        effective response. RoCIA will leverage the Cyber                 benign issues such as transitory network problems.
        Security Econometrics System (CSES) and the                       InTRADA sends to CoEDS KB status updates
        inference and query engines provided within CoEDS                 regarding the health of the replicas, hence providing
        KB to assist EDS stakeholders in evaluating                       data for future cyber attack analysis.
        cybersecurity investments and to provide an                   SeDIEP obtains the data semantics, i.e., ontological
        economic impact assessment of on-going cyber                      metadata, from CoEDS KB and utilizes such
        intrusions.                                                       metadata during the automatic semantic annotation.
     Dashboard Analytics and Situation Awareness                         Annotated        data,     including      cybersecurity
        (DaSA)                                                            econometrics, dynamic events, etc., are stored back
        Dashboard analytics includes a user graphical user                into CoEDS KB to construct and continuously
        interface (GUI) to support interactions between end               update the central data repository in the KB.
        users and InTRECS. Situational awareness will be              CoEDS KB provides RoCIA with topology data as
        performed for end users. We will also support                     well as the data semantics essential for performing
        reasoning through the inference engine in CoEDS.                  root cause and impact analysis. RoCIA supplies
     Test and Evaluation (TnE)                                           CoEDS KB with root cause and impact analysis data,
        Implemented modules will automatically configure                  including attack signatures, attack locations, exploits,
        the test suite environment to the appropriate start               consequences, countermeasures, model parameters,
        state for the test case. A portal will provide the                network components, security requirements, threats,
        information and documentation and will execute the                vulnerabilities, and stakeholders.
        test case. We will also develop a test suite in an end-       CoEDS KB furnishes DaSA with dynamic events
        user setting, including a set of denial of service                and electric grid components and topology data, both
        (DOS), reconnaissance, and network packet integrity               of which are in an annotated form. DaSA sends back
        exploits targeting SCADA, remote terminal unit                    situational awareness data to CoEDS KB. In addition,
        (RTU), and network architecture vulnerabilities.                  the KB also provides the Correlation Layers for
                                                                          Information Query and Exploration (CLIQUE) and
   InTRECS will be constantly active to intrinsically
provide resiliency, i.e., correct operations and excellent                Traffic Circle, two visual analytics tools in DaSA,
                                                                          with interoperability for behavior model-based
performance. At the same time, a DaSA GUI will guide end
                                                                          anomaly detection.
users to generate queries out of data derived from diverse
   2) Motivation for developing CoEDS ontologies: Among          suggested by Uschold and Gruninger [18]: (i) specification
existing ontologies in cybersecurity and related areas           of content; (ii) informal documentation of concept
(mentioned in Section II), there is not a single one that is     definitions (by domain experts); (iii) logic-based
comprehensive enough to cover a complete set of concepts         formalization of concepts and relationships between
and relationships for the purpose of this research. In           concepts; (iv) implementation of the ontology in a computer
particular, with regard to the fields of SCADA status, root      language; and (v) evaluation of the ontology, including the
cause analysis, situational awareness, electric grid             internal consistency and the ability to answer logical
components and topology, cybersecurity econometrics, cost        queries. As illustrated in Figure 2, these five stages are
benefit analysis, and complex event processing, all              essentially ongoing and iterative because end users’ needs
aforementioned existing ontologies are missing some              will change as their understanding of the domain evolves. In
necessary concepts within these critical fields. Even in the     this iterative, knowledge-driven approach, both ontology
case that a specific concept of our interest is contained in     engineers and domain experts have been involved, working
some existing ontology, more often than not, the semantics       together to capture domain knowledge, develop a
defined in such an ontology need to be extended and              conceptualization, and implement the conceptual model.
customized before this concept can be utilized within            The ontology construction process has taken place over a
InTRECS system. In brief, Energy Control Systems (ECS)           number of iterations, involving a series of interviews,
end users lack a comprehensive, customized conceptual            evaluation strategies, and refinements. Standard revision-
model, which prevents the energy sector from leveraging          control procedures have been utilized.
enhanced knowledge acquisition processes brought by
Semantic Web technologies. Such a situation motivates us
to develop CoEDS domain ontologies.
   3) Ontology development principles: We have observed
seven practices suggested by Smith et al. [17]: the ontology
should (i) be freely available; (ii) be expressed using a
standard language or syntax; (iii) provide tracking and
documentation for successive versions; (iv) be orthogonal to
existing ontologies; (v) include natural language
specifications of all concepts; (vi) be developed
collaboratively; and (vii) be used by multiple researchers. In
particular, we propose a decomposition methodology as the
strategy for coming up with orthogonal ontologies. Our
methodology is similar to those used in the database
normalization theory, third normal form (3NF) for example.
We first began with concepts from possibly many sub-
domains in one large set, followed by the identification of
dependencies or overlaps among these concepts, and we
finally proceeded to decompose all concepts based on their
identified dependencies. Our preliminary design is to
develop seven sub-ontologies in CoEDS: SCADA status,                    Fig. 2. Knowledge-driven, iterative ontology development.
root cause & impact, situational awareness, grid component
& topology, cybersecurity econometrics, cost benefit, and           5) Ontology format and development tool: There are
complex event processing. Consequently, we achieved the          different formats and languages for describing ontologies,
orthogonality feature, i.e., the non-overlapping feature, for    all of which are popular and based on different logics: Web
CoEDS domain ontologies.                                         Ontology Language (OWL) [19], Open Biological and
   4) Knowledge-driven ontology development procedure:           Biomedical Ontologies (OBO) [20], Knowledge Interchange
The ontology development was not from scratch. Instead, to       Format (KIF) [21], and Open Knowledge Base Connectivity
(i) take advantage of the knowledge already contained in         (OKBC) [22]. We have chosen the OWL format
existing ontologies and (ii) reduce the possibility of           recommended by the World Wide Web Consortium (W3C).
redundant efforts, we have reused, extended, and                 OWL is designed for use by applications that need to
customized a set of well-established concepts from existing      process the content of information instead of just presenting
domain ontologies. In addition, popular upper ontologies,        information to humans. As a result, OWL facilitates greater
e.g., the Basic Formal Ontology (BFO), was imported into         machine interpretability of Web contents. We have chosen
our ontologies. The ontology development was driven by           Protégé, an open-source ontology editor developed by
domain knowledge and decomposed into five stages, as
Stanford [23], as our development tool over other available      framework for the storage and querying of RDF data. The
tools such as CmapTools and OntoEdit.                            framework is fully extensible and configurable with respect
                                                                 to storage mechanisms, inferencers, RDF file formats, query
   6) CoEDS KB components – RDF Repository, Query
                                                                 result formats, and query languages. In addition, Sesame
Engine, and Inference Engine: Based on the formal
                                                                 offers a JBDC-like user API, streamlined system APIs, and
knowledge defined in CoEDS ontologies, heterogeneous
                                                                 a RESTful HTTP interface supporting the SPARQL
data sources can be annotated and integrated into a central
                                                                 protocol for RDF. Moreover, Sesame contains a built-in
repository. Note that data sources to be integrated include
                                                                 inference engine, and various reasoning tasks, e.g.,
structured, semi-structured, or unstructured data, the
                                                                 subsumption and contradiction reasoning, can be performed.
interoperability thus becomes an obstacle during knowledge
discovery. We adopt RDF, a model for data interchange            C. Semantic Data Annotation and Event Processing
recommended by the W3C, to handle such a challenge. RDF              According to the formal domain knowledge, including a
specifically supports the evolution of schemas over time         global metadata model, defined in CoEDS, heterogeneous
without requiring all the data consumers to be changed. The      data sources can be annotated and seamlessly integrated into
generic structure of RDF allows structured, semi-structured,     a central RDF data repository, which will serve as a unified
and unstructured data to be mixed, exposed, and shared           and consistent data layer for data analytics applications.
across different applications, thus helping to handle the data
interoperability challenge. Following automatic semantic                                      SKMT                                   Event Engine
                                                                                    (Semantic Knowledge Management Tool)
data annotation (see Section III.C), RDF triples will be                                                                             Event Stream
indexed and accumulated into a central repository. SPARQL              Knowledge
                                                                                         Data bases         Query Interface
Protocol and RDF Query Language (SPARQL) [24] is a                      Sources

query language recommended by W3C to retrieve and                                                               Repository          Event Processing
                                                                           Indexing Manager
manipulate RDF data. End users of InTRECS system will be                                                         Manager

guided by a GUI to automatically generate RDF queries
                                                                          Content              Semantic
across semantically integrated sources. These queries will                                     Annotation                                   Alerts

then be executed by a SPARQL-based query engine.
                                                                        Named
     The RDF data repository and query answering are not                Entity
                                                                                        Ontology             Concept           CoEDS
                                                                                        Mapping              Weighting        Knowledge
enough for an effective and comprehensive knowledge                    Detection
                                                                                                                                Base
acquisition. Suppose that some facts do not exist in any                                           Concepts                   (RDF Store)
                                                                      Semantic TagPrint            Properties
original data sources, they will thus not be stored in the RDF
repository. But such information may be critical to end
                                                                                                 CoEDS Ontologies
users. To obtain the ability to acquire previously implicit
knowledge, we will incorporate an inference engine (a.k.a.           Fig. 3. Semantic data annotation and event processing (SeDIEP).
logic reasoner). Compared with traditional relational
database techniques, inference engines provide a more              1) System overview: Semantic data annotation and event
expressive method for querying and reasoning over                processing (SeDIEP) subsystem manages various data
available data sets. Thus, ontology-based (a.k.a. semantics-     sources and automatically annotates and integrates data at
based) queries, instead of traditional SQL queries, are          semantic level. As shown in Figure 3, there are three major
possible. Ontology-based queries improve traditional             components in the subsystem: (i) Semantic TagPrint, (ii)
keyword-based queries in several ways. (i) Both                  Semantic Knowledge Management Tool (SKMT), and (iii)
synonymous terms (those having same meaning) and                 Event Engine. Semantic TagPrint is an automatic semantic
polysemous terms (those having different meanings) can be        tagging engine that annotates structured data and free text
included to obtain more results that are relevant to the user    using ontological entities from CoEDS ontologies. SKMT
query. (ii) Semantic relationships among terms often reveal      manages heterogeneous data sources for semantic
extra clues hidden in disparate data sources. Such               annotation and integration. Event engine feeds the semantic
relationships can be explicitly discovered to further improve    tagging engine with dynamic events. It also generates alerts
the quality of query answering. Consequently, we will be         with the support from CoEDS through modified RDF
able to acquire hidden knowledge and information that was        queries and the semantic reasoning.
originally implicit and unclear, yet critical, to end users.         Heterogeneous data sources will be annotated and
With a logic reasoner, CoEDS repository will work as a           seamlessly integrated into a central RDF data repository
comprehensive knowledge base.                                    based on CoEDS ontologies. This data repository will serve
                                                                 as a unified and consistent data layer for further analyzing
  7) Sesame framework for RDF repository, SPARQL
                                                                 data at the semantic level. Our core technologies can
RDF query engine, and inference engine: We have
                                                                 substantially reduce design-to-execution time for application
preliminarily chosen Sesame framework [25] to store and
                                                                 domains of data integration, visualization, and analysis.
manage the RDF repository. Sesame is an open-source Java
• Meaningful data. Our system will annotate terms in text           4) Semantic event processing: Dynamic events will be
  with their corresponding concepts in CoEDS ontologies           fed to our Semantic Tag Print, which will annotate these
  by finding their meanings and analyzing their context.          events with semantic tags. Then events are represented as
• Scalability. Indexed data are stored and managed in a           RDF triples, accompanied with event attributes such as
  repository. Collected and initially processed data can be       timestamps and probabilities. With the support from
  incrementally analyzed and indexed.                             CoEDS, SeDIEP will transform these tagged events into
• Easy integration. Various data sources can be seamlessly        SPARQL queries. We will perform event filtering,
  integrated along with their semantic indexes.                   correlation, and aggregation or abstraction using semantic
                                                                  matching, rules, and similarity evaluations. Moreover, we
   2) Deep annotation and integration: Data sources to be
                                                                  will detect event patterns on event streams with temporal
integrated contain structured, semi-structured, or
                                                                  semantic rules. As a result, high-risk vulnerabilities and
unstructured data. As discussed in the previous section, we
                                                                  threats can be predicted, and security alerts will then be
adopt RDF to handle the data interoperability challenge.
                                                                  automatically generated and rendered to users when facing
Semantic data annotation is the process of tagging source
                                                                  potential cyber intrusions.
files with metadata predefined in ontologies such as names,
entities, attributes, definitions, and descriptions. Herein, we      5) Core Components in SeDIEP: Figure 3 shows three
use terms of “semantic annotation” and “semantic tagging”         major components in SeDIEP to semantically integrate
interchangeably. The annotation provides extra information        various data sources and event streams.
contained in metadata to existing pieces of data. Metadata             a) Component one: Semantic TagPrint is an automatic
are usually from a set of ontological entities (including         semantic tagging engine that annotates structured data and
concepts and instances of concepts) predefined in                 free text using ontological entities. Three modules were
ontologies. For unstructured data such as free text, we will      designed for this component.
use a tagging engine to align them with ontological entities           Named Entity Detection: This module extracts
and generate semantic annotations. For structured data                    named entities, noun phrases in general, from the
including database data, the annotation will take two                     input text. We adopt Stanford Parser [26] to detect
successive steps: (i) first we will annotate data source                  and tokenize sentences, and assign Part-of-Speech
schemas by aligning their metadata with ontological entities;             (PoS) tags to tokens. Entity names will be extracted
(ii) according to annotated schemas we will then transform                based on PoS tags.
original data instances into RDF triples. We refer to such             Ontology Mapping: This module maps extracted
annotation as “deep” annotation – this term was coined by                 entity names to CoEDS concepts and instances with
Goble, C. in the Semantic Web Workshop of WWW 02. It is                   two steps: Phrase mapping and Sense mapping.
necessary to annotate more than just data source schemas                  Phrase mapping will match the noun phrase of an
because there are situations where the opposite “shallow”                 entity name to a predefined concept or instance.
annotation (i.e., annotation on schemas alone) cannot                     Sense mapping will utilize a linear-time lexical
provide users with the desired knowledge. Following                       chain algorithm to disambiguate terms that have
semantic data annotation, RDF triples will be indexed and                 several senses defined in ontologies.
accumulated into a central repository.                                 Ontology Weighting: This module utilizes statistical
                                                                          and ontological features of concepts to weigh
   3) Unified view over original data sources and cost-                   semantic tags. We then annotate the input text using
efficient analysis: All semantic tags will be generated from a            the semantics with higher weights.
global metadata model, i.e., CoEDS ontologies, our tool                b) Component two: SKMT collects original text and
thus provides a unified view over original data sources at the    sends annotation results to Repository Manager, whose main
semantic level. As discussed before, our RDF query and            role is to manage RDF repository (store) and to
reasoning engines will provide users with more meaningful         communicate with Query Interface. These components
and relevant information from semantically annotated and          altogether provide a unified view over original data sources
integrated data sources. In addition, semantic relationships      at the semantic level. Users will be guided by a GUI to
among tags provide us with additional clues and will further      automatically generate RDF queries across semantically
improve the quality of retrieved results. Given a set of          integrated data sources. These queries will then be executed
candidate results to be returned to users, we will calculate      by a SPARQL-based RDF query engine. As discussed
the semantic similarity between each result and the user          earlier in this subsection, we can calculate the semantic
query using semantic features such as (i) hypernym, which         similarity between each candidate query result and the user
defines the superClassOf relationship and (ii) holonym,           query using semantic features such as hypernym and
which defines the partOf relationship. We will then rank          holonym. These query results can then be ranked by their
these results by their respective semantic similarities.          respective semantic similarities. Consequently, we are able
Consequently, users can be presented with more relevant           to render users more accurate and desired query results.
query results.
    c) Component three: Event Engine annotates dynamic                summarized in Table I. In total, CoEDS ontologies contain
events and stores them as RDF triples. It will then generate          269 concepts, 232 object properties, and 110 data properties.
SPARQL queries and perform event filtering, correlation,                         TABLE I. STATISTICS FOR C OEDS ONTOLOGIES
and aggregation or abstraction with the semantics defined in
CoEDS ontologies.                                                      Sub-Ontology                    Statistic Information
                                                                                             Total
                                                                                                       Total Number of      Total Number of
                                                                                           Number of
           IV. PRELIMINARY EXPERIMENTAL RESULTS                                            Concepts
                                                                                                       Object Properties    Data Properties
    In this ongoing research, we have developed a                      SCADA Status
                                                                                              35              23                    12
preliminary version of CoEDS domain ontologies and                     Ontology
                                                                       Root Cause &
knowledge base to demonstrate a proof of concept of how                Impact                 37              21                    9
Semantic Web technologies can significantly contribute to              Ontology
resilient energy control systems. We also exported instances           Situational
into an RDF data repository within the Sesame framework.               Awareness              39              27                    15
                                                                       Ontology
                                                                       Grid
                                                                       Component &
                                                                                              51              39                    17
                                                                       Topology
                                                                       Ontology
                                                                       Cybersecurity
                                                                       Econometrics           38              25                    20
                                                                       Ontology
                                                                       Cost Benefit
                                                                                              33              19                    18
                                                                       Ontology
                                                                       Complex
                                                                       Event
                                                                                              36              28                    19
                                                                       Processing
                                                                       Ontology

                                                                      B. CoEDS Knowledge Base
                                                                          The current CoEDS KB contains a total of 1,223 facts
                                                                      (a.k.a. axioms in Protégé). Details can be found in Table II.
                                                                          TABLE II. STATISTICS FOR COEDS KNOWLEDGE BASE AXIOMS

                                                                                  Axiom Category                    Statistic Information

                                                                       Class Axioms                                         460

                                                                          Subclass Axioms                                   268

                                                                          Equivalent Class Axioms                              57
    Fig. 4. Protégé GUI screen shot exhibiting some CoEDS concepts.
                                                                          Disjoint Class Axioms                             135
A. CoEDS Ontologies                                                    Object Property Axioms                               217
    As discussed earlier in Section III.B, we have developed           Data Property Axioms                                 108
seven sub-ontologies in CoEDS: SCADA Status Ontology,
Root Cause & Impact Ontology, Situational Awareness                    Individual Axioms                                    236
Ontology, Grid Component & Topology Ontology,                          Annotation Axioms                                    202
Cybersecurity Econometrics Ontology, Cost Benefit
Ontology, and Complex Event Processing Ontology. The                  C. Sesame Framework to Manage Data Repository
purpose of such a decomposition strategy is to achieve the
                                                                          Within the Sesame framework we exported all
orthogonality feature, i.e., the non-overlapping feature
                                                                      ontological instances into an RDF data repository for future
among different CoEDS sub-ontologies. After individual
sub-ontologies were developed, we then imported them into             storage and management. Figure 5 is a screen shot from
                                                                      Sesame GUI, where the seven sub-ontologies and the overall
CoEDS. If future modifications are needed for any sub-
                                                                      CoEDS ontologies were clearly demonstrated. Being an
ontology, the changed schema information will be
                                                                      open-source Java framework, Sesame framework can be
automatically integrated into CoEDS ontologies. Figure 4
                                                                      readily extended and configured for the storage and querying
demonstrates a screen shot from Protégé GUI, which exhibits
                                                                      of RDF data. Moreover, a JBDC-like user API, streamlined
a portion of CoEDS concepts. Note that the well-defined,
                                                                      system APIs, and a RESTful HTTP interface are offered in
general-purpose structure from the Basic Formal Ontology
                                                                      Sesame as well.
(BFO), a popular upper ontology across different disciplines
and research areas, was preserved in the ontology schema.
Statistic information for all seven sub-ontologies is
                                                     Fig. 5. Screen shot from Sesame repository management.
                                                                                        International Conference on Information Security and Assurance
                            V. CONCLUSION                                               (ISA-10), pp. 355-361, Miyazaki, Japan, June 23-25, 2010.
                                                                                   [6] M. Choras, R. Kozik, A. Flizikowski, and W. Holubowicz, “Ontology
    To preserve critical energy control functions while under                           Applied in Decision Support System for Critical Infrastructures
attack, it is necessary to perform comprehensive analysis on                            Protection,” IEA/AIE2010, LNAI, pp. 671-680, 2010.
the root cause, extent, and impacts of cyber intrusions                            [7] B. Barnett, A. Crapo, and P. O’Neil, “Experiences in Using Semantic
without sacrificing the availability of energy delivery. We                             Reasoners to Evaluate Security of Cyber Physical Systems,” General
                                                                                        Electric Internal Report GridSec, 2012.
proposed to develop InTRECS, an intrinsically resilient                            [8] C. Turnitsa and A. Tolk, “Knowledge Representation and the
energy control system, to address these challenges. Semantic                            Dimensions of a Multi-Model Relationship,” Proc. the 40th
Web technologies, which play critical roles in knowledge                                Conference on Winter Simulation (WSC-08), pp. 1148–56, 2008.
representation and acquisition, have been extensively                              [9] L. Reeve and H. Han, “Semantic Annotation for Semantic Social
adopted in our system. The focus of this ongoing research is                            Networks Using Community Resources,” AIS SIGSEMIS Bulletin, vol.
                                                                                        2, pp. 52-56, 2005.
to demonstrate a proof of concept of how Semantic Web                              [10] S. Wiesener, W. Kowarschick, and R. Bayer, “SemaLink: An
technologies can significantly contribute to resilient energy                           Approach for Semantic Browsing through Large Distributed
control systems. We justified the research motivation,                                  Document Spaces,” Proc. the 3rd International Forum on Research
described our methodology in detail, and exhibited                                      and Technology Advances in Digital Libraries, p. 86, 1996.
preliminary experimental results. Future research directions                       [11] Zemanta. http://www.zemanta.com/.
                                                                                   [12] Common Tag. http://www.commontag.org/.
include, but are not limited to, (i) continue CoEDS ontology                       [13] K. Murat, J. Dang, and S. Uskudarli, “UNIpedia: A Unified
development towards delivering a highly stable and more                                 Ontological Knowledge Platform for Semantic Content Tagging and
usable version; (ii) incorporate query and inference engines                            Search,” Proc. the 4th IEEE International Conference on Semantic
into the knowledge base for end users to better analyze root                            Computing, Pittsburg, PA, USA, 2010.
causes and impacts of cyber intrusions; and (iii) implement                        [14] K. Murat, J. Dang, and S. Uskudarli, “Semantic TagPrint: Indexing
                                                                                        Content at Semantic Level,” Proc. the 4th IEEE International
SeDIEP subsystem.                                                                       Conference on Semantic Computing, Pittsburg, PA, USA, 2010.
                                                                                   [15] Y. Simmhan, Q. Zhou, and V.K. Prasanna,“Semantic Information
                   ACKNOWLEDGMENT                                                       Integration for Smart Grid Applications,” Chapter 19, Green IT:
    This research was partially supported through the U.S.                              Technologies and Applications, pp. 361–80, 2011.
Department of Energy (DOE) Higher Education Research                               [16] J. Kirsch, S. Goose, Y. Amir, and P. Skare, “Toward Survivable
Experiences (HERE) program for Faculty at the Oak Ridge                                 SCADA,” Proc. the Annual Cyber Security and Information
                                                                                        Intelligence Research Workshop (CSIIRW-11), Oak Ridge, 2011.
National Laboratory, Oak Ridge, Tennessee, sponsored by                            [17] B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W. Ceusters, L.
the U.S. Department of Homeland Security (DHS).                                         Goldberg, K. Eilbeck, A. Ireland, C. Mungall, N. Leontis, P. Rocca-
                                                                                        Serra, A. Ruttenberg, S. Sansone, R. Scheuermann, N. Shah, P.
                             REFERENCES                                                 Whetzel, and S. Lewis, “The OBO foundry: coordinated evolution of
[1]   J. Undercoffer, A. Joshi, and J. Pinkston, “Modeling Computer                     Ontologies to support biomedical data integration,” Nature
      Attacks: An Ontology for Intrusion Detection,” RAID 2003, LNCS                    Biotechnology, 25(11):1251–1255, 2007.
      2820, pp. 113-135, 2003, Springer-Verlag Berlin Heidleberg, 2003.            [18] M. Uschold and M. Gruninger, “Ontologies: principles, methods, and
[2]   A. Simmonds, P. Sandilands, and L. Ekert, “An Ontology for                        applications,” Knowledge Engineering Review, 11(2):93-155, 1996.
      Network Security Attacks,” Proc. the 2nd Asian Applied Computing             [19] OWL. http://www.w3.org/2004/OWL/.
      Conference (AACC-04), LNCS 3285, pp. 317-323, 2004.                          [20] OBO. http://www.obofoundry.org/.
[3]   W. Wang and T. Daniels, “A Graph Based Approach toward Network               [21] KIF (Knowledge Interchange Format). http://logic.stanford.edu/kif/.
      Forensic Analysis,” ACM Transactions on Information and Systems              [22] OKBC. http://www.ai.sri.com/okbc/.
      Security, Vol. 12, No. 1, Article 4, Pub. Date: Oct. 2008.                   [23] Protégé. http://protege.stanford.edu/.
[4]   J. Hieb, J. Graham, and J. Guan, “An Ontology for Identifying Cyber          [24] SPARQL. http://www.w3.org/TR/rdf-sparql-query/.
      Intrusion Induced Faults in Process Control Systems,” Critical               [25] Sesame. http://www.openrdf.org/doc/sesame/.
      Infrastructure Protection III, IFIP AICT 311, pp. 125-138, 2009.             [26] D. Klein and C.D. Manning, “Accurate Unlexicalized Parsing,” Proc.
[5]   G. Isaza, A. Castillo, M. Lopez, L. Casillo, and M. Lopez, “Intrusion             the 41st Meeting of the Association for Computational Linguistics, pp.
      Correlation Using Ontologies and Multi-agent Systems,” Proc. 4th                  423-430, 2003.

</pre>