=Paper=
{{Paper
|id=None
|storemode=property
|title=Intrinsically Resilient Energy Control Systems
|pdfUrl=https://ceur-ws.org/Vol-966/STIDS2012_T12_SheldonEtAl_ResilientEnergySystems.pdf
|volume=Vol-966
|dblpUrl=https://dblp.org/rec/conf/stids/SheldonHDFGKMMW12
}}
==Intrinsically Resilient Energy Control Systems==
Using Semantic Web Technologies to Develop
Intrinsically Resilient Energy Control Systems
Frederick Sheldon and Daniel Fetzer Jingshan Huang
Oak Ridge National Laboratory University of South Alabama
Oak Ridge, TN 37831, U.S.A. Mobile, AL 36688, U.S.A.
{sheldonft, fetzerdt}@ornl.gov huang@southalabama.edu
Jiangbo Dang and Dong Wei David Manz
Siemens Corporation, Corporate Research and Technology Pacific Northwest National Laboratory
Princeton, NJ 08540, U.S.A. Richland, WA 99354, U.S.A.
{jiangbo.dang, dong.w}@siemens.com david.manz@pnnl.gov
Thomas Morris Jonathan Kirsch and Stuart Goose
Mississippi State University Siemens Corporation, Corporate Research and Technology
Mississippi State, MS 39762, U.S.A. Berkeley, CA 94704, U.S.A.
morris@ece.msstate.edu {jonathan.kirsch, stuart.goose}@siemens.com
Abstract—To preserve critical energy control functions while to determine counter-measures to prevent recurrence and
under attack, it is necessary to perform comprehensive analysis possibly collect evidence to legally prosecute the offenders.
on root causes and impacts of cyber intrusions without sacrificing This analysis and response must be done without interrupting
the availability of energy delivery. We propose to design an the availability of the energy delivery systems.
intrinsically resilient energy control system where we extensively To address the aforementioned challenges, this paper
utilize Semantic Web technologies, which play critical roles in presents the design and architecture of InTRECS, an
knowledge representation and acquisition. While our ultimate InTrinsically Resilient Energy Control System. The ultimate
goal is to ensure availability/resiliency of energy delivery goal of InTRECS is to provide tools and technologies to ensure
functions and the capability to assess root causes and impacts of
the availability/resiliency of energy delivery functions, along
cyber intrusions, the focus of this paper is to demonstrate a proof
with the capability to assess root causes and impacts of cyber
of concept of how Semantic Web technologies can significantly
contribute to resilient energy control systems. intrusions. To meet these goals, InTRECS extensively applies
Index Terms—cybersecurity, energy control system, ontology, Semantic Web technologies, including cybersecurity domain
knowledge base, semantic annotation, data integration. ontologies, a comprehensive knowledge base, and semantic
data annotation & integration techniques. Semantic Web
I. INTRODUCTION technologies are built upon ontologies, which are formal,
Our energy infrastructure depends on energy delivery declarative knowledge models and have been shown to play
systems comprised of complex and geographically dispersed critical roles in knowledge representation and acquisition.
network architectures with vast numbers of interconnected In this paper, we argue that applying Semantic Web
components. These systems provide critical functions to technologies in InTRECS affords several benefits compared to
provide information and automated control over a large, typical approaches that utilize relational databases:
complex network of processes that collectively ensure reliable While relational databases focus on syntactic
and safe production and distribution of energy. The energy representation of data and lack the ability to explicitly
utilities are modernizing these vast networks with millions of encode semantics, Semantic Web technologies support
smart meters, high speed sensors, advanced control systems, rich semantic encoding, which is critical in automated
and a supporting communications infrastructure. This knowledge acquisition.
additional complexity brings benefits, but also increases the Powerful tools exist for capturing and managing
risks of cyber attacks that could potentially disrupt our energy ontological knowledge, including an abundance of
delivery. These systems must maintain high availability and reasoning tools readily supplied for ontological models,
reliability even when under attack. After a security incident has making it much more convenient to query, manipulate, and
been detected, the incident response team needs the ability to reason over available data sets. As a result, semantics-
investigate and determine the root cause, attack methods, based queries, instead of SQL queries, are made possible.
consequences, affected assets, impacted stakeholders, and other Advances in an energy delivery system (EDS) require
information in order to inform an effective response. The changes to be made regularly regarding underlying data
response team needs this information in the short term in order models. In addition, more often than not, it is preferable to
to contain or eradicate the attack, recover compromised represent data at different levels and/or with different
equipment, and restore normal operation. The team also needs abstractions. There are no straightforward methods for
performing such updates if relational models are adopted.
This manuscript has been authored by contractors of the U.S. Government (USG) under
contract DE-AC05-00OR22725. Accordingly, the USG retains a nonexclusive, royalty- Semantic Web technologies better enable EDS researchers
free license to publish or reproduce the published form of this contribution, or allow to append additional data into repositories in a more
others to do so, for USG purposes.
flexible and efficient manner. The formal semantics semantics (intended meanings) rather than data syntax (forms
encoded in ontologies makes it possible to reuse data in in which data are represented). Reasons for developing
unplanned and unforeseen ways, especially when data ontologies include, but not limited to: (i) to share domain
users are not data producers, which is now very common. information among people and software; (ii) to enable reuse of
domain knowledge; (iii) to analyze domain knowledge and
While our ultimate goal is to ensure availability/resiliency make it more explicit; and (iv) to separate domain knowledge
of energy delivery functions and the capability to assess root from its implementation. There exist some domain ontologies
causes and impacts of cyber intrusions, the focus of this paper in cybersecurity and related areas, e.g., Intrusion Detection
is to demonstrate a proof of concept of how Semantic Web System Ontology [1], Network Security Ontology [2], Process
technologies can significantly contribute to resilient energy Control Ontology [4], INSPIRE Ontology [5], and GE SADL
control systems. The rest of the paper is organized as follows. Host Defense Ontology [7]. These ontologies provide metadata
Section II gives a brief review on related research in ontologies and standard terminologies in respective domains.
and semantic annotation & integration, respectively. Section III
describes the overall architecture of InTRECS, followed by B. Semantic Data Annotation & Integration
methodology details for developing domain ontologies & Semantic data annotation & integration can bring critical
knowledge base and performing data annotation & integration. impacts and benefits to data analysis and management.
Section IV demonstrates our preliminary experimental results. Semantic annotation (tagging) systems can be divided into
Finally, Section V concludes with future research directions. manual, semi-automatic, and automatic ones [9]. In manual
II. RELATED WORK tagging systems (Sema-Link [10] for example), users employ
controlled vocabularies from some ontology to tag documents.
A. Ontologies in Energy Delivery Control and Cybersecurity
Such a manual process is time-consuming and requires deep
Energy delivery control systems comprise complex network domain expertise, in addition to the inconsistency issue. Semi-
architectures that may contain hundreds of specialized cyber automatic tagging systems improve manual tagging systems
components and may extend across wide geographical regions. by automatically parsing documents and recommending
Cyber attack investigation involves examining large volumes potential tags. Human annotators only need to select tags from
of data from heterogeneous sources. Researchers are facing the candidates suggested by the system. Automatic semantic
challenge of how to maintain the integrity of data derived from
tagging systems offer further improvement by parsing and
diverse sources across distributed geographic areas ([1-7]).
These research efforts have resulted in various ad-hoc tagging documents with ontological concepts and instances in
proprietary formats for storing and analyzing data and a fully automatic way. Zemanta [11] is such an example. By
maintaining respective metadata. Different parties are likely to suggesting contents from various sources, such as Wikipedia,
adopt different formats according to specific needs. Therefore, YouTube Flickr, and Facebook, Zemanta disambiguates terms
the seamless communication among different parties, along and maps them to the Common Tag Ontology [12]. Dang et al.
with the knowledge sharing and reuse that follow, become a have developed one of the largest comprehensive, domain-
non-trivial problem. Turnitsa and Tolk [8] discussed in depth independent ontological knowledge base, UNIpedia+ [13],
multi-resolution, multi-scope, and multi-structure challenges which covers around 11 million named English entities. Based
during data exchange between different models. on UNIpedia+, they further developed an automatic tagging
Semantic Web technologies that are based on domain system [14] to produce semantically linked tags for given data.
ontologies can render tremendous help. Ontologies are The information system architecture in the Los Angeles Smart
declarative knowledge models, defining essential Grid project [15] enabled analytical tools and algorithms to
characteristics and relationships for specific domains of interest. forecast energy load and identify load curtailment response
As a semantic foundation, ontologies greatly help domain through semantically meaningful data.
experts to formally define domain knowledge in terms of data
Fig. 1. Overall architecture of InTRECS system.
III. METHODOLOGY sources. Query results, e.g., the root cause, extent, and
impacts of the cyber intrusion, can then be provided back to
A. InTRECS Overall Architecture end users. InTRECS will also push security alerts up to end
Figure 1 illustrates the overall architecture of InTRECS, users. Both query results and alerts are regarded as semantic
which is decomposed into six subsystems. decision support to end users because they extensively utilize
Intrusion-Tolerant SCADA (InTRADA) Semantic Web technologies, namely, domain ontologies,
We will develop a survivable SCADA system based RDF triples resulting from semantic annotation, and
on intrusion-tolerant replication [16]. InTRADA will inferences & analysis performed at the semantic level.
be capable of guaranteeing correct operations and B. CoEDS Domain Ontologies and Knowledge Base
excellent performance even when part of the system
There are four components in CoEDS KB: (i) CoEDS
has been compromised and is under the control of an
domain ontologies, (ii) an RDF repository, (iii) a SPARQL
intelligent attacker.
RDF query engine, and (iv) an inference engine. Through
Cybersecurity Ontologies and Knowledge Base for automatic data integration and logic reasoning, CoEDS KB
Energy Delivery Systems (CoEDS) will be able to provide a unified and consistent data layer for
CoEDS knowledge base (KB) contains domain analyzing data at the semantic level. It will thus assist end
ontologies, a resource description framework (RDF) users to effectively obtain real-time decision support, so that
repository, a SPARQL RDF query engine, and an they can (i) obtain health status updates of SCADA replicas,
inference engine. The KB will provide end users (ii) analyze and better understand the root cause, extent, and
with a unified and consistent data layer for analyzing impacts of an attack, (iii) acquire situational awareness, and
data at the semantic level. (iv) recommend courses of action.
Semantic Data Integration and Processing (SeDIEP)
Our focus is to develop an automatic semantic data 1) Interaction between CoEDS and other InTRECS
annotation & integration engine for tagging data subsystems: CoEDS KB actively exchanges information
sources based on the metadata defined in CoEDS with other subsystems of InTRECS on a regular basis.
ontologies. An event-processing engine will handle InTRADA receives system health and status
dynamic events and generate security alerts. information from CoEDS KB, and incorporates such
Root Cause and Impact Analysis (RoCIA) knowledge to enhance its fault-detection algorithms.
RoCIA provides the basis to detect cyber incidents This will enable InTRADA to more rapidly
and investigate the root cause, attack methods, reconfigure itself in the event of a cyber attack by
consequences, affected assets, impacted stakeholders, helping it distinguish between performance faults
attackers’ identity, and other metrics to inform an caused by a malicious application and by more
effective response. RoCIA will leverage the Cyber benign issues such as transitory network problems.
Security Econometrics System (CSES) and the InTRADA sends to CoEDS KB status updates
inference and query engines provided within CoEDS regarding the health of the replicas, hence providing
KB to assist EDS stakeholders in evaluating data for future cyber attack analysis.
cybersecurity investments and to provide an SeDIEP obtains the data semantics, i.e., ontological
economic impact assessment of on-going cyber metadata, from CoEDS KB and utilizes such
intrusions. metadata during the automatic semantic annotation.
Dashboard Analytics and Situation Awareness Annotated data, including cybersecurity
(DaSA) econometrics, dynamic events, etc., are stored back
Dashboard analytics includes a user graphical user into CoEDS KB to construct and continuously
interface (GUI) to support interactions between end update the central data repository in the KB.
users and InTRECS. Situational awareness will be CoEDS KB provides RoCIA with topology data as
performed for end users. We will also support well as the data semantics essential for performing
reasoning through the inference engine in CoEDS. root cause and impact analysis. RoCIA supplies
Test and Evaluation (TnE) CoEDS KB with root cause and impact analysis data,
Implemented modules will automatically configure including attack signatures, attack locations, exploits,
the test suite environment to the appropriate start consequences, countermeasures, model parameters,
state for the test case. A portal will provide the network components, security requirements, threats,
information and documentation and will execute the vulnerabilities, and stakeholders.
test case. We will also develop a test suite in an end- CoEDS KB furnishes DaSA with dynamic events
user setting, including a set of denial of service and electric grid components and topology data, both
(DOS), reconnaissance, and network packet integrity of which are in an annotated form. DaSA sends back
exploits targeting SCADA, remote terminal unit situational awareness data to CoEDS KB. In addition,
(RTU), and network architecture vulnerabilities. the KB also provides the Correlation Layers for
Information Query and Exploration (CLIQUE) and
InTRECS will be constantly active to intrinsically
provide resiliency, i.e., correct operations and excellent Traffic Circle, two visual analytics tools in DaSA,
with interoperability for behavior model-based
performance. At the same time, a DaSA GUI will guide end
anomaly detection.
users to generate queries out of data derived from diverse
2) Motivation for developing CoEDS ontologies: Among suggested by Uschold and Gruninger [18]: (i) specification
existing ontologies in cybersecurity and related areas of content; (ii) informal documentation of concept
(mentioned in Section II), there is not a single one that is definitions (by domain experts); (iii) logic-based
comprehensive enough to cover a complete set of concepts formalization of concepts and relationships between
and relationships for the purpose of this research. In concepts; (iv) implementation of the ontology in a computer
particular, with regard to the fields of SCADA status, root language; and (v) evaluation of the ontology, including the
cause analysis, situational awareness, electric grid internal consistency and the ability to answer logical
components and topology, cybersecurity econometrics, cost queries. As illustrated in Figure 2, these five stages are
benefit analysis, and complex event processing, all essentially ongoing and iterative because end users’ needs
aforementioned existing ontologies are missing some will change as their understanding of the domain evolves. In
necessary concepts within these critical fields. Even in the this iterative, knowledge-driven approach, both ontology
case that a specific concept of our interest is contained in engineers and domain experts have been involved, working
some existing ontology, more often than not, the semantics together to capture domain knowledge, develop a
defined in such an ontology need to be extended and conceptualization, and implement the conceptual model.
customized before this concept can be utilized within The ontology construction process has taken place over a
InTRECS system. In brief, Energy Control Systems (ECS) number of iterations, involving a series of interviews,
end users lack a comprehensive, customized conceptual evaluation strategies, and refinements. Standard revision-
model, which prevents the energy sector from leveraging control procedures have been utilized.
enhanced knowledge acquisition processes brought by
Semantic Web technologies. Such a situation motivates us
to develop CoEDS domain ontologies.
3) Ontology development principles: We have observed
seven practices suggested by Smith et al. [17]: the ontology
should (i) be freely available; (ii) be expressed using a
standard language or syntax; (iii) provide tracking and
documentation for successive versions; (iv) be orthogonal to
existing ontologies; (v) include natural language
specifications of all concepts; (vi) be developed
collaboratively; and (vii) be used by multiple researchers. In
particular, we propose a decomposition methodology as the
strategy for coming up with orthogonal ontologies. Our
methodology is similar to those used in the database
normalization theory, third normal form (3NF) for example.
We first began with concepts from possibly many sub-
domains in one large set, followed by the identification of
dependencies or overlaps among these concepts, and we
finally proceeded to decompose all concepts based on their
identified dependencies. Our preliminary design is to
develop seven sub-ontologies in CoEDS: SCADA status, Fig. 2. Knowledge-driven, iterative ontology development.
root cause & impact, situational awareness, grid component
& topology, cybersecurity econometrics, cost benefit, and 5) Ontology format and development tool: There are
complex event processing. Consequently, we achieved the different formats and languages for describing ontologies,
orthogonality feature, i.e., the non-overlapping feature, for all of which are popular and based on different logics: Web
CoEDS domain ontologies. Ontology Language (OWL) [19], Open Biological and
4) Knowledge-driven ontology development procedure: Biomedical Ontologies (OBO) [20], Knowledge Interchange
The ontology development was not from scratch. Instead, to Format (KIF) [21], and Open Knowledge Base Connectivity
(i) take advantage of the knowledge already contained in (OKBC) [22]. We have chosen the OWL format
existing ontologies and (ii) reduce the possibility of recommended by the World Wide Web Consortium (W3C).
redundant efforts, we have reused, extended, and OWL is designed for use by applications that need to
customized a set of well-established concepts from existing process the content of information instead of just presenting
domain ontologies. In addition, popular upper ontologies, information to humans. As a result, OWL facilitates greater
e.g., the Basic Formal Ontology (BFO), was imported into machine interpretability of Web contents. We have chosen
our ontologies. The ontology development was driven by Protégé, an open-source ontology editor developed by
domain knowledge and decomposed into five stages, as
Stanford [23], as our development tool over other available framework for the storage and querying of RDF data. The
tools such as CmapTools and OntoEdit. framework is fully extensible and configurable with respect
to storage mechanisms, inferencers, RDF file formats, query
6) CoEDS KB components – RDF Repository, Query
result formats, and query languages. In addition, Sesame
Engine, and Inference Engine: Based on the formal
offers a JBDC-like user API, streamlined system APIs, and
knowledge defined in CoEDS ontologies, heterogeneous
a RESTful HTTP interface supporting the SPARQL
data sources can be annotated and integrated into a central
protocol for RDF. Moreover, Sesame contains a built-in
repository. Note that data sources to be integrated include
inference engine, and various reasoning tasks, e.g.,
structured, semi-structured, or unstructured data, the
subsumption and contradiction reasoning, can be performed.
interoperability thus becomes an obstacle during knowledge
discovery. We adopt RDF, a model for data interchange C. Semantic Data Annotation and Event Processing
recommended by the W3C, to handle such a challenge. RDF According to the formal domain knowledge, including a
specifically supports the evolution of schemas over time global metadata model, defined in CoEDS, heterogeneous
without requiring all the data consumers to be changed. The data sources can be annotated and seamlessly integrated into
generic structure of RDF allows structured, semi-structured, a central RDF data repository, which will serve as a unified
and unstructured data to be mixed, exposed, and shared and consistent data layer for data analytics applications.
across different applications, thus helping to handle the data
interoperability challenge. Following automatic semantic SKMT Event Engine
(Semantic Knowledge Management Tool)
data annotation (see Section III.C), RDF triples will be Event Stream
indexed and accumulated into a central repository. SPARQL Knowledge
Data bases Query Interface
Protocol and RDF Query Language (SPARQL) [24] is a Sources
query language recommended by W3C to retrieve and Repository Event Processing
Indexing Manager
manipulate RDF data. End users of InTRECS system will be Manager
guided by a GUI to automatically generate RDF queries
Content Semantic
across semantically integrated sources. These queries will Annotation Alerts
then be executed by a SPARQL-based query engine.
Named
The RDF data repository and query answering are not Entity
Ontology Concept CoEDS
Mapping Weighting Knowledge
enough for an effective and comprehensive knowledge Detection
Base
acquisition. Suppose that some facts do not exist in any Concepts (RDF Store)
Semantic TagPrint Properties
original data sources, they will thus not be stored in the RDF
repository. But such information may be critical to end
CoEDS Ontologies
users. To obtain the ability to acquire previously implicit
knowledge, we will incorporate an inference engine (a.k.a. Fig. 3. Semantic data annotation and event processing (SeDIEP).
logic reasoner). Compared with traditional relational
database techniques, inference engines provide a more 1) System overview: Semantic data annotation and event
expressive method for querying and reasoning over processing (SeDIEP) subsystem manages various data
available data sets. Thus, ontology-based (a.k.a. semantics- sources and automatically annotates and integrates data at
based) queries, instead of traditional SQL queries, are semantic level. As shown in Figure 3, there are three major
possible. Ontology-based queries improve traditional components in the subsystem: (i) Semantic TagPrint, (ii)
keyword-based queries in several ways. (i) Both Semantic Knowledge Management Tool (SKMT), and (iii)
synonymous terms (those having same meaning) and Event Engine. Semantic TagPrint is an automatic semantic
polysemous terms (those having different meanings) can be tagging engine that annotates structured data and free text
included to obtain more results that are relevant to the user using ontological entities from CoEDS ontologies. SKMT
query. (ii) Semantic relationships among terms often reveal manages heterogeneous data sources for semantic
extra clues hidden in disparate data sources. Such annotation and integration. Event engine feeds the semantic
relationships can be explicitly discovered to further improve tagging engine with dynamic events. It also generates alerts
the quality of query answering. Consequently, we will be with the support from CoEDS through modified RDF
able to acquire hidden knowledge and information that was queries and the semantic reasoning.
originally implicit and unclear, yet critical, to end users. Heterogeneous data sources will be annotated and
With a logic reasoner, CoEDS repository will work as a seamlessly integrated into a central RDF data repository
comprehensive knowledge base. based on CoEDS ontologies. This data repository will serve
as a unified and consistent data layer for further analyzing
7) Sesame framework for RDF repository, SPARQL
data at the semantic level. Our core technologies can
RDF query engine, and inference engine: We have
substantially reduce design-to-execution time for application
preliminarily chosen Sesame framework [25] to store and
domains of data integration, visualization, and analysis.
manage the RDF repository. Sesame is an open-source Java
• Meaningful data. Our system will annotate terms in text 4) Semantic event processing: Dynamic events will be
with their corresponding concepts in CoEDS ontologies fed to our Semantic Tag Print, which will annotate these
by finding their meanings and analyzing their context. events with semantic tags. Then events are represented as
• Scalability. Indexed data are stored and managed in a RDF triples, accompanied with event attributes such as
repository. Collected and initially processed data can be timestamps and probabilities. With the support from
incrementally analyzed and indexed. CoEDS, SeDIEP will transform these tagged events into
• Easy integration. Various data sources can be seamlessly SPARQL queries. We will perform event filtering,
integrated along with their semantic indexes. correlation, and aggregation or abstraction using semantic
matching, rules, and similarity evaluations. Moreover, we
2) Deep annotation and integration: Data sources to be
will detect event patterns on event streams with temporal
integrated contain structured, semi-structured, or
semantic rules. As a result, high-risk vulnerabilities and
unstructured data. As discussed in the previous section, we
threats can be predicted, and security alerts will then be
adopt RDF to handle the data interoperability challenge.
automatically generated and rendered to users when facing
Semantic data annotation is the process of tagging source
potential cyber intrusions.
files with metadata predefined in ontologies such as names,
entities, attributes, definitions, and descriptions. Herein, we 5) Core Components in SeDIEP: Figure 3 shows three
use terms of “semantic annotation” and “semantic tagging” major components in SeDIEP to semantically integrate
interchangeably. The annotation provides extra information various data sources and event streams.
contained in metadata to existing pieces of data. Metadata a) Component one: Semantic TagPrint is an automatic
are usually from a set of ontological entities (including semantic tagging engine that annotates structured data and
concepts and instances of concepts) predefined in free text using ontological entities. Three modules were
ontologies. For unstructured data such as free text, we will designed for this component.
use a tagging engine to align them with ontological entities Named Entity Detection: This module extracts
and generate semantic annotations. For structured data named entities, noun phrases in general, from the
including database data, the annotation will take two input text. We adopt Stanford Parser [26] to detect
successive steps: (i) first we will annotate data source and tokenize sentences, and assign Part-of-Speech
schemas by aligning their metadata with ontological entities; (PoS) tags to tokens. Entity names will be extracted
(ii) according to annotated schemas we will then transform based on PoS tags.
original data instances into RDF triples. We refer to such Ontology Mapping: This module maps extracted
annotation as “deep” annotation – this term was coined by entity names to CoEDS concepts and instances with
Goble, C. in the Semantic Web Workshop of WWW 02. It is two steps: Phrase mapping and Sense mapping.
necessary to annotate more than just data source schemas Phrase mapping will match the noun phrase of an
because there are situations where the opposite “shallow” entity name to a predefined concept or instance.
annotation (i.e., annotation on schemas alone) cannot Sense mapping will utilize a linear-time lexical
provide users with the desired knowledge. Following chain algorithm to disambiguate terms that have
semantic data annotation, RDF triples will be indexed and several senses defined in ontologies.
accumulated into a central repository. Ontology Weighting: This module utilizes statistical
and ontological features of concepts to weigh
3) Unified view over original data sources and cost- semantic tags. We then annotate the input text using
efficient analysis: All semantic tags will be generated from a the semantics with higher weights.
global metadata model, i.e., CoEDS ontologies, our tool b) Component two: SKMT collects original text and
thus provides a unified view over original data sources at the sends annotation results to Repository Manager, whose main
semantic level. As discussed before, our RDF query and role is to manage RDF repository (store) and to
reasoning engines will provide users with more meaningful communicate with Query Interface. These components
and relevant information from semantically annotated and altogether provide a unified view over original data sources
integrated data sources. In addition, semantic relationships at the semantic level. Users will be guided by a GUI to
among tags provide us with additional clues and will further automatically generate RDF queries across semantically
improve the quality of retrieved results. Given a set of integrated data sources. These queries will then be executed
candidate results to be returned to users, we will calculate by a SPARQL-based RDF query engine. As discussed
the semantic similarity between each result and the user earlier in this subsection, we can calculate the semantic
query using semantic features such as (i) hypernym, which similarity between each candidate query result and the user
defines the superClassOf relationship and (ii) holonym, query using semantic features such as hypernym and
which defines the partOf relationship. We will then rank holonym. These query results can then be ranked by their
these results by their respective semantic similarities. respective semantic similarities. Consequently, we are able
Consequently, users can be presented with more relevant to render users more accurate and desired query results.
query results.
c) Component three: Event Engine annotates dynamic summarized in Table I. In total, CoEDS ontologies contain
events and stores them as RDF triples. It will then generate 269 concepts, 232 object properties, and 110 data properties.
SPARQL queries and perform event filtering, correlation, TABLE I. STATISTICS FOR C OEDS ONTOLOGIES
and aggregation or abstraction with the semantics defined in
CoEDS ontologies. Sub-Ontology Statistic Information
Total
Total Number of Total Number of
Number of
IV. PRELIMINARY EXPERIMENTAL RESULTS Concepts
Object Properties Data Properties
In this ongoing research, we have developed a SCADA Status
35 23 12
preliminary version of CoEDS domain ontologies and Ontology
Root Cause &
knowledge base to demonstrate a proof of concept of how Impact 37 21 9
Semantic Web technologies can significantly contribute to Ontology
resilient energy control systems. We also exported instances Situational
into an RDF data repository within the Sesame framework. Awareness 39 27 15
Ontology
Grid
Component &
51 39 17
Topology
Ontology
Cybersecurity
Econometrics 38 25 20
Ontology
Cost Benefit
33 19 18
Ontology
Complex
Event
36 28 19
Processing
Ontology
B. CoEDS Knowledge Base
The current CoEDS KB contains a total of 1,223 facts
(a.k.a. axioms in Protégé). Details can be found in Table II.
TABLE II. STATISTICS FOR COEDS KNOWLEDGE BASE AXIOMS
Axiom Category Statistic Information
Class Axioms 460
Subclass Axioms 268
Equivalent Class Axioms 57
Fig. 4. Protégé GUI screen shot exhibiting some CoEDS concepts.
Disjoint Class Axioms 135
A. CoEDS Ontologies Object Property Axioms 217
As discussed earlier in Section III.B, we have developed Data Property Axioms 108
seven sub-ontologies in CoEDS: SCADA Status Ontology,
Root Cause & Impact Ontology, Situational Awareness Individual Axioms 236
Ontology, Grid Component & Topology Ontology, Annotation Axioms 202
Cybersecurity Econometrics Ontology, Cost Benefit
Ontology, and Complex Event Processing Ontology. The C. Sesame Framework to Manage Data Repository
purpose of such a decomposition strategy is to achieve the
Within the Sesame framework we exported all
orthogonality feature, i.e., the non-overlapping feature
ontological instances into an RDF data repository for future
among different CoEDS sub-ontologies. After individual
sub-ontologies were developed, we then imported them into storage and management. Figure 5 is a screen shot from
Sesame GUI, where the seven sub-ontologies and the overall
CoEDS. If future modifications are needed for any sub-
CoEDS ontologies were clearly demonstrated. Being an
ontology, the changed schema information will be
open-source Java framework, Sesame framework can be
automatically integrated into CoEDS ontologies. Figure 4
readily extended and configured for the storage and querying
demonstrates a screen shot from Protégé GUI, which exhibits
of RDF data. Moreover, a JBDC-like user API, streamlined
a portion of CoEDS concepts. Note that the well-defined,
system APIs, and a RESTful HTTP interface are offered in
general-purpose structure from the Basic Formal Ontology
Sesame as well.
(BFO), a popular upper ontology across different disciplines
and research areas, was preserved in the ontology schema.
Statistic information for all seven sub-ontologies is
Fig. 5. Screen shot from Sesame repository management.
International Conference on Information Security and Assurance
V. CONCLUSION (ISA-10), pp. 355-361, Miyazaki, Japan, June 23-25, 2010.
[6] M. Choras, R. Kozik, A. Flizikowski, and W. Holubowicz, “Ontology
To preserve critical energy control functions while under Applied in Decision Support System for Critical Infrastructures
attack, it is necessary to perform comprehensive analysis on Protection,” IEA/AIE2010, LNAI, pp. 671-680, 2010.
the root cause, extent, and impacts of cyber intrusions [7] B. Barnett, A. Crapo, and P. O’Neil, “Experiences in Using Semantic
without sacrificing the availability of energy delivery. We Reasoners to Evaluate Security of Cyber Physical Systems,” General
Electric Internal Report GridSec, 2012.
proposed to develop InTRECS, an intrinsically resilient [8] C. Turnitsa and A. Tolk, “Knowledge Representation and the
energy control system, to address these challenges. Semantic Dimensions of a Multi-Model Relationship,” Proc. the 40th
Web technologies, which play critical roles in knowledge Conference on Winter Simulation (WSC-08), pp. 1148–56, 2008.
representation and acquisition, have been extensively [9] L. Reeve and H. Han, “Semantic Annotation for Semantic Social
adopted in our system. The focus of this ongoing research is Networks Using Community Resources,” AIS SIGSEMIS Bulletin, vol.
2, pp. 52-56, 2005.
to demonstrate a proof of concept of how Semantic Web [10] S. Wiesener, W. Kowarschick, and R. Bayer, “SemaLink: An
technologies can significantly contribute to resilient energy Approach for Semantic Browsing through Large Distributed
control systems. We justified the research motivation, Document Spaces,” Proc. the 3rd International Forum on Research
described our methodology in detail, and exhibited and Technology Advances in Digital Libraries, p. 86, 1996.
preliminary experimental results. Future research directions [11] Zemanta. http://www.zemanta.com/.
[12] Common Tag. http://www.commontag.org/.
include, but are not limited to, (i) continue CoEDS ontology [13] K. Murat, J. Dang, and S. Uskudarli, “UNIpedia: A Unified
development towards delivering a highly stable and more Ontological Knowledge Platform for Semantic Content Tagging and
usable version; (ii) incorporate query and inference engines Search,” Proc. the 4th IEEE International Conference on Semantic
into the knowledge base for end users to better analyze root Computing, Pittsburg, PA, USA, 2010.
causes and impacts of cyber intrusions; and (iii) implement [14] K. Murat, J. Dang, and S. Uskudarli, “Semantic TagPrint: Indexing
Content at Semantic Level,” Proc. the 4th IEEE International
SeDIEP subsystem. Conference on Semantic Computing, Pittsburg, PA, USA, 2010.
[15] Y. Simmhan, Q. Zhou, and V.K. Prasanna,“Semantic Information
ACKNOWLEDGMENT Integration for Smart Grid Applications,” Chapter 19, Green IT:
This research was partially supported through the U.S. Technologies and Applications, pp. 361–80, 2011.
Department of Energy (DOE) Higher Education Research [16] J. Kirsch, S. Goose, Y. Amir, and P. Skare, “Toward Survivable
Experiences (HERE) program for Faculty at the Oak Ridge SCADA,” Proc. the Annual Cyber Security and Information
Intelligence Research Workshop (CSIIRW-11), Oak Ridge, 2011.
National Laboratory, Oak Ridge, Tennessee, sponsored by [17] B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W. Ceusters, L.
the U.S. Department of Homeland Security (DHS). Goldberg, K. Eilbeck, A. Ireland, C. Mungall, N. Leontis, P. Rocca-
Serra, A. Ruttenberg, S. Sansone, R. Scheuermann, N. Shah, P.
REFERENCES Whetzel, and S. Lewis, “The OBO foundry: coordinated evolution of
[1] J. Undercoffer, A. Joshi, and J. Pinkston, “Modeling Computer Ontologies to support biomedical data integration,” Nature
Attacks: An Ontology for Intrusion Detection,” RAID 2003, LNCS Biotechnology, 25(11):1251–1255, 2007.
2820, pp. 113-135, 2003, Springer-Verlag Berlin Heidleberg, 2003. [18] M. Uschold and M. Gruninger, “Ontologies: principles, methods, and
[2] A. Simmonds, P. Sandilands, and L. Ekert, “An Ontology for applications,” Knowledge Engineering Review, 11(2):93-155, 1996.
Network Security Attacks,” Proc. the 2nd Asian Applied Computing [19] OWL. http://www.w3.org/2004/OWL/.
Conference (AACC-04), LNCS 3285, pp. 317-323, 2004. [20] OBO. http://www.obofoundry.org/.
[3] W. Wang and T. Daniels, “A Graph Based Approach toward Network [21] KIF (Knowledge Interchange Format). http://logic.stanford.edu/kif/.
Forensic Analysis,” ACM Transactions on Information and Systems [22] OKBC. http://www.ai.sri.com/okbc/.
Security, Vol. 12, No. 1, Article 4, Pub. Date: Oct. 2008. [23] Protégé. http://protege.stanford.edu/.
[4] J. Hieb, J. Graham, and J. Guan, “An Ontology for Identifying Cyber [24] SPARQL. http://www.w3.org/TR/rdf-sparql-query/.
Intrusion Induced Faults in Process Control Systems,” Critical [25] Sesame. http://www.openrdf.org/doc/sesame/.
Infrastructure Protection III, IFIP AICT 311, pp. 125-138, 2009. [26] D. Klein and C.D. Manning, “Accurate Unlexicalized Parsing,” Proc.
[5] G. Isaza, A. Castillo, M. Lopez, L. Casillo, and M. Lopez, “Intrusion the 41st Meeting of the Association for Computational Linguistics, pp.
Correlation Using Ontologies and Multi-agent Systems,” Proc. 4th 423-430, 2003.