<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Machine-assisted Cyber Threat Analysis using Conceptual Knowledge Discovery { Position Paper {</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mart n Barrere?</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gustavo Betarte</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Victor Codocedo</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcelo Rodr guez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hernan Astudillo</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcelo Aliquintuy</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Javier Baliosian</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Remi Badonnel</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olivier Festor</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Raniery Paula dos Santos</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jeferson Campos Nobre</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lisandro Zambenedetti Granville</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amedeo Napoli</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Imperial College London</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>InCo, Facultad de Ingenier a, Universidad de la Republica</institution>
          ,
          <country country="UY">Uruguay</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Informatics, Federal University of Rio Grande do Sul</institution>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>LORIA/INRIA/CNRS - Nancy</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Universidad Tecnica Federico Santa Mar a</institution>
          ,
          <addr-line>Valpara so</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Over the last years, computer networks have evolved into highly dynamic and interconnected environments, involving multiple heterogeneous devices and providing a myriad of services on top of them. This complex landscape has made it extremely di cult for security administrators to keep accurate and be e ective in protecting their systems against cyber threats. In this paper, we describe our vision and scienti c posture on how arti cial intelligence techniques and a smart use of security knowledge may assist system administrators in better defending their networks. To that end, we put forward a research roadmap involving three complimentary axes, namely, (I) the use of FCA-based mechanisms for managing con guration vulnerabilities, (II) the exploitation of knowledge representation techniques for automated security reasoning, and (III) the design of a cyber threat intelligence mechanism as a CKDD process. Then, we describe a machine-assisted process for cyber threat analysis which provides a holistic perspective of how these three research axes are integrated together.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The goal of this paper is to introduce some novel applications of formal concept
analysis [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], knowledge discovery in databases and, in a broader sense, arti
cial intelligence techniques to support security analysis of computer networks
and systems. Computer networks are very dynamic environments composed by
diverse entities which, on a daily basis, hold thousands of virtual activities.
Additionally, they often require con guration changes to satisfy existing or new
operational requirements (e.g. new services, upgrading existing versions,
replacing faulty hardware). Such dynamicity highly increases the complexity of security
management. Even if automated tools help to simplify security tasks there is a
need for advanced and exible solutions able to assist security analysts in better
understanding what is happening inside their networks.
      </p>
      <p>
        The research work we put forward is being developed in the context of the
AKD (Autonomic Knowledge Discovery) project [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], a research collaboration
e ort involving ve teams with di erent expertises. We have identi ed several
key aspects in which the use of arti cial intelligence techniques, and particularly
formal concept analysis (FCA), can quickly improve on the current state of
a airs for processes and tasks in the eld of computer and network security. We
describe how we envision an adaptation of the conceptual knowledge discovery
on databases (CKDD) machinery to provide support in developing scienti cally
grounded techniques for the domain of cyber threat intelligence. In particular,
we are concerned with vulnerability management and cyber threat analysis. We
also motivate the bene ts of using ontology engineering methods and tools to
improve the state of the art of security-oriented automated reasoning.
      </p>
      <p>The remainder of this paper is organized as follows: Section 2 points out the
scienti c challenges of the research that is being developed in the context of the
AKD project. Section 3 motivates three di erent research elds in which arti cial
intelligence techniques can be used to provide machine-assisted support to the
domain of cyber security. Section 4 describes a cyber threat analysis process
aimed at detecting and recognizing security threats within computer systems
and points out how and where the techniques previously discussed apply. Finally,
Section 5 concludes and summarizes research perspectives.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Scienti c challenges</title>
      <p>
        Vulnerabilities, understood as program aws or con gurations errors, are used
by attackers to bypass the security policies of computer systems. Therefore,
vulnerability management mechanisms constitute an essential component of any
system intended to be protected. During the last decades, strong research e orts
as well as dozens of security tools have been proposed for dealing with security
vulnerabilities [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, current security solutions still seem to work under
certain boundaries that prevent them to act intelligently and exibly, i.e. strictly
sticked to the available security information in order to analyze, report and
eventually remediate found problems.
      </p>
      <p>In addition to this in exibility, remediating vulnerabilities is already a
complex problem and despite the great advances made in this area, remediation tasks
are reactive by nature and they can be hard to perform due to costly activities
and performance degradation issues. They may also generate consistency
conicts with other system policies. Therefore, our scienti c posture in this context
is that instead of detecting vulnerable states and then applying several
corrective actions, it would be better to anticipate and avoid these vulnerable states in
the rst place. This objective constitutes a challenging problem. Firstly,
mechanisms for understanding the behavior and dynamics of the system are needed.
Secondly, sometimes vulnerabilities are not known, so techniques for analyzing
the available knowledge and extracting measures that might allow the system to
make decisions are essential.</p>
      <p>The aforementioned security challenge gets more complex when considered
in dynamic networked scenarios. The accelerated growth of highly heterogeneous
and interconnected computer networks has severely increased the complexity of
network management. This phenomenon has naturally a ected network security
where traditional solutions seem unable to cope with this evolving and
changing landscape. The main problem is that even when current security techniques
may enable high levels of automation, they might fail to achieve their purpose
when certain aspects of a managed environment slightly change. We need to
provide systems with mechanisms to understand, reason about, and anticipate the
surrounding environment. In light of this, we rmly believe that an advanced,
exible, and clever management of security knowledge constitutes one of the
key factors to take security solutions to the next level. Our vision is that,
independently of the nature of an automated solution (automatically assisting an
administrator or automatically making security decisions), the ability to
intelligently manage knowledge is essential.</p>
      <p>In the broad sense of knowledge management, several scienti c areas within
the arti cial intelligence domain can contribute to achieve our vision. In this
work, we identify domains such as formal concept analysis (FCA), ontological
engineering, information retrieval (IR), case-based reasoning (CBR), and
conceptual knowledge discovery on databases (CKDD), as sound scienti c areas
that may support a new level of smart cyber security solutions. Fig. 1 illustrates
our research strategy for the short, medium and long term.</p>
      <p>I. Enriching vulnerability
management techniques
with FCA
!</p>
      <p>II. Improving security
knowledge
representation for automated
reasoning
!</p>
      <p>III.
threat
mechanisms</p>
      <p>Enhancing cyber
intelligence</p>
      <p>
        In the short term (I), our objective is to understand to what extent FCA can
enrich and advance the state of the art of vulnerability management techniques.
Vulnerability management can be usually seen as the cyclical process of assessing
and remediating vulnerabilities. Anticipation techniques are not considered in
the classical de nition, although the concept of foreseeing future vulnerabilities
perfectly ts the vision of exible and adaptive systems. Therefore, the idea is
to begin solving basic problems within the sub-area of vulnerability assessment
and progress towards FCA-based mechanisms for anticipating and remediating
security vulnerabilities. We understand that a clever use of available knowledge
requires a formal and robust underlying machinery that allows systems to
process, reason, extract, and extrapolate information and knowledge among other
features. In the medium term (II), we aim at investigating the link between
current security standard e orts such as the STIX language [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and knowledge
representation methods such as security ontologies. The results of this research
activity may provide a robust support to intelligently deal with security issues.
In the long term (III), the objective is to integrate the results and experience
obtained in (I) and (II) to develop novel approaches to deal with cyber
security threats supported by KDD-based techniques. In the following section, we
explain in detail each one of these stages, their impact and importance, and how
we envision their development.
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Research roadmap</title>
      <p>
        Enriching vulnerability management techniques with FCA
One of the main objectives of our research is the study of vulnerability
anticipation mechanisms from the perspective of FCA. Usually, a vulnerability is
considered as a combination of conditions that if observed on a target system, the
security problem described by such vulnerability is present on that system [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Each condition in turn is understood as the state that should be observed on a
speci c object. When the object under analysis exhibits the speci ed state, the
condition is said to be true on that system. In this context, a vulnerability is a
logical combination of conditions and therefore, identifying known vulnerabilities
implies the evaluation of logical predicates over computer system states. In brief,
we characterize vulnerabilities and system states by the properties they present.
From a technical perspective, the OVAL language [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] maintained by MITRE [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
is a standard XML-based security language which permits the treatment and
exchange of this type of vulnerability descriptions in a machine-readable
manner.
      </p>
      <p>V1 : c1 ∧ c2 V3 : ¬c2 ∨ c3 ∨ c4 V4 : ¬c3
Table 2: Semi-lattice representation
of the vulnerability set</p>
      <p>As an example, let us consider Table 1 depicting four vulnerabilities V =
fV1; V2; V3; V4g as logical formul , where ^; _; : represent the logical connectors
AN D; OR; N OT respectively, and C = fc1; c2; c3; c4g are four system conditions
(e.g. \port 80 is open", \httpd server is up", \ rewall is o ", etc.). A system
state s is de ned as a set of conditions ci 2 C such that ci is true on s. Therefore,
the process of vulnerability assessment over a system state s can be de ned as
follows:
f (s) =
vulnerable
saf e
9Vi 2 V; s:t:Vi(s) = true
otherwise</p>
      <p>A system state s is considered vulnerable if there exists at least one
vulnerability that evaluates to true when taking the values from the system for the
involved conditions, and safe otherwise. For example, considering s = fc1; c3g,
it can be observed that f (s) = vulnerable since V2(s) = V3(s) = true.</p>
      <p>
        From the perspective of FCA [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and particularly, using the formalization
of Logical Concept Analysis (LCA) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], this can be formalized as follows. Let V
be a set of vulnerability labels associated to formul in the logic L with ^; _; :
denoting the logical operators and atoms A containing a set of system conditions
ci 2 C. A vulnerability label v 2 V is associated to a formula in L through the
mapping function (v) 2 L.
      </p>
      <p>Let us de ne the logical context K = (V; (L; ); ) with the following
derivation operators for a subset of vulnerabilities A V and a formula d 2 L:
A = _ (v)
v2A
d
= fv 2 V j (v)
dg</p>
      <p>For any two vulnerabilities labels v1; v2 2 V , we have that v1 v2 ()
v1 _ v2 = v2 denotes that v1 is a model of v2. A pair (A; d) is a formal concept if
and only if A = d and d = A. It can be shown that the derivation operators
generate a Galois connection between the power set }(V ) of vulnerability labels
and the set of formul L and thus, a concept lattice can be obtained from the
logical context K. Within our approach, such a concept lattice generates the
search space for vulnerability assessment and correction.</p>
      <p>
        Analogously to the Boolean model of Information Retrieval [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], we can
use the concept lattice to classify the system state s and search for exact or
partial answers, i.e. vulnerabilities which a ect or may a ect the system. For
instance, the semi-lattice illustrated in Table 2 can be used to understand that
if a system is a ected by vulnerabilities V 2 and V 3, then it may be also a ected
by vulnerability V 1. In particular, the formula labeled by v satis es a formula
d in some context K if and only if the concept labeled with v is below the
concept labeled with d in the concept lattice of K [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Additionally, using the
classi cation algorithm inspired in case-based reasoning presented in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], it is
easy to show that the assessment process becomes a search in the hierarchy
generated by the semi-lattice, i.e. the assessment has a sub-linear complexity.
      </p>
      <p>
        Vulnerability remediation on the other hand consists in changing the right
properties of a system (ci 2 C) to bring it into a safe state. This is an explosive
combinatorial problem [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, we believe that a concept lattice can be
useful to guide the search for corrective actions that do not lead to new
vulnerable states. Furthermore, there might be no solution in some cases, so an
interesting approach would be to approximate safe solutions by weighting the
impact of vulnerabilities using scoring languages such as CVSS (Common
Vulnerability Scoring System) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Lastly, our nal goal is to understand to what
extent FCA can contribute to the process of anticipating vulnerabilities, which
basically consists in predicting potential vulnerable states due to changes in the
system. Considering known vulnerabilities, a concept lattice can be used as an
approximation map to avoid unsafe con guration changes. Extrapolation and
pattern detection mechanisms are also worth to be explored though
ontological engineering and data mining techniques might better suit such objectives as
discussed in the following section.
3.2
      </p>
      <p>Improving security knowledge representation for automated
reasoning
Several vocabularies have been proposed in the context of cyber security. Some of
the most important ones are: Structured Threat Information eXpression (STIX),
Common Attack Pattern Enumeration and Clasi cation (CAPEC), Common
Vulnerability and Exposures (CVE), Cyber Observables eXpression (CybOX),
Malware Attribute Enumeration and Characterization (MAEC) and Common
Weakness Enumeration (CWE) [24]. Most of these vocabularies were de ned by
particular organizations, like MITRE and NIST, to facilitate the exchange of
information regarding vulnerabilities, security issues and attack descriptions.</p>
      <p>The bene ts of introducing vocabularies are plenty and well-known. They
establish a common language that can be used by di erent organizations to
describe the same concepts and provide a framework for documentation allowing
the structured and systematized creation of a body of knowledge. Vocabularies
have proven not only be relevant for humans, but for autonomous agents in
several applications as well. At the syntactic level, they enable di erent systems to
communicate in a common pre-de ned structured manner. At the semantic level,
vocabularies have played a major role in the last decade allowing autonomous
agents to reason about the information within a dataset. For example, let us
consider a security analyst looking through di erent databases for a malware
that could a ect a given system. A malware is a very generic term used to
identify a piece of software specially designed to violate the security integrity of a
computer system. Thus, the search task can be very di cult given that there
are several types of malware, namely trojan horses, spywares, backdoors, worms,
among others. Instead, a vocabulary could easily integrate these descriptions by
stating that trojan horses, spywares, backdoors and worms are types of
malware. An autonomous agent can pro t from the vocabulary by automatically
inferring that an object catalogued as a \trojan horse" is relevant for the search
of \malware".</p>
      <p>In the semantic web, vocabularies are usually supported by ontologies, a
meta-model to provide a structured description of the concepts in a given
domain [21]. Ontologies can provide di erent levels of description, namely at the
entity level, at the relational level and at the instance level. The entity level
describes the concepts that compose a given domain (Malware, Trojan Horse,
Spyware) and their attributes (Malware has name, Trojan Horse has target os,
etc.). The relational level describes relations among concepts (Trojan is a type of
Malware, Trojan Horse has target operating system Windows, etc.) and their
attributes (is a type of is a non-symmetric, transitive relation). Finally, the
instance level describes the relations between instances, their types (trojan1 is a
Trojan), and their attributes (trojan1 has name \Zeus"). Furthermore,
ontologies support a similar level of inference as rst-order logic through its logical
formalism called description logics.</p>
      <p>
        Several research communities have undertaken the task of formalizing their
domain knowledge with vocabularies. and many of them have moved forward
towards describing their vocabularies through ontology de nitions. For example,
in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] an ontology learning approach is proposed for the astronomical domain.
In [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] the authors propose an ontology to document software architecture
decisions providing an automated annotation process over software design
documents. In [22], the authors propose a knowledge discovery process to build and
populate an ontology for the cultural heritage domain using a relational database
schema. Extensive reviews on ontology learning and construction using formal
concept analysis can be found in [18, 20, 23].
      </p>
      <p>
        As mentioned before, the domain of cyber security has already acknowledged
the bene ts of de ning common vocabularies. Furthermore, initial steps have
been taken towards building a comprehensive ontology de nition which
integrates the di erent vocabularies within the domain. In [24], the authors describe
the process through which they manually crafted a domain ontology with the
goal of supporting security analysts in the task of detecting cyber threats. This
work is indeed a big step forward, however we are con dent that the use of state
of the art ontology learning techniques, particularly formal concept analysis, can
greatly improve the quality of an ontology for cyber security. For instance,
techniques like ontology alignment [23] can overcome overlapping issues in current
vocabularies for cyber security, a fact that is oversought in [24]. The great
potential for automatically building description logic knowledge bases using FCA [
        <xref ref-type="bibr" rid="ref8">8,
20</xref>
        ] would allow to further extend the support provided to security analysts in a
more dynamic environment, a major drawback in manual approaches for
ontology building. Finally, the de nition of a domain ontology for cyber security is
a necessary condition to support more advanced data mining techniques. In our
project, this represents a milestone that would enable us to provide security
analysts with advanced features for threat detection such as integrated search from
multiple repositories [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], partial matching based on case-based reasoning [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], or
document annotation [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
3.3
      </p>
      <p>Enhancing cyber threat intelligence mechanisms
The traditional approaches for cyber security, which have mainly focused on
understanding and addressing vulnerabilities in computer systems, are still
necessary but not longer su cient enough. E ective defense against current and
future threats requires a deep understanding of the behavior, capability and intent
of the adversary. Threat environments have evolved from widespread disruptive
activity to more targeted, lower-pro le multi-stage attacks aiming at achieving
speci c tactical objectives and establishing a persistent foothold into the
threatened organization. This is what is called an Advanced Persistent Threat (APT).
The nature of APTs requires for more proactive defense strategies in contrast to
the traditional reactive cyber security approach. To be proactive, defenders need
to move beyond traditional incident response methodologies and techniques. It is
necessary to stop the adversary before he can exploit the security weaknesses of
the system. In the cyber domain, cyber intelligence is the understanding of the
adversary capabilities, actions and intent. According to [19]: Cyber intelligence
seeks to understand and characterize things like: what sort of attack actions have
occurred and are likely to occur; how can these actions be detected and recognized;
how can they be mitigated; who are the relevant threat actors; what are they
trying to achieve; what are their capabilities, in the form of tactics, techniques and
procedures (TTP) they have leveraged over time and are likely to leverage in the
future; what sort of vulnerabilities, miscon gurations or weaknesses are likely to
target; what actions have they taken in the past; etc.</p>
      <p>One important objective of our research is to develop techniques and tools for
providing assistance to accomplish di erent cyber threat intelligence procedures.
In particular, we are focused on processes aiming at leveraging capacities for
threat environment identi cation (type of attack, from where, how) and early
detection of vulnerability exploitation attempts. We also aim at the generation
and enrichment of (semantically structured) knowledge repositories, preferably
in a way that is decoupled from the speci cs of a particular technology for
conducting threat analysis and correlation.</p>
      <p>For a threat analysis tool to be useful in practice, two features are crucial:
i) the model used in the analysis must be able to automatically integrate formal
vulnerability speci cations from the bug-reporting community and formal attack
scenarios from the cyber security concerned community; ii) it is desirable for the
analysis to be able to scale to complex networks involving numerous machines
and devices. As a more ambitious goal, we aim at developing a prototype of an
engine, in the spirit of MulVAL [17], able to consume low-level alerts (e.g. taken
from OVAL scanning activities) and produce high-level attack predictions based
on the scenario under analysis.
4</p>
    </sec>
    <sec id="sec-4">
      <title>A machine-assisted approach for cyber threat analysis</title>
      <p>
        In this section we put forward a cyber threat analysis process aimed at detecting
and/or recognizing (potential) security attacks. We explain the most relevant
procedures involved in the analysis and point out how and where automated
support can be provided using the techniques discussed in sections 3.1, 3.2 and
3.3. The cyber threat analysis process, depicted in Fig. 2, embodies procedures
that give support to the key phases of the search of compromise: derivation of
threat indicators, collection of evidence, evaluation of the results and decision.
In what follows we explain the process in further detail.
1. The process begins at step 1 with a security analyst providing information
about some identi ed threat or anomaly, and characteristics of the target
system. This information constitutes the initial seed for the cyber threat
analysis, and might specify for instance, a compromise involving a suspicious
le found on a Linux system. The involved information shall be represented
using the STIX language, in particular using the notion of indicator of
compromise. One such indicator allows to specify the di erent types of objects
that can be found on a computing system/network such as ports, processes,
threads, les, etc. Additionally, an indicator may capture metadata for the
involved objects as well as logical relations between them thus providing
further information to security analysts.
2. Once the seed has been provided, a search of compromise is performed at
step 2. To that end, the threat nder component queries a database
containing machine-readable descriptions of known threats speci ed in a formal
language such as STIX. Only those cyber threats which are found to be
related with the provided information are considered for subsequent analysis.
3. The retrieved threat descriptions are then used at step 3 by the evidence
collector component to gather all the relevant information from the target
system in order to decide whether the latter is compromised by at least one
of the identi ed related cyber threats. The process of information gathering
involves, for instance, collecting the list of open ports or running processes
in the system. Standard languages such as OVAL provide great support for
evidence speci cation and automated collection procedures [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
4. The collected evidence is then evaluated by the threat analyzer component
at step 4 in order to determine the level of compromise of the system. A
target system may be considered compromised by a speci c cyber threat if
it presents a combination of objects (threat indicator ) which are commonly
found on infected systems. The threat analyzer decides whether the collected
evidence is su cient enough to indicate that the target system has been
compromised or conversely whether more knowledge is needed to diagnose
its status. In the rst case, the process moves to step 6 where the
information about the detected cyber threats is provided to the security analyst.
Otherwise, the process continues at step 5 where a semantic machinery is
used to derive new indicators that may lead to cyber threats not previously
evaluated.
5. In the case that none of the spotted cyber threats are found on the
system, a derivation process is triggered at step 5 in order to select new cyber
threats that were not analyzed before. This new selection is performed by
deriving threats related to the relevant evidence found on the system while
gathering information in the previous stage. Derivation mechanisms may
vary according to the available information and context, and they
constitute a key objective within this research work. The FCA-based technique
described in Section 3.1 may provide a map for nding vulnerable con
gurations close to the current system state. Additionally, two sub-components
may semantically guide the search for new related threats. As discussed in
Section 3.2, a security ontology may relax strict descriptions making context
awareness procedures more exible, i.e. security information that is not
explicitly encoded a priori can be derived by considering semantic associations.
Data mining techniques on the other hand may provide the ability to
extrapolate information and extract security patterns thus increasing detection
capabilities even more. The process of derivation (step 5), threat identi
cation (step 2), collection (step 3) and analysis (step 4) shall be repeated until
a conclusion or a stop condition is reached.
6. The outcome of a nished search process may be either that the system
appears to be compromised or not enough evidence has been found to
determine its compromise status. In any case, the process informs about the
tested cyber threats as well as the evidence found on the system at step 6
in order to assist the security analyst to proceed with the analysis.
      </p>
      <p>Open discussion. The selection of information and techniques for inferring
and discovering new knowledge might be assisted by a human being, the security
analyst in this case, thus following a methodology closer to CKDD. However,
interesting research questions arise from this scenario. One of them is to what
extent can we automate the whole process and let a security solution to make
decisions for us? Going one step further we pose the question of autonomic
solutions where self-adaptive and self-governed approaches come into scene. Our
vision is that to achieve any of these objectives, a clever knowledge
management is essential. In that context, we believe that FCA and CKDD may highly
contribute to accomplish such goal.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and perspectives</title>
      <p>In this paper we have motivated and explained how di erent arti cial intelligence
techniques, in particular FCA and CKDD, can be used to enhance the state of
the art of machine-assisted cyber security analysis. In addition to the objectives
depicted in our research roadmap, we also target the construction of an
experimental testbed for emulating hostile and unsafe environments. This can provide
the ability to deploy implementation prototypes and anticipation solutions in
order to evaluate the feasibility, scalability and accuracy of our approach. We
have already experimented with a preliminary version of a tool that provides
mechanical support for conducting the cyber threat analysis process described
in section 4. We are convinced that the extension of the tool with mechanisms
that make use of conceptual knowledge discovery techniques will greatly improve
the accuracy and e ciency of the process.
17. X. Ou, S. Govindavajhala, and A. W. Appel. Mulval: A logic-based network
security analyzer. In Proceedings of the 14th Conference on USENIX Security
Symposium - Volume 14, SSYM'05, pages 8{8, Berkeley, CA, USA, 2005. USENIX
Association.
18. J. Poelmans, D. I. Ignatov, S. O. Kuznetsov, and G. Dedene. Formal concept
analysis in knowledge processing: A survey on applications. Expert Syst. Appl.,
40(16):6538{6560, 2013.
19. S. Barnum. Standardizing Cyber Threat Intelligence Information with the
Structured Threat Information eXpression (STIX). Technical report, The MITRE
Corporation, 2013.
20. B. Sertkaya. A survey on how description logic ontologies bene t from formal
concept analysis. CoRR, abs/1107.2822, 2011.
21. S. Staab and R. Studer. Handbook on Ontologies. Springer Publishing Company,</p>
      <p>Incorporated, 2nd edition, 2009.
22. R. Stanley, H. Astudillo, V. Codocedo, and A. Napoli. A conceptual-kdd approach
and its application to cultural heritage. In Concept Lattices and their Applications,
pages 163{174, 2013.
23. G. Stumme. Formal concept analysis. In Handbook on Ontologies, pages 177{199.</p>
      <p>2009.
24. B. E. Ulicny, J. J. Moskal, M. M. Kokar, K. Abe, and J. K. Smith. Inference and
Ontologies. In Cyber Defense and Situational Awareness, Advances in Information
Security. 2014.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>MITRE</given-names>
            <surname>Corporation</surname>
          </string-name>
          . http://www.mitre.org/.
          <source>Last visited on May 17</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>OVAL</given-names>
            <surname>Language</surname>
          </string-name>
          . http://oval.mitre.org/.
          <source>Last visited on May 17</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Structured</given-names>
            <surname>Threat</surname>
          </string-name>
          <article-title>Information expression</article-title>
          . http://stix.mitre.org/.
          <source>Last visited on May 17</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Barrere</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Badonnel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Festor</surname>
          </string-name>
          .
          <article-title>A SAT-based Autonomous Strategy for Security Vulnerability Management</article-title>
          .
          <source>In Proceedings of the IEEE/IFIP Network Operations and Management Symposium (NOMS'14)</source>
          , May
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>M.</given-names>
            <surname>Barrere</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Badonnel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Festor</surname>
          </string-name>
          .
          <article-title>Vulnerability Assessment in Autonomic Networks</article-title>
          and
          <article-title>Services: A Survey</article-title>
          .
          <source>IEEE Communications Surveys &amp; Tutorials</source>
          ,
          <volume>16</volume>
          (
          <issue>2</issue>
          ):
          <volume>988</volume>
          {
          <fpage>1004</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>M.</given-names>
            <surname>Barrere</surname>
          </string-name>
          , G. Betarte, and
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Rodr guez. Towards Machine-assisted Formal Procedures for the Collection of Digital Evidence</article-title>
          .
          <source>In Proceedings of the 9th Annual International Conference on Privacy, Security and Trust (PST'11)</source>
          , pages
          <fpage>32</fpage>
          {
          <fpage>35</fpage>
          ,
          <year>July 2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Barrere</surname>
          </string-name>
          et al.
          <article-title>Autonomic Knowledge Discovery for Security Vulnerability Prevention in Self-governing Systems</article-title>
          . http://www.sticamsud.org/.
          <source>Last visited on May 17</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>R.</given-names>
            <surname>Bendaoud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Toussaint</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Napoli</surname>
          </string-name>
          .
          <article-title>Pactole: A methodology and a system for semi-automatically enriching an ontology from a collection of texts</article-title>
          .
          <source>In Proceedings of the 16th international conference on Conceptual Structures: Knowledge Visualization and Reasoning</source>
          , pages
          <volume>203</volume>
          {
          <fpage>216</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>V.</given-names>
            <surname>Codocedo</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Lykourentzou, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Napoli</surname>
          </string-name>
          .
          <article-title>A semantic approach to concept lattice-based information retrieval</article-title>
          .
          <source>Annals of Mathematics and Arti cial Intelligence</source>
          , pages
          <fpage>1</fpage>
          {
          <fpage>27</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>CVSS</surname>
          </string-name>
          ,
          <article-title>Common Vulnerability Scoring System</article-title>
          . http://www.first.org/cvss/.
          <source>Last visited on April 12</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferre</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. D.</given-names>
            <surname>King</surname>
          </string-name>
          .
          <article-title>A dichotomic search algorithm for mining and learning in domain-speci c logics</article-title>
          .
          <source>Fundam</source>
          . Inform.,
          <volume>66</volume>
          (
          <issue>1-2</issue>
          ):1{
          <fpage>32</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferre</surname>
          </string-name>
          and
          <string-name>
            <given-names>O.</given-names>
            <surname>Ridoux</surname>
          </string-name>
          .
          <article-title>A Logical Generalization of Formal Concept Analysis</article-title>
          . In B. Ganter and G. W. Mineau, editors,
          <source>ICCS</source>
          , volume
          <volume>1867</volume>
          <source>of LNCS</source>
          , pages
          <volume>357</volume>
          {
          <fpage>370</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>B.</given-names>
            <surname>Ganter</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Wille</surname>
          </string-name>
          .
          <source>Formal Concept Analysis: Mathematical Foundations</source>
          . Springer, Dec.
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>C. Lopez</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Codocedo</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Astudillo</surname>
            , and
            <given-names>L. M.</given-names>
          </string-name>
          <string-name>
            <surname>Cysneiros</surname>
          </string-name>
          .
          <article-title>Bridging the gap between software architecture rationale formalisms and actual architecture documents: An ontology-driven approach</article-title>
          .
          <source>Sci. Comput</source>
          . Program.,
          <volume>77</volume>
          (
          <issue>1</issue>
          ):
          <volume>66</volume>
          {
          <fpage>80</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>C. D. Manning</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Raghavan</surname>
            , and
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Schtze</surname>
          </string-name>
          . Introduction to Information Retrieval.
          <source>July</source>
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>N.</given-names>
            <surname>Messai</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-D. Devignes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Napoli</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Sma</surname>
          </string-name>
          l-Tabbone.
          <article-title>BR-Explorer: A sound and complete FCA-based retrieval algorithm (Poster)</article-title>
          .
          <source>In ICFCA, Dresden/Germany</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>