67


Securing Intelligent Autonomous Systems Through Artificial
Intelligence
Ganapathy Mania, Bharat Bhargavaa, Jason Kobesb, Justin Kingb , James MacDonaldb
a
    Purdue University, West Lafayette, Indiana, USA
b
    Northrop Grumman Corporation, McLean, Virginia, USA

                  Abstract
                  Intelligent Autonomous Systems (IAS) reconstruct their perception through adaptive learning
                  and meet mission objectives. IAS are highly cognitive, rich in knowledge discovery, reflective
                  through rapid adaptation, and provide security assurance. It is paramount to have effective
                  reasoning, decision-making, and understanding of operational context since IAS are exposed
                  to advanced multi-stage attacks during training and inference time. Advanced malware types
                  such as file-less malware with benign initial execution phase can mislead IAS to accept them
                  as normal processes and execute malicious code later. IAS are also exposed to adaptive
                  poisoning attacks where adversary inputs malicious data into training/testing set to manipulate
                  the learning. Hence it is vital to monitor IAS activities/interactions to conduct forensics. This
                  project will advance science of security in IAS through multifaceted advanced analytics,
                  cognitive and adversarial machine learning, and cyber attribution based on the following
                  approaches.
                      (a) Implement deep learning-based application profiling to categorize adaptive cyber-
                           attacks and poison attacks on machine learning models using contextual information
                           about the origin, trust, and transformation of data.
                      (b) Using HW/OS/SW data to develop perception algorithms using LSTM deep neural
                           networks for detecting malware/anomalies and classifying dynamic attack contexts.
                      (c) Facilitate cyber attribution for forensics through privacy-preserving provenance
                           structure for knowledge representation and perform intrusion detection sampling on
                           HW /OS/SW data.
                      (d) Employ advanced data analytics to aid ontological and semantic reasoning models to
                           enhance decision-making, attack adaptiveness, and self-healing.

                  Keywords 1
                  autonomy, machine learning, deep learning, cybersecurity, lstm


International Semantic Intelligence Conference (ISIC 2021), Feb
25-27, 2021, New Delhi, India
EMAIL: bbshail@purdue.edu (A. 2);
            ©️ 2020 Copyright for this paper by its authors. Use permitted under Creative
            Commons License Attribution 4.0 International (CC BY 4.0).

            CEUR Workshop Proceedings (CEUR-WS.org)
                                                                                                  68


1. Solution Overview                                  Intelligent autonomous systems receive
                                                       large amounts of diverse data from various
    Our focus is on constraints, barriers and          data sources. In addition, they operate in a
challenges such as poorly understood attack            dynamic operational context and interact
surfaces, data set training availability and           with numerous entities such as other TAS,
biases,     processing      latency,     human         UAVs, satellites, sensors, cloud systems,
understanding of AI results, AI/ML                     analysts,      malicious      actors,     and
countermeasures, human-machine disparity,              compromised systems.
measurement of effects. We propose novel              Cyber attribution module constitutes a
approaches for privacy-preserving cyber                stream data processor where data streams
attribution, intrusion detection, adversarial          are labeled / tagged on-the-fly for better
machine learning, malware/anomaly detection,           knowledge           representation        and
reasoning, and decision-making. Cyber                  categorization. This data is stored as
attribution involves extracting software,              monitored or provenance data with its
hardware, and operating system data to                 origin and historical information. For
perform intrusion detection sampling (fixed or         preserving privacy, detailed provenance
dynamic sampling), generating efficient                data is reduced in its scope to include only
provenance structure that is populated with            necessary data for a particular analysis or
specific data required for a particular analysis       learning. This module uses Provenance
or learning, and labeling and tagging to               Ontology (PROV-O) structure (elaborated
properly represent the information obtained.           in a later section) to obscure unnecessary
The processed data is distributed to the               or        privacy-compromising           data.
cognitive module where the data is checked             Furthermore, the attribution model
for any malicious data presence through                monitors data generated by software
poison attack filter. The filtered data is             (application      parameters),      hardware
transmitted to cognitive computing module              (memory bytes and instructions), and
and knowledge discovery module, where the              operating system (system calls). This data
data is fed into supervised, unsupervised, and         is used to conduct periodic sampling to
LSTM models to perform learning and                    identify signatures of intrusion activities.
advanced analytics. Based on multifaceted             Once the data is processed, it goes through
dimensions of data analytics, reasoning and            adversarial machine learning model.
decision-making ability of IAS are enhanced.           Attackers can insert malicious data into
The overall architecture of the proposed               training and testing dataset to influence
model-secure intelligent autonomous systems            machine learning models. In order to
with cyber attribution-is demonstrated in              isolate poisonous data, poison data filter
figure 1.                                              performs methods such as classification of
                                                       verified and unverified data as well as
                                                       outlier extraction. Once the poisonous data
                                                       is removed the data (raw or provenance
                                                       data) is sent to Cognitive computing
                                                       module and Knowledge discovery module.
                                                      In Cognitive computing module, depends
                                                       on the data and efficiency of machine
                                                       learning methods, malware / anomaly
                                                       detection is performed through either deep
                                                       learning methodologies such as Long
                                                       short-term memory (LSTM) e.g. Recurrent
                                                       Neural Networks (RNN) or Convolutional
Figure 1: Comprehensive Architecture of                Neural Networks (CNN) or light-weight
Secure Intelligent Autonomous Systems with             yet powerful machine learning methods
Cyber                                                  such as Support Vector Machines (SVM),
General characteristics of the proposed unified        Random Forests (RF), and K-Nearest
architecture are given as follows:                     Neighbors (KNN). In addition, cognitive
                                                       computing module consists of reasoning
                                                                                                 69


    engine, which is driven by rule sets,          reversing the error correction coding technique
    semantic, and ontological reasoning. Both      known as Golay coding [4][8]. The scheme
    anomaly detection module and reasoning         utilizes 223 number of binary vectors of size
    engine module influence the attack             23 bits to profile features and cluster the data
    adaptiveness (reflexivity) and self-healing    items. Since the method is built based on error
    of IAS, where decisions obtained through       correction scheme, it exhibits fault tolerance in
    reasoning and learning are turned into         wrongly labeled data. Similarly, we perform
    actions. With this extensive cognitive         privacy-preserving knowledge            discovery
    computing modules, the final response          through perturbed aggregation in untrusted
    from IAS to other interacting entities will    cloud [5]. In this project, we will use advanced
    be a secure and trusted one.                   data analytics to enable reasoning module for
   Knowledge discovery module facilitates         assisting attack adaptation and reflexivity of
    multi-faceted dimensions of advanced data      the system.
    analytics including regression analysis,
    supervised      learning,      unsupervised    3. Cognitive    Autonomy   for
    learning,      and      pattern-recognition.
    Discovered knowledge is shared with               Cybersecurity in Autonomous
    cognitive computing module for further            Systems
    learning. The proposed structure provides
    robust cyber resilience and autonomous         Decentralized machine learning is a promising
    operation of the system.                       emerging paradigm in view of global
                                                   challenges of data ownership and privacy. We
                                                   consider learning of linear classification and
2. Background            on        Cognitive       regression models, in the setting where the
                                                   training data is decentralized over many user
   Autonomy                                        devices, and the learning algorithm must run
                                                   on device, on an arbitrary communication
   Cognitive computing is a vital part of          network, without a central coordinator. We
security in autonomous systems. In particular,     plan to utilize and advance COLA, a new
malware and anomaly detection has become a         decentralized training algorithm [23] with
biggest     challenge   with     increase     in   strong theoretical guarantees and superior
sophistication in attacks such as file-less        practical performance. This framework
malware [1] and ransomware [2]. Behavior-          overcomes many limitations of existing
based malware detection system (pBMDS)             methods, and achieves communication
was proposed in [3]. The technique observes        efficiency, scalability, elasticity as well as
unique behaviors of applications as well as        resilience to changes in data and participating
users and leverages Hidden Markov Model            devices. We will consider fault tolerance to
(HMM) to learn application and user behaviors      dropped and oscillation of nodes from
based on two features: process state transitions   connected to disconnected and attacks on the
and user operational patterns. One of the          nodes. The learning has to be communication-
drawbacks of the HMM model is that it has          efficient decentralized framework and free of
very limited memory thus cannot be used for        parameter tuning. COLA offers full adaptively
sequential data. In this project, we leverage      to heterogeneous distributed systems on
hardware, software, and operating system data      arbitrary network topologies and is adaptive to
and apply long short-term memory units to          changes in network size and data and offers
identify anomalous behavior. We will also          fault tolerance and elasticity. IAS should have
profile applications and malware using HW          clear understanding of its operational context,
data (memory bytes and instruction sequences)      it's won processes, and its interactions with
to whitelist benign processes and blacklist        neighboring entities. In this project, the
malicious processes. In order to enable better     cognitive computing module consists of three
results    for     LSTM      deep       learning   major components: (1) Malware / anomaly
methodologies, knowledge discovery and             detection module, (2) Reasoning engine, and (
representation are important. We proposed a        4) Reflexivity engine. Cyber attribution data
metadata labeling scheme, BFC, for                 (system monitoring data or provenance data) is
information tagging and clustering by
                                                                                                        70


sent to cognitive computing engine for
analysis where the system profiles the
applications based on machine learning
models. In this paper, we will focus on the
cognitive autonomy property of the
autonomous systems.

4. Malware     and     Anomalous
   Application Behavior Profiling                      Figure 3: Malware/anomaly Detection with
                                                       Light-weight Machine Learning Methods
   with Deep Learning Model:
                                                       Advanced malware such as ransomware
                                                       encrypts IAS data without authorization. Since
                                                       it does not alter the system configurations and
                                                       leave a footprint, it is difficult to detect them.
                                                       But based on the executed instruction
                                                       sequences and constants (also known as magic
                                                       constants) used for encryption mechanism
                                                       during malware execution, applications can be
                                                       profiled. First, we will sample the address
                                                       spots for every 1,000,000 instructions (fixed
Figure 2: Recurrent Neural Network (RNN)               sampling). After a fixed period of time, we
model for application behavior profiling               will calculate the frequently occurring
                                                       addresses and their relevant process ids. A
We use instruction sequences executed in               threshold T will be set for data extraction. For
memory by application to understand the                example, extract memory bytes and
behavior of each application.                          instructions from top T = 10% of the global
Input: n-gram sequences of instructions from           list of sampled addresses (sorted in descending
memory                                                 order based on their frequency of occurrence).
Output: Binary classification of benign or             Once opcode and memory bytes data is
malicious                                              collected, we will extract features such as n-
 Step 1: Define a finite set I of instructions        gram, bigram, unigram features, magic
   {i1, i2, ..., in} in the system. Instructions are   constants feature, cosine similarity with
   executed based on time epochs i.e., time-           instructions occurrences, and standard
   series data.                                        deviation. Cosine similarity metric is one of
 Step 2: Given an observed sequence of {i1,           the most efficient method to learn from large
   i2, ..., in}, we find the set N of the top P        datasets [20]. It plays a crucial role in
   sequences to be executed at time t. The             understanding similarity between two feature
   size of the set N varies in each prediction         vectors when the magnitude of the vector is
   and is determined by n-grams input as well          large or unspecified
   as the clusters in the output of the model.         i.e., it can either be unigram, bigram, or n-
 Step 3: At time t, the sequence {i1, i2, ...,        gram features. Given two feature vectors Vi =
   in} is benign if i1 is in P, otherwise              {f11, f12, ...} and Vi = {f21, f22, ...}, where f11,
   malicious.                                          f21, . . .are values of a particular feature, the
                                                       cosine similarity is given as,
     Algorithm 1: Application Behavioral
             Profiling Algorithm

5. Malware      and     Anomaly
   Detection with Light-weight                         The cosine similarity lies between O and 1. If
                                                       the orientation of the two feature vectors is the
   Machine Learning Models:                            same then the similarity between them is Cos
                                                       O = 1 i.e., there is zero angle between them.
                                                                                             71


But when the angle is 90° (the orientation of     [3] Xie, Liang, Xinwen Zhang, Jean-Pierre
the feature vectors is at an angle of 90) then        Seifert, and Sencun Zhu. "pBMDS: a
the                                                   behavior-based malware detection system
similarity is Cos 90 = 0. The similarity score        for cellphone devices." In Proceedings of
varies between [O, ½). Once the features are          the third A CM conference on Wireless
extracted, we will implement RF, SVM, and             network security, pp. 37-48. ACM, 2010.
KNN learning models. K-NN is one of the           [4] Mani, Ganapathy, Bharat Bhargava, and
simplest yet powerful classifier with high            Jason Kobes. "Scalable Deep Learning
computational efficiency as well as accuracy          Through Fuzzy-based Clustering in
[6].                                                  Autonomous       Systems."     In    IEEE
                                                      International Conference on Artificial
6. Conclusion                                         Intelligence and Knowledge Engineering
                                                      (AI.KE),       pp.       IEEE.       2018.
                                                      http://www.cs.purdue.edu/homes/bb/aike
   We presented two approaches for detecting          2.pdf
through      profiling    evasive     malware     [5] Mani, Ganapathy, Denis Ulybyshev,
applications. We use both light-weight                Bharat Bhargava, Jason Kobes, and
machine learning models as well as deep               Puneet Goyal. "Autonomous Aggregate
learning models to profile and understand the         Data Analytics in Untrusted Cloud." In
behavior of autonomous systems. This multi-
                                                      IEEE International Conference on
model approach is advantages when it comes
                                                      Artificial Intelligence and Knowledge
to computational resources in mission critical        Engineering (AI.KE), pp. IEEE. 2018.
systems. Based on the data and sample size,           http://www.cs.purdue.edu/homes/bb/aikel
appropriate model can be selected for analysis.       .pdf
In particular, light-weight machine learning      [6] Prasath, V. B., Haneen Arafat Abu
models use less computational resources and           Alfeilat, Omar Lasassmeh, and Ahmad
they have considerably less time complexity.
                                                      Hassanat. "Distance and Similarity
On the other hand, LSTM model can provide
                                                      Measures Effect on the Performance of
robust classification with fundamental data,          K-Nearest       Neighbor      Classifier-A
which enables IAS to understand evasive               Review."           arXiv          preprint
malware at basic level.                               arXiv:1708.04321 (2017).
                                                  [7] Bholowalia, Purnima, and Arvind Kumar.
7. Acknowledgements                                   "EBK-means: A clustering technique
                                                      based on elbow method and k-means in
  This research is funded by Northrop                 WSN." International Journal of Computer
Grumman Corporation.                                  Applications 105, no. 9 (2014).
                                                  [8] Mani, Ganapathy, Nima Bari, Duoduo
8. References                                         Liao,    and    Simon     Berkovich.
                                                      "Organization      of      knowledge
[1] Hopkins, Michael, and Ali Dehghantanha.           extraction from big data systems." In
    "Exploit Kits: The production line of the         2014 Fifth International Conference
    Cybercrime economy?" In Information               on    Computing     for    Geospatial
    Security and Cyber Forensics (InfoSec),           Research and Application, pp. 63-69.
    2015 Second International Conference on,          IEEE, 2014.
    pp. 23-27. IEEE, 2015.
[2] [2] Kharraz, Amin, William Robertson,
    Davide Balzarotti, Leyla Bilge, and Engin
    Kirda. "Cutting the gordian knot: A look
    under the hood of ransomware attacks."
    In International Conference on Detection
    of Intrusions and Ma/ware, and
    Vulnerability Assessment, pp. 3-24.
    Springer, Cham, 2015.