-

Workshop on Deep Models and Artificial Intelligence for Defense Applications: Potentials, Theories, Practices, Tools, and Risks, November

“ The Holistic Battlespace: Why the Key to Resilience for AI/ML Algorithms is to Leverage Complexity Science”.

By Dr. Joe Schaff

Mission Systems

2020

1 1 12

Disclaimer: The work and related approaches in these slides are the opinions of the author, and do not reflect any policy, methods or Joe Schaff, NAVAIR / NAWCAD Mission Systems

approaches used by the US government. 1. 2.

Exploit the weaknesses of an adversary’s environmental dependencies.

Strengthen the dominant position by protecting key environmental factors. • Currently, a battlespace consists of a heterogeneous mix of humans and machines, some with intelligent autonomous systems. • Looking forward, the majority will be intelligent autonomy. • Either of these will have a dependency on the judicious use of information – there will not be complete, but only limited data. • To win, a dominant force needs to have awareness of its general objectives, the force laydown of both sides and any significant changes that may occur. • Information can and should be communicated in a narrow channel as nature does – i.e. stigmergy.

Emergence

• Ecosystem- or Battlespace-sized interactions will by default have unexpected (emergent) behaviors. • Intelligent autonomous systems (or Complex Adaptive Systems = CAS) will need to rapidly learn and adapt to their dynamically changing environment.

Effective learning must occur with limited experiences. • Below is a list of some key issues with ML in general: 1.

The need for adequate (i.e. massive) number of samples for comprehensive

training. 2. Long time scales for adaptive learning, partially due to massive sample size. 3. 4.

Large computational resources needed for training.

Brittleness due to lack of resilience, emergent misclassifications, and overfitting. • Most of these are significantly different from human limitations. Let’s look at the holistic picture to see how we can address some of these:

Massive embedded mobile ad-hoc (MANET) radios create the “smart swarm”. • Both humans and machines, referred to as “entities” communicate = interactions. • Entities are heterogeneous and need to self-organize and be cognizant of order. • Mathematically equivalent problem whether you assume either radios or UAVs. 2. Entities each can consist of one or more components.

• Components need to be resilient to attacks – i.e. self-healing and resistant. • Components are “smart components” that embed AI / ML to augment sensor and route planning capabilities. World model is the abstract “awareness”. • What does ETE look like at different scales (1 & 2 above)? • Battlespace by necessity must be complex.

• Attempts to over-simplify result in easily targetable entities. • Emergent behaviors will occur whether you want them

or not. • Best choice: “when you can’t beat ‘em, join ‘em”: • leverage these behaviors to produce tactical advantages. • Use these to create self-healing resilient networks. • Use the “creativity” that can emerge from nonlinear

classifiers in AI. • Choose wisely where you use emergent aspects of

complexity, how you apply AI. • Constrain other systems / components as needed to

make best use – e.g. formal methods. • Be the “lion tamer” of complexity to gain winning tactical advantages.

Joe Schaff, NAVAIR / NAWCAD Mission Systems

DISTRIBUTION STATEMENT A

Components

(Red = degree of complexity being used) Massive Swarms

Platforms

Platform

Components (use of AI/ML) Swarm Cloud (10,000’s objects) Platform Component Architecture

Swarm Technology Overlap: #1 – Massive Smart Swarm: Self-organizing mathematics = uses ”deterministic chaos” Video: https://youtu.be/iggsygNPEnU

How Can This Possibly Work? • Randomly generated, but constrained topology. • Does translation / rotation (mathematically = affine transformation). • Implicitly self-similar. • Computationally simple math • iterations (Iterated Function System = IFS). • In this particular function only one float multiplication per iteration: e.g. for determining the topological layout of 10,000 entities, would be 10KFLOPs. • Any IoT / edge device would have computational power to get topological picture of battlespace / other in milliseconds or faster (e.g. ESP32 = 400µsec). • So, what do we do with this? Distributed C2 / Resilient comms in denied environments? Control massive swarms? How Human Immersion into Battlespace: 1) Put on Oculus / other

headset 2) Link controls (BCI / other) to one of the UxVs in proximity circle. 3) Pass token to first one to respond / arbitrary choice. 4) View what it “sees”, and

fly in its “world”. 5) Handoff token when done / other location needed.

OK, but what is it??? Further details can be found in the chapter I wrote (Leveraging Deterministic Chaos to Mitigate Combinatorial Explosions) for the book “Engineering Emergence: A Modeling and Simulation Approach”, CRC Press ⓒ2019.

Trust: decisions

Cyber Architecture for cyber-hardened smart components to learn & adapt while creating a greater trust in their autonomous • Trust and resilience go

hand-in-hand. • Must merge Cyber and

A.I. holistically. • Must allow free-reign of

A.I. (i.e. creativity) but use effective resiliency constraints. • Meta-reasoning to prevent A.I. algorithms from being deceived.

Resilience & the desired attributes of behaviors creates trust.

Patent disclosure was submitted and presented to Invention

Evaluation Board Adversarial AI: Natural Adversarial

Examples* • Natural adversarial examples from IMAGENET-A. The red text is a ResNet-50 prediction with its confidence, and the black text is the actual class. * from: arXiv:1907.07174v2 [cs.LG] 18 Jul 2019

How do we avoid some of these issues? • We may never be able to design “foolproof” resilience into a system. • There are good strategies to limit some of the weaknesses in AI/ML. • Some aspects of transfer learning – IF the data is “clean” to begin with: “An Empirical Evaluation of Adversarial Robustness under Transfer Learning” • Others may not be avoidable if data is “poisoned”. See: Poison frogs. • First steps: Architecting trustworthy resilience and validating these architectures DARPA started the Assured Autonomy program. • This program looks at the methods for some AI / ML validation, but does not look at the battlespace “Big Picture”. • Early stage - Focused on AI/ML specifically. • Funding academic research for verifying /validating performance aspects of primarily NNs • Example: • VerifAI/SCENIC = toolkit for design/analysis of AI systems (SCENIC=probabilistic

programming language). D. Fremont, et.al, UCal Berkeley. • Study uses Grand Theft Auto 5 (GTA5).

• Download software here: https://github.com/BerkeleyLearnVerify/VerifAI • Many more examples available from other schools. • Formal Methods Approaches are frequently used. Formal Methods for Trust(?)…but it doesn’t Scale well…

Can it work with “smart components”? • Sometimes- complexity may rule it out

Component Architecture Background *Architectures have been designed in the past that address some but not all of these. Below are some of the attributes of the proposed architectural approach: • 1. Is able to use heterogeneous AI/ML technologies. • 2. Mitigates shortfalls in specific vision/other algorithms. • 3. Does meta-reasoning (cognitive architecture). • 4. Is Cyber-resilient. • 5. Is fully scalable from low-cost expendable to high value platform. • 6. Has a fully open architecture in hardware and software.

• 7. Allows exploration of algorithm internals for AI/ML and cyber analysis. • * Note: “architecture” is clearly an overloaded word - if you don’t like the word “architecture”, replace it with “framework”. Quick Fix for Minimal Data and “basic” Rapid

Learning

• What about DLNN issues:

adequate (i.e. massive) number of samples for comprehensive training?

Short time scale for adaptive learning?

• Transfer learning: take the trained weights / other parameters for similar NN trained on similar problem, load into new NN.

• Issues include: is the problem domain sufficiently similar? Does this limit the item classified to only those close / exact enough to original training data (i.e.

overfitting)? • Better way: Use “helper” algorithms and mathematical functions as coarse classifiers to “pre-train” the DLNN.

• Helper algorithms can work in a complementary manner with algorithms that are more accurate but challenging to train / adapt. • More than just ensemble classifiers = these are matched complementary sets. The sets can also be combined with other classifiers for an ensemble. Build a Scalable Prototype for ML & Cyber, and

Future Advanced Threats.

Real-time Convolutional Neural Networks for Emotion and Gender Classification (academic pub.) Every row starting from the top corresponds respectively to the emotions {e.g. “angry”, “happy”, “sad”, “surprise”, …} Both left & right blocks represent same pictures.

Right=convolved using backpropagation variant algorithm.

Prototype for expendable robot with deep-learning vision / object recognition.

Cost: <$300. From Prototype to Production: Overlaying a

Technology Transition Architecture

Even if the ETE architecture is incomplete, now is the time to design a “universal” production system designed for adaptation and validation.

Questions to be asked: 1. Is the research current state of the art?

Who is doing various parts of this research? How do we avoid the “valley of death” common to research transition? Can information flow effectively to / from researchers and customers? What conduits exist for resilient & consistent software to transition to customer use cases? Enhanced DevSecOps = researchers, tools, taxonomy conduits, secure containers.

Taxonomy conduits

Prototype ready

for customer Customer Use

Cases:

Connected via automated UML generated from selected software containers Researchers

Issues and What’s Next?

The “big picture” is currently incomplete: • Segments of the ETE architecture exist, satisfy some gaps. • Other gaps exist: both known and unknown. • Where does complexity provide advantages? Where are deterministic solutions better? • Must work in a multi-domain battlespace – the two ends (swarm,

components) are designed specifically for that. • What organizations can address the “big picture”? • Now at critical junction for MUmT and autonomy – incomplete/delayed response could put us too far behind adversaries to catch up.

On-going work – things I am doing so far: • Developed a class of algorithms that manage massive “smart” swarms: • Similar approach to ecosystems in nature, “stigmergic” communication. • Leverages “swarm intelligence” = AI, so that any entity “knows” where the others are positioned, as well as changes when broadcasted. • Needs only a few bytes of data to reorganize / know relative positioning of all battlespace entities. • Trivial math – e.g. raspberry Pi can calculate 10,000+ entities positions & dynamics in less than 100µsec. • Developed the resilient meta-reasoning architecture for components: • Uses heterogeneous AI / ML algorithms in a complementary manner = weakness of one type of algorithm is covered by another, + helper functions for learning as needed. Scales from raspberry Pi to largest available. • AI algorithms are given free reign in a ”sandboxed” environment to allow the full creativity or innovative results for most effective tactical decisions. • Meta-reasoner is the “rationalizer” or “adult supervision” that decides whether an algorithm has been deceived, choosing another algorithm’s results if needed. Periodically, meta-reasoner learns and adapts. • Ongoing collaboration with NASA LaRC Formal Methods laboratory. • Ongoing collaboration with academia, DARPA Assured Autonomy, OFJoeFSSchEafTf,NpAVrAIoR/gNrAaWmCADsM.ission Systems DISTRIBUTION STATEMENT A

Adversarial AI Adversarial AI Malware

1. Extracted from Hu and Tan: “Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN” • Works even when attackers have no access to the architecture and weights of the neural network to be attacked. 2. Extracted from paper by UMD researchers: “Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks ” • Data poisoning = attack on machine learning (ML). • Attacker adds examples to training set to manipulate the behavior of the model. • Targeted to control the behavior of the classifier on a specific test instance without degrading overall classifier performance. • Attacker adds a seemingly innocuous image (that is properly labeled) to a training set for face recognition, and control the identity of a chosen person. • Poisons could be entered into the training set simply by leaving them on the web and waiting for them to be scraped by a data collection bot. 3. Images in nature can confound machines.

Deconvolutional Network for Face Decomposition

• Top-down parts-based image decomposition with an adaptive deconvolutional network. Each column corresponds to a different input image under the same model. • low-level edges, mid-level edge junctions, high-level object parts and complete objects {extracted from: Zeiler, Taylor, Fergus; “Adaptive Deconvolutional Networks for Mid and High Level Feature Learning”} Several methods to construct deep fakes – some use Generative Adversarial Networks (GANs), other methods for deconstruct / reconstruct facial features. Joe Schaff, NAVAIR / NAWCAD Mission Systems

DISTRIBUTION STATEMENT A

Understanding Differences Between

Cyber - {Security} and {Resilience} Security: 1) Preserving data ”at rest” and in-transit. 2) Privacy = encryption, least-privilege access.

3) Securing system against external attack – hostile takeover, network-based attacks, etc.

Resilience: 1) More AI / ML based problems. 2) Resilient to deception / misclassification. 3) Resilient to noise added to data. 4) Recovery from exploitation of known weaknesses in classifiers. 5) Recovery from unanticipated attacks. Steps 1 and 2: Cyber-secure Kernel, Linux Containers

Use a microkernel OS = Example: Fuchsia (by Google – in development).

Based on a new microkernel called "Zircon” secure computing environment. Similar approach used by DARPA High Assurance Cyber Military Systems (HACMS) program.

Use Linux Containers (e.g. “Docker”) a) b)

Why? 1. It “sandboxes” unstable or vulnerable, yet useful ML algorithms. 2. Sandbox can re-instantiate the algorithm if it “crashes” due to malicious attack or instability. 3. Allows full creativity or “emergent behaviors” of algorithms.

Overhead and stability costs? a) Almost identical to bare metal or native ML application without sandboxing. b) If container crashes, then microkernel restarts container app with “sandboxed” algorithm. Pipeline Architecture: R&D to customer

Conduits

Pipeline Architecture:

A Multi-pronged Approach 1. Foundation: create developer pipelines, i.e. - remove any burden of operations so that researchers concentrate on research. 2. Latest technology advances from all available sources = follow the taxonomy tree. 3. Identify gaps and unfulfilled needs = where to invest in the research effort. 4. Map use cases to UML / MBSE language abstraction of software, for transition pipeline. OSD DevSecOps & more? Containers help visibility and sharability of products.

Pipelines to / from developers.

Hardened containers for algorithms or other software components.

MilCloud based = latest research in AI/ML may be shared with other researchers. BUT...this pipeline is not enough. Need to insert taxonomy...

Joe Schaff, NAVAIR / NAWCAD Mission Systems

Additional labs (e.g. NAWCAD,

NIWC, DHS) Additional algorithms (i.e. ant colony optimization or nature inspired algorithms with the swarms, LDA, PCA) add other architectures (CNNs) Joe Schaff, NAVAIR / Systems DISTRIBUTION

Taxonomy conduits

Prototype ready

for customer

Connected via automated UML generated from selected software containers Researchers Course Outline1 • Course will cover topics as diverse as the technology for biologically inspired robots, cognitive robotics, cultural, social and legal aspects of robotics, data mining, examples of human systems interfacing, machine learning principles and their limitations with respect to AI. • Your objective as a student will be to integrate this interdisciplinary knowledge and perform out of the box thinking, demonstrating this in a term project. • We're going to look at the ideas like robot emotion, and collaborative robots that can form limited social interactions. • You will design a robot that can implicitly determine the action it needs to take without explicit commands given to it, by observing its interaction with people. • The term project: Think of creating a Kickstarter where you will be building the next generation of cognitive human-behaving robots. • You need to show your product as something investors would buy into. • I will provide course material and extensive reference sources for both hardware and software to design these robots. • These robots could realistically be built with hardware and software for as little as $2000. • The Kickstarter is only a goal to shoot for, and if you indeed want to create an actual one after the course is over, you are encouraged to do so either alone or in collaboration with others in your class. • Unlike an actual Kickstarter, there's no penalty for not being sponsored - if you try and think out of the box, and apply whatever knowledge you're capable of finding as well as what I will provide, you will succeed.