=Paper= {{Paper |id=Vol-2040/paper1 |storemode=property |title= Dynamically Correlating Network Terrain to Organizational Missions |pdfUrl=https://ceur-ws.org/Vol-2040/paper1.pdf |volume=Vol-2040 |authors= A.E. Schulz,B. David O'Gwynn,J. Kepner,P.C. Trepagnier }} == Dynamically Correlating Network Terrain to Organizational Missions== https://ceur-ws.org/Vol-2040/paper1.pdf
      Dynamically Correlating Network Terrain to Organizational Missions
                                A. E. Schulz∗, B. David O’Gwynn†, J. Kepner, and P. C. Trepagnier
                                           MIT Lincoln Laboratory, 244 Wood St., Lexington, MA 02420




A BSTRACT                                                                 mission. Section §2 discusses our basic methodology. In section
A precondition for assessing mission resilience in a cyber context        §2.1, we outline the technology goals for a suite of mission mapping
is identifying which cyber assets support the mission. However,           tools developed at MIT Lincoln Laboratory. For brevity, we will
determining the asset dependencies of a mission is typically a man-       refer collectively to this suite of utilities as PCAMM, or Person
ual process that is time consuming, labor intensive and error-prone.      Centric Automated Mission Mapping. In section §2.2 we discuss
Automating the process of mapping between network assets and or-          one possible implementation. In section §2.3 we outline some of
ganizational missions is highly desirable but technically challeng-       the applications of PCAMM, with some examples drawn from data
ing because it is difficult to find an appropriate proxy within avail-    at MIT Lincoln Laboratory. In section §3 we draw conclusions,
able cyber data for an asset’s mission utilization. In this paper we      discuss the drawbacks and limitations of this approach, and outline
discuss strategies to automate the processes of both breaking an          open research questions for the future.
organization into its constituent mission areas, and mapping those
                                                                          2 M ETHODOLOGY
mission areas onto network assets, using a data-driven approach.
We have implemented these strategies to mine network data at MIT          The key challenge in this work is to develop mapping technology
Lincoln Laboratory, and provide examples. We also discuss ex-             that is automated and data driven. There are three principal insights
amples of how such mission mapping tools can help an analyst to           that guide our basic approach. The first and most fundamental is
identify patterns and develop contextual insight that would other-        that the link between network assets and organizational missions is
wise have been obscure.                                                   the workforce: people are required to carry out tasks and accom-
                                                                          plish goals, and people are also the users of an organization’s net-
1 I NTRODUCTION                                                           work. If the people can be mapped to the organizational missions,
Situational awareness of cyber assets, as well as their function in       we can pivot through them to the network assets that they use in
supporting the organizational missions, is crucial to mission as-         executing their tasks.
surance, both for prioritizing cyber key terrain (KT-C) to defend            The second insight is that finance data provides a mechanism
and for understanding the attack surface presented to an adversary.       to find out what the people in an organization are doing without
Equally critical is understanding an organization’s sources of data,      manually interviewing them. All government and business entities
and how those data sources are utilized in support of the mission.        have a charging structure. The money they spend is subdivided
The resources for defense of cyber assets are always limited. An          according to what the money is for i.e. the missions and programs.
organization can only allocate those resources effectively if they        The money is also distributed to people who work on those missions
are aware of their cyber assets, and how those assets support the         and programs. Therefore the finance data provides a link between
organization’s goals.                                                     the people and the missions.
   Thus, mapping cyber assets onto organizational missions is crit-          The third insight is that different questions will require the mis-
ical for situational awareness, risk assessment and resource allo-        sion map to be enriched with different data sources. The finance
cation. Many researchers, both in the military and in industry,           data providing the link between people and missions is not “big”
have been developing manual processes to meet this critical need          in the sense of big-data. These can exist entirely in memory, even
[2, 3, 4, 5, 7, 13, 14]. Manually produced maps are better than           for a very large organization. However, enriching the mission map
nothing, but they have severe disadvantages. They only consist of         with all the network log data at once cannot be approached with the
a single snapshot in time, and do not dynamically evolve as cy-           fast in-memory hash-map technique used for the unenriched mis-
ber assets are reshuffled within an organization. Also, they do not       sion map. Since we need a forensic tool that is lightweight and
scale. As cyber assets become more multi-purposed and mobile, the         fast, we engineer a way to leverage the big-data assets and pivot on
timescale for reshuffling critical infrastructure will likely become      the fly to whichever network data sources help answer the specific
so short that a manually assembled map may become obsolete even           question.
before it has been completed. Automating the process of mapping           2.1 Technology Goals
between network assets and organizational missions is highly de-
sirable, but technically challenging because it is difficult to find an   The development of our suite of mission mapping tools, which we
appropriate proxy for asset importance within available cyber data.       call PCAMM, is guided by a number of overarching technology
However, although full automation may not be possible, most orga-         goals. The intent is to map network and cyber infrastructure to or-
nizations have access to data sources that can be used to partially       ganizational missions. The tools should provide a method to iden-
automate the mission mapping process, providing a starting point          tify mission critical assets: key users and accounts, as well as key
which can be manually refined.                                            infrastructure. They should provide insight on the mission impact
   Identification of appropriate live data sources that act as a proxy    of compromised accounts, compromised or unavailable machines,
for organizational missions is an open research question. In this         and both the network and organizational role of Information Tech-
work we focus on the use of financial data to serve as a proxy for        nology (IT) systems of interest.
                                                                             Although the PCAMM software is not limited to visualization of
  ∗ Corresponding author. e-mail: alexia.schulz@ll.mit.edu                the data, it is designed to conform to Ben Schneiderman’s mantra
   † Present address: Belhaven University, Jackson, MS 39202              [11]. It should provide overview data and big-picture mission con-
                                                                          text; it should provide search and filter capability; and it should
                                                                          provide details on demand, allowing the user to specify new layers
                                                                          of enrichment on the fly. The mission mapping tools are designed to
                                                                          allow real-time mission-driven sense-making and forensic capacity.



                                   Distribution A. Approved for public release: distribution unlimited
    Finally, PCAMM should be as automated as possible. The goal             available in a Network Operations Center (NOC). We emphasize,
is to use data to identify key network infrastructure, so that the tools    however, that any data associated with people can be used to enrich
can provide real-time context that doesn’t get stale as a network           the map. The data used in the prototype implementation PCAMM
evolves.                                                                    is data assembled from the MIT Lincoln Laboratory network, which
                                                                            is stored and accessed in the Lincoln Research Network Operations
2.2     Implementation                                                      Center (LRNOC) [6, 9]. The data is accessed via the Lincoln Lab-
2.2.1    Prototype                                                          oratory Cyber Situational Awareness (LLCySA) platform [10].
                                                                                One of the organizational data sources is phone book or direc-
We have built a prototype implementation of the automated mis-              tory data; usually Lightweight Directory Access Protocol (LDAP)
sion mapping tool. We emphasize that this is just one possible im-          data in LDIF format (LDAP Data Interchange Format), though the
plementation; if an organization has the requisite data available in        directory data could be in any format. The directory data typically
their environment, these ideas could be executed in any of a num-           provides names, identification numbers, usernames, email handles,
ber of ways. Our implementation consists of three distinct pieces:          phone numbers, office locations, job titles, as well as information
a data layer consisting of an Accumulo database; a knowledge en-            from the organizational chart such as the divisions and groups for
gineering layer consisting of a domain specific language (DSL),             the people in an organization. The directory data is used to create
implemented in Python, used to interface with the database, and             a person-list that will form the backbone of the un-enriched mis-
a mission mapping interface, written in Python, that leverages the          sion map. We have enhanced the map by binning the job titles into
DSL to build the mission map and enrich as needed. We focus on              broader roles within an organization. For Lincoln Laboratory, the
the third layer in this work, but again wish to emphasize that the          roles we use are research, technical support, administrative support,
Python-based approach utilized in our proof-of-concept implemen-            leadership, information technology, security and students. This as-
tation was chosen tactically for speed of development, as it inter-         signment of titles to role categories is one example of a manual
faced easily with LRNOC’s existing LLCySA platform [10]. Many               processes that fits into the mapping framework. For our work, we
other frameworks (e.g. the ELK stack or a SQL database) are also            have also used data from Human Resources to enrich the map with
possible.                                                                   university degrees, as a proxy for subject-matter expertise. (How-
   The primary class in the mission map utility is a person. A per-         ever, in military contexts much more precise occupational specialty
son has unique attributes, such as a user id, an email handle or a          data are available.)
network account name. A person also has attributes that are shared              To create the un-enriched mission map, the directory data back-
with other people, such as a research group or a job title. Some of         bone is fleshed out with organizational financial data. There are two
the attributes are one-to-one, such as person’s first name. Others          main inputs that are required for this approach. The first is the la-
are one-to-many, such as the programs a person works on. In this            bor charging structure in the organization; for each person there is
latter case, the value stored in the attribute is a list. These person      a record of which programs were charged in each month, as well as
objects are combined to form another type of class: a person-list.          the fraction of time that was charged to the program. The second is
The person-list is fundamentally a hash-table which is keyed on one         finance data on the mission allocation of money used to fund each
the the unique attributes of the people in the organization, such as        program, that is, which mission area is supported by each program.
a badge id number. Most of the operations carried out by PCAMM              Although it is possible that this mapping may need to be manually
are actuated by manipulating person-list objects. The key feature           assembled, it is common practice in most business offices to main-
of the person-list class is the enrich method, which takes any dic-         tain such a list and often the data already exists, as it did in the case
tionary of data keyed on people and uses it to add attributes to the        of Lincoln Laboratory. Once these data sources are incorporated
people in the map.                                                          into the map, each person will have a list of programs they con-
   As with any data store, one critical tool built into PCAMM is the        tribute to, and a list of mission areas (and potentially submissions)
slicing tool. The slice feature allows the user to filter the mission       that they support. This is the un-enriched mission map.
map on any attribute of a person, and return a slice whose members              The mission mapping tool functions by taking this base-level
either all have, or don’t have, that attribute. It is useful to break the   mission map and enriching it in an ad-hoc fashion with network
mission map of an organization into sub-maps (slices), each map-            security data that is housed in some underlying database. In our
ping out some subset of whole enterprise. The sub-maps can be               case we have used the Lincoln Laboratory LLCySA platform [10],
unique, or they can be overlapping. Making a slice for every pos-           but as previously mentioned any queriable database structure is an
sible value of an attribute is a process that we call organizational        option. There are several network data sources that we have incor-
breakdown. A breakdown is stored as a dictionary structure where            porated into PCAMM. One of the most important are the authenti-
the keys are the Nval possible values the attribute can take, and the       cation logs. These provide a mapping between users and machines
value is a sub-map: i.e. the corresponding person-list. The break-          that they utilize, either by directly logging on with a username and
down computation is an embarrassingly parallel process, and the             password, or authenticating with kerberos credentials. Another im-
computation time scales approximately as                                    portant source is the property tracking data, which help us to map
                                                                            between users and the machines they own. We have also incorpo-
                tbreakdown ∼ O(tsg ∗ Nval /Nproc )                   (1)
                                                                            rated Nessus [12] vulnerability reports, which tell us which network
                                                                            assets are potentially targets. Mail exchange metadata can reveal
where Nproc is the number of processors the user has available,
                                                                            more informal communities within the organization. Other possi-
and tsg is the computation time to generate one slice, which scales
                                                                            ble data sources include web proxy logs, VPN logs, IDS alerts and
approximately as the number of people N p populating the map:
                                                                            host-based security system data such as logs from McAffee [1]. For
tsg ∼ O(N p ).
                                                                            any of these network indicators, we can pivot through the people as-
2.2.2    Data sources                                                       sociated with the machines of interest to the missions and programs
                                                                            that are potentially affected.
The mission map is populated with data from the organization. The
data sources are divided into two categories, organizational data           2.3     Use cases
sources and network data sources. The organizational data sources
are used to build the un-enriched mission map, which we describe            2.3.1    Overview Utilities
shortly. The network data sources are used to enrich the mission            There are a number of utilities modeled on database queries that
map, at the operator’s discretion, with log data that is typically          build on the mission map infrastructure to provide added function-
                                                                                          that make up an organization. The tree([list of attributes]) method
                                                                                          provides a convenient way to explore the data to understand large
                                                                                          trends. An example, shown in Figure 2 with data from MIT Lincoln
                                                                                          Laboratory, might be to break down the organization by mission and
                                                                                          then by ip, to see which assets are most commonly utilized by each
                                                                                          mission area. It is particularly powerful that the user can specify
                                                                                          whatever branching order best answers the specific question.
                                                                                             Figure 2 reveals that some assets are commonly used by every-
                                                                                          one in the Laboratory, whereas others are used only by specific mis-
                                                                                          sion areas. The tree utility accepts a list of values to ignore, in case
                                                                                          the user would like to exclude the mail server or other common as-
                                                                                          sets. The method will build trees to arbitrary depth, which is com-
                                                                                          bined with a zoomable functionality in the visualization, that allows
                                                                                          the user to descend down to successive layers. Only two layers of
                                                                                          the tree are simultaneously visible in a given view. For example,
                                                                                          if we had built this tree with branding order [missions, ips, nessus-
                                                                                          plugins], the nessus id numbers would be hidden in the top level
                                                                                          view. However the analyst can click in any of the mission areas.
                                                                                          The visualization would zoom in and show one ip per colored area,
                                                                                          with subdivisions depicting nessus plugin id.
Figure 1: Failed login event data, binned every hour, for a ∼ 3 day period in February
2014. Three of MIT Lincoln Laboratory’s mission areas are represented. On Feb 26th,
there was a spike in failed login attempts, and these predominantly affected one of the      Mission	
  7	
                                                                 Legend	
  
                                                                                                                                                                           Removed	
  	
  
mission areas while leaving the other two at levels close to the historic baseline.                              Mission	
  3	
                         Mission	
  0	
  

                                                                                             Mission	
  6	
  


ality to PCAMM. One such utility is the getall(attribute) method.                                                                                     Mission	
  1	
  
                                                                                             Mission	
  5	
  
This method aggregates all values of a particular person-attribute                                               Mission	
  2	
  
for people in the map, and returns a list of these values. For ex-
ample, an analyst could create a sub-map whose members were                                  Mission	
  4	
  
owners of vulnerable machines identified in a Nessus scan. The
getall(emails) method could be used to retrieve a list of the owners’
email handles, quickly generating a distribution list to whom patch
information should be sent.                                                               Figure 2: The treemap functionality allows an analyst to get overview context of
   A related utility is the gethist(attribute) method. This method                        the data in the mission map. This example was generated using data at MIT Lincoln
returns a dictionary: the keys are each possible value of an attribute                    Laboratory to build a tree with branching order (missions, ips). Tree data structures
and the values are the number of people who possess that value of                         can be built to arbitrary depth, but the visualization shows only two layers at a time.
the attribute. Calling gethist(missions) will enumerate how many                          Deeper layers are revealed by clicking in a colored area. The legend and actual mission
people in an organization work on each mission area. Calling geth-                        names have been obfuscated.
ist(loginfails) on each sub-map in a mission breakdown can indi-
cate whether any particular mission area is the target of a password                         One of the goals of PCAMM is the discovery of relationships be-
spraying attack. For convenience the the plothist(attribute) method                       tween network entities, people, missions and programs. To this end
displays a plot of the histogram.                                                         there are two useful utilities that quantify the extent to which enti-
   The PCAMM software allows the user to quickly enrich exter-                            ties are ”connected,” the correlation method and the pattern method
nal data sources with programmatic or mission context. Figure 1                           (discussed in section 2.3.2). The correlation utility is best explained
demonstrates this: the raw data are active directory log events of                        in terms of a Venn diagram. If two populations overlap, one can
failed login attempts at MIT Lincoln Laboratory between 24-27                             compute the conditional probability that a person in set A is also a
Feb 2014, which have been enriched with mission context using                             member of set B with Bayes’ theorem (e.g. [8]).
PCAMM. We only plot three of the mission areas in Figure 1. On                                                                      P(A ∩ B)   No /NT   No
Feb 26th around noon, there was an event generating failed login                                                P(B|A) =                     =        =                            (2)
                                                                                                                                     P(A)      NA /NT   NA
counts several orders of magnitude higher than the baseline. Figure
1 shows that these events potentially have more impact on one of                          Here No is the number of people in the overlap region of the Venn
the mission areas than the others. This is only one example. Since                        diagram, NT are all the people in the organization, and NA and NB
PCAMM allows an analyst to enrich any new data source with mis-                           are the number of people in groups A and B, respectively. This prob-
sion context, any event data associated with hosts, ip addresses, or                      ability quantifies the correlation of the two properties represented
people can be enriched with programmatic or mission context in                            by A and B within the organization, and is equal to 1 if the two
this way.                                                                                 properties correlate perfectly (i.e. complete overlap of the Venn Di-
   The data in the mission map is inherently graph data, with nodes                       agram). The method correlate(attribute1,attribute2) will compute
and edges that connect the various attributes such as ip addresses,                       the correlation of every value of one attribute with ever value of
missions and people. However graph representation of the data of-                         another. Computing the autocorrelation will quantify the clustering
ten does not convey sufficient contextual information to be useful.                       within an attribute, for example, correlating programs to programs
Our approach to this difficulty is to use a configurable treemap to                       will quantify what fraction of personnel are shared between any two
convey context in a flexible way. The breakdown method described                          programs.
in section 2.2.1 can be applied recursively. The result is the creation                      Because the correlate function is many-to-many, it
of a tree data structure that provides an overview of the components                      doesn’t lend itself well to visualization. Therefore, a plot-
                                                          Host	
  IP	
  Address	
  Removed	
  	
         Homeland	
  Protec.on	
  
                                                                                                         Spear-­‐phish	
  targets	
  
                                                                                                         Program	
  10	
  




                                                                                                                                           D              A

                                                                                                                                                     B
                                                                                                                                               C




Figure 3: The probability that a user of the host at ip xxx.xxx.xxx.xxx charges to
                                                                                                     Figure 4: If A, B and C are associated with IT systems of interest, the pattern dis-
a particular program is shown for users who authenticated between April 29th and
                                                                                                     covery methods in PCAMM allow an analyst to determine what the people have in
30th . This is real data from MIT Lincoln Laboratory, although the ip and the program
                                                                                                     common, and the pattern matching method facilitates the discovery of person D, whose
numbers have been obfuscated.
                                                                                                     assets may also be compromised.


                                                                                                             Attribute                      Value             # People / 7   Unique
corr(attibute1,value,attribute2) method will display the correlation                                          missions                  Space Control             22          No
of one specific value of attribute A with all possible values of                                              sponsors                    Other DoD               14          No
attribute B. Figure 3 shows data collected in a 24 hour period on                                              gender                       Female                 7          Yes
                                                                                                                role                       Research                7          Yes
Apr 29th 2014. The plot shows the probability that a user of the
                                                                                                                title                   Technical Staff            4          Yes
host at the obscured ip address works on a given program along the
                                                                                                          propertylocations                 Z1-123                 4          No
x−axis. The data depicted are real, but the ip and program numbers
have been obfuscated for MIT Lincoln Laboratory operational
security. This type of analysis could be useful in quantifying which
                                                                                                     Table 1: Output of the pattern() method on a sub-map of MIT Lincoln Laboratory
programs are potentially at risk, in the event the host at this ip                                   data. The sub-map corresponds to owners and users of assets at several ip addresses.
address were compromised.                                                                            The table quantifies what the individuals in the map have most in common.

2.3.2     Pattern Utilities
A critical part of situational awareness is the ability to recognize                                 on 22 programs in the Space Control mission area, many of which
and match patterns in the data. For example, if network assets are                                   are sponsored by Other DoD sponsors. Owners and users of these
found to be compromised, it is reasonable to inquire whether a par-                                  machines happen all to be female members of the research staff. If
ticular individual is associated with them. It is also useful to know                                these machines constituted a list of compromised assets, this pat-
whether these assets work in concert to support some programmatic                                    tern might help an analyst determine that the vector for the threat
or mission function. Such patterns may help an analyst to decipher                                   was somehow connected to a professional association of technical
the root cause of the infection, and whether there is a particular tar-                              women. The analyst can use the map technology to explore this
get of the compromise within the organization. Pattern matching is                                   hunch further. For example, she may be curious to see if these
critical for searching out other individuals who may be affected, or                                 women have recently attended some conference in common. She
identifying other network assets potentially at risk.                                                could use the enrich() method to add travel data to this sub-map, and
   The cartoon in Figure 4 illustrates this idea. Suppose malware is                                 potentially discover how many of these individuals had attended a
detected on the laptops of persons A, B and C. The pattern method                                    recent conference on women in STEM fields.
can help identify what these people have in common. In this ex-
                                                                                                        The functional inverse operation of the pattern() method is the
ample, they all support the Homeland Protection mission area, they
                                                                                                     pattern match() method. A user supplies a template with feature
all charge to a particular program, and they have all been targets in
                                                                                                     values, and the map is scanned to return a sub-map whose members
a recently identified spear-phishing campaign. Discovering this in-
                                                                                                     match the template. An example template used identify person D
formation is useful for potentially tracing the source of the infection
                                                                                                     (as well as A, B and C) in Figure 4 might be (missions = Homeland
to a phishing attack, identifying which program and mission areas
                                                                                                     protection, programs = 10, speardates != None). Pattern matching
are potentially impacted, and locating person D. This person has
                                                                                                     is accomplished by repeatedly applying the slicing algorithm. It
many features in common with A, B and C; their assets may also
                                                                                                     is an example of a utility that would be easier to optimize if the
be infected, or person D may inadvertently have been the vector
                                                                                                     mission map were implemented directly in a database, rather than
propagating the infection.
                                                                                                     the in-memory prototype discussed in this work.
   The pattern() method can be called on the mission map or any
sub-map. It systematically examines data with which the map has
been enriched. It returns a sorted list of tuples, containing the at-                                3       D ISCUSSION
tribute, the most common value found in the sub-map, and how                                         We present in this paper an important first step in dynamically cor-
many people share that value. An example is shown in Table 1.                                        relating cyber terrain to organizational missions. Such a mapping
The pattern() method was used on a sub-map created for this exam-                                    is critical for enhanced network situational awareness, for devel-
ple, which corresponds to owners and users of assets at several ip                                   oping forensic insight, for assessing risk and optimally allocating
addresses at MIT Lincoln Laboratory. There are seven individuals                                     resources to defend mission-critical assets. Our approach of in-
in the sub-map, and these results show that collectively they work                                   corporating finance data and identity stores with passive network
monitoring and log data demonstrates that an approximate map can             Mapping network assets onto organizational missions is vital for
be built dynamically, providing insight that would not otherwise be       identifying infrastructure and quantifying its mission impact, iden-
available. However, there are limitations to this approach. First, this   tifying interdependency of mission components, identifying pat-
implementation uses authentication data as a proxy for asset impor-       terns and finding other assets that meet a profile. Mapping informs
tance. While frequency of use is one important indicator, it will         the optimization of resource allocation, helps to quantify risk, and
overemphasize the importance of some assets, and will potentially         provides the basis for mission assurance. We have demonstrated
miss other critical assets such as servers and routing infrastructure,    some initial capabilities toward the ultimate realization of dynami-
whose use does not typically require authentication. One impor-           cally achieving these goals, and are searching for appropriate use-
tant extension of this work that will greatly mitigate this inaccuracy    cases on which to test and calibrate the method.
is the addition of logs from routers or switches, that lend insight
into internal connections between machines. With this new source          ACKNOWLEDGEMENTS
of information each person could be associated with a list of hosts       The authors wish to thank the members of the LLCySA team for
and/or ip addresses that they directly use, as well as a second layer     their support of this work. They also acknowledge useful conver-
of hosts/ips observed to connect with the first layer in netflow data.    sations with Jeffrey Gottschalk, Bill Streilein, Bill Campbell and
Not only will this provide more complete insight into the assets          James Riordan. This work is sponsored by the Assistant Secretary
used in a given mission area, it will also be critical for baselining     of Defense for Research & Engineering under Air Force Contract
internal network connectivity, deviations from which can help indi-       #FA8721-05-C-0002. Opinions, interpretations, conclusions and
cate potential insider threats or lateral movement.                       recommendations are those of the authors and are not necessarily
   Another drawback of this approach is the heavy reliance on fi-         endorsed by the United States Government.
nance data to identify the missions and programs. While this map-
ping is likely to exist for most organizations and business entities,     R EFERENCES
it will be far more accurate in some cases than in others. Following       [1] McAfee: an intel company. http://www.mcafee.com/us/, 2014.
the money will always provide an approximate proxy for organiza-           [2] A. D’Amico, L. Buchanan, J. Goodall, and P. Walczak. Mission im-
tional mission structure, but it would be best to combine this with            pact of cyber events: Scenarios and ontology to express the relation-
other indicators of a person’s role or work function. These indicia            ships between cyber assets, missions, and users. Technical report,
will in many cases be directly available, but may be augmented by              DTIC Document, 2009.
e.g. semantic analysis of their documents or email.                        [3] J. Goodall, A. D’Amico, and J. Kopylec. Camus: Automatically map-
   Another potential improvement to the implementation presented               ping cyber assets to missions and users. In Military Communications
here will be to add data dependencies of the various missions and              Conference, 2009. MILCOM 2009. IEEE, pages 1–7, Oct 2009.
programs to the map. This will provide a secondary indication of           [4] S. Musman, M. Tanner, A. Temin, E. Elsaesser, and L. Loren. A
asset importance, particularly for database and server infrastructure.         systems engineering approach for crown jewels estimation and mis-
It will also help interpret the risk associated with adverse events,           sion assurance decision making. In Computational Intelligence in Cy-
such as a discovery of data exfiltratration, to the missions and pro-          ber Security (CICS), 2011 IEEE Symposium on, pages 210–216, April
                                                                               2011.
grams. We reserve an analysis of mapping data dependencies for
                                                                           [5] A. Natarajan, P. Ning, Y. Liu, S. Jajodia, and S. Hutchinson. Nsd-
future work.
                                                                               miner: Automated discovery of network service dependencies. Or-
   Despite these imperfections, the PCAMM implementation has                   lando, FL, March 2012. IEEE, IEEE International Conference on
demonstrated that pivoting through employees to filter for attributes          Computer Communications.
of interest is a highly effective way to increase network situational      [6] J. O’Connell. Lincoln research network operations center. MIT Lin-
awareness and convey mission context. The PCAMM software pro-                  coln Laboratory, Cyber Netcentric Workshop, June 2014.
vides                                                                      [7] E. Peterson. Making sense of cyberspace: visualizations for analysts
                                                                               and decision-makers. Johns Hopkins University Applied Physics Lab-
   • Overview                                                                  oratory, Visualization and analytics for cyber situational awareness,
                                                                               August 2013.
         – Display histograms for a program, division or mission           [8] W. H. Press. Numerical recipes in C++: the art of scientific comput-
         – Create treemaps showing the distribution of subgroups               ing. Example book in C++. Cambridge University Press, 2002.
           and sub-subgroups                                               [9] S. M. Sawyer, B. David O’Gwynn, A. Tran, and T. Yu. Understand-
                                                                               ing query performance in accumulo. In High Performance Extreme
         – Compute statistics and distributions for aggregated                 Computing Conference (HPEC), 2013 IEEE, pages 1–6. IEEE, 2013.
           quantities                                                     [10] S. M. Sawyer, T. H. Yu, M. L. Hubbell, and B. D. O’Gwynn. LL-
         – Identify rare attribute values                                      CySA: Making sense of cyberspace. Lincoln Laboratory Journal,
                                                                               20(2), 2014.
   • Zoom and Filter                                                      [11] B. Shneiderman. The eyes have it: A task by data type taxonomy for
                                                                               information visualizations. In Visual Languages, 1996. Proceedings.,
         – Filter on a particular feature to create a sub-map                  IEEE Symposium on, pages 336–343. IEEE, 1996.
                                                                          [12] Tenable Network Security . Nessus open source vulnerability scanner
         – Correlate between any attribute and any other                       project, 2005.
                                                                          [13] P. Verga. Defense critical infrastructure actions needed to improve the
         – Identify individuals who meet some profile
                                                                               identification. Department of Defense, Manual, 1(3020.45), October
                                                                               2008.
   • Pattern identification
                                                                          [14] J. Watters, S. Morrissey, D. Bodeau, and S. C. Powers. The risk-to-
                                                                               mission assessment process (riskmap): a sensitivity analysis and an
         – Identify commonalities between persons (or machines)
                                                                               extension to treat confidentiality issues. The Institute for Information
           of interest
                                                                               Infrastructure Protection, 2009.
         – Identify others assets that fit the pattern

   • Details on Demand

         – Retrieve data for any person or group of interest