An Ontology for Insider Threat Indicators Development and Applications Daniel L. Costa, Matthew L. Collins, Samuel J. Perl, Michael J. Albrethsen, George J. Silowash, Derrick L. Spooner Software Engineering Institute Carnegie Mellon University Pittsburgh, PA, USA insider-threat-feedback@cert.org Abstract—We describe our ongoing development of an insider II. PURPOSE threat indicator ontology. Our ontology is intended to serve as a standardized expression method for potential indicators of A. Goals malicious insider activity, as well as a formalization of much of The primary goal of this effort is to support the creation, our  team’s  research  on  insider  threat   detection,  prevention,  and   sharing, and analysis of indicators of insider threat. Because mitigation. This ontology bridges the gap between natural language descriptions of malicious insiders, malicious insider insider data is sensitive, insider threat teams often work only activity, and machine-generated data that analysts and with data from inside their own organizations. These records investigators use to detect behavioral and technical observables frequently include documented employee behaviors, of insider activity. The ontology provides a mechanism for intellectual property, employee activity on networks, and sharing and testing indicators of insider threat across multiple information on organizational proprietary networks and participants without compromising organization-sensitive data, information technology (IT) architecture. Organizations and thereby enhancing the data fusion and information sharing teams are hesitant to release this information due to the risk of capabilities of the insider threat detection domain. breaching employee privacy, releasing sensitive organizational information, or unnecessarily losing a competitive advantage. Keywords—ontology; insider threat; data fusion; information sharing A shared ontology will allow teams to share indicators of insider threat without disclosing their own sensitive data. Our I. BACKGROUND desired outcome is to facilitate information sharing on The study of insider threat presents some of the most effective indicators of malicious insider activity across complex challenges in information security. Even defining the organizations, with an emphasis on extensibility, semi- insider threat has proven difficult, with interpretations and automation, and the ability for community members to benefit scope varying depending on the problem space. The CERT® from investigations and analysis performed by others. Division of Carnegie   Mellon   University’s   Software   B. The Case for an Ontology Engineering Institute defines a malicious insider as a current or former employee, contractor, or other business partner who has All entity and relationship data models, including semantic or  had  authorized  access  to  an  organization’s  network,  system,   data models, have their limitations [5]. Models are extremely or data and intentionally exceeded or misused that access in a formal by design and can encounter problems when manner that negatively affected the confidentiality, integrity, or representing the variety of actions involved in a real-world availability   of   the   organization’s   information   or   information   insider threat case. In addition, the data on cases of insider systems [1]. Organizations have begun to acknowledge the threat is often gathered from legal judgments and outcomes importance of detecting and preventing insider threats, but whose documentation is highly variable. As a result, insider there is a surprising lack of standards within the insider threat threat domain experts tend to rely on natural language to domain to assist in the development, description, testing, and document their cases and findings. Though natural language is sharing of these techniques. For many organizations, more expressive than a model, we believe the insider threat establishing an insider threat program and beginning to look domain will benefit from the development of an ontology. Our for potentially malicious insider activity is a new business interest in building an ontology, developed from our activity. In particular, Executive Order 13587 and the National observations of the field today, is driven by the following Insider Threat Policy describe minimum standards for factors: establishing an insider threat program and monitoring x We expect rapid growth in the data being collected and employee use of classified networks for malicious activity [2- shared by organizations, specifically about insider threats. 4]. Some organizations have already stated that overcoming this challenge is one of their top priorities [6]. x The insider threat research community lacks a defined, formal model that is machine readable, human understandable, and transferrable with limited sharing barriers. We felt that starting a model of this kind, based C. Construction Method on the real-world case data we have already collected, Since 2001, the CERT® Insider Threat Center has collected could accelerate this process within the community, as over 800 cases in which insiders used IT to disrupt an has been done in other fields [7, 8]. organization’s   critical   IT   services,   commit   fraud   against   an   x We are willing to accept some loss of descriptive power organization, steal intellectual property, or conduct national for individual cases, provided we can analyze large security espionage, sabotaging systems and data, as well as populations of cases using computation. We expect other cases of insiders using IT in a way that should have been insider threat teams (both in research and in operations) to a concern to an organization. This data provides the be asked to detect insider threat activity by analyzing a foundation for all of our insider threat research, our insider growing quantity of data from new sources in an threat lab, insider threat assessments, workshops, exercises, increasingly limited amount of time. and the models developed to describe how the crimes evolve over time. Our case collection involves gathering and III. APPROACH analyzing data from public (e.g., media reports, court A. Domain Identification documents, and other publications) and nonpublic (e.g., law At first glance, defining the domain of our ontology enforcement investigations, internal investigations from other appeared to be a trivial matter: representation of potential organizations, interviews with victim organizations, and indicators of malicious insider activity. In practice, indicators interviews with convicted insiders) sources. This data of malicious insider activity involve complex interconnections collection, summarized in Figure 1, primarily focuses on of parts of several other domains: gathering information about three entities: the organizations x Human behavior: understanding insider threats involves involved, the perpetrator of the malicious activity, and the understanding the people behind the malicious activity— details of the incident. Each case in our insider incident the reasons why they attacked, their psychological repository contains a natural language description of the characteristics, their emotions, and their intent. technical and behavioral observables of the incident. We used x Social interactions and interpersonal relationships: these descriptions as the primary data source for our ontology. modeling the relationships between insiders and their employers, colleagues, friends, and family is a crucial part of identifying stressors that are often associated with malicious insider activity. x Organizations and organizational environments: the culture and policies of organizations factor heavily into the interpretation of malicious behavior within an organization. x Information technology security: information and information systems can be both the targets of and tools used to perpetrate malicious insider activity. IT security Fig. 1. CERT model for insider incidents also contains other concepts of interest in describing the 1) Data-Driven Ontology Bootstrapping insider threat domain, namely, confidentiality, integrity, To ensure full coverage of the information contained in our and availability. insider incident repository, we adopted an approach that utilizes concept maps as a first step in the development of an B. Domain Scoping ontology [11]. Manually developing concept maps for over 800 With a representative list of sub-domains for insider threat individual insider threat cases required an infeasible level of enumerated, our next challenge was determining the scope at effort, so we developed a semi-automated concept map which our ontology must provide support for each subdomain. extraction method adapted from several existing approaches We chose to develop the following competency questions for [12, 13]. This method used part-of-speech and part-of-sentence our ontology to assist us in our scoping efforts [9, 10]. tagging to extract [concept, concept, relationship] triples from x What concepts and relationships comprise the technical the natural language description of each insider incident. We and behavioral observables of potential indicators of utilized additional text and natural language processing malicious insider activity? techniques to eliminate stop-words, group similar triples, and x What potential indicators of malicious insider threat sort the triple collection by frequency of occurrence. We then activity are insider threat teams using for detection? used this collection of triples as the basis for our class hierarchy, using our competency questions to set scope and x To facilitate information sharing, at what level of detail optimize the arrangement of specific classes. should organizations describe their indicators of malicious insider activity without revealing organization- 2) Additional Data Sources sensitive information? We supplemented the candidate classes and object properties derived from our insider incident repository with concepts and relations from the cyber threat and digital forensics domains. We reviewed the Structured Threat Information Exchange (STIX) and Cyber Observable Expression (CybOX) languages [14, 15], as well the SANS Asset hasInformation Information Institute’s digital forensics artifact catalog [16], to fill gaps in our concepts for cyber threats, cyber observables, and their hasObject / hasOwnership associated forensic artifacts. hasInstrument IV. IMPLEMENTATION Actor A. Design Decisions hasActor We adapted components from several existing ontologies for our work. To assist in the modeling of actors and their Event hasAction Action actions, we adapted several top-level ontology components from material available on schema.org [17]. We leveraged existing ontologies for filling gaps in our coverage of cyber precedes assets, including concepts from the network services, IT Fig. 2. Top-level ontology classes and object properties systems, IT security, and mobile device domains [18-21]. To validate our design, we used the catalog of common ontology C. Example Uses development pitfalls from work   titled   “Validating   ontologies   To demonstrate use of the ontology to describe indicators with   oops!”   [22]. We provided support for modeling the of malicious insider activity, we present two examples of temporality of actions and events relative to one another translating natural language descriptions of indicators of through use of the sequence design pattern [23]. We have malicious insider activity from our insider threat incident chosen to implement our ontology using the Web Ontology repository into ontology individuals. The translation process is Language (OWL), due to its maturity, wide use, and relatively straightforward; the concepts from each description extensibility [24]. are manually identified, individuals are created for each concept as instances of the appropriate ontology class, and B. Overview of Top-Level Classes individual object properties are added to relate the class The top-level of our ontology, summarized in Figure 2, is instances to one another. Figure 3 and Figure 4, respectively, composed of five classes: Actor, Action, Asset, Event, and depict the ontology translation for the following insider threat Information. The Actor class contains subclasses for indicator descriptions: representing people, organizations, and organizational x The insider transferred proprietary engineering plans from components such as departments. The Action class contains the victim organization's computer systems to his new the subclasses that define the things that actors can perform. employer. The Asset class provides subclasses that define the objects of x The insider accessed a web server with an administrator actions. The Information class provides subclasses that account and deleted approximately 1,000 files. provide support for modeling the information contained within some assets (examples include personally identifiable information, trade secrets, and classified information). The Event class provides support for multiple types of events of interest. Events are generally associated with one or more Actions. The creation of an individual event typically requires making some inference, as opposed to an individual Action, which can be created through direct observation. For example, moving a file is modeled in our ontology as an Action. A data exfiltration event, when associated with a file move action via the hasAction object property, expresses the fact that the associated action was unauthorized. Additionally, an object property hierarchy is provided to express various types of relationship roles, job roles, and event roles. Fig. 3. Data exfiltration example from insider incident repository translated into ontology individuals Without some level of automation, this detection practice becomes infeasible to perform effectively and efficiently. Using our ontology, we have designed a semi-automated approach for the detection of potential indicators of malicious insider activity that fuses data from multiple types of sources. The ontology provides an analysis hub that combines information  from  an  organization’s enterprise network activity and human resources data to provide a data-rich environment for the development and detection of robust, effective indicators of malicious insider activity. 1) Operational Data to Ontology Individuals We use the term “operational data” to encapsulate the data and data sources that capture the user-based activity that occurs   on   an   organization’s   information systems and networks. The technical observables associated with some potential indicators of malicious insider activity are found in operational data and during the analysis of trends in operational data. Some examples of operational data include: x Host-based user activity logs Fig. 4. Information technology sabotage example from insider incident x Critical application audit logs translated into ontology individuals x Network activity logs V. APPLICATIONS x Communication server logs x System event logs A. Insider Threat Indicator Information Sharing Since operational data is usually found in structured or Our ontology provides two powerful concepts in the semi-structured log files, we attempted to prove the concept of description of potential indicators of malicious insider automatically translating the information contained in activity: abstraction and extensibility. By abstraction, we operational data sources into ontology individuals. Instead of mean that indicators can now be described at a level of detail direct translation into ontology individuals from operational that omits organization-sensitive information while still data sources, we chose to translate the operational data into maintaining enough descriptive information to express the CybOX cyber observable files, and automatically create idea that given observable actions or conditions are potential ontology individuals based on the contents of the CybOX indicators of malicious insider activity. By extensibility, we files. This approach allowed us to focus on identifying the mean that we have provided the conceptual components that fields from CybOX that were applicable to our ontology organizations can use to describe their existing indicators and classes, and provide a translation mechanism for only those develop new indicators. Potential indicators of malicious applicable fields. Without the CybOX translation layer, we insider   activity   often   include   qualifiers   such   as   “excessive,”   would have had to develop ontology translation mechanisms “anomalous,”  “unauthorized,”  and  “suspicious”  to  distinguish   for each type of operational data source we wish to support, conditions that are potentially indicative of malicious insider which would require an infeasible level of effort, support, and activity from “normal”  behavior  and  activity.  Definitions and maintenance. Additionally, CybOX provides an API for their interpretations of these types of conceptual qualifiers vary XML file format, which facilitates the automated translation greatly from organization to organization, and often vary of any input data source into the CybOX format. (CybOX within organizations based on variables such as job type, currently supports over 60 input data sources.) location, and time. To accommodate these variations, we In our proof of concept, we were successful in introduce  the  idea  of  “policy  packs”  in  our ontology: modular automatically translating Windows system event logs into the collections of ontology axioms that represent organization- CybOX format, and, using simple scripts, automatically agnostic concepts, definitions, and interpretations of indicator generating the OWL XML code to create individuals for a patterns. Our ontology specifically provides support for this small subset of our ontology classes. In a robust via the Event class hierarchy. Organizations using our implementation, the automated ontology individual creation ontology can develop their own defined classes, or modify would provide configurable settings that would allow existing ones, to specify the necessary and sufficient organizations to control the creation of ontology individuals restrictions for class membership. for classes whose specific definitions may vary from organization to organization. For example, if the ontology B. Automated Indicator Instance Extraction Framework contained a class representing after-hours logins, the Insider threats can be detected by observing instances of automated individual creation mechanism should provide a indicators of malicious insider activity within an organization. way to specify a time range that is considered after-hours. Operationally, this involves the collection and analysis of large amounts of data on every employee in an organization. 2) Human Resources Data to Ontology Individuals We use the term “human resources data” to encapsulate data and data sources that provide contextual and behavioral information about employees. These records are typically stored in an unstructured format, and are locked within Human Resources departments to protect the privacy rights of employees. Examples of human resources data include: x Organization charts x Employee performance reviews x Employee personnel files, including job title, supervisor, role, and responsibilities x Employee behavior records, such as formal reprimands and policy violations x Information from anonymous insider reporting channels x Results of background checks Human resources data provides a rich source of contextual, behavioral, and psychosocial information regarding employees. Human resources data is typically more fragmented and less structured than operational data, so the Fig. 5. Data flow diagram for automated indicator instance extraction automated translation of this data into ontology individuals framework may be a challenge for some organizations. Enterprise The evaluation of specific instances of indicators requires solutions for human resource information management exist, expert analysis and investigation to remove false positives, and where they are used, a structured representation of human assess severity of the satisfied indicator, and perform set and resources data could be used to develop an automated temporal analysis on the satisfied indicators. The framework ontology translation process. In our proof of concept for the can support a workflow-based analysis and incident escalation automated indicator instance extraction framework, we did not process. Specific implementations of the framework are attempt to automatically create ontology individuals from expected to grow and change as the organization, its insider human resources data, but in future work, we will apply a threat program, and the larger insider threat community and similar approach to we used for operational data. domain all do the same. The activities associated with the 3) Semantic Reasoner If operational data and human resources data are both operations and maintenance of this framework include described using the ontology, and if indicator policy packs are x Identifying new candidate indicators during the analysis in place, an organization can use a semantic reasoner to make of satisfied indicators inferences and automatically classify ontology individuals as x Adding new indicators to the ontology as updates or instances of specific defined classes. Ontology individuals that additions to indicator policy packs meet the formal definitions of potential indicators of malicious x Re-running the semantic reasoner as new ontology insider   activity   can   then   be   said   to   have   “satisfied”   some   individuals are created and new indicators are added indicator. A collection of ontology individuals that satisfy x Adding automated ingest support for new operational data threat indicators becomes a useful data set for insider threat sources analysts. With a robust set of indicators implemented as defined classes, analysts have the ability to see descriptions of x Extending the human resources data ingest process to potential indicators of malicious insider activity across include new data sources previously disparate data sets and at larger scale. Satisfied x Updating the configuration for the automated ontology indicators can be reviewed by analysts to identify false individual extractor as organizational policies change and positives, refine indicators, develop new indicators to add new insights are gained back into the ontology via policy packs, or create threat In addition to the activities mentioned above, the ontology reports that summarize the potential malicious insider activity itself will grow and change over time. The drivers for found in the data. ontology changes will be the addition of new concepts and 4) Putting it All Together relationships based on analysis of new cases involving The full framework—beginning with the development and malicious insider activity, as well as feedback from the maintenance of the ontology through the release of organizations that are using the ontology. Finally, indicator organizational threat reports based on the detected instances of policy packs can be safely shared with other organizations as a means of identifying effective industry specific and domain- potential indicators of malicious insider activity—is presented wide detection strategies and patterns. in Figure 5. This framework is meant to support detection of potential indicators of malicious insider activity that is then VI. CONCLUSION triaged. An effective implementation of the framework With the initial development of our ontology, we have depends on the indicators it contains, and not all satisfied created a bridge between natural language descriptions of indicators necessarily warrant an investigation. potential indicators of malicious insider activity in case data and the operational data that contains the technical and behavioral observables associated with malicious insider [2] U.S. GOVERNMENT, "Executive Order 13587-Structural Reforms To activity. We have provided a mechanism that allows sensitive Improve the Security of Classified Networks and the Responsible Sharing and Safeguarding of Classified Information," 2011. information to be abstracted away while maintaining enough [3] B. Obama, "National Insider Threat Policy and Minimum Standards for descriptive ability to effectively communicate actions and Executive Branch Insider Threat Programs," T. W. House, Ed., ed: behaviors of interest across organizations. By introducing the Office of the Press Secretary, 2012, p. 1. application of our ontology as an analysis hub that combines [4] F. o. A. Scientists, "National Insider Threat Policy and Minimum operational and human resources data, we have laid the Standards for Executive Branch Insider Threat Programs (Minimum foundation for more effective fusion of these traditionally Standards)," T. W. Hourse, Ed., ed. www.fas.org: Federation of disparate data sources. American Scientists, 2012. [5] M. West, Developing high quality data models: Elsevier, 2011. VII. FUTURE WORK [6] F. Intelligence and National Security Alliance (INSA) in partnership with DHS, and ODNI. (2014). Insider Threat Resource Directory. As we continue the development of our ontology, we will Available: http://www.insaonline.org/insiderthreat perform the following activities in future work: [7] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. x Provide enhanced support for behavioral components of Cherry, et al., "Gene Ontology: tool for the unification of biology," potential indicators of malicious insider activity Nature genetics, vol. 25, pp. 25-29, 2000. x Collaborate with other organizations to improve the [8] S. Schulze-Kremer, "Adding semantics to genome databases: towards an ontology for molecular biology," in Ismb, 1997, p. 5. expression of insider threat indicators using the ontology [9] M. Grüninger and M. S. Fox, "The role of competency questions in x Add support for additional indicator policy packs enterprise engineering," in Benchmarking—Theory and Practice, ed: x Mature the proof of concept automated indicator instance Springer, 1995, pp. 22-31. extractor and provide customization options for additional [10] A. Gangemi, "Ontology design patterns for semantic web content," in data sources and organization configurations The Semantic Web–ISWC 2005, ed: Springer, 2005, pp. 262-276. x Assess the feasibility of automating the creation of [11] R. R. Starr and J. M. P. de Oliveira, "Conceptual maps as the first step in an ontology construction method," in Enterprise Distributed Object ontology individuals based on human resources data Computing Conference Workshops (EDOCW), 2010 14th IEEE x Evaluate formal ontology validation methods and apply International, 2010, pp. 199-206. them to our ontology [12] K.  Žubrinic,  "Automatic  creation of a concept map." [13] J. J. Villalon and R. A. Calvo, "Concept Map Mining: A definition and a ACKNOWLEDGEMENT framework for its evaluation," in Web Intelligence and Intelligent Agent The authors gratefully acknowledge support for this work from Technology, 2008. WI-IAT'08. IEEE/WIC/ACM International Conference on, 2008, pp. 357-360. the Defense Advanced Research Projects Agency (DARPA) and the Federal Bureau of Investigation. The views, opinions, and/or findings [14] S. Barnum, "Standardizing cyber threat intelligence information with the contained in this article are those of the authors and should not be Structured   Threat   Information   eXpression   (STIX™),"   MITRE Corporation, July, 2012. interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. Approved for Public [15] The MITRE Corporation. (2014). Cyber Observable eXpression. Release, Distribution Unlimited. Available: http://cybox.mitre.org/language/version2.1/ [16] R. Lee, "SANS Digital Forensics and Incident Response Poster Copyright 2014 Carnegie Mellon University Released," in Blog: SANS Digital Forensics and Incident Response Blog This material is based upon work funded and supported by Federal Bureau of vol. 2014, S. D. Faculty, Ed., ed. SANS: SANS, 2012. Investigation under Contract No. FA8721-05-C-0003 with Carnegie Mellon [17] schema.org. Available: http://schema.org University for the operation of the Software Engineering Institute, a federally [18] J.-b. Gao, B.-w. Zhang, X.-h. Chen, and Z. Luo, "Ontology-based model funded research and development center sponsored by the United States of network and computer attacks for security assessment," Journal of Department of Defense. Shanghai Jiaotong University (Science), vol. 18, pp. 554-562, References herein to any specific commercial product, process, or service by 2013/10/01 2013. trade name, trade mark, manufacturer, or otherwise, does not necessarily [19] S. Fenz and A. Ekelhart, "Formalizing information security knowledge," constitute or imply its endorsement, recommendation, or favoring by Carnegie in Proceedings of the 4th international Symposium on information, Mellon University or its Software Engineering Institute. Computer, and Communications Security, 2009, pp. 183-194. NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND [20] L. Obrst, P. Chase, and R. Markeloff, "Developing an ontology of the SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED cyber security domain," Proceedings of Semantic Technologies for ON  AN  “AS-IS”  BASIS.  CARNEGIE  MELLON  UNIVERSITY  MAKES  NO   Intelligence, Defense, and Security (STIDS), pp. 49-56, 2012. WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY [21] S. E. Parkin, A. van Moorsel, and R. Coles, "An information security OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, ontology incorporating human-behavioural implications," in OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE Proceedings of the 2nd International Conference on Security of Information and Networks, 2009, pp. 46-55. MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, [22] M. Poveda-Villalón, M. C. Suárez-Figueroa, and A. Gómez-Pérez, OR COPYRIGHT INFRINGEMENT. "Validating ontologies with oops!," in Knowledge Engineering and Knowledge Management, ed: Springer, 2012, pp. 267-281. This material has been approved for public release and unlimited distribution. Carnegie Mellon® and CERT® are registered marks of Carnegie Mellon [23] Aldo Gangemi. (2010). Submissions:Sequence. Available: http://ontologydesignpatterns.org/wiki/Submissions:Sequence University. DM-0001586 [24] G. Antoniou and F. Van Harmelen, "Web ontology language: Owl," in REFERENCES Handbook on ontologies, ed: Springer, 2004, pp. 67-92 [1] D. M. Cappelli, A. P. Moore, and R. F. Trzeciak, The CERT Guide to Insider Threats: How to Prevent, Detect, and Respond to Information Technology Crimes (Theft, Sabotage, Fraud): Pearson Education, 2012.