=Paper=
{{Paper
|id=Vol-2484/paper5
|storemode=property
|title=Technology Assisted Analysis of Timeline and Connections in Digital Forensic Investigations
|pdfUrl=https://ceur-ws.org/Vol-2484/paper5.pdf
|volume=Vol-2484
|authors=Hans Henseler,Jessica Hyde
|dblpUrl=https://dblp.org/rec/conf/icail/HenselerH19
}}
==Technology Assisted Analysis of Timeline and Connections in Digital Forensic Investigations==
Technology assisted analysis of timeline and connections in digital forensic investigations Hans Henseler∗ Jessica Hyde Magnet Forensics, Waterloo, Canada and University of Magnet Forensics, Waterloo, Canada and George Mason Applied Sciences Leiden, The Netherlands University, Fairfax VA, USA hans.henseler@magnetforensics.com jessica.hyde@magnetforensics.com ABSTRACT This article presents work in progress on research that focuses This article describes ongoing research on the application of AI tech- on the use of Artificial Intelligence (AI) as an emerging technology niques such as Graph Neural Networks to assist investigators with that can assist forensic examiners in the discovery of patterns and the discovery of relations and patterns in digital forensic evidence. relations in digital evidence. It builds further on the ideas presented Digital forensic analysis of smartphones and computers reveals in earlier work on computer assisted extraction of identities in forensic artifacts that are extracted from structured databases main- digital forensics [15], on Semantic Search for E-Discovery [20] and tained by the operating system and applications. Such forensic [12], on finding digital evidence in mobile devices [14] and on the artifacts are part of a forensic ontology which can be used to build a link and timeline analysis that is present in modern digital forensic relational graph of identifiers (e.g. users, documents) and a timeline tools [7]. of events. This information can assist with answering key investi- Our vision differs from existing applications of AI in E-Discovery gation questions such as who, when, where etc. We propose to use that typically rely on machine learning for classifying digital con- a graph database and query language to assist in this analysis. Fur- tent such as predictive coding and active learning [11] that filter ther, using key identifiers and aliases we want to augment digital and cluster emails, chats and documents or classification of pictures forensic artifacts with entities, relations and events by extraction with weapons, drugs and nudity. In stead we attempt to apply AI in from the full-text of unstructured electronic contents such as emails the discovery of relevant relations in temporal connection graphs and documents. that are derived from extracted digital forensic artifacts. Smartphones and Internet of Things (IoT) devices contain many CCS CONCEPTS other digital traces that are a treasure trove in a forensic investi- gation. Such traces can prove to be more personal than written • Computing methodologies → Semantic networks; Neural communication because they do not only reveal our conscious but networks; • Applied computing → Law; Investigation tech- also our unconscious behavior. Also smartphones have become niques; • Information systems → Users and interactive retrieval; very personal because of their link with social media and biometric • Human-centered computing → Visualization toolkits. protection (e.g. fingerprint, iris). However, this type of information is machine generated and grows at an even faster pace than our KEYWORDS personal communication. Forensic investigations are in need of Digital Forensics, AI, Link analysis, Timeline, Technology Assisted more effective search strategies that can leverage the richness of Discovery, Graph databases, Text Mining, Graph Neural Networks detailed forensic in modern digital evidence (e.g. from smartphones, cloud, IoT devices etc.). We propose that investigators are assisted 1 INTRODUCTION with discovery through using semantic nets that are obtained from Digital evidence continues to grow exponentially in investigations digital evidence. We refer to this as technology assisted discovery and prosecution of suspects in both criminal as well as civil cases. as opposed to technology assisted review that is very common in Not only in advanced cybercrime investigations, as in, ransomware E-Discovery investigations. investigations or as part of incident response, but also through This paper is structured as follows. Section 2 describes digital the use of digital forensics in homicide cases or internal (corpo- forensic investigations and the key questions that are relevant when rate) investigations where the suspect’s smartphone and/or laptop investigating a case. It also describes related work on a digital foren- needs to be examined. Smartphones and other portable "wearable" sics ontology that can assist when taking a semantic AI approach electronics leave digital traces that can be linked to persons and and explains some use cases why this is helpful when investigating locations. The exponential growth of digital traces, as well as the digital evidence. In section 3 we explain modern digital forensic expansion of cybercrime, and digitization of investigative methods investigations and illustrate how link analysis and time line visuali- represent significant changes to society and lead to a broadening sation are currently assisting forensic examiners in digital forensic horizon of digital investigation [9]. investigations. Section 4 presents our vision on how AI techniques such as graph databases and entity extraction can help discover- ∗ Corresponding author. ing patterns and relations in these semantic networks of digital In: Proceedings of the First International Workshop on AI and Intelligent Assistance artifacts. Finally, in section 5 we present conclusions and identify for Legal Professionals in the Digital Workplace (LegalAIIA 2019), held in conjunction with ICAIL 2019. June 17, 2019. Montréal, QC, Canada. future research opportunities. Copyright ©2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Published at http://ceur-ws.org. Hans Henseler and Jessica Hyde 2 DIGITAL FORENSIC INVESTIGATION of digital evidence from various domains such as incident response, Digital forensic investigation typically has three phases: data collec- counter terrorism, criminal investigations, forensic investigations tion, data examination and data analysis. Data collection involves and gathering of intelligence. CASE enables better coordination of the correct preservation and copying of digital data sources. Data investigations in different jurisdictions so that criminal individuals examination relates to the investigation of copies of digital data and organisations are discovered faster while generating a more sources to find files, extract fragments etc. without interpreting complete overall view on their criminal activities. the resultant findings in the context of the case. Data analysis in- Once a semantic network has been formed based on a digital volves the analysis, reconstruction, interpretation and qualification forensics ontology, it can assist with identifying possible crime of the evidence which is obtained from the digital data sources. The scenarios and with testing hypothesis which is becoming increas- research proposed here focuses on the analysis of digital evidence. ingly important in investigations. Sometimes it’s more important to know with who a victim, suspect or witness communicated, and 2.1 Investigation Questions where these persons were than actually knowing what has been communicated. AI can assist investigators with detecting corre- In any investigation the investigators, regardless if they are senior lations that can lead to the discovery of relationships that were legal counsel in legal E-Discovery or senior investigating officers or not known. This is called link analysis. Analysing a social network detectives in a criminal investigation, try to answer the following from a collection of emails is not new but link analysis based on ’golden’ investigation questions: digital forensic artifacts relies on a much richer set of data. 1 Who–was involved? 2 What–happened? 2.3 Use cases 3 Where–did it happen? 4 How–was the crime committed? Modern digital forensic tools have a feature that performs link anal- 5 When–did the crime take place? ysis to assist forensic examiners with their investigations. Axiom 6 With what–was the crime committed? is a commercial digital forensics processing tool created by Mag- 7 Why–was the crime committed? net Forensics that build a connections database from relationships between discovered artifacts (e.g. users, files etc.). Triples (subject, The analysis of digital evidence in E-Discovery investigations predicate, object) are extracted following the forensic ontology typically focuses on document review and analysis where reviewers similar to the CASE ontology introduced in the previous section. and senior investigators analyse textual content. They are assisted These triples define a forensics ontology that is used by Axiom to by machine learning (also known as predictive coding and continu- automatically generate relationship graphs. ous active learning [11]) to identify relevant emails and documents to speed up their investigation. Digital forensic investigations on smartphones and computers are a bit different. Here investigators Subject Predicate Object go beyond email and document analysis and study digital artifacts file accessed on system that can be quite pertinent when trying to answer these questions file accessed on USB [14]. file accessed by user id Who-questions can often be answered by investigating which file transferred with program name person is using an e-mail address, user account or phone number. file transferred by user id Communication via text messages, chat and email may help to un- file related cloud derstand what has happened. Call details records, GPS-locations file emailed to email address and WiFi-network tell something about the location of a smart- file downloaded with program name phone and consequently of it’s user. Pictures and video can provide file downloaded by user id visual clues how a crime was committed and with what kind of contact name contacted with device weapon. Date and time of a file or trace, tell when data was last contact name contacted by person accessed, modified or created. Computers and smartphone main- picture hit similar to picture hit tain detailed records when apps and users were active and which file/msg contains key words files were involved. Besides messages that a user communicated via file/msg references file name emails and chat messages, search history from a browser or specific call log call to contact name apps can help understand motive and premeditation. user id used program name user id searched for key words 2.2 Digital Forensics Ontology Table 1: Subset of triples of the forensic ontology that is used Document analysis in E-Discovery heavily relies on the review and in Axiom analysis of unstructured information that is contained in emails and documents. The analysis of digital forensic artifacts described above is more structured. In order to understand this structure and to be able to analyse it, it is useful to have a digital forensics ontology. Link analysis has interesting use cases for forensic examiners: The Cyber-investigation Analysis Standard Expression (CASE) [10] (1) Given a hit the examiner needs to see a visual representation provides such an ontology. CASE is an open standard that is cur- of all related evidence. Where the ’related’ links are one of rently under development. It can be used to describe different types the concepts identified in the forensics ontology. Technology assisted analysis of timeline and connections in DF investigations Figure 1: Link analysis example in Axiom on the M57-Jean scenario showing the ’m57bis.xls’ file in the center and highlighting one of the records in the Windows MRU (most recently used) list. (2) Given a link to related evidence the examiner should be able 3.1 Axiom Link analysis to follow the link and may want to pivot the data around the Link analysis in itself is not a new concept in digital forensics as is destination or choose a different visualisation. For example, reflected by work published in 2015 [8] and was introduced earlier the examiner identified a search query in browser history in 2005 in the field of network forensics [21]. However, it tends to and then wants to review all events on the system before focus on traditional ’call chain analysis’-focusing on phone calls, and event this query was executed. text messages, and/or social media connections or IP addresses between people or computers rather than the artifacts they create Table 1 above lists a simplified digital forensics ontology illustrat- [7]. ing a number of triples that make up the key forensics ontology that Artifact relationship analysis goes beyond visualizing relation- is used by AXIOM to build a relation graph from digital evidence. ships between people and computers. It applies the link analysis concept to files and operating system artifacts, helping a foren- sic examiner to visualize relationships within artifacts and across 3 EXPERIMENT evidence sources, such as, computers, mobile devices, and even cloud-based accounts. To validate the idea and explore the potential of AI for assisting Figure 1 above presents an example of link analysis in Axiom. with the discovery of patterns and relations in digital forensics data, This example was discussed in a Magnet Forensics webinar [16]. The we have processed the M57-Jean scenario [6] in Axiom (version tree like structure on the left side shows the file name of a spread- 3.0). sheet "m57biz.xls". It shows various relations to other elements, e.g., The M57-Jean scenario is a single disk image scenario involving "Transferred by" and identifier "Jean User", "Hash the ex-filtration of corporate documents from the laptop of a senior hash" with a md5 as well as a sha1 hash value, "Application name" executive. The scenario involves a small start-up company, M57.Biz. relation with an application named "Outlook" etc. The right side of A few weeks into inception a confidential spreadsheet that contains the picture displays matching results. This overview lists records the names and salaries of the company’s key employees was found from the Windows MRU (Most Recently Used) list, file system last posted to the "comments" section of one of the firm’s competitors. accessed date, Outlook email record etc. Axiom allows the user to The spreadsheet ’m57bis.xls’ only existed on one of M57’s officers- navigate the graph manuall by selecting an end node and making Jean. Jean says that she has no idea how the data left her laptop and it the center node by double clicking. that she must have been hacked. The investigator has been given a disk image of Jean’s laptop and is asked to figure out how the data was stolen, or, if Jean isn’t as innocent as she claims. Hans Henseler and Jessica Hyde Figure 2: Link analysis illustrating a relative time selection of the MRU artifact highlighted in figure 1 3.2 Axiom Timeline analysis in the relation graph in a chronological order which provides more In [13] an overview is presented of the evolution of timeline analysis meaning and context then when simply filtering a timeline for se- in digital forensics. Initially, timeline analysis was focused on file- lected entities or filtering the relation graph for a particular time based dates and times. Around 2010 the first tools became available frame. that started using times from inside files. Modern digital forensic tools (both open source as well as commercial) have advanced 4 PROPOSED RESEARCH timeline capabilities that visualise digital forensic artifacts. Figure 2 presents a screenshot of the new timeline visualisation Our research focuses on the combination of timeline and link analy- and analysis feature in Axiom 3.0. The top section shows a timeline sis. In order to accomplish this we propose to use a graph database reflecting artifact counts for a period of 6 minutes starting from with a graph query language. The graph can initially be constructed July 20, 2008 1:24:40am and ending 1:30:40am. The table below the from forensic artifacts. With modern graph databases and graph timeline presents a detailed view of the artifacts presented in the query languages it becomes easy to augment this graph with addi- graph. At the top is a "File download" record from email, followed tional data. This could include data from non-digital sources but by "File/folder opening", then the "File knowledge" reflecting the also by text mining the full-text of electronic documents and emails, creation of a new file "m57biz.xls". Such a sequence of artifacts may new relations might be uncovered that previously would have re- help understand how a file came into existence on a computer and quired human inspection of the contents of such documents. if it was opened on that computer. Generation of timelines has also received much attention outside 4.1 Graph database and language the field of digital forensics. Many applications exist that allow for Visualisation of traces in a network, on a map or on a timeline can creation of time lines in an investigation. For example, building case assist a forensic investigator to understand the story that is behind chronologies with CaseFleet [2], create a timeline for your court the data. By ingesting the information that is extracted by Axiom case with TrialLine [5] and assembling case facts in a chronological in the M57-Jean case in a graph database, it becomes possible to order with CaseMap [3]. However, our first impression is that each experiment further with visualisations and discovering relations. one of these tools relies on manual development of case timelines Cypher is a graph query language that allows for expressive without the help of artificial intelligence. and efficient querying of graph data [1]. It lets developers write Both timeline analysis as well as link analysis are (separately graph queries by describing patterns in the data. If we have a graph from each other) considered powerful instruments in an investi- describing our digital forensic artifacts, Cypher is designed to be a gation. However, we propose that in combination these features human readable query language and is suitable for both developers become even more powerful enabling an examiner to analyse links as well as forensic examiners. Technology assisted analysis of timeline and connections in DF investigations Cypher describes nodes, relationships and properties as ASCII art extraction can be targeted reducing the number of false positive directly in the language, making queries easy to read and recognize identities. as part of your graph data. Figure 3 below presents an example of a Some interesting work in the field of entity-centric timeline simple Cypher query. extraction has been reported in [17]. A prototype tool is being Cypher is supported by a variety of graph databases. We intend developed that can extract structured information on events for a to use Neo4j [4] for our experiments which will start with model- given entity of interest and place anchors on a time line for these ing a relational graph based on a selection of the digital forensics events. It uses massive streams of textual documents as input (e.g. ontology that is used by Axiom. Then we’ll investigate how easy it online news, social media posts or any crawled web documents). is for examiners to formulate Cypher queries and which standard With digital forensics it is already possible to extract identities queries can be formulated to identify interesting relationships that from structured information through digital forensic analysis [15]. can be prioritized for review. When an examiner identifies an interesting identity probably this identity will have associated email aliases, accounts, phone numbers etc. Once this information is known it can be added to the relation graph and will help in extracting a timeline of events that are related to these identities. 4.4 Using AI to understand graphs and timelines Analysing a graph using a visualisation tool seems simple enough. As graphs get bigger, traditional mathematics can help with the analysis of the graph but these methods also have their limitations. One of the problems is that there is no clear beginning or ending of a graph (assuming it’s cyclic) and that large scale matrix operations that are typically required for graph analysis do not compute due Figure 3: Example of a Cypher query to memory and time restrictions. Ontologies, graph database and graph query language are well established and are hardly considered AI techniques. Extracting relations and timelines from full text are well established AI tech- 4.2 Integration with other information sources niques that we hope to leverage in our research but we have no Once the digital artifacts from a case have been imported in the intention of improving this. The core idea in our innovation is to graph database, it becomes quite simple to add relations and objects use Graph Neural Networks (GNNs) as a new AI technique that in the same case that were discovered through other sources. These can assist with the analysis of large time-based graphs of relations. can either be other sources of digital evidence, e.g., other cases, GNNs were first introduced in 2009 [19] and have recently gained call detail records, or from non-digital sources such as witness and increasing popularity in various domains, including including social victim statements, lawful interception, observation, open source science (social networks), natural science and knowledge graphs intelligence or case time lines that were manually created assisted [22]. Similar to the successful application of Convolutional Neural by software such as mentioned in paragraph 3. Networks (CNNs) in image classification and Recurrent Neural By leveraging the scalability of modern graph databases a great Networks (RNNs) in natural language processing, variations of variety of additional information can be included in the automated GNNs have have demonstrated ground-breaking performance on analysis [18]. Further research is required to investigate what other many tasks. information (that is typically available in a criminal investigation) Our research hypothesis is that we can use GNNs to model in- can be combined with the digital forensics graph in a useful way. teresting relation graphs which can assist investigators with the We expect that to some extent even scenarios and hypothesis can be identification of relevant subgraphs from a highly complex case formulated as (a set of) graph queries which can be tested against graph that is automatically constructed from digital forensic arti- the graph containing all known information on the case. facts combined with other case data. 4.3 Extracting relations and timelines from 5 CONCLUSIONS full text We propose to use a graph database and query language to assist More than 90% of the information around us is mostly unstruc- in digital forensic investigations. We start with a relation graph tured, e.g., documents, emails and chat messages. Text mining can that is based on connections from digital forensic artifacts. Further help investigators by turning this unstructured information into research and experiments are needed to study how forensic examin- structured data. Entity extraction can extract entities (e.g. names ers can interact with this graph and how to extend the graph with of people, organisations, places etc) and events from full text. Un- other data sources. In particular we intend to study how events fortunately the extraction of entities is error prone and generates on a timeline can be added to the graph, how information from many false positives making the results useless. By using identifiers non-digital evidence can be added and how we can improve the per- that have been discovered from digital forensic analysis the entity formance of existing entity extraction techniques on unstructured Hans Henseler and Jessica Hyde data from emails and documents. Finally, we want to research if new machine learning techniques such as GNNs can be used to learn from investigators what link and event patterns are interesting from an investigator perspective. REFERENCES [1] [n. d.]. About Cypher. Adapted from https://www.opencypher.org/about, Ac- cessed: 2019-04-22. [2] [n. d.]. CaseFleet: Building Powerful Case Chronologies with CaseFleet. Company website, https://www.casefleet.com/timelines-case-timeline-software, Accessed: 2019-05-22. [3] [n. d.]. CaseMap: Chronology best Practices. Product pagee, https://www. casesoft.com/download/chrons.pdf, Accessed: 2019-05-22. [4] [n. d.]. Neo4j: The Internet-Scale Graph platform. Neo4J website, https://neo4j. com/product/, Accessed: 2019-04-23. [5] [n. d.]. TrialLine: Legal Timelines for Your Court Case. Company blog, https: //blog.trialline.net/legal-timelines-for-your-case-in-court, Accessed: 2019-05-22. [6] 2012. M57-Jean Scenario. In Digital Corpora. Scenario published at the Digital Cor- pora website, https://digitalcorpora.org/corpora/scenarios/m57-jean, Accessed: 2019-04-15. [7] 2018. Telling the Story of Digital Evidence. (2018). Magnet Foren- sics blog, https://www.magnetforensics.com/blog/telling-the-story-of-digital- evidence/, Accessed: 2019-04-22. [8] Fergal Brennan, Martins Udris, and Pavel Gladyshev. 2015. An Automated Link Analysis Solution Applied to Digital Forensic Investigations. https://doi.org/10. 1007/978-3-319-14289-0_13 [9] E. Casey. 2017. The broadening horizons of digital investigation. Editorial of Digital Investigation 21 (2017), 1–2. [10] E. Casey, S. Barnum, R. Griffith, J. Snyder, H. Van Beek, and A. Nelson. 2017. Advancing coordinated cyber-investigations and tool interoperability using a community developed specification language. Digital Investigation 22 (2017), 14–15. [11] Gordon V. Cormack and Maura R. Grossman. 2015. Multi-Faceted Recall of Continuous Active Learning for Technology-Assisted Review. In SIGIR 2015 Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. [12] David Graus. 2017. Entities of Interest. Ph.D. Dissertation. Informatics Institute, University of Amsterdam. [13] C. Hargreaves and J. Patterson. 2012. An automated timeline reconstruction approach for digital forensic investigations. Digital Investigation 9 (2012), S69– S879. [14] H. Henseler and V. Noort. 2017. Finding Digital Evidence in Mobile Devices. (2017). Presentation at DFRWS US 2017 conference, https://www.dfrws.org/conferences/ dfrws-usa-2017/sessions/finding-digital-evidence-mobile-devices, Accessed: 2019-04-03. [15] Jop Hofste, Hans Henseler, and Maurice van Keulen. 2013. Computer assisted extraction, merging and correlation of identities with Tracks Inspector. In Pro- ceedings of the International Conference on Artificial Intelligence and Law (ICAIL 2013). 247–248. https://doi.org/10.1145/2514601.2514639 Demo-paper. [16] J. Hyde. 2018. Connecting the Dots Between Artifacts and User Activity. (2018). Recorded webinar, https://www.magnetforensics.com/resources/connecting- artifacts-property-theft-webinar/, Accessed: 2019-04-23. [17] Jakub Piskorski, Vanni Zavarella, and Martin Atkinson. 2018. On the Development of an Entity-Centric Timeline Extraction Tool. 821–824. https://doi.org/10.1109/ ASONAM.2018.8508798 [18] G. Sadowski and P. Rathle. 2016. Why Modern Fraud Detection Needs Graph Database Technology. (2016). Neo4j blog, https://neo4j.com/blog/fraud-detection- graph-database-technology/, Accessed: 2019-04-22. [19] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. The Graph Neural Network Model. Trans. Neur. Netw. 20, 1 (Jan. 2009), 61–80. https://doi.org/10.1109/TNN.2008.2005605 [20] David van Dijk, David Graus, Zhaochun Ren, Hans Henseler, and Maarten de Rijke. 2015. Who is involved? Semantic search for e-discovery. In ICAIL 2015 Workshop on Using Machine Learning and Other Advanced Techniques to Address Legal Problems in E-Discovery and Information Governance (DESI VI Workshop). [21] W. Wang and T. Daniels. 2005. Network Forensics Analysis with Evidence Graphs. (2005). Published in the proceedings of the DFRWS US 2005 con- ference, https://www.dfrws.org/sites/default/files/session-files/paper-network_ forensics_analysis_with_evidence_graphs.pdf, Accessed: 2019-04-22. [22] Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. 2018. (12 2018). Research gate, https://www.researchgate.net/ publication/329841448_Graph_Neural_Networks_A_Review_of_Methods_ and_Applications, accessed: 2019-05-31.