An Approach to Human-Machine Teaming in Legal Investigations Using Anchored Narrative Visualisation and Machine Learning ∗ Simon Attfield† Bob Fields David Windridge Department of Computer Science Department of Computer Science Department of Computer Science Middlesex University Middlesex University Middlesex University London, UK London, UK London, UK s.attfield@mdx.ac.uk b.fields@mdx.ac.uk d.windridge@mdx.ac.uk Kai Xu Department of Computer Science Middlesex University London, UK k.xu@mdx.ac.uk ABSTRACT evidence. Given this complexity, investigators tend to create external representations or ‘models’ of the investigated domain as During legal investigations, analysts typically create external a means of cognitive offloading and creating structures for representations of an investigated domain as resource for cognitive supporting reflection, insight and collaboration. Interactive offloading, reflection and collaboration. For investigations Visualisation and Machine Learning have created interest as tools involving very large numbers of documents as evidence, creating for supporting the identification of relevant documents as a prelude such representations can be slow and costly, but essential. We to such investigations. However, less attention perhaps has been believe that software tools, including interactive visualisation and paid to the potential for combining these technologies within the machine learning, can be transformative in this arena, but that investigation process itself. We argue that such an approach might design must be predicated on an understanding of how such tools support more rapid convergence on investigatory narratives that might support and enhance investigator cognition and team-based matter by: collaboration. In this paper, we propose an approach to this problem a) allowing users to visually externalise their evolving mental by: (a) allowing users to visually externalise their evolving mental models of an investigation domain in the form of models of an investigation domain in the form of thematically thematically organized Anchored Narratives; organized Anchored Narratives; and (b) using such narratives as a b) using such narratives as a (more of less) tacit interface to (more or less) tacit interface to cooperative, mixed initiative cooperative, mixed initiative machine learning. machine learning. We elaborate our approach through a discussion of representational forms significant to legal investigations and We argue that the effect of this can be cooperative human- discuss the idea of linking such representations to machine machine teaming through an evolving symbiotic relationship learning. between three distinct but interconnected elements: user cognition, external representation and machine learning. We develop our case KEYWORDS by reviewing the role of external representations in investigatory eDiscovery, TAR, anchored narratives, machine learning, sensemaking focussing on cognition and collaboration. We then sensemaking, distributed cognition. consider harnessing machine learning as a tacit means of anticipating investigatory goals and enhancing access to relevant data. 1 Introduction Legal investigations, particularly in regulatory and litigation contexts, tend to be characterised by the simultaneous challenge 1 Background - External Representations for and opportunity of very large numbers of documents as a source of Investigatory Sensemaking The creation, augmentation and use of representations, whether In: Proceedings of the First International Workshop on AI and Intelligent Assistance internal (in the head) or external (in the world), are a central part of for Legal Professionals in the Digital Workplace (LegalAIIA 2019). June 17, 2019. sensemaking. This idea is reflected in most significant theories and Montreal, QC, Canada. models of sensemaking. For example, Klein et al. [1] discuss the Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Published at role of mental ‘frames’ in sensemaking, and Pirolli and Card [2] http://ceur-ws.org. emphasise the way intelligence analysts externally structure LegalAIIA, June 17, 2019, Montréal (Québec), Canada. Attfield, Fields, Windridge and Xu information into representations as part of a wider sensemaking Research shows that narrative representations play a process (referring to this step as ‘schematization’). particularly important role in the way that people reason about External representations, when created, can be intimately evidence. For example, Pennington and Hastie [8] conducted a involved in the cognitive processes of sensemaking. The approach series of studies into the way that jurors mentally comprehend of Distributed Cognition is predicated on the idea that cognitive evidence in legal cases. They found that, irrespective of how activities make use of external as well as internal representations, evidence was presented, jurors structured it in terms of narratives with external representations seen not only as sources of that made sense to them. Not only that, they added information to information, but as structures that transform the cognitive task itself make the stories make more sense. This finding is typical of studies [3]. Having an effective representation can lead to different and into evidential reasoning and provided a basis for what Pennington better strategies for carrying out a task, better performance, and and Hastie called their Story Model. According to the Story Model lower mental effort. The form and properties of external people find it easiest to make sense of legal evidence through representations can lead to changes in cognitive processes as these narratives that they construct in order to explain the evidence. become integrated into and participate within these processes. Importantly, the resulting narrative is constructed not just from the Distributed Cognition aims to dissolve the traditional division of evidence, but by reasoning from evidence to explanation. inside/outside the individual when analysing cognition in order to explore the complex relationships between people, artefacts and Argumentation technology when accounting for how thinking gets done. Investigatory sensemaking involves drawing conclusions from In an attempt to render the concepts of distributed cognition evidence using generalised beliefs about the way the world works more useful and applicable to the design of human-computer [9]. For example, an investigator may infer from reading an email interaction, Wright et al. [4] identified a collection of ‘abstract in which person a thanks person b for a gift, that a gift was information resources’ that can form a part of the process of exchanged, with this inference depending on both the text in the carrying out activities. Such abstract structures can be represented email and the more general belief that people don’t usually express in a variety of forms, embodied in physical media (possibly as a gratitude in this way when in reality no gift has been exchanged. result of the design of interactive technologies) or located in the This is an example of an abductive inference (reasoning to the best minds of members of a distributed cognitive system. More recently, possible explanation) which is characteristic of investigatory Attfield et al. [5] applied this idea to sensemaking, identifying a sensemaking. Many thousands of such inferences may be made taxonomy of abstract information resources that can be represented during an investigation, and given their generally defeasible nature, internally or externally during sensemaking and which are it can be important that they are amenable to review. For example, transformed during the process of sensemaking. These resources Attfield and Blandford [7] reported on the way that lawyers include representations of the domain (specific or general), intents maintained links from chronology entries to supporting (high-level values to low-level and goals), and representations of documentary evidence and traversed them frequently. action (possible, planned or performed). Actors involved in the Based on a study of how Dutch judges reasoned about cases, sensemaking activity may make use of any or all of these, and the Wagenaar [9] observed their prominent use of narrative nature of their representation determine how they may do so. connections and argumentation links and developed from this the notion of Anchored Narratives. An anchored narrative is a hybrid Narrative representational form combining narrative with argumentational External representations can take many forms depending on the links to supporting evidence. Bex [10] has used this approach to entities and relationships being represented. Faisal, Attfield and develop a formal theory that combines stories with evidential Blandford [6] proposed six basic types: spatial, sequential arguments in a hybrid framework for structured argumentation. (including narrative), networks, hierarchical, argumentation Figure 1 shows an example of an Anchored Narrative in which structures and faceted. Here we discuss two types which are events are represented as a connected narrative (from top to bottom important for constructing domain representations during in figure 1) attached to supporting evidence (where available). investigatory sensemaking: narrative and argument. Later we Significantly, events are anchored, not only in evidence, but extend this with a discussion of thematic organisation. within the context of the unfolding story. The plausibility of each For example, Attfield and Blandford [7] reported a study of the event is then judged not solely in virtue of its supporting evidence, cognitive work of lawyers involved in some large corporate but also by the support of plausibility afforded by its position in the investigations. As part of their work, the lawyers represented their surrounding narrative and how this relates to generalised beliefs analyses in the form of sequences of connected events or about how the world words. Figure 1 also shows the representation chronologies, created around different themes of an investigation. of multiple competing narratives with a point of divergence based These narrative representations, which were ultimately very large, on evidence from interview 1 and interview 2. Explicitly played a central role in the way that the lawyers thought about and representing such competing conclusions can be a helpful in a collaborated around the investigations and they were central in the context of defeasible reasoning where multiple interpretations or generation of insights. The lawyers reported that this was a natural claims may be explicitly considered. way for them to think about an investigation. An Approach to Human-Machine Teaming in Legal Investigations LegalAIIA, June 17, 2019, Montréal (Québec), Canada. Using Anchored Narrative Visualisation and Machine Learning. Figure 2 - Model of the Visual Analytics Process from Kohlhammer et al. (2011) Figure 1 - An example of an Anchored Narrative 3. Interactive Visualisation Data visualisation has a capability of supporting insight from abstract data by leveraging the power of the human perceptual system to convert cognitive problems into perceptual problems [11]. It can, reveal insights that are otherwise difficult to discover [12]. Interest has developed in extending data visualisation beyond the display of large datasets to support other aspects of sensemaking (including what Pirolli and Card [2] referred to as schematization) and also to enhance human sensemaking by coupling representations to computational components such as machine learning; this is an approach emphasised by Visual Figure 3 - The SenseMap allows interactive construction of Analytics. Figure 2 shows Kohlhammer et al’s [13] model of the episodes or narratives from discoveries. Each discovery (or Visual Analytics process. The main difference between this model event) is represented as a box, which can be grouped or and a data visualisation pipeline is the addition of the ‘model’ connected to form a episode/narrative. component (representing the product of automated data analysis such as machine learning) and its interactions with other In addition to organizing discoveries into evolving narratives, components. we see value in organising narratives into identifiable episodes and Visual Analytics tools can facilitate the process of constructing themes. Investigations can be complex. Investigation teams have narratives from data and capturing the data and analysis that lead to been shown to divide analyses along the lines of episodes and them. Figure 3 shows a tool we have developed called SenseMap themes as these become apparent. This has the value of reducing [14]. SenseMap provides the user with a freeform interactive space cognitive complexity and supporting the division of labour [7]. (right) which can be used for constructing anchored narratives from Different episodes and themes will also have different theories of data. The user interacts with data and represents interesting relevance, and we anticipate that such structuring can be exploited discoveries as a boxes in the main panel (right) by a simple click. by machine learning for the (further) identification of relevant Discoveries can be moved freely to form thematic groups or information in large evidential collections. Hence, we propose evolving narratives. SenseMap also captures the provenance of the structuring events at the interface into discrete episodes and by discovery such that clicking on a discovery will restore the original hierarchical theme. Figure 4 shows a conceptual model of this idea data source i.e. discoveries are anchored in source data. in which connected events form episodes, which in turn become components in anchored narratives. Similarly, discoveries can be grouped as hierarchically organised themes. LegalAIIA, June 17, 2019, Montréal (Québec), Canada. Attfield, Fields, Windridge and Xu The hierarchical aspect of the problem significantly multiplies the complexity of the machine learning methodology required to approach it. In particular, sequence-based recommender systems typically rely on query proximity within some appropriate metric (or quasi-metric) space. However, we here require that the proximal region to the user's query (anchor) within 'narrative space' takes into account arbitrary levels of aggregation (or narrative coarse- graining) in a way that both encompasses (potentially evolving) user preference and does not burden the user with excessive feedback requirements. To this end, we propose to use active learning within the context of the querying of the sequential aggregation so as to achieve the optimal reduction in the bandwidth of user feedback required to obtain a convergent recommender platform for narrative Figure 4 - Hierarchical structure of events and discovery construction. Active learning is a process by which machine based on time and theme. learning hypotheses are fed back to the user (here via appropriate visualisation techniques) in a manner such that preference feedback Besides interfacing with users, there are many examples in to the machine learner is optimally exploited to improve learning which Visual Analytics can provide the interface between domain performance. This typically provides a logarithmic improvement in experts and machine learning algorithms [15]. Some of these allow user feedback requirements with respect to labelling effort/user users to provide feedback on the machine learning outcomes (such load associated with classical machine learning approaches. as classification or prediction), and improving the underlying Maximally rapid mutual convergence on hypotheses of interest to machining learning model. These are often known as the user is thus ensured, such that human and machine mutually active/interactive learning. Other methods focus on exposing the adapt to take advantage of their respective capabilities in the most inner workings of a machine learning model, i.e. how the model synergistic fashion. makes classification or prediction. This is known as explainable AI The proposed system would thus exploit feedback from the user (XAI) and critical to the issues related to model transparency such in its learning-loop in order to develop a better tailored model of as model bias and user trust. These issues are closely related to the narrative and chronological salience via the use of active learning discussions in the next section. to pro-actively present representation alternatives to the user across the interface. Crucial to bootstrapping this process is an initial 'seed' 4. Coupling with Machine Learning set of domain-annotated data, constituting an initial extraction of The nature of the problem as defined implicates a unique nexus salient descriptors from the narrative stream. between machine learning, human computer interfacing (HCI) and machine representation. While domain summarisation is a well- 5. Discussion/Conclusion established aspect of machine learning-based textual and image We believe that there is a prospect of achieving high quality, analytics, it is necessarily a passive, feedforward process unless synergistic relationships between human and machine cognition in explicit human-in-the-loop considerations are incorporated. Our which one supports the other to enable rapid convergence on problem, when cast in machine learning terms, can be specified as significant and important narratives during investigatory the building of a recommender system for returning evidence in sensemaking. An approach that we propose involves the use of relation to significant, or user-salient, aspects of the chronological interactive visualisation to allow users to construct structured data stream at arbitrary levels of hierarchical external representations of the investigated domain, coupled to aggregation/representation. The problem of relevance has both a machine learning models that might exploit this structure to model 'vertical' (abstractive) as well as 'horizontal' (chronological) and predict investigators’ evolving interests around different parts aspects, given that narrative sequences and events (evidence) exist of the investigation. This is essentially a mixed initiative approach in a subsumptive relationship. to sensemaking in which computational and human agents establish Thus, we seek a system in which user and machine exist within common ground around investigatory goals through common a convergent hermeneutic feedback cycle, for which potentially access to a visualisation interface. In future work we seek to supportive evidence is returned to the user on the basis of the develop a prototype of this approach to provide proof-of-concept current narrative representation at some appropriate level of validation and to develop the techniques involved through iterative hierarchical aggregation. In response, the user feeds back empirical trials. information on the utility of this evidence as part of the constructed narrative sequence (at its appropriate level of representation) in order to either to further develop an existing , or else initiate a novel representational frame. An Approach to Human-Machine Teaming in Legal Investigations LegalAIIA, June 17, 2019, Montréal (Québec), Canada. Using Anchored Narrative Visualisation and Machine Learning. REFERENCES [1] Klein, G., Phillips, J. K., Rall, E. L., & Peluso, D. A. (2007). A data- frame theory of sensemaking. In Expertise out of context: Proceedings of the sixth international conference on naturalistic decision making (pp. 113- 155). New York, NY, USA: Lawrence Erlbaum. [2] Pirolli, P., & Card, S. (2005, May). The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis. In Proceedings of international conference on intelligence analysis (Vol. 5, pp. 2-4). [3] Hutchins, E. (1995). Cognition in the Wild. MIT Press. [4] Wright, P. C., Fields, R., & Harrison, M. D. (2000). Analyzing Human- Computer Interaction as Distributed Cognition: The Resources Model. Human-Computer Interaction, 15(1), 1–41. [5] Attfield, S., Fields, B., & Baber, C. (2018). A resources model for distributed sensemaking. Cognition, Technology & Work, 20(4), 651–664. [6] Faisal, S., Attfield, S., & Blandford, A. (2009). A classification of sensemaking representations. [7] Attfield, S., & Blandford, A. (2011). Making sense of digital footprints in team-based legal investigations: The acquisition of focus. Human– Computer Interaction, 26(1-2), 38-71. [8] Pennington, N., & Hastie, R. (1991). A cognitive theory of juror decision making: The story model. Cardozo L. Rev., 13, 519. [9] Wagenaar, W. A. (1995). Anchored narratives: A theory of judicial reasoning and its consequences. Psychology, law and criminal justice: International developments in research and practice, 267-285. [10] Bex, F. (2015). An Integrated Theory of Causal Stories and Evidential Arguments. In Proceedings of the 15th international conference on artificial intelligence and law (pp. 13-22). ACM. [11] Few (2013) Data Visualization for Human Perception, In: The Encyclopedia of Human-computer Interaction. Interaction Design Foundation. [12] Card, S. K., Mackinlay, J., & Shneiderman, B. (Eds.). (1999). Readings in Information Visualization: Using Vision to Think. Morgan Kaufmann. [13] Kohlhammer, J., Keim, D., Pohl, M., Santucci, G., & Andrienko, G. (2011). Solving problems with visual analytics. Procedia Computer Science, 7, 117-120. [14] Nguyen, P. H., Xu, K., Bardill, A., Salman, B., Herd, K., & Wong, B. W. (2016, October). Sensemap: Supporting browser-based online sensemaking through analytic provenance. In 2016 IEEE Conference on Visual Analytics Science and Technology (VAST) (pp. 91-100). IEEE. [15] Endert, A., Ribarsky, W., Turkay, C., Wong, B. W., Nabney, I., Blanco, I. D., & Rossi, F. (2017, December). The state of the art in integrating machine learning into visual analytics. In Computer Graphics Forum (Vol. 36, No. 8, pp. 458-486).