An Approach to Human-Machine Teaming in Legal Investigations
 Using Anchored Narrative Visualisation and Machine Learning ∗
            Simon Attfield†                                                     Bob Fields                                       David Windridge
    Department of Computer Science                                Department of Computer Science                          Department of Computer Science
         Middlesex University                                          Middlesex University                                    Middlesex University
              London, UK                                                    London, UK                                            London, UK
         s.attfield@mdx.ac.uk                                           b.fields@mdx.ac.uk                                   d.windridge@mdx.ac.uk

                                                                                       Kai Xu
                                                                  Department of Computer Science
                                                                       Middlesex University
                                                                           London, UK
                                                                         k.xu@mdx.ac.uk


ABSTRACT                                                                                    evidence. Given this complexity, investigators tend to create
                                                                                            external representations or ‘models’ of the investigated domain as
During legal investigations, analysts typically create external                             a means of cognitive offloading and creating structures for
representations of an investigated domain as resource for cognitive                         supporting reflection, insight and collaboration. Interactive
offloading, reflection and collaboration. For investigations                                Visualisation and Machine Learning have created interest as tools
involving very large numbers of documents as evidence, creating                             for supporting the identification of relevant documents as a prelude
such representations can be slow and costly, but essential. We                              to such investigations. However, less attention perhaps has been
believe that software tools, including interactive visualisation and                        paid to the potential for combining these technologies within the
machine learning, can be transformative in this arena, but that                             investigation process itself. We argue that such an approach might
design must be predicated on an understanding of how such tools                             support more rapid convergence on investigatory narratives that
might support and enhance investigator cognition and team-based                             matter by:
collaboration. In this paper, we propose an approach to this problem                           a) allowing users to visually externalise their evolving mental
by: (a) allowing users to visually externalise their evolving mental                                models of an investigation domain in the form of
models of an investigation domain in the form of thematically                                       thematically organized Anchored Narratives;
organized Anchored Narratives; and (b) using such narratives as a                              b) using such narratives as a (more of less) tacit interface to
(more or less) tacit interface to cooperative, mixed initiative                                     cooperative, mixed initiative machine learning.
machine learning. We elaborate our approach through a discussion
of representational forms significant to legal investigations and                              We argue that the effect of this can be cooperative human-
discuss the idea of linking such representations to machine                                 machine teaming through an evolving symbiotic relationship
learning.                                                                                   between three distinct but interconnected elements: user cognition,
                                                                                            external representation and machine learning. We develop our case
KEYWORDS                                                                                    by reviewing the role of external representations in investigatory
eDiscovery, TAR, anchored narratives, machine learning,                                     sensemaking focussing on cognition and collaboration. We then
sensemaking, distributed cognition.                                                         consider harnessing machine learning as a tacit means of
                                                                                            anticipating investigatory goals and enhancing access to relevant
                                                                                            data.
1     Introduction
   Legal investigations, particularly in regulatory and litigation
contexts, tend to be characterised by the simultaneous challenge                            1    Background - External Representations for
and opportunity of very large numbers of documents as a source of                               Investigatory Sensemaking
                                                                                                The creation, augmentation and use of representations, whether
In: Proceedings of the First International Workshop on AI and Intelligent Assistance
                                                                                            internal (in the head) or external (in the world), are a central part of
for Legal Professionals in the Digital Workplace (LegalAIIA 2019). June 17, 2019.           sensemaking. This idea is reflected in most significant theories and
Montreal, QC, Canada.                                                                       models of sensemaking. For example, Klein et al. [1] discuss the
Copyright © 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0). Published at                     role of mental ‘frames’ in sensemaking, and Pirolli and Card [2]
http://ceur-ws.org.                                                                         emphasise the way intelligence analysts externally structure
 LegalAIIA, June 17, 2019, Montréal (Québec), Canada.                                                     Attfield, Fields, Windridge and Xu

information into representations as part of a wider sensemaking               Research shows that narrative representations play a
process (referring to this step as ‘schematization’).                     particularly important role in the way that people reason about
    External representations, when created, can be intimately             evidence. For example, Pennington and Hastie [8] conducted a
involved in the cognitive processes of sensemaking. The approach          series of studies into the way that jurors mentally comprehend
of Distributed Cognition is predicated on the idea that cognitive         evidence in legal cases. They found that, irrespective of how
activities make use of external as well as internal representations,      evidence was presented, jurors structured it in terms of narratives
with external representations seen not only as sources of                 that made sense to them. Not only that, they added information to
information, but as structures that transform the cognitive task itself   make the stories make more sense. This finding is typical of studies
[3]. Having an effective representation can lead to different and         into evidential reasoning and provided a basis for what Pennington
better strategies for carrying out a task, better performance, and        and Hastie called their Story Model. According to the Story Model
lower mental effort. The form and properties of external                  people find it easiest to make sense of legal evidence through
representations can lead to changes in cognitive processes as these       narratives that they construct in order to explain the evidence.
become integrated into and participate within these processes.            Importantly, the resulting narrative is constructed not just from the
Distributed Cognition aims to dissolve the traditional division of        evidence, but by reasoning from evidence to explanation.
inside/outside the individual when analysing cognition in order to
explore the complex relationships between people, artefacts and           Argumentation
technology when accounting for how thinking gets done.                        Investigatory sensemaking involves drawing conclusions from
    In an attempt to render the concepts of distributed cognition         evidence using generalised beliefs about the way the world works
more useful and applicable to the design of human-computer                [9]. For example, an investigator may infer from reading an email
interaction, Wright et al. [4] identified a collection of ‘abstract       in which person a thanks person b for a gift, that a gift was
information resources’ that can form a part of the process of             exchanged, with this inference depending on both the text in the
carrying out activities. Such abstract structures can be represented      email and the more general belief that people don’t usually express
in a variety of forms, embodied in physical media (possibly as a          gratitude in this way when in reality no gift has been exchanged.
result of the design of interactive technologies) or located in the       This is an example of an abductive inference (reasoning to the best
minds of members of a distributed cognitive system. More recently,        possible explanation) which is characteristic of investigatory
Attfield et al. [5] applied this idea to sensemaking, identifying a       sensemaking. Many thousands of such inferences may be made
taxonomy of abstract information resources that can be represented        during an investigation, and given their generally defeasible nature,
internally or externally during sensemaking and which are                 it can be important that they are amenable to review. For example,
transformed during the process of sensemaking. These resources            Attfield and Blandford [7] reported on the way that lawyers
include representations of the domain (specific or general), intents      maintained links from chronology entries to supporting
(high-level values to low-level and goals), and representations of        documentary evidence and traversed them frequently.
action (possible, planned or performed). Actors involved in the               Based on a study of how Dutch judges reasoned about cases,
sensemaking activity may make use of any or all of these, and the         Wagenaar [9] observed their prominent use of narrative
nature of their representation determine how they may do so.              connections and argumentation links and developed from this the
                                                                          notion of Anchored Narratives. An anchored narrative is a hybrid
Narrative                                                                 representational form combining narrative with argumentational
    External representations can take many forms depending on the         links to supporting evidence. Bex [10] has used this approach to
entities and relationships being represented. Faisal, Attfield and        develop a formal theory that combines stories with evidential
Blandford [6] proposed six basic types: spatial, sequential               arguments in a hybrid framework for structured argumentation.
(including narrative), networks, hierarchical, argumentation                  Figure 1 shows an example of an Anchored Narrative in which
structures and faceted. Here we discuss two types which are               events are represented as a connected narrative (from top to bottom
important for constructing domain representations during                  in figure 1) attached to supporting evidence (where available).
investigatory sensemaking: narrative and argument. Later we                   Significantly, events are anchored, not only in evidence, but
extend this with a discussion of thematic organisation.                   within the context of the unfolding story. The plausibility of each
    For example, Attfield and Blandford [7] reported a study of the       event is then judged not solely in virtue of its supporting evidence,
cognitive work of lawyers involved in some large corporate                but also by the support of plausibility afforded by its position in the
investigations. As part of their work, the lawyers represented their      surrounding narrative and how this relates to generalised beliefs
analyses in the form of sequences of connected events or                  about how the world words. Figure 1 also shows the representation
chronologies, created around different themes of an investigation.        of multiple competing narratives with a point of divergence based
These narrative representations, which were ultimately very large,        on evidence from interview 1 and interview 2. Explicitly
played a central role in the way that the lawyers thought about and       representing such competing conclusions can be a helpful in a
collaborated around the investigations and they were central in the       context of defeasible reasoning where multiple interpretations or
generation of insights. The lawyers reported that this was a natural      claims may be explicitly considered.
way for them to think about an investigation.
 An Approach to Human-Machine Teaming in Legal Investigations
                                                                                  LegalAIIA, June 17, 2019, Montréal (Québec), Canada.
 Using Anchored Narrative Visualisation and Machine Learning.


                                                                             Figure 2 - Model of the Visual Analytics Process from
                                                                                          Kohlhammer et al. (2011)


       Figure 1 - An example of an Anchored Narrative

3. Interactive Visualisation
    Data visualisation has a capability of supporting insight from
abstract data by leveraging the power of the human perceptual
system to convert cognitive problems into perceptual problems
[11]. It can, reveal insights that are otherwise difficult to discover
[12]. Interest has developed in extending data visualisation beyond
the display of large datasets to support other aspects of
sensemaking (including what Pirolli and Card [2] referred to as
schematization) and also to enhance human sensemaking by
coupling representations to computational components such as
machine learning; this is an approach emphasised by Visual                Figure 3 - The SenseMap allows interactive construction of
Analytics. Figure 2 shows Kohlhammer et al’s [13] model of the            episodes or narratives from discoveries. Each discovery (or
Visual Analytics process. The main difference between this model           event) is represented as a box, which can be grouped or
and a data visualisation pipeline is the addition of the ‘model’                     connected to form a episode/narrative.
component (representing the product of automated data analysis
such as machine learning) and its interactions with other                    In addition to organizing discoveries into evolving narratives,
components.                                                              we see value in organising narratives into identifiable episodes and
    Visual Analytics tools can facilitate the process of constructing    themes. Investigations can be complex. Investigation teams have
narratives from data and capturing the data and analysis that lead to    been shown to divide analyses along the lines of episodes and
them. Figure 3 shows a tool we have developed called SenseMap            themes as these become apparent. This has the value of reducing
[14]. SenseMap provides the user with a freeform interactive space       cognitive complexity and supporting the division of labour [7].
(right) which can be used for constructing anchored narratives from      Different episodes and themes will also have different theories of
data. The user interacts with data and represents interesting            relevance, and we anticipate that such structuring can be exploited
discoveries as a boxes in the main panel (right) by a simple click.      by machine learning for the (further) identification of relevant
Discoveries can be moved freely to form thematic groups or               information in large evidential collections. Hence, we propose
evolving narratives. SenseMap also captures the provenance of the        structuring events at the interface into discrete episodes and by
discovery such that clicking on a discovery will restore the original    hierarchical theme. Figure 4 shows a conceptual model of this idea
data source i.e. discoveries are anchored in source data.                in which connected events form episodes, which in turn become
                                                                         components in anchored narratives. Similarly, discoveries can be
                                                                         grouped as hierarchically organised themes.
 LegalAIIA, June 17, 2019, Montréal (Québec), Canada.                                                        Attfield, Fields, Windridge and Xu

                                                                                The hierarchical aspect of the problem significantly multiplies
                                                                            the complexity of the machine learning methodology required to
                                                                            approach it. In particular, sequence-based recommender systems
                                                                            typically rely on query proximity within some appropriate metric
                                                                            (or quasi-metric) space. However, we here require that the proximal
                                                                            region to the user's query (anchor) within 'narrative space' takes
                                                                            into account arbitrary levels of aggregation (or narrative coarse-
                                                                            graining) in a way that both encompasses (potentially evolving)
                                                                            user preference and does not burden the user with excessive
                                                                            feedback requirements.
                                                                                To this end, we propose to use active learning within the context
                                                                            of the querying of the sequential aggregation so as to achieve the
                                                                            optimal reduction in the bandwidth of user feedback required to
                                                                            obtain a convergent recommender platform for narrative
   Figure 4 - Hierarchical structure of events and discovery                construction. Active learning is a process by which machine
                   based on time and theme.                                 learning hypotheses are fed back to the user (here via appropriate
                                                                            visualisation techniques) in a manner such that preference feedback
    Besides interfacing with users, there are many examples in              to the machine learner is optimally exploited to improve learning
which Visual Analytics can provide the interface between domain             performance. This typically provides a logarithmic improvement in
experts and machine learning algorithms [15]. Some of these allow           user feedback requirements with respect to labelling effort/user
users to provide feedback on the machine learning outcomes (such            load associated with classical machine learning approaches.
as classification or prediction), and improving the underlying              Maximally rapid mutual convergence on hypotheses of interest to
machining learning model. These are often known as                          the user is thus ensured, such that human and machine mutually
active/interactive learning. Other methods focus on exposing the            adapt to take advantage of their respective capabilities in the most
inner workings of a machine learning model, i.e. how the model              synergistic fashion.
makes classification or prediction. This is known as explainable AI             The proposed system would thus exploit feedback from the user
(XAI) and critical to the issues related to model transparency such         in its learning-loop in order to develop a better tailored model of
as model bias and user trust. These issues are closely related to the       narrative and chronological salience via the use of active learning
discussions in the next section.                                            to pro-actively present representation alternatives to the user across
                                                                            the interface. Crucial to bootstrapping this process is an initial 'seed'
4. Coupling with Machine Learning                                           set of domain-annotated data, constituting an initial extraction of
    The nature of the problem as defined implicates a unique nexus          salient descriptors from the narrative stream.
between machine learning, human computer interfacing (HCI) and
machine representation. While domain summarisation is a well-               5. Discussion/Conclusion
established aspect of machine learning-based textual and image                  We believe that there is a prospect of achieving high quality,
analytics, it is necessarily a passive, feedforward process unless          synergistic relationships between human and machine cognition in
explicit human-in-the-loop considerations are incorporated. Our             which one supports the other to enable rapid convergence on
problem, when cast in machine learning terms, can be specified as           significant and important narratives during investigatory
the building of a recommender system for returning evidence in              sensemaking. An approach that we propose involves the use of
relation to significant, or user-salient, aspects of the chronological      interactive visualisation to allow users to construct structured
data      stream     at     arbitrary     levels     of     hierarchical    external representations of the investigated domain, coupled to
aggregation/representation. The problem of relevance has both a             machine learning models that might exploit this structure to model
'vertical' (abstractive) as well as 'horizontal' (chronological)            and predict investigators’ evolving interests around different parts
aspects, given that narrative sequences and events (evidence) exist         of the investigation. This is essentially a mixed initiative approach
in a subsumptive relationship.                                              to sensemaking in which computational and human agents establish
    Thus, we seek a system in which user and machine exist within           common ground around investigatory goals through common
a convergent hermeneutic feedback cycle, for which potentially              access to a visualisation interface. In future work we seek to
supportive evidence is returned to the user on the basis of the             develop a prototype of this approach to provide proof-of-concept
current narrative representation at some appropriate level of               validation and to develop the techniques involved through iterative
hierarchical aggregation. In response, the user feeds back                  empirical trials.
information on the utility of this evidence as part of the constructed
narrative sequence (at its appropriate level of representation) in
order to either to further develop an existing , or else initiate a novel
representational frame.
 An Approach to Human-Machine Teaming in Legal Investigations
                                                                                LegalAIIA, June 17, 2019, Montréal (Québec), Canada.
 Using Anchored Narrative Visualisation and Machine Learning.

REFERENCES
[1] Klein, G., Phillips, J. K., Rall, E. L., & Peluso, D. A. (2007). A data-
frame theory of sensemaking. In Expertise out of context: Proceedings of
the sixth international conference on naturalistic decision making (pp. 113-
155). New York, NY, USA: Lawrence Erlbaum.

[2] Pirolli, P., & Card, S. (2005, May). The sensemaking process and
leverage points for analyst technology as identified through cognitive task
analysis. In Proceedings of international conference on intelligence
analysis (Vol. 5, pp. 2-4).

[3] Hutchins, E. (1995). Cognition in the Wild. MIT Press.

[4] Wright, P. C., Fields, R., & Harrison, M. D. (2000). Analyzing Human-
Computer Interaction as Distributed Cognition: The Resources Model.
Human-Computer Interaction, 15(1), 1–41.

[5] Attfield, S., Fields, B., & Baber, C. (2018). A resources model for
distributed sensemaking. Cognition, Technology & Work, 20(4), 651–664.

[6] Faisal, S., Attfield, S., & Blandford, A. (2009). A classification of
sensemaking representations.

[7] Attfield, S., & Blandford, A. (2011). Making sense of digital footprints
in team-based legal investigations: The acquisition of focus. Human–
Computer Interaction, 26(1-2), 38-71.

[8] Pennington, N., & Hastie, R. (1991). A cognitive theory of juror decision
making: The story model. Cardozo L. Rev., 13, 519.

[9] Wagenaar, W. A. (1995). Anchored narratives: A theory of judicial
reasoning and its consequences. Psychology, law and criminal justice:
International developments in research and practice, 267-285.

[10] Bex, F. (2015). An Integrated Theory of Causal Stories and Evidential
Arguments. In Proceedings of the 15th international conference on
artificial intelligence and law (pp. 13-22). ACM.

[11] Few (2013) Data Visualization for Human Perception, In: The
Encyclopedia of Human-computer Interaction. Interaction Design
Foundation.

[12] Card, S. K., Mackinlay, J., & Shneiderman, B. (Eds.). (1999). Readings
in Information Visualization: Using Vision to Think. Morgan Kaufmann.

[13] Kohlhammer, J., Keim, D., Pohl, M., Santucci, G., & Andrienko, G.
(2011). Solving problems with visual analytics. Procedia Computer
Science, 7, 117-120.

[14] Nguyen, P. H., Xu, K., Bardill, A., Salman, B., Herd, K., & Wong, B.
W. (2016, October). Sensemap: Supporting browser-based online
sensemaking through analytic provenance. In 2016 IEEE Conference on
Visual Analytics Science and Technology (VAST) (pp. 91-100). IEEE.

[15] Endert, A., Ribarsky, W., Turkay, C., Wong, B. W., Nabney, I., Blanco,
I. D., & Rossi, F. (2017, December). The state of the art in integrating
machine learning into visual analytics. In Computer Graphics Forum (Vol.
36, No. 8, pp. 458-486).