Quick Check: A Legal Research Recommendation System
                 Merine Thomas                                             Thomas Vacek                                      Xin Shuai∗
           Center for AI and Cognitive                              Center for AI and Cognitive                             Wissee Inc.
                   Computing                                                Computing                                 Sammamish, WA, USA
               Thomson Reuters                                          Thomson Reuters                             shuaixin.david@gmail.com
                Eagan, MN, USA                                           Eagan, MN, USA
             merine.thomas@tr.com                                     thomas.vacek@tr.com

                   Wenhui Liao∗                                           George Sanchez                                    Paras Sethia
             Minneapolis, MN, USA                                   Center for AI and Cognitive                    Center for AI and Cognitive
           wendy.liao2009@gmail.com                                         Computing                                      Computing
                                                                         Thomson Reuters                               Thomson Reuters
                                                                         Eagan, MN, USA                                 Toronto, Canada
                                                                      george.sanchez@tr.com                           paras.sethia@tr.com

                       Don Teo                                             Kanika Madan                                   Tonya Custis∗
           Center for AI and Cognitive                              Center for AI and Cognitive                         Autodesk AI Lab
                   Computing                                                Computing                                San Francisco, CA, USA
               Thomson Reuters                                          Thomson Reuters                            tonya.custis@autodesk.com
                Toronto, Canada                                          Toronto, Canada
                don.teo@tr.com                                        kanika.madan@tr.com
ABSTRACT                                                                               A Legal Research Recommendation System. In Proceedings of the 2020 Natu-
Finding relevant sources of law that discuss a specific legal issue                    ral Legal Language Processing (NLLP) Workshop, 24 August 2020, San Diego,
                                                                                       US. ACM, New York, NY, USA, 4 pages.
and support a favorable decision is an onerous and time-consuming
task for litigation attorneys. In this paper, we present Quick Check,
a system that extracts the legal arguments from a user’s brief and                     1    INTRODUCTION
recommends highly relevant case law opinions. Using a combi-                           When preparing or reviewing a legal brief, litigation attorneys
nation of full-text search, citation network analysis, clickstream                     spend a significant amount of time searching for the most pertinent
analysis, and a hierarchy of ranking models trained on a set of over                   authority to bolster or refute a particular point of law. This involves
10K annotations, the system is able to effectively recommend cases                     sifting through a collection of millions of primary and secondary
that are similar in both legal issue and facts. Importantly, the system                sources of law, as well as past briefs and memoranda. The task is
leverages a detailed legal taxonomy and an extensive body of edi-                      particularly challenging given the need for high recall; an incom-
torial summaries of case law. We demonstrate how recommended                           plete legal research process can potentially miss a highly relevant
cases from the system are surfaced through a user interface that                       source of law that would adversely impact the litigation strategy.
enables a legal researcher to quickly determine the applicability of                      Early work in document recommendation for legal research
a case with respect to a given legal issue.                                            focused on the retrieval of relevant authority and briefs through a
                                                                                       combination of explicit user query input and implicit user browsing
CCS CONCEPTS                                                                           behavior [1] or by attempting to cluster legal issues into broader
• Information systems → Retrieval models and ranking; •                                topics [8]. In this paper, we present an approach that considers the
Computing methodologies → Information extraction.                                      task from a citation recommendation perspective [3, 5]. Our system,
                                                                                       Quick Check, complements the legal research process by extracting
KEYWORDS                                                                               the core legal arguments of interest directly from a user’s input
                                                                                       brief document and recommending relevant primary and secondary
recommendation; learning to rank; legal research
                                                                                       sources of law. In particular, the system leverages a combination of
ACM Reference Format:                                                                  full-text search, citation network analysis, and clickstream analysis
Merine Thomas, Thomas Vacek, Xin Shuai, Wenhui Liao, George Sanchez,                   to surface highly relevant case law opinions. Importantly, apart
Paras Sethia, Don Teo, Kanika Madan, and Tonya Custis. 2020. Quick Check:              from the user’s brief, no other user interaction is required by the
∗ Work done while at Thomson Reuters.                                                  system to interpret the legal issues and locate relevant authority.
                                                                                          While the structure and formatting styles of legal briefs in the
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons   U. S. federal and state court systems will vary depending on the
License Attribution 4.0 International (CC BY 4.0).
                                                                                       court level and jurisdiction, a typical document will include at least
NLLP @ KDD 2020, August 24th, San Diego, US
© 2020 Copyright held by the owner/author(s).                                          the following main sections (or the equivalents thereof): (1) an
                                                                                       Introduction articulating the party’s claim and relief sought, (2) a
NLLP @ KDD 2020, August 24th, San Diego, US                                                                                                     Thomas, et al.


Statement of Facts that summarize key factual elements at issue          3.2.1 Search-engine-based Candidate Discovery. Each paragraph
and the procedural history of the case, (3) an Argument section          within a segment discusses a particular aspect of the legal issue at
containing the legal issues at hand and related supporting facts,        hand. For each of these paragraphs, we perform full-text search
and (4) a Conclusion summarizing the main points and the specific        across a corpus of about 12M case law opinions using a proprietary
relief sought. The Argument section is typically further divided into    search engine tuned for the legal domain. To increase the juris-
subsections, each discussing a particular legal issue. We refer to       dictional relevance of results, the search is restricted to a subset
each subsection as an issue segment. The recommendation system           of jurisdictions based on the corresponding jurisdictions of the
we describe follows an issue-segment-centric approach; potentially       citations present within the segment or the rest of the brief.
relevant cases are mined and ranked with respect to a particular            In addition to the case law opinions themselves, we consider
issue segment in the brief.                                              cases from a context-aware citation recommendation perspective [3,
                                                                         6]. In particular, we leverage an index of pseudo-documents, each
                                                                         representing a case, constructed in the following manner. For a
2     TRAINING DATA COLLECTION
                                                                         given case, we consider all cases and previously filed briefs in
The case ranking component of the system (Section 3.3) was trained       which a citation to the case is made. The sentence preceding the
on a large corpus of graded issue-segment-to-case pairs. The initial     citation reference within the document is extracted and added to
pairs were collected from a combination of manual curation by            the pseudo-document corresponding to the case. Thus, a case’s
attorneys and an early prototype of the system, while the bulk of the    pseudo-document is an aggregate of all extracted reference texts
dataset was collected from the output of successive improvements         and provides a representation of the legal context in which a case is
to the system. The quality of a recommended case was graded on           cited. A set of full-text searches using the issue segment paragraphs
a five-point Likert scale, reflecting the degree to which a case is      is also performed over this index.
relevant to the legal issue at hand. A recommendation with a rating
of 4 or 5 is considered highly relevant, while one with a rating of 1    3.2.2 Citation-based Candidate Discovery. The set of case citations
is considered irrelevant. In total, we collected over 10K graded pairs   within an issue segment (hereafter referred to as input citations)
from attorney-editors for model training. The briefs were chosen         gives a valuable characterization of the legal issue being discussed.
to cover a variety of jurisdictions, practice areas, and motion types.   The system leverages this citation "profile" to find potentially related
                                                                         cases through the following means:
                                                                              • Case and brief citation network: The most directly re-
3     SYSTEM OVERVIEW
                                                                                lated cases are those that are bibliographically coupled to
Figure 1 gives an overview of the Quick Check system architecture.              the input citations (i.e. cases citing the same input citations).
The recommendation system consists of three primary stages: Doc-                Similarly, a brief citation network is constructed by decom-
ument Structure Extraction, Candidate Case Discovery, and Case                  posing the corpus of past filed briefs into issue segments.
Ranking.                                                                        We then consider all bibliographically coupled segments. For
                                                                                both the case and brief-issue-segment networks, we extract
3.1    Document Structure Extraction                                            the set of other cases that are cited in the coupled case or
                                                                                issue segment as candidate recommendations.
The first stage of the pipeline converts a user’s uploaded brief doc-
                                                                              • Statutory annotations: Statutory annotations provide con-
ument into HTML, which is used for all downstream document
                                                                                cise summaries of important cases that have interpreted a
section parsing logic. Stylistic information contained in the HTML
                                                                                statute or regulation. They are organized editorially in a
tags provide an obvious indication of section headings. Therefore,
                                                                                hierarchy of procedural topics. Candidate recommendations
the system searches for the presence of a combination of bold,
                                                                                are extracted by considering the cases that are found within
alignment, and heading elements. Of primary interest to the recom-
                                                                                the same procedural topic as an input citation.
mendation system is the accurate identification of the Argument
                                                                              • Pinpoint headnotes: An input citation will often be accom-
section of a brief. Thus, a set of high-precision rules is applied
                                                                                panied by a direct quote from the cited case or a page number
against the extracted set of headings to capture the top-level Ar-
                                                                                pinpointing the relevant portion of the case. Moreover, a case
gument heading, which may include terms such as "Discussion",
                                                                                will often have one or more editorial summaries, called head-
"Memorandum", or "Analysis". Subsection headings in the Argu-
                                                                                notes, that highlight important points of law in the case.
ment section are identified through the presence of a numbering
                                                                                Headnotes contain reference links to the corresponding lo-
or word capitalization convention.
                                                                                cation within the case document where the point of law is
   Each issue segment of the Argument section is a collection of
                                                                                discussed. Thus, one can correlate the input citation to one
paragraphs and citations describing a particular legal issue. We
                                                                                or more headnotes in the cited case based on a combination
consider each issue segment in isolation when discovering and
                                                                                of the pinpoint information and headnote reference links1 .
ranking candidate cases.
                                                                                This is useful because extensive editorial annotations exist
                                                                                that identify explicitly the point of law (i.e. headnote) for
3.2    Candidate Case Discovery                                                 which a case is citing another case. Therefore, the system
Given an issue segment, the system first collects a large pool of        1 If more than one headnote is identified, the most relevant headnote is determined
potentially relevant cases. This is done using both search-based         based on a combination of text similarity and topic similarity measures, the latter of
and citation-based document discovery mechanisms.                        which leverages a legal topic taxonomy.
Quick Check: A Legal Research Recommendation System                                                         NLLP @ KDD 2020, August 24th, San Diego, US


                                           Figure 1: Overview of Quick Check system architecture.


        is able to retrieve cases that cite the same case for the same   that the first ranker alone achieves a mean 𝑁 𝐷𝐶𝐺@5 of 0.62. Finally,
        reason as the input citation of the issue segment.               at a brief level, the percentage of briefs where at least one-third of
      • Clickstream analysis: Within a particular research web           the recommendations are highly relevant is 55%.
        session on our legal research platform, a user will interact
        with cases in a number of ways, including viewing the case,      5    DEMONSTRATION
        saving it to a folder, or printing the case document. Research   Users can upload briefs that are in either an early draft or nearly
        session activity is aggregated across all users to provide       completed state. They may also choose to analyze an old brief
        implicit relevance feedback of cases. In particular, given the   with potentially outdated authority or even an opposing party’s
        citation profile of the issue segment, the system finds cases    document. When a brief document has been uploaded, the recom-
        that commonly appear within the same session.                    mendation system pipeline is run. The entire pipeline completes
                                                                         in under a couple minutes for a brief document of typical length.
3.3     Case Ranking                                                     The recommended cases are displayed and grouped by the corre-
The pool of candidates collected from the discovery stage is passed      sponding issue segments. Each case is accompanied with additional
through two ranking SVM models [7]. The first ranker uses meta-          information that helps to put the recommendation in context for the
data information corresponding to each of the discovery methods          user, including the input citations that are related and the portion of
as features (e.g. how often the case was found in the top 5 results of   text within the case found to be most similar to the issue segment.
searches, the number of input citations the case is bibliographically    The latter is determined using a combination of legal topic classifi-
coupled with, etc.) and acts as a filter to reduce the pool size down    cation (Section 3.3) and a vector space model representation of the
to several hundred cases.                                                issue segment and the recommended case. A recommendation may
   The second ranker leverages an additional set of features that        also be marked with additional tags highlighting if the case is from
measure the textual and topical similarity of the issue segment          a high court, is frequently cited, or is less than 2 years old. Figure 2
and the candidate case, where the issue segment is represented           shows the Quick Check interface for a sample brief.
by either its textual content or the pinpoint headnotes of its input         After being presented with the recommended cases, a user may
citations (Section 3.2.2). Textual similarity is computed using an       filter the the results based on the issue segment of interest, or
edit-distance-based similarity measure, while topical similarity is      by a specific date range or jurisdiction. The user can also choose
assessed from the hierarchical similarity of the segment and can-        to lower the threshold of the final ranker model to explore more
didate case when classified under a legal topic taxonomy using a         recommendations from the system. Recommended cases can then
legal topic classifier [1, 2]. Additionally, the recency of a case is    be viewed in full or saved/downloaded for further review.
taken into account at this stage.
   Finally, the top-ranked candidates are fed to an ensemble-based       6    CONCLUSION
pointwise ranker [4] leveraging additional features that analyze         We presented Quick Check, a commercially available system that
the results of the search-based discovery component. The model           recommends cases with highly similar legal issues and facts given
produces a probability score on the relevancy of a case, which is        a user’s input brief document. The system leverages a multitude
used to filter out poor quality recommendations prior to surfacing       of case discovery pathways and ranking models trained over a
to the user.                                                             large annotated training set to extract the most relevant cases to
                                                                         a given legal issue. The system is robust against the wide variety
4     RESULTS                                                            of brief formatting styles and has been found to be effective across
The quality of the output recommendations is measured against            jurisdictions, practice areas, and motion types.
a test set of nearly 500 briefs (corresponding to about 2K issue
segments) using several metrics of varying granularity. Across all       REFERENCES
recommendations, the percentages of highly relevant, relevant, and       [1] Khalid Al-Kofahi, Peter Jackson, M. Dahn, Charles Elberti, William Keenan, and
irrelevant recommendations are 39%, 60.5%, and 0.5%, respectively.           John Duprey. 2007. A Document Recommendation System Blending Retrieval and
                                                                             Categorization Technologies. In Proceedings of AAAI Workshop on Recommender
At an issue segment level, the percentages of segments with at               Systems in e-Commerce. 9–16.
least one highly relevant, at least one relevant or highly relevant,     [2] Khalid Al-Kofahi, Alex Tyrrell, Arun Vachher, Tim Travers, and Peter Jackson.
and at least one irrelevant recommendation are 67%, 97%, and 1%,             2001. Combining Multiple Classifiers for Text Categorization. In Proceedings of
                                                                             the Tenth International Conference on Information and Knowledge Management
respectively, while the mean 𝑁 𝐷𝐶𝐺@5 per issue of relevant or                (Atlanta, Georgia, USA) (CIKM ’01). Association for Computing Machinery, New
highly relevant recommendations is 0.66. For comparison, we note             York, NY, USA, 97–104. https://doi.org/10.1145/502585.502603
NLLP @ KDD 2020, August 24th, San Diego, US                                                                                                                   Thomas, et al.


Figure 2: Quick Check interface showing the top-ranked recommended case related to a legal issue extracted from the uploaded
brief document.


[3] Michael Färber and Adam Jatowt. 2020. Citation Recommendation: Approaches          [6] Qi He, Jian Pei, Daniel Kifer, Prasenjit Mitra, and Lee Giles. 2010. Context-Aware
    and Datasets. ArXiv abs/2002.06961 (2020).                                             Citation Recommendation. In Proceedings of the 19th International Conference on
[4] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2008. The Elements of           World Wide Web (Raleigh, North Carolina, USA) (WWW ’10). Association for
    Statistical Learning (second ed.). Springer New York Inc., New York, NY, USA,          Computing Machinery, New York, NY, USA, 421–430. https://doi.org/10.1145/
    339.                                                                                   1772690.1772734
[5] Qi He, Daniel Kifer, Jian Pei, Prasenjit Mitra, and C. Lee Giles. 2011. Citation   [7] T. Joachims. 2002. Optimizing Search Engines Using Clickthrough Data. In ACM
    Recommendation without Author Supervision. In Proceedings of the Fourth ACM            SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 133–142.
    International Conference on Web Search and Data Mining (Hong Kong, China)          [8] Qiang Lu and Jack G. Conrad. 2012. Bringing Order to Legal Documents - An
    (WSDM ’11). Association for Computing Machinery, New York, NY, USA, 755–764.           Issue-based Recommendation System Via Cluster Association. In KEOD.
    https://doi.org/10.1145/1935826.1935926