Return of the AI: An Analysis of Legal Research on Artificial
                   Intelligence Using Topic Modeling
              Constanta Rosca, Bogdan Covrig, Catalina Goanta, Gijs van Dijck, Gerasimos Spanakis
                     {constanta.rosca,b.covrig,catalina.goanta,gijs.vandijck,jerry.spanakis}@maastrichtuniversity.nl
                                                 Law & Tech Lab, Maastricht University
                                                         Maastricht, Netherlands
ABSTRACT                                                                               years AI-related themes have gained considerable popularity in
AI research finds itself in the third boom of its history, and in re-                  new disciplines, such as law.
cent years, AI-related themes have gained considerable popularity                         With over 2500 publications already by the year 2015 referring
in new disciplines, such as law. This paper explores what legal re-                    to ‘artificial intelligence’ [19] (see also Figure 3 in the Appendix),
search on AI constitutes of and how it has evolved, while addressing                   it may no longer be realistic to assume that researchers can keep
the issues of information retrieval and research duplication. Using                    up with legal research on AI, or the number of publications in
Latent Dirichlet Allocation (LDA) topic modeling on a dataset of                       general. Moreover, the recent spike suggests more and more authors
3931 journal articles, we explore three questions: (a) Which topics                    have started writing about AI in law topics, including authors who
within legal research on AI can be distinguished? (b) When were                        have previously not published on such topics. This, in combination
these topics addressed? and (c) Can similar papers be detected? The                    with the inability to keep up with legal research on AI due to its
topic modeling results in a total of 32 meaningful topics. Addition-                   exponential growth, creates the risk that authors replicate previous
ally, it is found that legal research on AI drastically increased as of                work without being aware of similar previous publications.
2016, with topics becoming more granular and diverse over time.                           This paper aims to explore what legal research on AI constitutes
Finally, a comparison of the similarity assessments produced by the                    of and how it has evolved while addressing the issue of informa-
algorithm and a human expert suggest that the assessments often                        tion retrieval and the risk of research duplication. We develop a
coincide. The results provide insights into how a legal research on                    methodology that distinguishes topics in a collection of documents
AI has evolved over time, and support for the development of ma-                       (in this case journal publications), allows exploring the evolution
chine learning and information retrieval tools like LDA that assist                    of the topics over time, and detects similarity between documents,
in structuring large document collections and identifying relevant                     with the purpose of providing solutions for reading and analyz-
articles.                                                                              ing a number of publications in bulk in ways that humans cannot.
                                                                                       Consequently, we aim to answer the following research questions
CCS CONCEPTS                                                                           (RQ):
                                                                                          RQ1: Which topics within the field of legal research on AI can
• Computing methodologies → Topic modeling; • Applied
                                                                                       be distinguished (‘What’)? A methodology would provide insight
computing → Law.
                                                                                       in how legal research on AI is structured, and it would allow classi-
                                                                                       fying publications in sub-topics, which would enhance information
KEYWORDS                                                                               retrieval.
information retrieval, topic modeling, legal research                                     RQ2: What (i.e. about which topics) has been written when
ACM Reference Format:                                                                  (‘What - When’)? This question contributes to the understanding of
Constanta Rosca, Bogdan Covrig, Catalina Goanta, Gijs van Dijck, Gerasi-               which topics have emerged, remained, or lost the interest of legal
mos Spanakis. 2020. Return of the AI: An Analysis of Legal Research on                 scholars. The analysis of the What - When question will provide
Artificial Intelligence Using Topic Modeling. In Proceedings of the 2020 Nat-          information about the evolution of legal research on AI.
ural Legal Language Processing (NLLP) Workshop, 24 August 2020, San Diego,                RQ3: Can similar papers be detected? Considering the sharp
US. ACM, New York, NY, USA, 8 pages.                                                   increase of publications, it may be becoming increasingly difficult
                                                                                       to find publications on similar research questions, which may even
1    INTRODUCTION                                                                      result in reproduction of scholarship because prior publications
                                                                                       are overlooked. This question explores a methodology that allows
Artificial intelligence (AI) research finds itself in the third boom of
                                                                                       detecting thematically similar documents in a given corpus.
its history, fuelled by increased funding, scientific breakthroughs
such as deep learning, and widespread public speculation relating
to the scope and impact of these breakthroughs. While decades                          2   BACKGROUND
ago the interest in AI research was generally limited to specific                      One of the innovations in this paper is to use unsupervised ma-
disciplines (e.g. computer science, philosophy), during the past                       chine learning to categorize legal research on AI and to map how
                                                                                       legal scholarship on this topic has developed. The idea of smart
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons   literature reviews has received prior attention, on the grounds of
License Attribution 4.0 International (CC BY 4.0).
                                                                                       how manual searches for existing literature - especially in matured
NLLP @ KDD 2020, August 24th, San Diego, US
© 2020 Copyright held by the owner/author(s).                                          domains where scholarship is abundant - is not efficient and might
                                                                                       have a negative impact on the quality of new research [2]. Similar
NLLP @ KDD 2020, August 24th, San Diego, US                                                                                      Rosca and Covrig, et al.


approaches have been taken to map computer science literature                  Resulting articles therefore discuss a wide array of aspects relat-
[23], as well as communications research [29].                                 ing to artificial intelligence, and do not, as such, focus on specific
    To the best of our knowledge, legal scholarship as such has                technical or legal issues. For the purpose of this study, we assume
not yet been analyzed using LDA. Within the narrow confines of                 that even one reference to the keywords is sufficient to include an
the field of research labeled as AI and Law, a lot of legal research           article in the corpus.
on information retrieval has been published in the past decades.                  The articles in the corpus follow a power law distribution, where
Notwithstanding that the volume of scholarship on information                  a relatively large number of publications is written by a low number
retrieval pertaining to the discipline of computer science alone is            of authors, and few publications by many authors (see Figure 4 in
vast to say the least [13], particularly when applied to the legal             the Appendix). The same distribution applies to the number of
field, such research focused on the availability of legal information          publications per author(s) with roughly 28 authors having more
[5, 20, 21, 28, 32, 34, 46, 55], search systems and search strategies [16,     than 5 publications (see Figure 5 in the Appendix).
22, 30, 43, 47, 57], information processing [6, 17, 18, 25, 33, 36, 37, 50],
and the role of legal publishers [1, 3, 26]. These publications address        3.2    Topic Modeling
a variety of issues and questions, including the sustainability of             3.2.1 Latent Dirichlet Allocation. Latent Dirichlet Allocation (LDA)
publicly available legal repositories (e.g. AustLII), the importance of        [8] was used to identify topics in the corpus. LDA has many different
natural language processing when searching for legal information,              use cases so far. Examples concern organizing large document
the performance of online searches compared to searching through               collections in order to improve search and retrieval of information,
paper, how citation analysis may be used to improve search results,            summarization of large textual data, and even image clustering. In
the role of legal publishers in this and, more generally, the impact           the legal domain, LDA has been used to study the agenda of the US
of automation on how the law is analyzed and applied.                          Supreme Court [27], the High Court of Australia [11] and the Court
    Still, the question of how to capture and visualize the devel-             of Justice of the European Union (CJEU) [15]. Winkels [58] used
opment of an entire legal sub-field like legal research on AI has              LDA to build a recommender system for Dutch case law. Panagis
barely been explored. One of the avenues of exploration has been               and Sadl [39] combined network analysis and LDA to study the
shaped by Bench-Capon et al., who have previously focused on                   case-law generative process of the CJEU.
the proceedings of the International Conference on AI and Law,                    LDA is a generative, probabilistic model for a collection of docu-
first held in 1987, to make a 25-year retrospective of the research            ments, which are represented as mixtures of latent topics, where
generated therein by describing the scholarship progress through               each topic is characterized by a distribution over all words of the
illustrative papers selected from various editions [4]. Other studies          collection of documents. The basic representation unit of the doc-
focused on traditional (systematic) literature reviews on the impact           uments is the word, i.e. all distinct terms are extracted from the
of AI on specific legal domains, such as administrative law [40] or            document collection along with their frequencies (per document),
intellectual property [24].                                                    which is the so called ‘Bag-of-Words’ model [41].
    However, given the limitations of legal databases or the way                  On a conceptual level, the algorithm tries to discover topics that
they are used by researchers, making comprehensive overviews of                can represent the collection of documents. Each document is gen-
existing literature remains a considerable hurdle. This is all the more        erated from a mixture of these topics and each topic is generated
so in the past four years, when the production of legal scholarship            from a probability vector (distribution) over all words. Assuming
on artificial intelligence seems to have grown considerably (see               such a generative model for any collection of documents, LDA’s
Figure 3 in the Appendix). This paper aims to fill this research               goal is to try and ‘backtrack’ this process, i.e. find a set of topics
gap by proposing and testing an unsupervised machine learning                  that are likely to have generated the whole document collection.
approach to the clustering of literature in this field, namely topic
modeling.                                                                      3.2.2 LDA Variants. In this paper, we used the model of Blei et al.
                                                                               [8], however we did explore other variations as well. Stevens et al.
                                                                               present a qualitative comparison of different topic model algorithms,
3 METHODOLOGY                                                                  [48] also using different evaluation metrics [31, 35] and conclude
                                                                               that, given the same data and the same number of topics, LDA is
3.1 Corpus                                                                     able to learn more coherent topics than the competing approaches.
The corpus includes a total of 3931 journal articles obtained from the            Nevertheless, there are extensions of the LDA model towards
HeinOnline database. Absent a centralised, comprehensive, open                 topic tracking over time [53, 56]. However, according to Wang et al.
access repository for international legal scholarship, we focused              [52], these methods deal with constant topics and the timestamps
on one of the commercially available databases. HeinOnline is one              are used for better discovery. Opposed to that, in a Blei et al. paper
of the leading international databases on legal materials, which               [7] a model for detection of evolving topics in a discrete time space
contains over 170 million pages of literature and indexing over                is presented (Dynamic Topic Modeling - DTM). Here, LDA is used
2700 law journals. In the section ‘Law Journal Library’, section               on topics aggregated in time epochs and a state space model handles
type ‘Articles’, the corpus covers literature available in the database        transitions of the topics from one epoch to another.
between 1960 and 2018. Unlike arXiv, HeinOnline does not have                     Bruggermann et al. applied DTM on the RCV1 Reuters corpus
a section on ‘artificial intelligence’. The total number of retrieved          (810.000 documents) with weekly time epochs [10]. Results showed
articles reflects the results of a boolean search using the keywords           that the variances within the topics among the time epochs are
‘artificial intelligence’, namely all articles which include both terms.       marginal. DTM still treats the corpus as a whole and the number of
Return of the AI: An Analysis of Legal Research on Artificial Intelligence Using Topic Modeling                           NLLP @ KDD 2020, August 24th, San Diego, US


topics is fixed over all time epochs. Chaney et al. introduce another                       highest weights in relation to the topic, and the labels the human
extension to LDA that detects events in a large text collection [12].                       coders assigned to the topics.
Their model adds separate probability distributions for defined                                 Three miscellaneous topics were identified: id21, id25, and id33.
entities and time intervals to the generative process. The model                            Both the inspection of the 20 words and the titles did not result in a
consists accordingly of general topics, entity-related topics and                           substantively meaningful label or description for these topics. It was
topics specific to time intervals. Events are detected as anomalies,                        decided to not remove the words in these topics and to not re-run
which are identified as temporary deviations from usual behavior.                           the topic modeling algorithm, as it was expected that the removal
Usual behavior refers to the topics discussed by the entities. An                           of words could introduce selection bias in the corpus. Moreover,
anomaly can be detected, whenever these entities change their                               the identification of the three miscellaneous topics does not affect
topics of discussion significantly and at the same time in a similar                        the relevance or interpretation of the other topics.
way.                                                                                            The results show a wide range of topics, varying from tax to
                                                                                            military technology to copyright. Within each topic, diversity could
3.2.3 Output of LDA. LDA can be applied on a corpus and the                                 still be observed. For example, for the topic of algorithmic decision-
output model provides the following distributions:                                          making and quantitative methods, paper titles such as ‘An FDA for
     • t𝑖 = {𝑡𝑖 𝑗 }: topic-word vector-distribution, where 𝑖 denotes                        Algorithms’ [49], ‘A Simple Guide to Machine Learning’ [54], and
       the topic (in total there are 𝑁 topics) and 𝑗 denotes the word                       ‘Lawyer as a Soothsayer: Exploring the Important Role of Outcome
       (in total there are 𝑃 words in the collection). The component                        Prediction in the Practice of Law’ [38] can be observed.
       𝑡𝑖 𝑗 shows the relative weight of word 𝑗 in topic 𝑖.                                     An important issue concerned the initial selection of journal arti-
     • d𝑘 = {𝑑𝑘𝑖 }: document-topic vector-distribution, where 𝑖                             cles. The articles were selected based on the search string ‘artificial
       denotes the topic (out of the total 𝑁 topics) and 𝑘 denotes a                        intelligence’. The number of occurrences of this string presumably
       document. The component 𝑑𝑘𝑖 shows the relative weight of                             is, however, not an entirely accurate proxy for measuring whether
       topic 𝑖 in document 𝑘.                                                               the paper actually is about artificial intelligence. It might be that
                                                                                            papers on public policy or competition & markets mention the term
4 RESULTS AND ANALYSIS                                                                      ‘artificial intelligence’, but do not primarily focus on artificial in-
                                                                                            telligence or even on technology or digital matters. An additional
4.1 Topics in Legal Research on AI (What?)                                                  selection was therefore required. Consequently, three of the re-
Pre-processing of the corpus involved filtering for articles written                        searchers (authors) independently went through the topic list and
only in the English language and afterwards removing common                                 the related words as displayed in Table 1 in order to determine
English stop-words (the, of, etc.) as well as removing some very                            whether the labels and words are likely to be related to artificial
common corpus-specific words (subject, supra, part, etc.). More-                            intelligence, digital, and/or technology. A perfect agreement could
over, we considered the presence of either unigrams (one words)                             be observed for the vast majority of topics. In the few instances
or bigrams (two words) so as to be able to capture some concepts                            of disagreement between the coders, the disagreements were re-
like ‘dispute resolution’.                                                                  solved through a brief discussion. Ultimately, a total number of 18
    Subsequently, LDA was used to identify topics in the corpus.                            topics was selected (Table 1, in bold). The 18 topics were used in
For identifying the number of topics (𝑁 ), we used perplexity [51]                          subsequent analyses.
and coherence [31] measures on a held-out set. We started from
 𝑁 = 5 and increased it (step 5) till 𝑁 = 100. A plateau in both
perplexity (178.4) and coherence (0.51) could be observed around
                                                                                            4.2    How Did Legal Research on AI Evolve
35 or 40 topics, which suggests a computationally optimal number                                   (What - When)?
of topics. Based on this, topic models with 30, 35, 40, and 45 topics                       To answer this question we needed to determine which of the 18
were explored. Two legal researchers inspected the results of the                           topics that were selected in the previous section, have gained or lost
various topic models in order to determine which number of topics                           attention over time. The first step is to extract the dominant topics
were substantively the most meaningful. For this, the 20 terms with                         of each paper by using the document-topic vectors 𝑑 (as described
the highest weights for each topic were provided to the researchers                         in section 3.2.3). More specifically, we denoted for each document 𝑘
(e.g. ‘weapon’, ‘system’, ‘military’, ‘international’, ‘war’ etc.). Based                   the first three dominant topics 𝑖 1−3 , based on their relative weights
on this evaluation, a topic model was selected that consisted of 35                         (contributions) in descending order in the document-topic vector d𝑘 .
topics.                                                                                     To assess the relevance of each document’s topics, two researchers
    The topic validation consisted of two steps. First, three researchers                   manually reviewed the first five topics – sorted by means of contri-
inspected the 20 terms for each topic in order to label the topic                           bution – from a sample of documents. They agreed that for most
(e.g. ‘military technology’). Second, paper titles were inspected to                        documents, the relevance dropped significantly after the third topic,
determine whether the paper titles supported the assigned label                             since the latter topics provide no substantive contribution to the
- if not, the label would, if possible, be adjusted. In this respect,                       paper. Based on the results of the inspection, the first three topics
the researchers were presented with a list of paper titles for each                         were denoted as dominant. We define the frequency of a topic as
topic. The validation process indicated that the vast majority of the                       the number of times that a topic is dominant in all articles.
LDA-produced topics were substantively meaningful.                                             Linking the topic frequencies to the year of publication, the count
    Table 1 reveals the topics distinguished by the LDA model. It                           of papers addressing each topic every year was computed. Figure 1
includes the topic IDs (not meaningful), the ten terms that have the                        depicts how the 18 selected topics evolved over time (1960 - 2018).
NLLP @ KDD 2020, August 24th, San Diego, US                                                                                                      Rosca and Covrig, et al.


                                                         Table 1: All 35 topics identified by LDA

                 id    words                                                                                 labels
                 0     speech, public, amendment, court, medium, political, free, government, con-           freedom of expression
                       tent, freedom
                 1     internet, network, computer, communication, cyberspace, technol-                      Internet governance
                       ogy, user, access, virtual, service
                 2     weapon, system, military, international, war, human, state, attack, tar-              military technology
                       get, autonomous_weapon
                 3     datum, information, privacy, personal, protection, data, individual,                  privacy
                       consumer, big, user
                 4     work, company, employee, corporate, worker, labor, business, employer, cor-           labour/corporate
                       poration, economic
                 5     lawyer, legal, client, firm, service, practice, attorney, profession, professional,   legal practice
                       work
                 6     student, school, legal, education, learn, university, skill, practice, teach, re-     legal education
                       search
                 7     state, act, agency, public, government, federal, policy, congress, rule, national     public policy
                 8     environmental, space, energy, nanotechnology, water, land, risk, plan,                environment/space
                       air, include
                 9     patent, claim, invention, method, application, process, art, inventor,                patents
                       court, technology
                 10    copyright, work, protection, author, program, copy, court, intellec-                  copyright
                       tual_property, fair, computer
                 11    cognitive, behavior, process, theory, make, people, action, social, men-              psychology & neuroscience
                       tal, mind
                 12    human, robot, machine, technology, artificial_intelligence, robotic,                  from humans to machines
                       agent, science, future, research
                 13    market, cost, consumer, service, economic, product, competition, price, trade-        competition & markets
                       mark, firm
                 14    datum, algorithm, model, decision, analysis, result, method, predic-                  algorithmic decision-making & quan-
                       tion, study, risk                                                                     titative methods
                 15    surveillance, privacy, government, search, police, enforcement,                       surveillance
                       fourth_amendment, court, information, intelligence
                 16    contract, party, electronic, agreement, agent, online, term, dispute,                 contract & dispute resolution
                       transaction, dispute_resolution
                 17    legal, theory, social, science, society, system, political, power, economic, form     legal theory/philosophy
                 18    evidence, probability, argument, theory, inference, case, reason, fact, expert,       evidence
                       scientific
                 19    international, state, country, european, national, article, trade, member, china,     international law & relations
                       global
                 20    medical, health, patient, physician, care, medicine, health_care, hospital, fda,      health
                       device
                 21    https, http, technology, www, online, user, pdf, digital, last_visit, platform        miscellaneous
                 22    person, human, child, life, moral, legal, animal, state, property, interest           personhood
                 23    tax, income, trust, taxpayer, property, asset, return, business, pay, interest        tax
                 24    legal, rule, case, system, reason, knowledge, base, model, argument,                  knowledge-based systems
                       fact
                 25    time, world, people, game, make, year, life, work, american, story                    miscellaneous
                 26    financial, market, bank, security, investor, regulation, risk, transac-               financial regulation & technology
                       tion, investment, trading
                 27    criminal, crime, police, sentence, justice, offender, sentencing, victim, commit,     crime
                       drug
                 28    technology, system, process, change, development, public, informa-                    regulation of innovation
                       tion, research, social, design
                 29    information, search, document, library, legal, research, database, case,              information retrieval
                       access, electronic
                 30    liability, vehicle, product, car, tort, autonomous, risk, safety, driver,             autonomous vehicles
                       manufacturer
                 31    court, case, judge, judicial, rule, justice, decision, opinion, trial, litigation     courts
                 32    software, code, license, open_source, source, program, standard, free,                software licensing
                       developer, computer
                 33    make, rule, problem, case, fact, question, reason, give, decision, view               miscellaneous
                 34    computer, system, program, information, software, user, technology,                   trends in legal technology
                       datum, expert, process
Return of the AI: An Analysis of Legal Research on Artificial Intelligence Using Topic Modeling                           NLLP @ KDD 2020, August 24th, San Diego, US


                                                                                                  Figure 2: Relative topic evolution across AI periods2


                                                                                            but interest in which saw several relatively insignificant changes
                           Figure 1: Topic river1                                           before the deep learning era, when its popularity underwent a
                                                                                            high increase.

   Figure 2 shows how the topics have evolved in relation to other                          4.3     Document Similarity
topics across six major periods in AI history (taken from [19]).
                                                                                            As mentioned in Section 3.2.3, each document 𝑘 is represented by a
   As a measure of relative topic popularity during a certain period,
                                                                                            vector d𝑘 , where each component of the vector shows the weight
the ratio of papers concerned with each topic to the total number of
                                                                                            of each topic in that specific document. For any pair of documents
papers written in a certain period is computed. Caution needs to be
                                                                                            we can use cosine similarity [42] as a measure of how close two
exercised when interpreting the relative topic popularity. Consider-
                                                                                            documents are, which in our case is translated to similarity in the
ing the relatively low number of publications before approximately
                                                                                            ‘topic space’, i.e. two very similar documents (cosine similarity of
1986 (see Figure 1), just before the second AI winter, not much
                                                                                            value 1) are expected to have similar topic distributions.
weight should be given to the topic popularity in those years, as
                                                                                               The detection of publications that have similar topics enhances
small changes in the number of publications can have substantial
                                                                                            information retrieval and reduces the risk of reproduction of schol-
impact on the popularity of one topic relative to other topics.
                                                                                            arship because prior publications are overlooked. Having computed
   For example, as visualized in Figure 2, scholarly output produced
                                                                                            the similarity between all document pairs in our corpus, we ob-
during the first AI boom concerned, among others, knowledge-
                                                                                            tained some 7,436,296 similarity scores after adjusting the algorithm
based systems (13.16 %) and algorithmic decision-making and quan-
                                                                                            to avoid computing the similarity between the same document pair
titative methods (7.90 %), however, trends in legal technology and
                                                                                            twice.
information retrieval were prominent topics in this period, form-
                                                                                               To explore the substantive meaning of the ensuing similarity
ing the subject-matter of 45% and 21% of the papers, respectively.
                                                                                            scores, the results produced by the cosine similarity algorithm were
As time advances, topics like financial regulation & technology,
                                                                                            compared to substantive similarity as defined by a legal expert -
autonomous vehicles, surveillance, software licensing and Internet
                                                                                            one of the authors. The goal of the inspection was to explore (1) the
governance appear.
                                                                                            extent to which papers were similar and (2) whether differences
   Another illustration is reflected by the developments visible
                                                                                            in similarity scores produced by the cosine similarity measure are
around the deep learning era. The topics that gained popularity
                                                                                            substantively meaningful differences. For this, five papers for differ-
(compared to the previous period, namely the third AI boom) with
                                                                                            ent topics were selected - the seed papers. Each of these five seed
rise of deep learning are regulation of innovation, privacy, algorith-
                                                                                            papers were compared with five papers with different similarity
mic decision-making and quantitative methods, financial regulation &
                                                                                            scores - the comparison papers: one similarity score in the .55-.65
technology, military technology and autonomous vehicles. The largest
                                                                                            range, one in the .65-.75 range, one in the .75-.85 range, one in the
decreases in popularity in this period are those of knowledge-based
                                                                                            .85-.95 range, and one in the .95-1.00 range. As a result, a total of
systems and trends in legal technology. The popularity of the other
                                                                                            five seed papers and 25 comparison papers were inspected.
topics underwent popularity changes below the average for this
                                                                                               Without knowledge as to which similarity range the comparison
period. The interest in most topics grew during the deep learning
                                                                                            paper would fall under, the legal expert ranked the similarity for
era.
                                                                                            each pair of papers (seed paper - comparison paper) for each topic
   In addition, we observed several interesting trends regarding the
                                                                                            separately. A Spearman’s rank correlation test showed a high cor-
evolution of pairs of topics (e.g. algorithmic decision-making and
                                                                                            relation (r = .62, p < .001) between the ranks based on the machine-
quantitative methods with knowledge-based systems and information
                                                                                            generated similarity scores and the ranks provided by the expert.
retrieval with trends in legal technology), as well as individual topics,
                                                                                            The comparison took place on four levels: the paper title (i.e. to
e.g. privacy, which emerged as a topic during the first AI winter,
                                                                                            what extent do the paper titles suggest similarity?), the research
1 The label ‘ADM and quantitative methods’ refers to the topic algorithmic decision-        question level (i.e. are the research questions similar?), the focus
making and quantitative methods (id14).                                                     or sub-topic level (i.e. which sub-topics does the paper focus on,
NLLP @ KDD 2020, August 24th, San Diego, US                                                                                          Rosca and Covrig, et al.


which angle or perspective does it take?), and the citation level (i.e.     highest similarity scores. The similarity predicted by the machine
is there a reference from one paper to the other?).                         did, however, also sometimes deviate from the expert assessment,
    The inspection of the paper suggests that papers with similarity        although there will undoubtedly also be disagreement between
scores of .85 or lower may share substantive similarity, but in a           human experts when assessing the similarity of documents.
limited way at best (e.g. pairs of publications where one article [14]         The results do not only provide insight into how a legal research
focuses on whether (source) code is or should be copyright protected        within a broader theme has evolved over time, they also provide
and the other paper [9] on whether results produced by machines             support for the development of LDA tools that assist in structuring
are or should be subject to copyright protection). In contrast, articles    large document collections and in finding relevant papers. To aid
with high similarity scores, particularly those with a score of .90 or      researchers in either exploring or keeping up with such vast and
higher, are also substantively similar, at least in case of the inspected   complex research themes explored in parts of legal scholarship
papers. For instance, two selected papers on humanitarian law with          which do not necessarily overlap or interact, it is necessary to
the highest similarity score ([44] and [45]) discuss legal principles       consider how a method such as that presented in this paper (topic
such as proportionality and necessity, and they discuss the role of         modeling) can be used to further visualize this body of legal research.
subjectivity and the capability of autonomous weapons systems               To this extent, we are currently working on a dashboard which
(similarity in the >.95 range)                                              we aim to make available to scholars (from any discipline) with
    When inspecting citations between papers with very high simi-           an interest in exploring the evolution of legal literature on AI, or
larity scores, references from one paper to another were sometimes          particular topics within it. Such a dashboard would make it easier,
found, and sometimes not. In some instances when no citation was            on the one hand, for legal researchers to get the bigger picture of
found this could be due to the fact that papers were published in           all the fields of law / journals / scholars tackling topics of interest
subsequent years and authors did not have the opportunity to cite           relating to AI, and on the other hand for researchers from other
the published version of the similar paper. Nevertheless, there are         disciplines (e.g. computer science) to have a bird’s eye view on
instances where a reference was expected in the papers, but not             the vast legal literature which might have a direct impact on their
found.                                                                      research. Such publicly available resources could even be a new
                                                                            way to stimulate more awareness of legal and ethical implications
                                                                            of technology on society, and be of interest to civil society as well.
5    DISCUSSION                                                                Of course, this paper does not go without limitations. First, the
The landscape of legal research on AI has undergone considerable            corpus is not necessarily representative of all legal articles or legal
changes with respect to the volume of scholarship tackling topics           publications in general. Although HeinOnline papers do presum-
dealing with AI. In this context, we were interested in what exact          ably constitute a significant part of journal articles in law, there
topics authors focused on, and how these topics shifted over time.          are other publishers and repositories that contain a number of
For this, we applied LDA topic modeling to identify topics and              publications that may composed substantively different than the
analyze how journal articles were distributed across topics as well         publications in HeinOnline. Second, another limitation regarding
as across time (1960-2018).                                                 the corpus concerns the initial selection. We selected journal ar-
   The main finding for the first research question, namely what            ticles that included the keywords ‘artificial intelligence’. Had we
topics can be distinguished in the corpus of legal papers on AI,            selected different keywords, the corpus might have looked differ-
is that 35 topics can be identified, 32 of them being meaningful            ently. Additionally, different topics can be expected when running
(and three miscellaneous topics). Overall, the model performed              additional topic models for a subset of the corpus. Furthermore, the
considerably well in identifying latent topics in our corpus (see           analyses that explored the increase or decrease of publications over
Table 1 above).                                                             time used the three most dominant topics to determine whether
   The second research question dealt with the evolution of topics          topics have become less or more relevant. The results might be
throughout the different periods which can be identified in the             different if also less relevant topics are taken into consideration,
history of AI. The topic river displayed in Figure 1 above shows            although such analyses would be empirically difficult to conduct, as
fundamental changes in legal research on AI from two perspectives:          it would require a measure to determine when certain topic weights
on the one hand, the total number of papers referring to artificial         are deemed insufficiently relevant.
intelligence sees a sharp increase since 2016, and on the other hand,          We are currently exploring the possibility of applying and com-
the diversity of topics also increases throughout time. This can be         paring different topic modeling algorithms and incorporate the
mostly contextualized by the occurrence of new technologies (e.g.           latest NLP representation models (namely word embeddings ei-
the Internet, leading to a new topic on Internet governance), but also      ther using word2vec or BERT). Moreover, we want to apply our
by the granular development of existing technologies.                       methodology to a different legal collection and identify whether
   As for the third research question, namely how similar papers can        our findings can be confirmed in a different corpus. Finally, after
be detected, we further calculated similarity scores between pairs          fully validating our approach, we are planning to release it as an
of articles and compared the scores for a selection of pairs to the         open source website where people can explore and visualize our
scores produced by humans. Consequently, it was explored whether            findings in an intuitive way.
the similarity scores produced by the machine coincide with the
expert assessment. A correlation test revealed a high correlation           REFERENCES
between the orders produced by the machines and by the human.                [1] Olufunmilayo Arewa. 2006. Open Access in a Closed Universe: Lexis, Westlaw,
The highest agreement levels were found for the papers with the                  Law Schools, and the Legal Information Market. (03 2006).
Return of the AI: An Analysis of Legal Research on Artificial Intelligence Using Topic Modeling                                   NLLP @ KDD 2020, August 24th, San Diego, US


 [2] Claus Boye Asmussen and Charles Møller. 2019. Smart literature review: a                    Commission (Dec. 2019). https://ec.europa.eu/jrc/en/publication/intellectual-
     practical topic modelling approach to exploratory literature review. Journal of             property-and-artificial-intelligence-literature-review
     Big Data 6, 1 (2019), 93. https://doi.org/10.1186/s40537-019-0255-7                    [25] Aaron Kirschenfeld. 2017. Yellow Flag Fever: Describing Negative Legal Precedent
 [3] Steven Barkan. 1992. Can Law Publishers Change the Law? Legal Reference                     in Citators. https://doi.org/10.31228/osf.io/dfjah
     Services Quarterly 11 (03 1992), 29–35. https://doi.org/10.1300/J113v11n03_05          [26] Melanie Knapp and Rob Willey. 2016. Comparison of Research Speed and Accu-
 [4] Trevor Bench-Capon, Michal Araszkiewicz, Kevin Ashley, Katie Atkinson, Floris               racy Using WestlawNext and Lexis Advance. Legal Reference Services Quarterly
     Bex, Filipe Borges, Daniele Bourcier, Paul Bourgine, Jack G. Conrad, Enrico                 35 (05 2016), 1–11. https://doi.org/10.1080/0270319X.2016.1177428
     Francesconi, Thomas F. Gordon, Guido Governatori, Jochen L. Leidner, David D.          [27] Michael A Livermore, Allen Riddell, and Daniel Rockmore. 2016. Agenda forma-
     Lewis, Ronald P. Loui, L. Thorne McCarty, Henry Prakken, Frank Schilder, Erich              tion and the US supreme court: A topic model approach. Arizona Law Review 1,
     Schweighofer, Paul Thompson, Alex Tyrrell, Bart Verheij, Douglas N. Walton,                 2 (2016).
     and Adam Z. Wyner. 2012. A history of AI and Law in 50 papers: 25?years of             [28] Peter Maggs. 1994. Legal Data Banks in the United States and Their Use in
     the international conference on AI and Law. Artificial Intelligence and Law 20, 3           Comparative Law. International Journal of Legal Information 22 (01 1994), 214–
     (2012), 215–319. https://doi.org/10.1007/s10506-012-9131-x                                  227. https://doi.org/10.1017/S0731126500024926
 [5] Robert C. Berring. 1986. Full-Text Databases and Legal Research: Backing into          [29] Daniel Maier, A. Waldherr, P. Miltner, G. Wiedemann, A. Niekler, A. Kein-
     the Future.                                                                                 ert, B. Pfetsch, G. Heyer, U. Reber, T. Häussler, H. Schmid-Petri, and S.
 [6] Jon Bing. 1986. The text retrieval system as a conversion partner. International            Adam. 2018. Applying LDA Topic Modeling in Communication Research: To-
     Review of Law, Computers & Technology 2, 1 (1986), 25–39. https://doi.org/10.               ward a Valid and Reliable Methodology. Communication Methods and Mea-
     1080/13600869.1986.9966228 arXiv:https://doi.org/10.1080/13600869.1986.9966228              sures 12, 2-3 (2018), 93–118. https://doi.org/10.1080/19312458.2018.1430754
 [7] David M. Blei and John D. Lafferty. 2006. Dynamic Topic Models. In Proceedings of           arXiv:https://doi.org/10.1080/19312458.2018.1430754
     the 23rd International Conference on Machine Learning (Pittsburgh, Pennsylvania,       [30] Elizabeth M. McKenzie. 2001. Natural Language Searching. Legal Reference
     USA) (ICML ’06). ACM, New York, NY, USA, 113–120. https://doi.org/10.1145/                  Services Quarterly 18, 4 (2001), 39–47. https://doi.org/10.1300/J113v18n04_04
     1143844.1143859                                                                             arXiv:https://doi.org/10.1300/J113v18n040 4
 [8] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet             [31] David Mimno, Hanna M Wallach, Edmund Talley, Miriam Leenders, and Andrew
     Allocation. J. Mach. Learn. Res. 3 (March 2003), 993–1022. http://dl.acm.org/               McCallum. 2011. Optimizing semantic coherence in topic models. In Proceedings
     citation.cfm?id=944919.944937                                                               of the conference on empirical methods in natural language processing. Association
 [9] A. Bridy. 2012. Coding Creativity: Copyright and the Artificially Intelligent               for Computational Linguistics, 262–272.
     Author. Stanford Technology Law Review 5 (2012), 1–28.                                 [32] A Moens. 2009. Is AustLII Sustainable? Australian Law Librarian,17(3), 154-157
[10] Daniel Brüggermann, Yannik Hermey, Carsten Orth, Darius Schneider, Stefan                   (2009).
     Selzer, and Gerasimos Spanakis. 2016. Storyline detection and tracking using           [33] Marie-Francine Moens, Maarten Logghe, and Jos Dumortier. 2002. Legislative
     Dynamic Latent Dirichlet Allocation. Computing News Storylines (2016), 9.                   Databases: Current Problems and Possible Solutions. International Journal of
[11] David J Carter, James Brown, and Adel Rahmani. 2016. Reading the high court at              Law and Information Technology 10 (03 2002). https://doi.org/10.1093/ijlit/10.1.1
     a distance: Topic modelling the legal subject matter and judicial activity of the      [34] Elizabeth Moll-Willard. 2018. The use and perceptions of open Access resources
     high court of australia, 1903-2015. UNSWLJ 39 (2016), 1300.                                 by legal academics at the University of Cape Town (UCT) in South Africa. 6 (09
[12] Allison June-Barlow Chaney, Hanna M Wallach, Matthew Connelly, and David M                  2018), 1–13.
     Blei. 2016. Detecting and Characterizing Events.. In EMNLP. 1142–1152.                 [35] David Newman, Youn Noh, Edmund Talley, Sarvnaz Karimi, and Timothy Baldwin.
[13] Laura Dietz, Bhaskar Mitra, Jeremy Pickens, Hana Anber, Sandeep Avula,                      2010. Evaluating topic models for digital libraries. In Proceedings of the 10th annual
     Asia J. Biega, Adrian Boteanu, Shubham Chatterjee, Jeff Dalton, Shiri Dori-                 joint conference on Digital libraries. ACM, 215–224.
     Hacohen, John Foley, Henry Feild, Ben Gamari, Rosie Jones, Pallika Kanani,             [36] P. Ogden. 1993. "Mastering the lawless science of our law": A story of legal
     Sumanta Kashyapi, Widad Machmouchi, Matthew Mitsui, Steve Nole, Alexan-                     citation indexes. Law Library Journal 85 (01 1993), 1–48.
     dre Tachard Passos, Jordan Ramsdell, Adam Roegiest, David Smith, and                   [37] Marc Opijnen, Hayo Schreijer, Ilja Andreas, and Maarten Kroon. 2015. Spe-
     Alessandro Sordoni. 2019. Report on the First HIPstIR Workshop on the                       cialised Government Publishing: The Law Pocket and Linked Legal Data in the
     Future of Information Retrieval. SIGIR Forum 53, 2 (December 2019), 62–                     Netherlands.
     75. https://www.microsoft.com/en-us/research/publication/report-on-the-first-          [38] Mark K. Osbeck. 2018. Lawyer as Soothsayer: Exploring the Important Role of
     hipstir-workshop-on-the-future-of-information-retrieval/                                    Outcome Prediction in the Practice of Law. Penn State Law Review 123, 1 (2018),
[14] R. Dixon. 2003. Breaking into locked rooms to access computer source code:                  41–102.
     Does the dmca violate constitutional mandate when technological barriers of            [39] Yannis Panagis and Urska Sadl. 2015. The Force of EU Case Law: A Multi-
     access are applied to software. Virginia Journal of Law & Technology 8, 1 (2003),           dimensional Study of Case Citations.. In JURIX. 71–80.
     1–60.                                                                                  [40] João Reis, Paula Espírito Santo, and Nuno Melão. 2019. Impacts of Artificial
[15] Arthur Dyevre and Nicolas Lampach. 2018. Issue Attention on the European                    Intelligence on Public Administration: A Systematic Literature Review. In 2019
     Court of Justice: A Text-Mining Approach. SSRN (2018). http://dx.doi.org/10.                14th Iberian Conference on Information Systems and Technologies (CISTI). IEEE,
     2139/ssrn.3251186                                                                           1–7.
[16] Ian Edwards. 2018. Search like a robot: Developing targeted search algorithms.         [41] Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in
     Australian Law Librarian 26, 2 (2018), 104.                                                 automatic text retrieval. Information processing & management 24, 5 (1988),
[17] Daphne Gelbart and J. Smith. 1993. Automating the Process of Abstracting Legal              513–523.
     Cases. International Journal of Law and Information Technology 1 (03 1993),            [42] G. Salton, A. Wong, and C. S. Yang. 1975. A Vector Space Model for Automatic
     324–334. https://doi.org/10.1093/ijlit/1.3.324                                              Indexing. Commun. ACM 18, 11 (Nov. 1975), 613–620. https://doi.org/10.1145/
[18] Daphne Gelbart and J. C. Smith. 1994. The application of automated text process-            361219.361220
     ing techniques to legal text management. International Review of Law, Computers        [43] Pamela Samuelson. 2010. Google Book Search and the Future of Books in Cy-
     & Technology 8, 1 (1994), 203–210. https://doi.org/10.1080/13600869.1994.9966390            berspace. Minnesota law review 94 (01 2010).
     arXiv:https://doi.org/10.1080/13600869.1994.9966390                                    [44] M. Sassoli. 2014. Autonomous Weapons and International Humanitarian Law: Ad-
[19] Catalina Goanta, Gijs van Dijck, and Gerasimos Spanakis. 2019. Back to the                  vantages, Open Technical Questions and Legal Issues to Be Clarified. International
     Future: Waves of Legal Scholarship on Artificial Intelligence. Forthcoming in Sofia         Law Studies. US Naval War College 90 (2014), 308–340.
     Ranchordás and Yaniv Roznai, Time, Law and Change (Oxford, Hart Publishing,            [45] Michael Schmitt and Jeffrey Thurnher. 2013. ’Out of the Loop’: Autonomous
     2019) (2019).                                                                               Weapon Systems and the Law of Armed Conflict. Harvard National Security
[20] Graham Greenleaf, Daniel Austin, Philip Chung, Andrew Mowbray, Madeleine                    Journal 4, 02 (2013), 231–281.
     Davis, and Jill Matthews. 2000. Solving the Problems of Finding Law on the Web:        [46] Cecilia Magnusson Sjóberg. 1997.             Corpus Legis: A Legal Document
     World Law and DIAL. Journal of Information, Law and Technology 2000 (01 2000).              Management Project.           International Journal of Law and Information
     https://doi.org/10.1017/S0731126500009483                                                   Technology 5, 1 (03 1997), 83–99.                https://doi.org/10.1093/ijlit/5.1.83
[21] Graham Greenleaf, Andrew Mowbray, Geoffrey King, and Geoffrey van Dijk. 1995.               arXiv:https://academic.oup.com/ijlit/article-pdf/5/1/83/9820642/83.pdf
     Public Access to Law via Internet: The Australian Legal Information Institute.         [47] James A. Sprowl. 1976. Computer-Assisted Legal Research—An Analysis
     Journal of Law and Information Science, 6(1), 50 (1995).                                    of Full-Text Document Retrieval Systems, Particularly the LEXIS System.
[22] F. Hanson. 2002. From Key Numbers to Keywords: How Automation Has Trans-                    Law & Social Inquiry 1, 1 (1976), 175–226. https://doi.org/10.1111/j.1747-
     formed the Law. Law Library Journal 94 (09 2002).                                           4469.1976.tb00955.x arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1747-
[23] Karen Hao. 2019. We analyzed 16,625 papers to figure out where AI is headed                 4469.1976.tb00955.x
     next. https://www.technologyreview.com/s/612768/we-analyzed-16625-papers-              [48] Keith Stevens, Philip Kegelmeyer, David Andrzejewski, and David Buttler. 2012.
     to-figure-out-where-ai-is-headed-next/                                                      Exploring topic coherence over many models and many topics. In Proceedings of
[24] Maria Iglesias, Sharon Shamuilia, and Amanda Anderberg. 2019. Intellectual Prop-            the 2012 Joint Conference on Empirical Methods in Natural Language Processing
     erty and Artificial Intelligence - A literature review. EU Science Hub - European           and Computational Natural Language Learning. Association for Computational
NLLP @ KDD 2020, August 24th, San Diego, US                                                                                        Rosca and Covrig, et al.


     Linguistics, 952–961.
[49] Andrew Tutt. 2017. An FDA for Algorithms. Administrative Law Review 69, 1
     (2017), 83–123.
[50] Marc van Opijnen. 2017. Gaining Momentum. How ECLI Improves Access to
     Case Law in Europe.
[51] Hanna M Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno. 2009.
     Evaluation methods for topic models. In Proceedings of the 26th annual interna-
     tional conference on machine learning. 1105–1112.
[52] Chong Wang, David M. Blei, and David Heckerman. 2008. Continuous Time
     Dynamic Topic Models.. In UAI, David A. McAllester and Petri Myllymäki
     (Eds.). AUAI Press, 579–586. http://dblp.uni-trier.de/db/conf/uai/uai2008.html#
     WangBH08
[53] Xuerui Wang and Andrew McCallum. 2006. Topics over Time: A non-Markov
     Continuous-time Model of Topical Trends. In Proceedings of the 12th ACM SIGKDD
     International Conference on Knowledge Discovery and Data Mining (Philadelphia,
     PA, USA) (KDD ’06). ACM, New York, NY, USA, 424–433. https://doi.org/10.
     1145/1150402.1150450
[54] E. Warren. 2017-2018. A Simple Guide to Machine Learning. SciTech Lawyer 14,
     1 (2017-2018), 5–9.
[55] Clemens Wass. 2017. openlaws.eu – Building Your Personal Legal Network.
[56] Xing Wei, Jimeng Sun, and Xuerui Wang. 2007. Dynamic Mixture Models for
     Multiple Time-Series.. In IJCAI (2007-03-05), Manuela M. Veloso (Ed.). 2909–2914.     Figure 4: Distribution of the number of authors over the
     http://dblp.uni-trier.de/db/conf/ijcai/ijcai2007.html#WeiSW07
[57] Robin Widdison. 2002. New Perspectives in Legal Information Retrieval. In-
                                                                                           number of documents
     ternational Journal of Law and Information Technology 10, 1 (01 2002), 41–70.
     https://doi.org/10.1093/ijlit/10.1.41 arXiv:https://academic.oup.com/ijlit/article-
     pdf/10/1/41/2065390/100041.pdf
[58] R Winkels. 2015. Experiments in finding relevant case law. In NAIL 2015: 3rd
     International Workshop on Network Analysis in Law.


A     APPENDIX


Figure 3: Number of journal articles with keywords per year
                                                                                           Figure 5: Number of publications per author for authors
                                                                                           with at least five publications