<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Unveiling Themes in Judicial Proceedings: A Cross-Country Study Using Topic Modeling on Legal Documents from India and the UK</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Krish Didwania</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Durga Toshniwal</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amit Agarwal</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education</institution>
          ,
          <addr-line>Manipal</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, Indian Institute of Technology</institution>
          ,
          <addr-line>Roorkee</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Professor, Department of Computer Science, Indian Institute of Technology</institution>
          ,
          <addr-line>Roorkee</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Legal documents are indispensable in every country for legal practices and serve as the primary source of information regarding previous cases and employed statutes. In today's world, with an increasing number of judicial cases, it is crucial to systematically categorize past cases into subgroups, which can then be utilized for upcoming cases and practices. Our primary focus in this endeavor was to annotate cases using topic modeling algorithms such as Latent Dirichlet Allocation, Non-Negative Matrix Factorization, and BerTopic for a collection of lengthy legal documents from India and the UK. This step is crucial for distinguishing the generated labels between the two countries, highlighting the diferences in the types of cases that arise in each jurisdiction. Furthermore, an analysis of the timeline of cases from India was conducted to discern the evolution of dominant topics over the years.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Topic modeling</kwd>
        <kwd>Unsupervised learning</kwd>
        <kwd>Judicial system</kwd>
        <kwd>Long legal documents</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        A legal document holds paramount importance as a written testament, encapsulating
contractual agreements, commitments, and legally binding actions. Renowned for their meticulous
construction by legal experts, these documents ensure precision and accuracy [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In this study,
we delve into a collection of legal documents centered around court cases, capturing the
intricate proceedings, decisions, and rulings within the judicial system. Serving as comprehensive
records of legal disputes brought before courts, they document involved parties, issues, and
judicial outcomes [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Through these documents, one can trace the evolution of legal arguments,
evidence presentation, and law application in addressing case complexities.
      </p>
      <p>
        In today’s era, abundant legal documents from accumulated judicial proceedings provide a
vast data repository. The Supreme Court of India has witnessed significant developments in
case disposal rates. In 2023, the apex court disposed of 52,191 cases, marking a 33% increase
compared to the previous year’s count of 39,800 cases. This achievement represents the highest
disposal rate in the past six years. Our primary goal is to explore strategies leveraging this
wealth of data to support future legal proceedings. Topic modeling emerges as a pivotal tool in
this endeavor, automatically identifying underlying themes or topics within extensive document
collections [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. By analyzing word distributions across documents, topic models eliminate the
need for manual annotation, ofering an eficient means to organize, explore, and index large
datasets.
      </p>
      <p>
        In our study, we employ topic modeling algorithms, including Latent Dirichlet Allocation
(LDA) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],Non-Negative Matrix Factorization(NMF)[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and BerTopic [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], to analyze legal
documents from India and the UK. Beyond topic modeling, our research includes an ablation study
examining judicial case types in both countries, comparing topic diferences and semantic
similarities. Additionally, we conduct a timeline analysis of Indian legal documents, observing
trends in dominant topic changes over the years.
      </p>
      <p>This research not only aims to understand the prevalent legal topics within these documents
but also seeks to provide insights into the dynamics of legal proceedings and the evolving
nature of legal discourse. Uncovering patterns and trends can enhance our understanding of
legal systems and inform future legal practices and policies. Through this multidimensional
analysis, we aim to contribute to the ongoing dialogue surrounding legal document analysis
and its implications for the legal profession and society at large.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Previous research underscores the significance of legal documents and their widespread
implementation. Many of these studies focus on supervised learning techniques utilizing labels.
Shukla et al.[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] not only introduce the dataset used in our study but also provide summaries
of judgments or segment-wise details, including facts, statutes, and analysis, through various
supervised and unsupervised techniques. O et al.[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] concentrate on using topic models to
summarize and visualize British legislation to facilitate easier browsing and identification of
key legal topics and their associated terms. Wang et al.[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] demonstrate the efectiveness and
necessity of experiments to validate decision-making processes in the design, highlighting the
high performance of the LDA algorithm in measuring text similarity. Similarly, Carter et al.[10]
conduct similar experiments on legal documents from the High Courts of Australia, focusing on
case studies such as the Mabo litigation and the concept of ’proximity’ in tort law. Mohammadi
et al.[11] investigate the eficient handling of large-scale legal case law databases like Human
Rights Documentation[12], particularly focusing on Article 8 of the European Convention on
Human Rights, through topic modeling and citation networks. Kumar et al.[13] propose an
approach to generate concise summaries from legal judgments using topics obtained from LDA,
providing a notable method for summarization, especially as the first such approach for Indian
legal judgments.
      </p>
      <p>Priyadarshini et al.[14] address instability in topic modeling through an ensemble approach,
combining Semantic LDA and ensemble models, resulting in reduced processing time compared
to conventional methods for legal texts from the UK. Regarding the variety of algorithms
for topic modeling, Gonçalves et al.[15] conduct a systematic mapping study to classify and
analyze current literature, identifying trends and gaps in research areas and applied methods.
Additionally, eforts have been made to enhance the learning of topic models by proposing
regularization methods to improve coherence and interpretability, as suggested by Newman et
al[16].</p>
      <p>While prior studies have examined legal texts from individual countries, our research, to the
best of our knowledge, represents the first comparative study across multiple countries. Along
with this, no previous work has incorporated a timeline analysis of legal documents from India.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodolody</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>The dataset utilized in this study comprises three sections: Indian Abstractive, Indian Extractive,
and UK Abstractive cases.</p>
        <p>The Indian Abstractive dataset (IN-Abs) consists of Indian Supreme Court judgments obtained
from the Legal Information Institute of India website, totaling 7,130 case documents with
corresponding abstractive summaries. These documents have an average token, 5389 in length.</p>
        <p>The Indian Extractive dataset (IN-Ext) was curated based on feedback from legal experts
dissatisfied with the IN-Abs summaries. Two LLB graduates annotated rhetorical segments in
50 Indian Supreme Court case documents and provided extractive summaries for each segment.</p>
        <p>The UK Abstractive dataset (UK-Abs) comprises 793 case documents and their oficial press
summaries from the UK Supreme Court website, segmented into abstractive summaries. These
documents have average tokens, 14296 in length.</p>
        <p>Notably, abstractive and extractive case summaries were not utilized in this paper as topic
modeling employs unsupervised algorithms. During data examination, it was discovered that
Indian cases spanned from 1945 to 2020, while UK cases only covered the years 2009 to 2010.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Preprocessing</title>
        <p>In the preprocessing stage of topic modeling, we have employed some common practices to take
various steps aimed at ensuring the accuracy of the analysis. These steps include the removal
of stop words, which are commonly occurring words that contribute little semantic value and
may distort the results. Additionally, lemmatization has been applied to standardize words
by reducing them to their base or root form, thereby ensuring consistency among diferent
inflections of the same word[17].</p>
        <p>During the implementation LDA and NMF, due to the extensive length of the documents, we
eliminated frequently occurring common words found in judicial documents, especially those
present in more than half of all cases. This process aimed to reduce the influence of ubiquitous
terms during topic modeling, thus improving the distinctiveness and relevance of the resulting
topics, which closely align with the underlying themes of the corpus. This preprocessing step
significantly enhanced the quality of the outcomes. This procedure was not employed for
BerTopic as better sentence embeddings would be generated for meaningful sentences.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Topic Modelling</title>
        <p>This research work employed the following Topic Modelling algorithms for legal documents in
both datasets:
Latent Dirichlet Allocation (LDA): LDA is a probabilistic model widely used for topic
modeling in legal documents. It employs a two-step process: topic assignment and word
generation.LDA utilizes the term frequency-inverse document frequency (TF-IDF) [18] top rioritize
discriminative words in documents. In the context of our paper on legal document topic
modeling, LDA acts as an unsupervised learning algorithm, extracting hidden topics by iteratively
optimizing topic and word distributions to best explain the observed word occurrences. Overall,
LDA ofers a powerful approach for uncovering topics within legal documents, leveraging
TF-IDF and probabilistic principles to capture their latent structure efectively.</p>
        <p>Non-Negative Matrix Factorization (NMF): NMF is a dimensionality reduction technique
widely used in topic modeling. In the context of legal documents, NMF decomposes the
document-term matrix into two non-negative matrices: one representing topics and their
distributions across words, and the other representing documents and their distributions across
topics. This decomposition helps identify latent topics within the corpus of legal documents.
Unlike LDA, NMF does not assume a probabilistic model but rather aims to factorize the input
matrix into lower-dimensional matrices that capture meaningful patterns. In our application
of NMF to legal document topic modeling, we utilized the TF-IDF vectorization technique in
conjunction with NMF. TF-IDF is employed to transform the raw text data into a numerical
representation that highlights the importance of words in individual documents relative to their
occurrence across the entire corpus. The TF-IDF vectorization process assigns higher weights to
words that are frequent within a document but relatively rare across the entire corpus, thereby
emphasizing discriminative terms that are likely to be indicative of specific topics or themes.
NMF is particularly suitable for legal document analysis as it ensures that all resulting factors
are non-negative, which aligns well with the intuitive notion that topics and document-topic
distributions should not contain negative values. By iteratively optimizing these matrices, NMF
efectively extracts coherent topics that are interpretable in the context of legal terminology
and concepts.</p>
        <p>BerTopic: In our research, we utilize BerTopic, a topic modeling algorithm leveraging
pretrained BERT (Bidirectional Encoder Representations from Transformers) models to generate
document embeddings from legal documents. These embeddings capture semantic meaning and
are subsequently reduced in dimensionality using Uniform Manifold Approximation and
Projection (UMAP)[19]. UMAP preserves local and global structure, enabling eficient visualization
and analysis of high-dimensional data. We employ MiniBatchKMeans clustering with 50 clusters
to group similar documents, facilitating the identification of coherent topics. Preprocessing
techniques, including OnlineCountVectorizer and ClassTfidf Transformer with BM25 weighting,
enhance the quality and interpretability of resulting topics. To overcome the maximum input
sequence limit of 512 for models like SentenceBert[20], we segment input data into chunks,
aggregating topics from diferent chunks for comprehensive topic extraction. This integration of
UMAP with BerTopic enhances topic model interpretability and utility while eficiently handling
BERT’s input sequence limitation.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimentation</title>
      <sec id="sec-4-1">
        <title>4.1. Hyperparameter Tuning</title>
        <p>For LDA, hyperparameters are- the number of topics (),  (parameter controlling the sparsity
of document-topic distributions), and  (parameter controlling the sparsity of topic-word
distributions) were meticulously fine-tuned. We tested the model’s performance using various
combinations of hyperparameters, including diferent values of  and  ranging from 0.01 to
0.99, both symmetric and asymmetric priors, and a range of  values from 4 to 11. In the India
dataset, optimal hyperparameters were determined as  = 0.46,  = 0.91, and  = 7, while
for the UK dataset,  was set to asymmetric,  = 0.01, and  = 6 [21].</p>
        <p>In implementing NMF, we opted for the same number of topics as LDA, as this algorithm
requires less reliance on hyperparameter tuning. Furthermore, for BerTopic utilizing
SentenceBERT, specifying the number of topics beforehand is unnecessary. Instead, we adjusted other
parameters related to dimensionality reduction and clustering to achieve optimal performance.</p>
        <p>After determining the optimal , resulting topics underwent expert annotation by legal law
professionals. Expert annotations served as a vital validation mechanism, refining the topic
models by ensuring alignment with domain-specific nuances and requirements.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Evaluation Metrics</title>
        <p>Topic modeling is a powerful technique used to extract underlying themes or topics from
a collection of documents. However, assessing the quality of the topics generated by topic
modeling algorithms is essential for ensuring their utility and interpretability. Coherence
measures provide a quantitative assessment of the coherence and interpretability of topics by
evaluating the semantic similarity between words within topics[22]. It serves as a crucial metric
for evaluating the quality of topics and assisting in model selection. In this work, we have
evaluated all three models using two diferent coherence measures[23]:
a) C_V coherence: This coherence measure calculates coherence based on the cosine similarity
of word vectors. It evaluates the similarity between word pairs within topics by computing
the cosine of the angle between their corresponding word vectors. The c_v measure considers
both the intra-topic coherence,i.e., similarity between words within a topic, and inter-topic
coherence,i.e., similarity between words across diferent topics.</p>
        <p>=</p>
        <p>− 1
1 ∑︁ ∑︁

=1 =1</p>
        <p>similarity(, , ,+1)
√︁∑︀
=1 2, ×
∑︀=1 2</p>
        <p>,(+1)
similarity(w, , w,+1) =</p>
        <p>w, · w,+1
‖w, ‖ · ‖ w,+1‖
(1)
(2)</p>
        <p>Where  is the coherence score, is the number of topics, is the number of words in
topic ,, and ,+1 are two adjacent keywords in topic , similarity(, , ,+1) is the word
pair cosine similarity between , and ,+1.
b) U_MASS: The u_mass coherence measure quantifies coherence by measuring the pointwise
mutual information (PMI) between pairs of words. It computes the PMI between all word pairs
within topics and aggregates these scores to obtain the overall coherence score. The u_mass
measure assesses the semantic relatedness of words within topics based on their co-occurrence
in the corpus.</p>
        <p>It calculates how often two words,  and  , appear together in the corpus and it’s defined
as
(,  ) = log
(,  ) + 1
()
,
(3)
where (,  ) indicates how many times words  and  appear together in documents,
and () is how many times word  appeared alone. The greater the number, the better
the coherence score. Also, this measure isn’t symmetric, which means that (,  )
is not equal to ( , ). We calculate the global coherence of the topic as the average
pairwise coherence scores on the top  words that describe the topic.</p>
        <p>In the context of long document topic modeling, we have assigned utmost importance to the
u_mass coherence score. This metric holds significant weight as it evaluates the co-occurrence
of keywords associated with topics throughout the entirety of long documents[24].</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <sec id="sec-5-1">
        <title>5.1. Quantitative Analysis</title>
        <p>In our statistical analysis, as displayed in Table 1, we delved into the diverse array of topics
present within legal documents from both India and the UK.</p>
        <p>Notably, all three algorithms demonstrated semantic coherence within individual topics,
showcasing the inherent similarity among words within each topic. Additionally, the analysis
underscored the substantial diversity existing between diferent topics, indicating the richness
and complexity of legal discourse. The bar graphs in Figure 1 show the equal distribution of
documents throughout the topics generated from the LDA model. This balanced distribution
indicates that our model efectively mitigated the possibility of class imbalance, ensuring a
comprehensive representation of various legal themes and issues within the dataset[25].</p>
        <p>We observed that the LDA algorithm achieved the highest u_mass score, highlighting its
notable performance. This outcome shows the model’s eficacy in capturing the underlying
structure within lengthy legal texts.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Comparison in topics of India and UK</title>
        <p>As demonstrated in Table 2, we collaborated with a legal expert to assign annotated labels based
on the generated keywords for both datasets. These annotations also reveal diferences in the
predominant keywords between India and the UK.</p>
        <p>In the results section of legal documents focusing on topic modeling, it is worth noting the
significant distinctions observed in the keywords extracted from legal texts originating from
India and the UK [26]. This distinction emphasizes the notable diversity in the types of legal
cases encountered in the two countries which sheds light on the uniqueness of their respective
judicial systems and highlights the diferences in the legal landscape, practices, and priorities
between India and the UK [27].Such findings highlight the importance of considering regional
and jurisdictional nuances when analyzing legal texts and stress the necessity for customized
approaches in legal research and analysis. The heatmaps in Figure 2 (b) and (c) further confirm
the diversity and discrepancy among the topics generated individually for both countries, while
heatmap (d) illustrates the dissimilarity among the generated topics between India and the UK.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Timeline Analysis of Indian cases</title>
        <p>In this research, we also carried out a timeline analysis of Indian legal cases, examining the
evolving trends in the primary subject matter over successive years[28]. The dataset covering
Indian legal cases extended from 1945 to 2020, with the majority, constituting over 85%, gathered
between 1950 and 1990. We constructed a line graph as shown in Figure 3 illustrating document
counts over this time frame, with each topic’s involvement depicted distinctly.</p>
        <p>The generated graphs portray the temporal dynamics of topic prevalence within the dataset
spanning from 1950 to 1995. The graphs illustrate the trends for each topic, showcasing how
their prominence fluctuated over the years. Each line represents a specific topic, and the y-axis
indicates the number of documents pertaining to that topic. Both graphs together ofer a
comprehensive view of the thematic evolution within the corpus, shedding light on the shifts in
focus and thematic trends across the specified time frame.</p>
        <p>The surge in legal cases in India between 1950 and 1990 can be attributed to several
intertwined factors[29]. Firstly, the era witnessed significant legislative reforms, possibly leading
to confusion and disagreements that resulted in more disputes being brought to court. Rapid
economic development spurred increased commercial activities, which in turn likely generated a
higher number of legal conflicts over contracts, property rights, and taxation. Social and political
changes, alongside a burgeoning population, may have further fueled civil unrest and disputes.
Notably, the years 1955-1965 saw a peak in the number of cases, potentially influenced by the
civil war at the time and the introduction of broad-based economic liberalization characterized
by a blend of caprice, status quo-ism, and unfavorable economic conditions.</p>
        <p>Despite the inconclusive nature of the line graphs, one can still discern notable trends. For
instance, there is a marked upsurge in cases associated with income tax and trade regulations
during those years, while topics such as land rights and criminal cases exhibit a significant decline
after a specific period. The occurrences of industrial and property disputes and election cases
lfuctuated over time, experiencing periods of both surges and the absence of cases intermittently.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Works</title>
      <p>In this study, we have utilized multiple topic modeling algorithms to analyze legal judicial cases
from two countries: India and the UK. Within the legal domain, annotations of legal cases are
imperative, serving as valuable resources for future cases and referencing past statutes. Our
research demonstrates the efectiveness of employing topic modeling in automating annotation
tasks, wherein generated keywords facilitate the identification of relevant topics. Furthermore,
a noteworthy aspect of our study is the illustration of the disparities in the types of cases
prevalent in both countries, thereby shedding light on variations in living standards and legal
frameworks.</p>
      <p>In our upcoming eforts within this project, we aspire to delve into the utilization of
alternative transformer-based models and expansive language models, eliminating the necessity
for segmentation. This approach will enhance the precision in identifying the specific topics
relevant to each case. Moreover, our discoveries emphasize the vital need for a hierarchical
framework in topic modeling. Such a structure could prove invaluable in scenarios requiring
multi-label annotation, given that documents often relate to multiple subjects. Additionally, we
intend to extend our timeline analysis to include more recent years post-1990, overcoming the
limitations posed by the dataset’s constraints and providing insights into evolving topic trends.
[10] D. J. Carter, A. Rahmani, Proximity and neighbourhood: Using topic modelling to read
the development of law in the high court of australia, Monash University Law Review 45
(2019) 785–824.
[11] M. Mohammadi, L. M. Bruijn, M. Wieling, M. Vols, Combining topic modelling and citation
network analysis to study case law from the european court on human rights on the right
to respect for private and family life, 2024. arXiv:2401.16429.
[12] G. Woods, Human rights set sail from strasbourg (2017).
[13] R. Kumar, K. Raghuveer, Legal document summarization using latent dirichlet allocation,</p>
      <p>International Journal of Computer Science and Telecommunications 3 (2012) 8–23.
[14] R. Priyadarshini, et al., Ledocl: A semantic model for legal documents classification
using ensemble methods, Turkish Journal of Computer and Mathematics Education
(TURCOMAT) 12 (2021) 1899–1908.
[15] L. Gonçales, K. Farias, M. Scholl, T. C. Oliveira, M. Veronez, Model comparison: a systematic
mapping study., in: SEKE, 2015, pp. 546–551.
[16] D. Newman, E. V. Bonilla, W. Buntine, Improving topic coherence with regularized topic
models, Advances in neural information processing systems 24 (2011).
[17] J. W. Johnsen, K. Franke, The impact of preprocessing in natural language for open source
intelligence and criminal investigation, in: 2019 IEEE International Conference on Big
Data (Big Data), IEEE, 2019, pp. 4248–4254.
[18] H. Christian, M. P. Agus, D. Suhartono, Single document automatic text
summarization using term frequency-inverse document frequency (tf-idf), ComTech: Computer,
Mathematics and Engineering Applications 7 (2016) 285–294.
[19] L. McInnes, J. Healy, J. Melville, Umap: Uniform manifold approximation and projection
for dimension reduction, arXiv preprint arXiv:1802.03426 (2018).
[20] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks,
arXiv preprint arXiv:1908.10084 (2019).
[21] A. Panichella, A systematic comparison of search-based approaches for lda hyperparameter
tuning, Information and Software Technology 130 (2021) 106411.
[22] D. Mimno, H. Wallach, E. Talley, M. Leenders, A. McCallum, Optimizing semantic
coherence in topic models, in: Proceedings of the 2011 conference on empirical methods in
natural language processing, 2011, pp. 262–272.
[23] M. Röder, A. Both, A. Hinneburg, Exploring the space of topic coherence measures, in:
Proceedings of the eighth ACM international conference on Web search and data mining,
2015, pp. 399–408.
[24] L. R. Scheuter, Does it make sense? Analyzing coherence in longer fictional discourse on a
syntactic and semantic level, Master’s thesis, University of Twente, 2021.
[25] S. I. Nikolenko, S. Koltcov, O. Koltsova, Topic modelling for qualitative studies, Journal of</p>
      <p>Information Science 43 (2017) 88–102.
[26] E. Alexander, M. Gleicher, Task-driven comparison of topic models, IEEE transactions on
visualization and computer graphics 22 (2015) 320–329.
[27] T. Agrawal, Judicial review: A comparative study between usa, uk and india, Issue 5 Int’l</p>
      <p>JL Mgmt. &amp; Human. 5 (2022) 890.
[28] M. Linton, E. G. S. Teo, E. Bommes, C. Chen, W. K. Härdle, Dynamic topic modelling for
cryptocurrency community forums, Springer, 2017.
[29] B. Ghosh, S. Marjit, C. Neogi, Economic growth and regional divergence in india, 1960 to
1995, Economic and Political Weekly (1998) 1623–1630.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Borah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Biswas</surname>
          </string-name>
          ,
          <article-title>Summarization of legal documents: Where are we now and the way forward</article-title>
          ,
          <source>Computer Science Review</source>
          <volume>40</volume>
          (
          <year>2021</year>
          )
          <fpage>100388</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W. F.</given-names>
            <surname>Dodd</surname>
          </string-name>
          ,
          <article-title>Modern constitutions: a collection of the fundamental laws of twenty-two of the most important countries of the world, with historical and bibliographical notes</article-title>
          , volume
          <volume>2</volume>
          , University of Chicago Press,
          <year>1908</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Brookes</surname>
          </string-name>
          ,
          <string-name>
            <surname>T. McEnery</surname>
          </string-name>
          ,
          <article-title>The utility of topic modelling for discourse studies: A critical evaluation</article-title>
          ,
          <source>Discourse Studies</source>
          <volume>21</volume>
          (
          <year>2019</year>
          )
          <fpage>3</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Blei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. I. Jordan</surname>
          </string-name>
          ,
          <article-title>Latent dirichlet allocation</article-title>
          ,
          <source>Journal of machine Learning research 3</source>
          (
          <year>2003</year>
          )
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Seung</surname>
          </string-name>
          ,
          <article-title>Algorithms for non-negative matrix factorization</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>13</volume>
          (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Grootendorst</surname>
          </string-name>
          , Bertopic:
          <article-title>Neural topic modeling with a class-based tf-idf procedure</article-title>
          ,
          <source>arXiv preprint arXiv:2203.05794</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shukla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Poddar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Legal case document summarization: Extractive and abstractive methods and their evaluation</article-title>
          ,
          <source>arXiv preprint arXiv:2210.07544</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>J. O'Neill</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Robin</surname>
            ,
            <given-names>L. O</given-names>
          </string-name>
          <string-name>
            <surname>'Brien</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Buitelaar</surname>
          </string-name>
          ,
          <article-title>An analysis of topic modelling for legislative texts</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <article-title>Topic model based text similarity measure for chinese judgment document</article-title>
          ,
          <source>in: Data Science: Third International Conference of Pioneering Computer Scientists</source>
          , Engineers and Educators,
          <string-name>
            <surname>ICPCSEE</surname>
          </string-name>
          <year>2017</year>
          , Changsha, China,
          <source>September 22-24</source>
          ,
          <year>2017</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>42</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>