<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparative Study of Transformer-Based Models for Ambiguous Clause Detection in Legal Documents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vrishani Shah</string-name>
          <email>Vrishanishah20@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>MIT World Peace University</institution>
          ,
          <addr-line>Pune, Maharashtra</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <fpage>29</fpage>
      <lpage>30</lpage>
      <abstract>
        <p>The goal of this study is to comprehend several Natural Language Processing approaches for removing ambiguity from legal texts for readers who are not legal experts. Ambiguity refers to clauses having more than one meaning. This is significant in the legal field as words have more than one meaning in diferent contexts and they may defer from the general meaning in English Language. e.g. 'consideration' as opposed to the word 'considerate'. Ambiguity can cause misleading information like wrong interpretation or incorrect translation between languages. Various approaches are evaluated in order to determine which is the most efective way to find ambiguity in legal texts. We investigate sentence-context aware word transformers like BERT, Longformer and RoBERTa, which tell us the meaning of a word in a given sentence. Additionally, growing usage of Artificial Intelligence in the legal field will make it possible to train accurate models based on a legal document specific datasets. This study compares various NLP approaches to determine which one is most efective in identifying ambiguity in legal documents.</p>
      </abstract>
      <kwd-group>
        <kwd>Ambiguity detection</kwd>
        <kwd>context understanding</kwd>
        <kwd>Longformer</kwd>
        <kwd>BERT</kwd>
        <kwd>RoBERTa</kwd>
        <kwd>legal documents</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Ambiguity identification in legal documents is a critical activity with considerable impacts for
persons and organizations, even those with no professional legal training. Legal documents,
including contracts, statutes, and regulations, are lengthy and complicated, containing several
technical jargon and phrases that can be challenging to comprehend. These words may have
various meanings in English but signify something diferent in legal terms. Ambiguities in these
agreements can cause misunderstanding, misinterpretation, and legal issues, which can take
time and money to settle. Uncertainty about the scope and applicability of a law or regulation
can lead to inconsistent interpretations and enforcement.</p>
      <p>
        Ambiguity identification utilizing transformers [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] can aid with the resolution of ambiguities
in legal documents by recognizing potential ambiguities and providing alternate interpretations.
This can assist the public with minimal legal knowledge in comprehending possible dificulties
before they escalate into disputes or legal challenges. One of the primary benefits of
implementing transformers for ambiguity detection is their ability to analyze enormous amounts of text
swiftly and properly.
CEUR
Workshop
Proceedings
      </p>
      <p>ceur-ws.org
ISSN1613-0073</p>
      <p>Furthermore, transformer models may learn from massive amounts of data and be trained
on a wide variety of legal documents to improve their accuracy and efectiveness. This means
that the models can adapt to varied legal circumstances and detect nuances in documents that
people may miss. BERT and Longformer are transformers that process and analyze natural
language text. These models enable computers to comprehend language in context and carry
out a variety of language-related tasks. These models’ transformer architecture enables them to
model relationships between words in a sentence or document, as well as capture context and
meaning. This is particularly important in legal papers, where language is frequently complex
and context is critical to understanding the meaning of a certain sentence.</p>
      <p>BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained transformer
model that was trained on huge amounts of text data to achieve a thorough knowledge of natural
language. BERT analyzes language in a bidirectional manner, enabling it to collect meaning
from both left-to-right and right-to-left directions. Another transformer model is Longformer,
which is specifically built to handle long documents such as legal contracts and regulations.
It focuses on the most relevant parts of the content using an attention mechanism and can
capture context over long distances. This makes it especially efective for detecting ambiguity in
legal documents, where sections or provisions may be separated by substantial gaps in the text.
Both BERT and Longformer can be used to detect ambiguities in legal documents by analyzing
the document’s context and structure and identifying potential ambiguities and conflicts in
meaning. As a result, we investigate transformers like BERT, Longformer, RoBERTa, Albert, and
others that grasp the context of a sentence and detect ambiguity, assisting us in determining
whether a word has more than one meaning in a certain sentence.</p>
      <p>This comparative study observes the performance of diferent transformer models in the
detection of ambiguity in legal documents and contracts. The length of the contract being
an important deciding factor. Longformer proves to outperform the other models due its
ability to parse through a larger token size which proves crucial. This paper is organized as
follows: Section 2 is a literature survey of existing surveys. Section 3.1 explores the Longformer
transformer explaining the architecture, Section 3.2 talks about the BERT model. RoBERTa is
explored in Section 3.3. Section 44discusses the resulting outcome. Section 5 is about the future
scope and implementation of this study and Section 6 contains the conclusion.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Survey</title>
      <p>
        Natural language processing tasks, including question answering [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ], sentiment analysis [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
natural language translation [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], fake news classification [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ], and natural language
inference, were evaluated by the authors using various computational approaches. BERT and
its variants are now becoming popular because of it’s robust mechanism. Legal-BERT [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
is a transformer which has shown considerable potential in detecting ambiguities in legal
writings. To learn the complex vocabulary of legal language, it was trained on a vast dataset
of legal documents, including case law, legislation, and legal papers. To illustrate the model’s
usefulness in legal applications, the authors provided a set of fine-tuning tasks for it, such as
legal question answering and legal document classification. On these tests, they discovered
that LEGAL-BERT beat various state-of-the-art models, obtaining an accuracy of 92% on legal
document classification and 73% on legal question answering. Overall, LEGAL-BERT is a
promising technique for legal text processing that takes advantage of the capabilities of BERT
while focusing on legal-specific problems. Its great accuracy in tasks such as legal document
classification and legal question answering illustrates its usefulness in actual legal applications.
      </p>
      <p>
        The utility of fine-tuning pre-trained transformer models for processing long legal documents,
such as LegalBERT and Longformer, has been studied when we compare them as indicated in
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The authors propose changes to these models to address the special issues provided by
legal literature, such as extended sentences, complex syntax, and domain-specific terminology.
The authors compared the performance of the improved LegalBERT and Longformer models to
the original models on a dataset of legal documents. They discovered that the adjusted models
performed better on a variety of classification tasks, such as document categorization, sentiment
analysis, and legal topic identification.
      </p>
      <p>
        The author introduces Longformer [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], a new transformer architecture that can eficiently
process long sequences of up to tens of thousands of tokens. Because of their quadratic memory
and computing complexity, the authors first highlight the limits of existing transformer models,
such as BERT, in handling extended sequences. They then present Longformer, which integrates
a sparse attention mechanism to lower the processing requirements of transformers’
selfattention mechanism. Longformer is evaluated on many natural language processing tasks,
including document categorization and question answering, and its performance is compared to
that of existing transformer models. Longformer outperformed other models on tasks involving
long sequences while maintaining comparable performance on shorter sequences, according to
the researchers.
      </p>
      <p>
        RoBERTa (Robustly Optimized BERT) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] is an improved version of the well-known BERT
model. The original BERT model has various shortcomings including inadequate pre-training
procedures, small batch sizes, and insuficient training data. They then propose many changes
to the BERT pre-training method, such as higher batch sizes, longer training sessions, dynamic
masking, and no next sentence prediction. Furthermore, the authors undertook a thorough
examination of the aspects influencing RoBERTa’s performance, such as the efect of pre-training
data size, batch size, and training duration. Overall, the RoBERTa model outperforms the original
BERT model in terms of performance and resilience, and its changes to the pre-training approach
can help inform the creation of future transformer models.
      </p>
      <p>
        ALBERT [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], a new variation of the BERT model, provides state-of-the-art performance on
various natural language processing tasks while lowering the number of parameters greatly
when compared to BERT. The authors initially identify the original BERT model’s enormous
number of parameters as a key obstacle to training and deployment on resource-constrained
devices. They then propose many BERT architecture adjustments, such as factorized embedding
parameterization, cross-layer parameter sharing, and inter-sentence coherence loss. ALBERT
was trained on a huge corpus of text data and its performance on several benchmark natural
language processing tasks was evaluated by the authors. They discovered that ALBERT
outperformed the original BERT model on various tasks while requiring much fewer parameters.
Overall, the ALBERT model ofers an appealing approach for lowering the number of
parameters in transformer models while still performing well on natural language processing tasks.
According to the research, we can observe the use of transformers to comprehend context and
overcome the ambiguity problem in legal texts.
      </p>
      <p>
        We can also explore the Legal-pretrained Longformer models [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], though their objective is
diferent, it gives a good overview on how the Longformer can be trained of legal data. The
standard Longformer uses a pre-trained RoBERTa multiplying the token size by 8 thus creating
a new attention mechanism and similarly the Legal Longformer is warm-started from the
LegalBERT. There is no pretraining involved, only finetuning in addition to the warm starting. The
Legal Longformer Extended also play a major role in increasing the token size and maintaining
the window span at 128. The Legal Longformer Extra Global is another approach where the
global pattern is replaced with the extra global pattern. As defined in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] ‘ExtraGlobal’ approach
is a periodic distribution of the global tokens across the input texts being processed during the
ifnetuning phase instead of fewer global tokens.
      </p>
      <p>According to the research, we can observe the use of transformers to comprehend context
and overcome the ambiguity problem in legal texts. Overall, the application of NLP in legal
documents can be highly advantageous, allowing those who do not have legal competence to
decode the phrases, which can aid in the resolution of dificulty or misconceptions.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Longformer</title>
        <p>
          Longformer [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] is a modified version of the transformer architecture built to accommodate
large input sequences common in documents. The authors begin by identifying the drawbacks
of existing transformer models, such as their inability to handle large input sequences and the
self-attention mechanism’s quadratic time and space complexity. They then propose changes
to the transformer architecture, such as a sliding window self-attention mechanism, global
attention bias, and sparse attention. The sliding window self-attention mechanism [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] reduces
the self-attention operation’s time and space complexity from quadratic O(n*n) to linear O(n*W)
(Figure 1), allowing the model to handle extended input sequences.
        </p>
        <p>Sliding Window Attention Pattern
The Longformer proposes a technique that transitions from local to global attention over the
full document without requiring a large amount of memory. It operates on the sliding window
principle, but it employs a strategy in which querying is limited to its peer node and the key
nodes adjacent to it. This may appear to be sacrificing information in order to gain context, but
when layered, it establishes a pattern for gathering information from neighbouring nodes in
diferent layers as shown in Figure 2. The global attention bias directs the model’s attention
to the beginning and end of the input sequence, whereas the local attention bias directs the
model’s focus to the middle of the input sequence.</p>
        <p>Dilated Sliding Window Attention Pattern
To cover large documents, the number of layers increases, resulting in an O(n*W*L) memory
demand. To address this, we must minimize L (the number of layers) so that n*W*L«n*n. To
reduce layering, more nodes must be covered in each layer, which is accomplished by selecting
the alternating node for each query. This may diminish the focus on local nodes. As a result, the
authors of Longformer advise utilizing a mix of Sliding Window and Dilated Sliding Window,
with the beginning layers using simple Sliding Window and increasing layers using Dilated
Sliding Window.</p>
        <p>Global Attention
The combination of sliding and dilated sliding windows in Longformer is not optimized for
task-specific activities. When some nodes are given the authority to attend to all nodes in a
layer at the same time, this is referred to as global attention. According to the user, this could
be task specific.</p>
        <p>
          Longformer was trained on a variety of datasets, including text8 and enwik8. They attained
very low BPC values such as 1.0 and 1.1, demonstrating their efectiveness in comparison to other
transformers. They fine-tuned Longformer by training it on many natural language processing
tasks, such as sentiment analysis and question answering, and evaluating its performance on a
variety of long-document benchmarks. The document was trained on 4096 tokens, which is over
8 times the size of what BERT can handle. Longformer can be taught on any previously learned
model and while training the Longformer for MLM, they continue pre-training from RoBERTa,
an already trained BERT model, with just minor changes to fit the Longformer design. Except
for gradient updates, sequence length, batch size, maximum learning rate, linear warmup, and
polynomial decay, most parameters were retained the same as in RoBERTa. Diferent global
attention for acceptable datasets were introduced while answering questions on datasets like
WikiHOP and TriiviaQA. Before being run by Longformer, the questions were merged to make
longer sequences. Most word pieces in document classification were longer than 512, which was
a useful approach to gauge the skill of Longformers. Overall, the Longformer model provides a
promising solution for handling long input sequences in natural language processing tasks and
can inform the development of future transformer models for document processing.
3.2. BERT
BERT (Bidirectional Encoder Representations from Transformers) [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] is a pre-training method
(Bidirectional Encoder Representations from Transformers) for deep neural network models
based on the transformer architecture. The authors demonstrate that BERT can achieve
state-ofthe-art performance on a wide range of natural language processing tasks, including question
answering, text classification, and language inference.
        </p>
        <p>
          The authors note that previous pre-training methods for language models were unidirectional
and lacked the ability to capture dependencies between words in both directions [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. BERT
addresses this limitation by introducing a bidirectional training objective that enables the model
to capture context from both directions in the input sequence. BERT utilizes MLM (Masked
Language Modelling), which masks some words and predicts them depending on the context of
the sentence, as well as ’Next Sentence Prediction’.
        </p>
        <p>
          Masked Language Modelling
Masked Language Modelling [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] is also known as Cloze Procedure. the essence of this process
is masking a small number of the parameters and keeping the others the same and thus using
context to predict the possible outcomes for the masked parameters. The most optimum option
with highest probability is then chosen as the output. In BERT particularly, MLM is performed
on unlabeled data. For context to be understood properly, the transformer should not take
the meaning of just each word, but also the meaning of phrases in the tokens and the whole
sentence as well. MLM then predicts the probability of words based on the meaning of the
whole sentence. As stated in [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], Cloze Procedure definition is ” Any single occurrence of
a successful attempt to reproduce accurately a part deleted from a ”message” (any language
product) by deciding, from the context that remains, what the missing part should be. Cloze
Procedure does not use isolated sentences to understand the perfect meaning, rather it tries
to understand the meaning of the entire input and then give the best output based on what
the author means. This can be very useful in legal documents because it is not just taking into
account a particular phrase for eg. The phrase ’absolute discharge’ in legal matter means a
person has been found guilty but not convicted and the case is closed. [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] If taken separately
it can mean ’allowed to flow’ but when used as ’For example, a person charged with an ofence
may express sincere remorse and regrets toward the victim. In most cases, a charitable donation
is imposed. The court considers this gesture as a compensation for the harm caused.’, it means
something completely diferent.
        </p>
        <p>
          Next Sentence Prediction
Next Sentence Prediction is as the name suggests checking if a sentence follows another sentence.
Usually during implementation, two sentences are considered where 50% of the time B does
occur after A and the other 50% it is a random sentence with no contextual relation. This helps
in training text pair relations. The main objective of NSP is to check if the context is valid or
not based on the pair of sentences. If the NSP score is low, it may not lead to MLM thus making
BERT less sensitive to irrelevant information unlike RoBERTa, another transformer [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. For
example, if we have three sentences, ’This man was an accessory to this crime’, ’The accessory
enhances my outfit’, ’He will be held responsible’. All sentences separately mean something,
but sentence 2 is highly unlikely to follow sentence one whereas sentence 3 makes more sense.
Taking a large group of sentences, separated by delimiters helps this method to then predict
what fits in better and thus in understanding context of a sentence. Depending on the context
of the next sentence, it can help the model to then detect ambiguity in legal documents if the
sentence does not fit conventional English language meanings.
        </p>
        <p>BERT delivers high performance mostly on sentence-level and token-level problems. BERT
consists of two steps: pre-training and fine-tuning. Pre-Training occurs when unlabeled data is
fed, and fine-tuning occurs when pre-trained labeled data from downstream tasks is fed.</p>
        <p>They employed the same model size as OpenAI GPT when experimenting with BERT, however
BERT used a bi-directional strategy, whilst the latter used constrained self-attention. BERT
adopted the MLM technique during pre-training, but this caused schisms in pre-training and
ifne-tuning because the masked work occurred in pre-training but not fine-tuning. To reduce
this, they chose to replace only 50% of the words with the masked term rather than all of them.
They used texts from English Wikipedia and BooksCorpus for pre-training on Next Sentence
Prediction, selecting 50% of the sentences as the next predicted one and 50% as random sentences.
The authors also introduce several modifications to the transformer architecture to enable the
bidirectional training objective and improve the model’s performance. These modifications
include adding a ”segment embedding” to distinguish between the two sentences in the input,
using a ”position embedding” to capture the position of each token in the input sequence, and
introducing a new pre-training task called ”masked LM” to enable the model to handle masked
input tokens.</p>
        <p>Overall, the BERT model represents a breakthrough in the field of natural language
processing and has become a widely used model for a variety of language tasks. Its success has
inspired further research into pre-training methods for neural network models and has led
to significant improvements in the state-of-the-art performance on many natural language
processing benchmarks.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.3. RoBERTa</title>
        <p>
          RoBERTa [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] was made to improve upon BERT’s performance by optimizing its pretraining
objectives and hyperparameters. They achieved this by training the model on a larger corpus of
text and using a new set of techniques to reduce overfitting and improve generalization.
        </p>
        <p>The RoBERTa model is trained using a masked language modelling (MLM) objective, which
involves randomly masking words in the input sequence and predicting them based on the
context provided by the surrounding words. The authors also introduced a new training
technique called dynamic masking, where the masking probability is increased as the training
progresses, forcing the model to rely more on context and less on surface features. To further
optimize the model, the authors trained it on a larger corpus of text compared to the corpus
used to train BERT. They also introduced a set of additional pre-training tasks, such as next
sentence prediction and document shufling, to encourage the model to learn more about the
structure and coherence of natural language text.</p>
        <p>RoBERTa was also trained with a larger batch size and a longer training schedule compared
to BERT, which helped reduce overfitting and improve generalization. The authors also applied
a set of regularization techniques, such as dropout and weight decay, to further improve the
model’s robustness. The RoBERTa model (Figure 3 was evaluated on a range of benchmark NLP
tasks, including question answering, natural language inference, and named entity recognition.
The authors found that RoBERTa outperformed BERT on most of these tasks, demonstrating its
improved ability to capture context and generalize well to new data.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>Observing some of the best Transformers that can be used to detect ambiguity and understand
context of text sequences, Longformer proves to be the best fit for detecting ambiguity in legal
documents. Its receptiveness towards larger texts makes it more appropriate compared to other
models available and usage of low memory and computational power gives it an edge over
the other Transformers. Legal texts are frequently long and complex, making it dificult to
detect ambiguity using traditional NLP models with a fixed attention span of 512 tokens, such
as BERT and RoBERTa. Longformer, on the other hand, uses a sliding window attention method
to analyze longer documents while preserving the same computational complexity as BERT
and RoBERTa. Longformer’s sliding window attention mechanism allows it to capture global
context, which is crucial in detecting ambiguity in legal documents. By considering a larger
context, Longformer can better understand the relationships between sentences and clauses in
a document, which can help identify potential sources of ambiguity. Additionally, Longformer’s
pre-training objectives can also help improve its performance in detecting ambiguity. For
example, the authors of the Longformer paper introduced a new pre-training objective called
document token prediction, where the model is trained to predict which tokens belong to the
same document. This objective encourages the model to learn more about the structure and
coherence of long documents, which can help improve its ability to detect ambiguity. Overall,
Longformer’s ability to handle longer documents and capture global context makes it better
suited for detecting ambiguity in legal documents. However, it’s worth noting that the relative
performance of these models may depend on the specific task and dataset, and it’s always
important to evaluate diferent models thoroughly before making a final decision.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Future Scope</title>
      <p>In the future, we can use Longformer to train a model to detect ambiguity in legal documents.
Longformer can handle a huge number of sequences using the Sliding Window approach, hence
enormous documents can be employed. Using a suitable dataset can also aid in resolving
the ambiguity by not only detecting it but also describing the meaning of the said sequence
in relation to the content. This will help not only discover ambiguity, but also interpret the
sentence based on its relation to the document.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In conclusion, Longformer has emerged as a promising solution for the problem of
ambiguity detection in legal documents. Legal documents are often long, complex, and filled with
technicalities, making it dificult for standard NLP models to capture the global context and
identify potential sources of ambiguity. Longformer’s ability to handle longer documents and
its sliding window attention mechanism enables it to capture global context better than BERT
and RoBERTa. Moreover, Longformer’s pre-training objectives, such as the document token
prediction task, improve its performance in detecting ambiguity. The results of recent studies
demonstrate that Longformer outperforms BERT and RoBERTa in detecting ambiguity in legal
documents.</p>
      <p>Therefore, Longformer has the potential to enhance the accuracy of legal document analysis
and improve the quality of decision-making by legal professionals and help those lacking legal
expertise. As more research is conducted on Longformer and other transformer-based models,
we can expect to see further advancements in NLP’s application to the legal domain.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need (</article-title>
          <year>2017</year>
          ). arXiv:
          <volume>1706</volume>
          .
          <fpage>03762</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Limbasiya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <article-title>Semantic textual similarity and factorization machine model for retrieval of question-answering</article-title>
          ,
          <source>in: International Conference on Advances in Computing and Data Sciences (ICACDS'19)</source>
          , https://link.springer.com/chapter/10.1007/
          <fpage>978</fpage>
          -981-13- 9942-8_
          <fpage>19</fpage>
          ,
          <year>2019</year>
          , pp.
          <fpage>195</fpage>
          -
          <lpage>206</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Limbasiya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <article-title>Bidirectional long short-term memory-based spatio-temporal in community question answering, Deep Learning-Based Approaches for Sentiment Analysis (</article-title>
          <year>2020</year>
          )
          <fpage>291</fpage>
          -
          <lpage>310</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ahuja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Vats</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pahuja</surname>
          </string-name>
          , T. Ahuja,
          <article-title>Pragmatic analysis of classification techniques based on hyperparameter tuning for sentiment analysis</article-title>
          ,
          <source>in: International Semantic Intelligence Conference</source>
          , volume
          <volume>2786</volume>
          , http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2786</volume>
          /Paper54.pdf,
          <year>2021</year>
          , pp.
          <fpage>453</fpage>
          -
          <lpage>459</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Madaan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <article-title>Anuvaad: A hindi-sanskrit-hindi bilingual machine translation system using rule-based approach</article-title>
          ,
          <source>International Journal of Social Ecology and Sustainable Development (IJSESD) 13</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          , L. Jain,
          <article-title>Anuvaadika: Implementation of sanskrit to hindi translation tool using rule-based approach</article-title>
          ,
          <source>Recent Advances in Computer Science and Communications (Formerly: Recent Patents on Computer Science</source>
          )
          <volume>13</volume>
          (
          <year>2019</year>
          )
          <fpage>1136</fpage>
          -
          <lpage>1151</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Madaan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Prodan</surname>
          </string-name>
          ,
          <article-title>Mcred: multi-modal message credibility for fake news detection using bert and cnn</article-title>
          ,
          <source>Journal of Ambient Intelligence and Humanized Computing</source>
          <volume>14</volume>
          (
          <year>2023</year>
          )
          <fpage>10617</fpage>
          -
          <lpage>10629</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Madaan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <article-title>Ucred: fusion of machine learning and deep learning methods for user credibility on social media</article-title>
          ,
          <source>Social Network Analysis and Mining</source>
          <volume>12</volume>
          (
          <year>2022</year>
          )
          <fpage>54</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Goel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Madaan</surname>
          </string-name>
          ,
          <article-title>Fad-cods fake news detection on covid19 using description logics and semantic reasoning</article-title>
          ,
          <source>International Journal of Information Technology and Web Engineering (IJITWE) 16</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          , LEGAL-BERT:
          <article-title>The muppets straight out of law school</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4665</fpage>
          -
          <lpage>4674</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhatia</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Kaur, Processing long legal documents with pre-trained transformers: Modding LegalBERT and longformer</article-title>
          ,
          <source>in: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>31</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>I.</given-names>
            <surname>Beltagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <string-name>
            <surname>Longformer: The Long- Document Transformer</surname>
          </string-name>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , V. Stoyanov,
          <article-title>RoBERTa: A robustly optimized BERT pretraining approach (</article-title>
          <year>2019</year>
          ). arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , V. Stoyanov,
          <string-name>
            <surname>ALBERT:</surname>
          </string-name>
          <article-title>A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS</article-title>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Tsotsi</surname>
          </string-name>
          ,
          <source>Exploration of Eficient Transformer Methods for Long Legal Document Processing</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Eifler</surname>
          </string-name>
          ,
          <article-title>Understanding longformer's sliding window attention mechanism</article-title>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , in
          <source>: Proceedings of the 2019 Conference of the North, Association for Computational Linguistics</source>
          , Stroudsburg, PA, USA,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , “
          <article-title>cloze procedure”: A new tool for measuring readability</article-title>
          ,
          <source>Journalism Quarterly</source>
          <volume>30</volume>
          (
          <year>1953</year>
          )
          <fpage>415</fpage>
          -
          <lpage>433</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D.</given-names>
            <surname>Avocats</surname>
          </string-name>
          , Doyon avocats - discharge, https://www.doyonavocats.ca/en/discharge/,
          <year>2023</year>
          . Accessed:
          <fpage>2024</fpage>
          -5-2.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <article-title>How context afects language models' factual predictions (</article-title>
          <year>2020</year>
          ). arXiv:
          <year>2005</year>
          .04611.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>