<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Team Sharingans at SimpleText: Fine-Tuned LLM based approach to Scientific Text Simplification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Syed Muhammad Ali</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hammad Sajid</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Owais Aijaz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Owais Waheed</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Faisal Alvi</string-name>
          <email>faisal.alvi@sse.habib.edu.pk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abdul Samad</string-name>
          <email>abdul.samad@sse.habib.edu.pk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Program, Dhanani School of Science and Engineering, Habib University</institution>
          ,
          <addr-line>Karachi-75290</addr-line>
          ,
          <country country="PK">Pakistan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper reports Habib University's Team Sharingans' participation in the CLEF 2024 SimpleText track, which aims to simplify scientific texts for improved readability and comprehension for non-experts. Our goal is to use state-of-the-art language models for simple yet accurate explanations of scientific texts for the general public. Our solution is based on a multi-step approach utilizing the GPT-3.5 model to solve Tasks 1, 2, and 3 i.e. passage extraction, identification and explanation of dificult concepts, and summarization. Our approach for Task 1 involved sentence embedding-based vector database for narrowing the corpus, MS-Marco for document ranking, and GPT-3.5 for selecting informative passages. For Task 2, we fine-tuned the GPT-3.5 model to identify and explain dificult terms and generate explanations. For Task 3 also, we fine-tuned the GPT-3.5 model with a specific prompt to simplify given scientific abstracts and sentences. The efectiveness of our approach was assessed based on the quality of results, demonstrating the potential of advanced language models in making scientific education more accessible to the general public. Our solution proposes using fine-tuned large language models as a reliable source for scientific education.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Large Language Models</kwd>
        <kwd>GPT-3</kwd>
        <kwd>5 Turbo</kwd>
        <kwd>Elastic Search</kwd>
        <kwd>BERT</kwd>
        <kwd>Text simplification</kwd>
        <kwd>SimpleText</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        • Task1: What is in (or out)? Selecting passages to include in a simplified summary [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        • Task 3: Rewrite this! Given a query, simplify passages from scientific abstracts [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review</title>
      <p>We review and analyze the approaches of the teams who participated in CLEF Simple Text 2023.
Specifically, the approaches of teams whose models were among the top-scoring models in their
respective tasks are discussed.</p>
      <p>
        For Task 01, the Elsevier [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] team fine-tuned the bi-encoder and cross-encoder ranking models
for ranking documents given a query in order of their relevance. Specifically, they use the Dense
Passage Retrieval model. The AIIR and LIAAD Labs [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed five systems for this task, including
cross-encoder with and without fine-tuning, Sentence-BERT bi-encoder models, and traditional IR
models like TF-IDF combined with PL2.
      </p>
      <p>
        For Task 2.1 and Task 2.2, diverse methodologies and tools were employed. The UBO [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] team
utilized the pke package, along with statistical and graphical approaches such as YAKE!, TextRank, and
Tf-Idf, to extract keywords from the provided sentences, and subsequently extracted definitions from
Wikipedia for Task 2.2. The Sinai [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] team used the GPT-3 auto-regressive model for lexical complexity
prediction. They presented an approach for identifying the most challenging terms in the text which
leveraged zero-shot and few-shot learning prompts to assess term dificulty.
      </p>
      <p>
        For Task 03, the UBO [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] team employed the SimpleT5 model and trained it on the datasets.
Subsequently, they utilized this trained model to generate simplified text from the test dataset. They also
utilized the BLOOM model, albeit requiring sample data input due to its few-shot learning nature, and
similarly applied it to generate simplified text. AIIR and LIAAD [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] team, utilized OpenAI’s Davinci
model with a straightforward prompt for text rewriting.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Approaches</title>
      <p>3.1. Task 1
For Task 01, we had
• A Corpus of DBLP abstracts. An Elastic search index and a vector database with sentence
embedding scores were provided through APIs for querying the corpus.
• An input file containing input queries and their topic texts.
• A file containing the quality relevance scores of abstracts w.r.t topics on a scale of 0-2 for 25 topics
and 64 queries.
• A set of files containing the topics selected from The Guardian newspaper and Tech Xplore
website along with their URLs and article content.</p>
      <p>The approaches used for this task are given:</p>
      <sec id="sec-3-1">
        <title>3.1.1. MS-Marco + GPT-3.5 based re-ranking</title>
        <p>In this approach, we utilized the vector database for querying the top 100 relevant abstracts from the
corpus. To generate the query for the API, we used the query text. If the query text was a long phrase
or a sentence, then the “abstracts" parameter was used in the query to search inside abstracts. In case
the query text was a short phrase the “title" parameter was used. Table 1 shows examples of phrases
and the generated queries.</p>
        <p>Then, the abstracts retrieved from the search were ranked using the “msmarco-MiniLM-L12" cross
encoder w.r.t the query text as well as the topic text. The query and the topic texts were concatenated
together by a period and a white space “. ". The top 10 re-ranked abstracts were provided with a
ifne-tuned GPT-3.5 model to select the most relevant abstract with reference to query text, and then
extract the most relevant passage from the selected abstract. This two-step process in shown in Table 2.</p>
        <p>The GPT-3.5 model was fine-tuned on manually curated training data. The hyperparameters are
given in Table 3.</p>
        <p>The training data used to fine-tune GPT-3.5 comprised several examples, each having 10 manually
selected abstracts as input and a manually extracted passage as the output. Finally, the runs for this
task were submitted with the run id “Sharingans_Task1_marco-GPT3".</p>
        <p>Sentence/Phrase
how AI systems, especially virtual
assistants, can perpetuate gender stereotypes
Select the abstract which gives the most relevant definition/explanation for the
following term/phrase: (list of 10 abstracts)
Extract the most relevant part of abstract explaining the given term/phrase in light
of the topic (topic). (abstract)</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.1.2. Keyword extraction with RAKE and ColBert+GPT-3.5 based re-ranking</title>
        <p>
          For this approach, we utilized RAKE [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], a keyword extraction algorithm, to identify relevant terms for
querying the corpus. We provided RAKE with the topic and query text to extract relevant keywords
from them. Then we used these terms to generate a query for the Elastic Search index, which in turn
narrowed down the corpus to a subset of documents. This subset was further refined using the ColBERT
neural ranker [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] to choose the top 10 most relevant ones, given the topic text and the query. Finally,
GPT-3.5 helped in selecting the most informative and concise passage for inclusion in the summary. We
did not include runs for this approach since the MS-Marco + GPT-3.5 approach worked better which
has been described above.
3.2. Task 2
For Task 02, we were provided with:
• A train file, along with some manual run files, that included the fields of the “source sentences"
along with their corresponding extracted terms, definitions, dificulty, and explanations with
positive and negative definitions as an indicator for what an acceptable definition should look
like.
• A validation file for testing the trained model with similar entries as that in the train file.
• A test dataset, having around 500 plus entries, consisting of just the source sentences for the
evaluation of the model’s output.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.2.1. GPT-3.5 Turbo based approach</title>
        <p>To accomplish Task 02, we fine-tuned the GPT-3.5 Turbo model on the train dataset. GPT-3.5 Turbo
is an advanced language model developed by OpenAI, part of the broader GPT-3.5 series. Due to
its enhanced Natural Language understanding and generation ability, we decided to use this model
specifically for this task. Table 4 represents the details of the fine-tuning of our GPT-3.5 model.</p>
        <p>The efective use of 3 epochs alongside a single batch size allowed the dataset to be passed into the
model only three times, which is relatively less for such a task. However, setting a batch size of one
alongside a learning rate multiplier of 2 allowed a more stable adjustment of weights. We used a unit
batch size with so that it has a regularizing efect to prevent our model from overfitting on the small
dataset. The idea of a small batch size was to have the model learn before having to see all the data.</p>
        <p>For this task, we observed good performance on the test set. This indicates that the mini-batch
learning approach, although unconventional with a batch size of one, was efective in optimizing the
model both for term extraction and for generating definitions. The small batch size and learning rate
multiplier helped achieve a better generalization over the small dataset.</p>
        <p>We passed the training dataset as a query to the GPT model, which consisted of the keywords,
dificulty scores, and their definitions respectively for each sub-task to fine-tune the model. The
finetuned model was then used to extract keywords from the source sentences, assign them dificulty scores,
generate definitions, and store them in a data frame. Finally, we converted the output into a JSON file
as required for the submission with the runid “Sharingans_task2.2_GPT" for both sub-tasks.</p>
        <p>The efectiveness of this method can be attributed to the tailored approach to the specific requirements
of Task 2. The model’s performance validated our decision, demonstrating that even with small batches,
careful tuning can achieve desirable outcomes.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.2.2. KeyBert, Classification, and Prompt Engineering based approach</title>
        <p>
          Our second approach for Task 02 included utilizing the “KeyBert Model" [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] for keyword extraction,
Random Forest Classification for assigning dificulties, and Prompt Engineering through
Mistral-7BInstruct-v0.3 Large Language Model (LLM).
        </p>
        <p>The KeyBert model leverages BERT embeddings to create/extract keywords and key phrases. We
utilized it to extract keywords from the source sentences. We then used Random Forest Classification
on the extracted keywords with a training and test split of 80%-20%. Through the use of
Mistral-7BInstruct-v0.3 Large Language Model (LLM), we sent requests through the Hugging Face’s API to perform
prompt engineering to get the required definitions as the response.</p>
        <p>We did not submit the runs of this approach due to a major limitation of Hugging Face API that
restricts the number of requests to around 500 queries which were far less than the number of terms
extracted. This would result in an extremely low score in case this run was submitted.
3.3. Task 3</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.3.1. Data Description:</title>
        <p>For Task 03, we were provided with:
• A parallel corpora of training data comprising of source sentences/abstracts along with their
query texts and simplified versions.
• Test data which included source sentences (task 3.1) and source abstracts (task 3.2) and query
text for each of the sentence/abstract.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.3.2. Fine-Tuned GPT-3.5 Turbo</title>
        <p>
          In this approach, we used OpenAI’s GPT-3.5 model, since it has great summarizing capabilities. We
ifrst experimented with fine-tuning the GPT-3.5 model, using the training data of task 3.1 and task
3.2 all together and shufling the sentences and abstracts randomly. Then we experimented with
finetuning the model for Task 3.1 and Task 3.2 separately. Utilizing the EASSE scoring [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], we found that
ifne-tuning the model for task 3.1 and task 3.2 separately yielded slightly better results as compared to
ifne-tuning the model with data for both tasks altogether, especially for task 3.2. The method to train
the model for task 3.1 and task 3.2 however remained the same which is discussed below.
        </p>
        <p>The fine-tuning process was similar for both of the subtasks. We provided the model with a prompt
to simplify the sentences/abstracts along with the sentences/abstracts, the query text, and the reference
output sentences/abstracts. The hyperparameters used for fine-tuning the model are given in Table 6
and Table 7 for tasks 3.1 and 3.2 respectively.</p>
        <p>After training the model, we provided the same prompt with the test data (sentence/abstract and query
text) to generate the simplified sentences/abstracts. These simplified sentences/abstracts were then
evaluated using the EASSE score and were submitted with the runid “Sharingans_task3.1_finetuned"
and “Sharingans_task3.2_finetuned" for task 3.1 and task 3.2 respectively.</p>
      </sec>
      <sec id="sec-3-7">
        <title>3.3.3. Fine-Tuned Bart Sequence-to-Sequence Model</title>
        <p>
          In this approach, we utilized Meta’s BART sequence to sequence pre-trained model. BART was
introduced by Meta (Facebook) as a Denoising Sequence-to-Sequence Pre-training for Natural Language
Generation, Translation, and Comprehension [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Specifically, we use the “BART-large-cnn"
sequenceto-sequence model using the Hugging Face Transformer library. We first tokenized the training input
sentences/abstracts and the reference outputs and used them to fine-tune the model. Then we provided
the model with test data to generate simplified sentences. We observed that although the model
performed well in summarizing the longer sentences and abstracts, it did not simplify them in many
cases. Moreover, for shorter sentences, the model generated outputs that were very similar or even the
same as the original sentence. Since this model did not perform well as compared to the GPT-3.5 model,
we did not include runs for this model.
In this approach, we utilized the PEGASUS model for text simplification. PEGASUS is a pre-trained
encoder-decoder model tailored specifically for abstractive text simplification [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. We fine-tune this
model via the Hugging Face Transformer library using the same approach as for BART. This model
provides slightly better results than BART but still lags behind OpenAI’s GPT-3.5.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>4.1. Task 01
4.2. Task 02
Our run for Task 02 retrieved a total of 1,501 keywords, assigned them dificulty scores, and later on
generated their definitions and explanations. Table 9 shows our oficial results for our Task 02 run.
recall
precision</p>
      <p>BLEU
overall</p>
      <p>The overall recall metric indicates the proportion of terms (independently from the dificulty) that
were found while the precision metric indicates how accurately were the terms labeled as dificult. The
ability of GPT-3.5 Turbo to efectively comprehend Natural Language tasks can be concluded from the
overall scores of recall and precision indicating that our fine-tuned model was able to extract keywords
and distinguish their dificulties quite satisfactorily. The BLEU scores, on the other hand, computed
with n-grams equal to 1, 2, 3, and 4 lack precision on a higher number of n-grams. This may potentially
be because the words chosen by our fine-tuned model to complete the definitions were not quite in line
with the actual definitions used as reference, however, the idea conveyed by the definition was correct
to an extent based on manual interpretation.
4.3. Task 03
Tables 10 and 11 show the scores for the run submitted for task 3.1 and task 3.2 respectively. Since an
identical approach was taken for tasks 3.1 and 3.2 for these runs, they exhibit very similar scores. We
observe that the fine-tuned GPT-3.5 model scores fairly high in the scoring metrics. The FKGL, BLEU
and Lexical complexity score for task 3.1 and 3.2 are similar. The SARI score and compression ratio are
slightly higher in task 3.2 which indicates that documents in task 3.2 had to be modified more than the
relatively smaller sentences in task 3.1 for simplification. The FKGL scores for both sub-tasks however
indicate that the text can be further simplified. But this should be done without loss of information of
the original text. Overall, this suggests that our approach has fairly good potential for scientific text
simplification and summarization.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Count FKGL</p>
      <p>SARI</p>
      <p>BLEU
578
We utilized several models and techniques to solve SimpleText tasks 1, 2 and 3. For Task 1, we resorted to
extracting keywords, sorting through documents, and ranking their relevance, then finally using GPT-3.5
to pick out the most relevant passages for our summary. Task 2 mostly involved fine-tuning the GPT-3.5
Turbo model to generate complex definitions. We also experimented with the KeyBert model to extract
words, Random Forest classification to assign complexities and then generating definitions via prompt
engineering using the MISTRAL 7-B model. However, the GPT approach turned out to be much better.
Since Task 3 was text-generation based, we utilized curated data to finetune the GPT API and generate
summaries. We also experimented with the Pegasus and BART model for abstractive summarization,
GPT-3.5 exhibited a better performance. Conclusively, we found that out of all approaches, Open AI’s
GPT 3.5 language model gave the best results for task 2 and task 3. However, the pipeline for Task 01
which utilized GPT-3.5 did not perform well. Further research can be done to investigate the cause of
poor performance of the Marco-GPT pipeline as well as to further improve the approaches for Tasks 2
and 3 for better simplification of scientific texts.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We would like to acknowledge the support provided by the Ofice Of Research (OoR) at Habib University,
Karachi, Pakistan for funding this project through the internal research grant IRG-2235. We would also
like to thank SimpleText@CLEF-2024 chairs for their guidance and organization.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>SanJuan</surname>
          </string-name>
          , et al.,
          <article-title>Overview of the CLEF 2024 SimpleText task 1: Retrieve passages to include in a simplified summary</article-title>
          , in: G.
          <string-name>
            <surname>Faggioli</surname>
          </string-name>
          , et al. (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2024</year>
          ), CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>G. M. D. Nunzio</surname>
          </string-name>
          , et al.,
          <article-title>Overview of the CLEF 2024 SimpleText task 2: Identify and explain dificult concepts</article-title>
          , in: G.
          <string-name>
            <surname>Faggioli</surname>
          </string-name>
          , et al. (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2024</year>
          ), CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          , et al.,
          <article-title>Overview of the CLEF 2024 SimpleText task 3: Simplify scientific text</article-title>
          , in: G.
          <string-name>
            <surname>Faggioli</surname>
          </string-name>
          , et al. (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2024</year>
          ), CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Capari</surname>
          </string-name>
          , et al.,
          <article-title>Elsevier at simpletext: Passage retrieval by fine-tuning gpl on scientific documents</article-title>
          , in: G.
          <string-name>
            <surname>Faggioli</surname>
          </string-name>
          , et al. (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2023</year>
          ), CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Mansouri</surname>
          </string-name>
          , et al.,
          <article-title>Aiir and liaad labs systems for clef 2023 simpletext</article-title>
          , in: G.
          <string-name>
            <surname>Faggioli</surname>
          </string-name>
          , et al. (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2023</year>
          ), volume
          <volume>3497</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>253</fpage>
          -
          <lpage>253</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Dubreuil</surname>
          </string-name>
          ,
          <article-title>Ubo team @ clef simpletext 2023 track for task 2 and 3 - using ia models to simplify scientific texts</article-title>
          , in: G.
          <string-name>
            <surname>Faggioli</surname>
          </string-name>
          , et al. (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2023</year>
          ), CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ortiz-Zambrano</surname>
          </string-name>
          , et al.,
          <article-title>Sinai participation in simpletext task 2 at clef 2023: Gpt-3 in lexical complexity prediction for general audience</article-title>
          , in: G.
          <string-name>
            <surname>Faggioli</surname>
          </string-name>
          , et al. (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2023</year>
          ), CEUR Workshop Proceedings, CEURWS.org,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Engel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cramer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Cowley</surname>
          </string-name>
          ,
          <source>Automatic Keyword Extraction from Individual Documents</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          . doi:
          <volume>10</volume>
          .1002/9780470689646.ch1.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>O.</given-names>
            <surname>Khattab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaharia</surname>
          </string-name>
          ,
          <article-title>Colbert: Eficient and efective passage search via contextualized late interaction over bert</article-title>
          ,
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>2004</year>
          .12832. arXiv:
          <year>2004</year>
          .12832.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Grootendorst</surname>
          </string-name>
          , Maartengr/keybert: Bibtex,
          <year>2021</year>
          . URL: https://doi.org/10.5281/zenodo.4461265. doi:
          <volume>10</volume>
          .5281/zenodo.4461265.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alva-Manchego</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Scarton</surname>
          </string-name>
          , L. Specia, EASSE:
          <article-title>Easier automatic sentence simplification evaluation</article-title>
          , in: S. Padó, R. Huang (Eds.),
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing</source>
          (
          <article-title>EMNLP-IJCNLP): System Demonstrations, Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>49</fpage>
          -
          <lpage>54</lpage>
          . URL: https://aclanthology.org/D19-3009. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -3009.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghazvininejad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          , L. Zettlemoyer, Bart:
          <article-title>Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension</article-title>
          ,
          <source>in: Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>7871</fpage>
          -
          <lpage>7880</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .acl-main.
          <volume>703</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saleh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          , Pegasus:
          <article-title>Pre-training with extracted gap-sentences for abstractive summarization</article-title>
          , ArXiv abs/
          <year>1912</year>
          .08777 (
          <year>2019</year>
          ). URL: https://api.semanticscholar.org/ CorpusID:209405420.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>