<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Simplification of Scientific Texts using Pre-trained Language Models: A Comparative Study at CLEF Symposium 2023</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aftab Anjum</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nikolaus Lieberum</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Christian-Albrechts-Universität zu Kiel</institution>
          ,
          <addr-line>Christian-Albrechts-Platz 4, 24118 Kiel</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The complexity of scientific texts often creates a barrier to understanding for non-specialist readers. This barrier inhibits the democratization of knowledge and prevents the wider public from engaging with scientific discourse. We argue that applying artificial intelligence to the tasks of identifying and explaining dificult concepts (Complexity Spotting), and simplifying scientific text, has the potential to democratize access to scientific knowledge. We investigate a range of cutting-edge deep learning models for their eficacy in these tasks. The models are trained and evaluated on a dataset of scientific articles, annotated for complex concepts and their simpler explanations. We present a comparative analysis of the performance of these models, illuminating the strengths and weaknesses of each. Our findings reveal promising avenues for future research and development in the field of automated text simplification, contributing to the broader goal of making scientific knowledge accessible to all.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Text Simplification</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Dificult Term Extraction</kwd>
        <kwd>Dificult Term Explanation</kwd>
        <kwd>Neural Networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Scientific texts are known for their complexity, technical jargon, and specialized vocabulary,
which can often pose challenges for a wide range of readers. Researchers, scientists, and students
alike often struggle to comprehend and digest the content of scientific papers, making it dificult
for them to stay up-to-date with the latest advancements in their respective fields. To bridge this
gap and promote accessibility of scientific information, there is a growing need for automatic
simplification techniques that can transform intricate scientific texts into more comprehensible
versions without compromising the integrity and accuracy of the original content. In recent
years, significant progress has been made in natural language processing (NLP) and machine
learning, enabling the development of various text simplification techniques. One such
technique, called SimpleText, focuses specifically on the automatic simplification of scientific texts.
SimpleText aims to address the linguistic and structural complexities present in scientific writing
while preserving the essential scientific concepts and ensuring the accuracy of the simplified
content. However, the unique characteristics of scientific texts present additional challenges for
automatic simplification. Scientific texts often contain domain-specific terminology, complex
sentence structures, and intricate logical reasoning, which require a deep understanding of the
underlying concepts. Existing text simplification approaches, designed for general-purpose
texts, may not adequately capture the nuanced relationships and meaning in scientific content.</p>
      <p>For the Blended Intensive Program (BIP) Artificial Intelligence (AI) for Humanities: from
Text Simplification to Automatic Humor Analysis, we explore the application of advanced deep
learning models namely AI21, ST5, and BLOOM to address the challenges of text simplification.
These models, each with their unique capabilities, are applied to two interconnected tasks:
’Complexity Spotting,’ where the objective is to identify and explain dificult concepts in scientific
texts, and ’Text Simplification,’ where the aim is to convert complex scientific sentences into
simpler ones that are easier for a general audience to understand.</p>
      <p>The overarching aim of this paper is to investigate the eficacy of these deep learning models
in simplifying scientific texts and spotting complex concepts. By doing so, we hope to contribute
valuable insights to the field of automated text simplification, ultimately promoting the wider
accessibility and understanding of scientific knowledge.</p>
      <p>The structure of the paper is as follows: Section 2 provides a concise overview of the related
work in the field. Section 3 presents the experiments conducted in this study, including detailed
descriptions of the tasks, attributes of the dataset used, a discussion on the utilization of existing
models, and a comprehensive analysis of the obtained results. Finally, Section 4 concludes the
paper by summarizing the key findings and highlighting the significance of the advancements
made. It also addresses the future directions and potential strategies for further improving the
performance of the models.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Automatic text simplification is a research area that focuses on developing computational
methods to simplify complex texts and make them more accessible to a wider audience, including
individuals with cognitive or linguistic challenges, non-native speakers, or people with low
literacy levels. This field combines techniques from natural language processing (NLP), machine
learning, and linguistics to analyze and modify the structure, vocabulary, and syntax of texts.</p>
      <p>There has been significant research and development in automatic text simplification, aiming
to create algorithms and models that can efectively simplify texts while preserving their
meaning. Some common approaches i am doing to discuss here.</p>
      <sec id="sec-2-1">
        <title>2.1. Lexical Based automatic text simplification</title>
        <p>Lexical-based automatic text simplification is an approach that focuses on simplifying texts
by replacing complex words or phrases with simpler alternatives while preserving the overall
meaning. This technique leverages lexical resources, such as dictionaries, thesauri, and word
frequency lists, to identify and substitute complex terms with simpler equivalents.</p>
        <p>
          The research community has made notable strides in the domain of lexical-based automatic
text simplification. For instance, Ciprian-Octavian and Andrei-Ionut Stan [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] introduces
SimpLex, a lexical text simplification architecture designed to automatically simplify complex texts.
The architecture focuses on lexical substitutions, where complex words or phrases are replaced
with simpler alternatives. SimpLex leverages a combination of linguistic resources, such as
WordNet and SimpleWiki, along with machine learning techniques to identify suitable
substitutions. The system is evaluated on a large corpus of news articles and achieves significant
improvements in readability while preserving essential content. The research demonstrates the
efectiveness of a lexical-based approach to text simplification and provides a valuable resource
for improving the accessibility of written texts for a wider range of readers.
        </p>
        <p>
          An other work Proposed Debabrata and Tambe [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] where they used lexical for text
simplification approach using WordNet. The authors propose a method that identifies complex words in
a given text and replaces them with simpler synonyms from WordNet. The approach involves
measuring semantic relatedness between words and selecting the most suitable substitution
based on a combination of contextual and lexical cues. Evaluation results demonstrate the
efectiveness of the proposed method in improving the readability of complex texts while preserving
the core meaning. The research contributes to the field of text simplification by providing a
valuable and accessible solution, leveraging the rich lexical information ofered by WordNet to
automatically simplify complex vocabulary.
        </p>
        <p>
          In a related context, Jipeng Qiang [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] introduces LSBert, a lexical simplification method based
on BERT, a popular language representation model. The authors propose an approach that
leverages BERT’s contextualized embeddings to generate simplified versions of complex words
or phrases. LSBert employs a two-step process: first, it identifies complex words in the input
text, and then it generates simpler alternatives by selecting candidate substitutions based on
their semantic similarity to the original word. The simplification is performed by fine-tuning
BERT on a large corpus of simplified pairs. Evaluation results demonstrate the efectiveness
of LSBert in simplifying complex vocabulary while maintaining the overall coherence and
meaning of the text. This research contributes to the field by harnessing the power of BERT in
lexical simplification tasks and ofers a promising solution for enhancing the accessibility and
understandability of written content.
        </p>
        <p>
          Moreover, in 2019 Sanja and Horacio [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] focuses on improving the lexical coverage of text
simplification systems specifically designed for the Spanish language. The authors address the
challenge of limited lexical resources available for Spanish text simplification by proposing a
method that combines rule-based strategies and machine learning techniques. They leverage
existing resources, such as WordNet and specialized corpora, to build a comprehensive lexicon
specifically tailored for Spanish text simplification. The proposed method efectively identifies
complex lexical items and suggests appropriate substitutions. Evaluation results demonstrate
that the enhanced lexical coverage significantly improves the performance of Spanish text
simplification systems, leading to more accurate and efective simplifications. This research
provides a valuable contribution to the field by addressing the lexical challenges specific to the
Spanish language and enhancing the accessibility and understandability of Spanish texts for a
wider audience.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Semantic Based automatic text simplification</title>
        <p>Semantic-based automatic text simplification is an approach that aims to simplify texts while
preserving their underlying meaning and intention. By leveraging semantic analysis, this
technique identifies complex linguistic structures and replaces them with simpler alternatives,
enhancing the accessibility of the text for a broader readership.</p>
        <p>
          Numerous research studies have delved into the realm of semantic-based automatic text
simplification, yielding valuable contributions to the field. For instance, Elior Sulem presents a
novel approach [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] to evaluate the efectiveness of text simplification techniques. The authors
propose a framework that combines semantic analysis and structural evaluation to assess the
quality of simplified texts. The framework considers both the preservation of the original
meaning and the improvement in readability. The authors conduct experiments on a large
dataset of simplified texts and demonstrate that their approach outperforms existing evaluation
methods in capturing both semantic and structural changes. The findings of this study contribute
to the development of more accurate and reliable evaluation metrics for text simplification
systems, ultimately leading to improved accessibility and comprehension for diverse readers.
        </p>
        <p>
          In another study by Sanja and goran paper proposes a novel approach [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] to automated
text simplification by leveraging event-based semantics. The authors recognize that complex
sentence structures pose significant challenges to comprehension, especially for individuals with
limited language proficiency. To address this, they introduce a method that focuses on identifying
key events and their participants in a sentence. By simplifying sentence structures while
preserving the fundamental meaning conveyed by these events, the proposed approach aims to
improve the accessibility and understandability of complex texts. Evaluation results demonstrate
promising performance, with the method successfully simplifying sentences while maintaining
their semantic coherence and preserving critical information. This research provides valuable
insights into the use of event-based semantics for text simplification, ofering a potentially
efective solution to enhance the accessibility of complex texts for various user groups.
        </p>
        <p>
          Similarly, Shuming and Xu Sun Introduces a method [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] that leverages event-based semantics
for automated text simplification. The authors propose an approach that focuses on simplifying
complex sentences by representing their semantic structure in terms of events and their
participants. By identifying the main event and its semantic roles, the method generates simpler
versions of the sentences while maintaining the core meaning. Evaluation results demonstrate
that the proposed approach efectively simplifies complex sentences while preserving semantic
coherence. This research contributes to the field by providing a novel perspective on text
simplification, emphasizing the importance of event-based semantics in simplifying complex
texts and making them more accessible to a wider range of readers.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Transformer models for Automatic text simplification</title>
        <p>
          Transformer models have emerged as powerful tools for automatic text simplification, ofering
state-of-the-art performance in various natural language processing tasks. In the context of
text simplification, transformer models have been widely applied and have shown promising
results. Some of the study in this domain i will put here. In 2018, Sanqiang and Rui Meng
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] proposes an approach for sentence simplification that integrates Transformer models with
paraphrase rules. The authors acknowledge the challenges of simplifying sentences while
maintaining their meaning and grammatical correctness. To address this, they combine the
power of Transformer models, known for their ability to learn contextual representations,
with manually curated paraphrase rules. The method involves generating multiple simplified
versions of a source sentence using the Transformer model and then applying the paraphrase
rules to ensure simplicity and coherence. Evaluation results demonstrate that the proposed
approach outperforms existing methods in terms of simplicity and grammaticality. This research
contributes to the field by presenting a comprehensive approach that combines the strengths
of Transformer models and human-created paraphrase rules, ofering a promising solution for
sentence simplification.
        </p>
        <p>
          Similarly, Robert-Mihai proposed a study [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] where they explores the application of
sequenceto-sequence (Seq2Seq) models for automated text simplification. The authors recognize the
importance of enhancing the accessibility of complex texts for individuals with lower reading
abilities. To address this, they propose a method where a Seq2Seq model is trained to generate
simplified versions of input sentences. The model is trained on a large dataset of sentence
pairs, consisting of complex and simplified versions. By learning to map complex sentences
to simpler equivalents, the Seq2Seq model ofers a promising solution for text simplification.
Evaluation results demonstrate that the proposed approach significantly improves readability
while maintaining the core meaning of the original text. This research contributes to the field by
showcasing the efectiveness of Seq2Seq models for automated text simplification, highlighting
their potential to make complex texts more understandable and inclusive.
        </p>
        <p>
          Moreover, Takumi Maruyama [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] focuses on the challenging problem of text simplification
for languages with extremely low resources. The authors propose an approach that leverages
pre-trained Transformer-based language models, such as BERT, to overcome the limitations of
data scarcity. By fine-tuning these models on small amounts of labeled simplification data, they
are able to generate simplified versions of complex sentences. Evaluation results demonstrate
the efectiveness of this approach, with the generated simplifications achieving high levels of
simplicity and readability. The research demonstrates the potential of pre-trained Transformer
models to address low resource scenarios, opening up possibilities for text simplification in
languages with limited available data. This work contributes to the field by providing insights
into adapting large-scale language models for low resource text simplification, making it a
valuable contribution for improving the accessibility of complex texts in low resource language
contexts.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <sec id="sec-3-1">
        <title>3.1. Task description for CLEF (2023) Simple-Text</title>
        <p>The Simple-Text data-set and benchmarks contribute to the research on automatic text
simplification by introducing three interconnected tasks.</p>
        <p>Task 1: What is in (or out)? Select passages to include in a simplified summary, given a
query.</p>
        <p>Task 2: What is unclear? Given a passage and a query, rank terms/concepts that are required
to be explained for understanding this passage (definitions, context, applications,..).</p>
        <p>Task 3: Rewrite this! Given a query, simplify passages from scientific abstracts.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dataset</title>
        <p>In our research, we leverage the DBLP abstracts corpus as the main source of our data.
Specifically, we utilize the Citation Network Dataset, known as DBLP+Citation, which is the
12th version released in 2020. This dataset consists of a vast collection of 4,894,063 scientific
articles. By using this comprehensive corpus, we ensure a diverse and extensive range of
scholarly publications for our analysis and experimentation. The DBLP+Citation dataset serves
as a valuable resource, providing us with the necessary information and content to investigate
various research questions and explore the intricacies of scientific articles in our study.</p>
        <p>For the task of Complexity Spotting, we rely on an annotated database that includes the
following columns: query_id (e.g., G11.1), query_text (e.g., drones), snt_id (e.g., G11.1_2892036907_1),
source_snt, term (e.g., autonomous), dificulty, and definition. This database serves as a
comprehensive resource for training and evaluating our models, providing diverse examples of complex
terms along with their definitions and levels of dificulty.</p>
        <p>For the Text Simplification task, we utilize another dataset which includes the columns:
query_id, query_text, doc_id, snt_id, source_snt, and simplified_snt. This dataset ofers a
variety of complex sentences from scientific texts alongside their simplified versions, ofering a
strong foundation for the evaluation and improvement of our models’ simplification capabilities.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Utilization of Existing Models</title>
        <p>In the following, the approach used for the two tasks involved in this research will be illustrated.
Detailed procedures for data gathering and subsequent preprocessing are covered, along with
the process for extracting features. Moreover, we elucidate the choice and preparation of the
machine learning models, coupled with the performance evaluation metrics utilized for its
assessment.</p>
        <p>
          SimpleT5 Simple T5 is a model built on top of PyTorch Lightning and Transformers. It
allows users to quickly train their T5 models, including T5, mT5, and byT5 models, with only
a few lines of code [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The T5 models, which can be trained using SimpleT5, are versatile
and can be used for a variety of natural language processing (NLP) tasks. These tasks include
summarization, question answering (QA), question generation (QG), translation, text generation,
and more [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>
          AI21 Labs - Jurassic-2 Grande Instruct The J2-Grande-Instruct model is a variation of
the Jurassic-2 series developed by AI21. It is an auto-regressive language model based on
the Transformer architecture and designed with modifications for improved eficiency. The
models diverge from their GPT-3 counterparts in several aspects, including vocabulary size
and the depth/width ratio of the neural net. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] This model is specifically trained to handle
instructions-only prompts, also known as "zero-shot" prompts, without the need for examples
or "few-shot" prompts. It aims to provide a natural way to interact with large language models
and is designed to give users an idea of the optimal output for their task without needing any
examples.
        </p>
        <p>
          BLOOM (BigScience Large Open-science Open-access Multilingual Language Model)
The BLOOM model is an autoregressive Large Language Model (LLM) that leverages a
decoderonly transformer architecture, derived from Megatron-LM GPT-2. It underwent training on
approximately 366 billion tokens between March and July 2022, utilizing 1.6 Terabytes of
preprocessed text. This extensive dataset included 350 billion unique tokens, encompassing 46
natural languages and 13 programming languages, enabling BLOOM to grasp a wide range of
linguistic and programming contexts [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Results Analysis</title>
        <p>In this section, we present a detailed analysis of the results obtained from our experiments. We
begin by providing an overview of the experimental setup and methodology employed for the
evaluation. Subsequently, we delve into the quantitative analysis of the performance metrics,
followed by a qualitative assessment of the generated outputs.</p>
        <p>For Bloom, AI21 and T5, the training processes of these models involve large-scale language
modeling, leveraging vast amounts of text data. Bloom utilizes a combination of unsupervised
and supervised training techniques, incorporating linguistic knowledge and fine-tuning on
specific downstream tasks. AI21 adopts a similar approach, employing a Transformer-based
architecture and training on a diverse dataset. T5, on the other hand, employs a unified framework
that incorporates both supervised and unsupervised learning, enabling it to perform multiple
tasks. These pre-trained models can be utilized by fine-tuning them on specific downstream
tasks, such as text classification, summarization, or question-answering. By adapting the
pretrained models to target tasks, researchers and practitioners can benefit from their powerful
language understanding capabilities and achieve improved performance in a range of NLP
applications.</p>
        <p>This study we did not optimised the hyper parameters of the above utilized models for
simple tasks. we used the default parameters for Task 2.2 and Task 2.1. we focused on utilizing
pre-trained language models without performing fine-tuning. The models, including Bloom,
AI21, and T5, were accessed through their respective APIs to obtain results for the tasks at
hand. Specifically, Table 1 and Table 2 present the performance of the T5 model for Task 2.2.
Similarly, for Task 2.1, no fine-tuning was conducted, and the pre-trained models were utilized
as is. Upon analyzing the provided tables (1, 2, 3, 4), it becomes evident that the achieved
performance is not particularly high. This can be attributed to the fact that the models used are
not specifically tailored to the domain of the study. However, it is important to note that by
undertaking the process of fine-tuning the models with our specific training data, the results
are anticipated to exhibit significant improvements. Fine-tuning the models to align with the
domain-specific requirements of the study would enhance their performance and generate more
favorable outcomes.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>The comprehension of scientific texts can be challenging for non-specialist readers, acting as a
barrier that restricts access to scientific knowledge and inhibits public engagement in scientific
discussions. To address this issue and foster the democratization of scientific information, we
propose leveraging artificial intelligence techniques for identifying and explaining complex
concepts (referred to as Complexity Spotting) and simplifying scientific texts. This study
investigates state-of-the-art deep learning models for their efectiveness in these tasks. We train
and evaluate the models using a data-set of scientific articles that have been annotated to
identify complex concepts and their corresponding simpler explanations. Through a comparative
analysis, we provide insights into the strengths and weaknesses of each model’s performance.
Our findings highlight promising opportunities for future research and development in
automated text simplification, which contributes to the overarching objective of making scientific
knowledge more accessible to a wider audience.</p>
      <p>In this study, we did not perform fine-tuning on the large language models such as T5, Bloom,
and AI21. Despite this, the performance of the models remained reasonable. For our future
investigations, we plan to conduct thorough data exploration and analysis as a preliminary step.
Subsequently, we intend to fine-tune these models using the provided dataset. Given that these
models have already been trained on extensive data, fine-tuning can be achieved with a smaller
amount of additional data. Additionally, these models exhibit some capability in handling data
imbalance, albeit to a certain extent, if the dataset is not heavily skewed. By pursuing these
steps, we aim to further enhance the performance and applicability of the models in the context
of scientific text simplification.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.-O.</given-names>
            <surname>Truică</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-I.</given-names>
            <surname>Stan</surname>
          </string-name>
          , E.-S. Apostol,
          <article-title>Simplex: a lexical text simplification architecture</article-title>
          ,
          <source>Neural Computing and Applications</source>
          <volume>35</volume>
          (
          <year>2023</year>
          )
          <fpage>6265</fpage>
          -
          <lpage>6280</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Swain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tambe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ballal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dolase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Rajmane</surname>
          </string-name>
          ,
          <article-title>Lexical text simplification using wordnet</article-title>
          ,
          <source>in: Advances in Computing and Data Sciences: Third International Conference, ICACDS</source>
          <year>2019</year>
          , Ghaziabad, India,
          <source>April 12-13</source>
          ,
          <year>2019</year>
          ,
          <string-name>
            <given-names>Revised</given-names>
            <surname>Selected</surname>
          </string-name>
          <string-name>
            <surname>Papers</surname>
          </string-name>
          ,
          <source>Part II 3</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>114</fpage>
          -
          <lpage>122</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Qiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          , Lsbert:
          <article-title>Lexical simplification based on bert</article-title>
          ,
          <source>IEEE/ACM Transactions on Audio, Speech, and Language Processing</source>
          <volume>29</volume>
          (
          <year>2021</year>
          )
          <fpage>3064</fpage>
          -
          <lpage>3076</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Štajner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Saggion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Ponzetto</surname>
          </string-name>
          ,
          <article-title>Improving lexical coverage of text simplification systems for spanish</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>118</volume>
          (
          <year>2019</year>
          )
          <fpage>80</fpage>
          -
          <lpage>91</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Sulem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Abend</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rappoport</surname>
          </string-name>
          ,
          <article-title>Semantic structural evaluation for text simplification</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>05022</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Štajner</surname>
          </string-name>
          , G. Glavaš,
          <article-title>Leveraging event-based semantics for automated text simplification</article-title>
          ,
          <source>Expert systems with applications 82</source>
          (
          <year>2017</year>
          )
          <fpage>383</fpage>
          -
          <lpage>395</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>A semantic relevance based neural network for text summarization and text simplification</article-title>
          ,
          <source>arXiv preprint arXiv:1710.02318</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Andi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bambang</surname>
          </string-name>
          ,
          <article-title>Integrating transformer and paraphrase rules for sentence simplification</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>11193</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>R.-M. Botarleanu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dascalu</surname>
            ,
            <given-names>S. A.</given-names>
          </string-name>
          <string-name>
            <surname>Crossley</surname>
            ,
            <given-names>D. S. McNamara</given-names>
          </string-name>
          ,
          <article-title>Sequence-to-sequence models for automated text simplification</article-title>
          ,
          <source>in: Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July</source>
          <volume>6</volume>
          -
          <issue>10</issue>
          ,
          <year>2020</year>
          , Proceedings,
          <source>Part II 21</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>31</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Maruyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yamamoto</surname>
          </string-name>
          ,
          <article-title>Extremely low-resource text simplification with pre-trained transformer language model</article-title>
          ,
          <source>International Journal of Asian Language Processing</source>
          <volume>30</volume>
          (
          <year>2020</year>
          )
          <fpage>2050001</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Roy</surname>
          </string-name>
          , simplet5, https://pypi.org/project/simplet5/,
          <year>2022</year>
          . Accessed:
          <fpage>2023</fpage>
          -06-05.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>O.</given-names>
            <surname>Lieber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sharir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lenz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shoham</surname>
          </string-name>
          , Jurassic-1
          <source>: Technical Details and Evaluation</source>
          ,
          <source>Technical Report, AI21 Labs</source>
          ,
          <year>2023</year>
          . URL: https://uploads-ssl.
          <source>webflow.com/ 60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B.</given-names>
            <surname>Workshop</surname>
          </string-name>
          , Bloom:
          <article-title>A 176b-parameter open-access multilingual language model</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2211</volume>
          .
          <fpage>05100</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>