<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Generating Academic Abstracts: Controlled Text Generation Using Metrics from a Target Text Allows for Transparency in the Absence of Specialized Knowledge</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elena Callegari</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Vajdecka</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Desara Xhura</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>SageWrite ehf.</institution>
          ,
          <addr-line>Grettisgata 55, 101 Reykjavík</addr-line>
          ,
          <country country="IS">Iceland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Iceland</institution>
          ,
          <addr-line>Saemundargata 2, 102 Reykjavík</addr-line>
          ,
          <country country="IS">Iceland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1938</year>
      </pub-date>
      <abstract>
        <p>The lack of specialized linguistic knowledge in end users can make it dificult to develop more transparent natural language generation applications. This paper introduces a novel approach to controlled text generation, with a particular emphasis on controlling the stylistic properties of the generated text. By extracting and concatenating numerical metrics-representing various stylistic properties-from a reference text crafted by the target author into the text input of our generation model, we enhance the stylistic alignment between generated and target texts. Our proposed method successfully improves this alignment, surpassing the baseline, and represents a promising first step towards striking a balance between explainable AI and lack of specialized knowledge.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Natural Language Generation</kwd>
        <kwd>Controllable Text Generation</kwd>
        <kwd>Rule-based Generation</kwd>
        <kwd>Explainable AI</kwd>
        <kwd>Academic Abstracts</kwd>
        <kwd>Stylistic Metrics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Artificial Intelligence (AI) and Machine Learning (ML) have achieved remarkable success across
various practical applications such as natural language processing, face recognition, autonomous
driving, image classification, automated clinical diagnosis, and more [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Deep learning
approaches have even surpassed human performance in numerous domains [
        <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
        ]. However,
a significant drawback of deep learning methods is their inherent lack of interpretability
[
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ], which makes them opaque “black boxes". This opacity poses substantial challenges,
particularly when attempting to interpret predictive results that might later be found incorrect
[
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ]. Consequently, there is an urgent necessity to add transparency, comprehensibility, and
explainability to the outcomes and decisions made by machine learning systems [
        <xref ref-type="bibr" rid="ref10 ref5 ref8 ref9">5, 8, 9, 10</xref>
        ].
      </p>
      <p>
        The need for increased transparency can also be appreciated in the domain of Natural
Language Generation (NLG). NLG is a subfield of artificial intelligence that focuses on generating
human-like text from structured data or other inputs. Most studies at the interface between
Explainable AI (XAI) and NLG have focused on using NLG to generate those additional
explanations needed to make a given AI application more transparent [
        <xref ref-type="bibr" rid="ref11 ref12 ref13 ref14 ref15 ref16 ref17 ref18">11, 12, 13, 14, 15, 16, 17, 18</xref>
        ].
An example is [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], who use NLG to generate text that justifies and explains the rationale for
the diferent classification decisions made in a deep visual recognition task. However, text
generation for the sake of text generation (e.g. for question-answering, or to automatically
generate the summary of an article) could also benefit from additional transparency [
        <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
        ],
particularly with the arrival of large, pre-trained models.
      </p>
      <p>
        The advent of large language models (LLMs) such as GPT [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], T5 [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] and BART [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] has
made it possible to generate text that is more diverse, sounds more natural and is more relevant.
However, LLMs are essentially still black boxes, lacking interpretability: LLMs always generate
text according to the latent representation of the context, making it dificult to control the
generation. This has led to the rapid rise of Controlled Text Generation (CTG) studies with
transformer-based LLMs. Various methodologies have emerged in the last 3-4 years, with topic
and sentiment control being particularly popular areas of application of CTG [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Controlling Style in NLG: User Knowledge</title>
      <p>
        Despite the impressive progress in NLG, the challenge of producing text outputs that conform
to specific writing styles or stylistic preferences persists. Eforts in this direction often employ
costly methods such as reinforcement learning [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], or oversimplify the style transfer issue by
framing style as a classification problem, categorizing writing styles into specific classes, such as
Formal, Poetic, or Topical [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. However, writing styles encompass a uniqueness and complexity
that extends beyond simple categorical classes. Research in stylometry [27, 28, 29, 30, 31]
demonstrates that specific writing features, such as punctuation use patterns [ 28], can help
distinguish whether a text was written by a particular individual, underscoring writing style as
a distinctive attribute of the author, which cannot be easily constrained into a particular style
classification dilemma.
      </p>
      <p>Unless deliberate measures are employed to govern text generation, the resulting style
remains implicitly influenced by the training data [ 32], potentially leading to unnatural and
monotonous-sounding text that often significantly deviates from an intended target text, or
from the intentions of the end user. Hence, it becomes desirable to have explicit control over
various stylistic aspects of text generation. For instance, an author aiming to generate text might
want the generation to reflect their aversion to the use of an excessive number of adjectives, or
their preference for sentences featuring at most two embedded clauses.</p>
      <p>Being able to control generation down to these fine-grained parameters would improve user
satisfaction with respect to the outputs of a text-generation model: if users could define and
modify fine-grained parameters such as “max. number of adjectives per nominal phrase" and
“max number of embedded clauses per sentence" according to their preferences, they would
be able to generate NLG outputs that are truly tailored to their individual style. Controlled
text generation would also enhance the transparency of deep-learning models: by controlling
generation through diferent linguistic parameters, users would be in a better position to
understand the decision-making process behind the generated text. Moreover, by examining
these parameters, users could gain insights into how diferent linguistic parameters are utilized
to produce specific text elements. For example, modifying and observing the impact of specific
linguistic parameters would enable users to trace how alterations influence the text output,
providing an observable cause-efect relationship in the text-generation process.</p>
      <p>The main catch preventing us from implementing such fine-grained parameters and allowing
the user the possibility to control them is the fact that a significant portion of the population
lacks the necessary linguistic background to comprehend and adjust them. Users might not
know what the diference between sentence and clause is, or might struggle to remember what
an adjective is. Moreover, knowledge of linguistic terms does not guarantee an ability to
deconstruct one style’s into specific sub-parameters: we can easily tell whether we like, dislike
or are indiferent to the style of a given paper, but determining whether that is due to particular
lexical, syntactic or punctuation choices is considerably harder to do.</p>
      <p>In this study, a novel method is presented to incorporate stylistic metrics into text generation
control. The objective is to achieve text outputs that closely match an intended style. To achieve
this, we extract a set of stylistic metrics, represented as numerical values with decimal points,
from a reference text written by a target author. These metrics are then incorporated into the
input provided to our text-generation model.</p>
      <p>The idea behind this methodology is to render NLG more transparent and customizable
without the need for the user to have any specialized linguistic knowledge to fine-tune or
understand the parameters behind the generation. Instead of letting the user set fine-grained
parameters to control text generation, we extract them automatically from a reference text
that is representative of the style of the target author, and incorporate these parameters into
the input of our text-generation model. We believe ours to be a promising approach towards
improving and democratizing the explainability of NLG, as no specialized linguistic knowledge
is required from the user, who is nonetheless still able to control the output of the generation.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Existing Work</title>
      <p>We test our approach for controlling text generation on the task of automatically generating
academic abstracts using the article as input. Our goal is to produce abstracts that harmonize
with the overall style of the article, creating a seamless and coherent extension of the paper
itself.</p>
      <p>Abstract generation can be seen as a type of summarization problem, a challenge that can be
approached using a variety of techniques [33]. Extractive summarization involves identifying
key sentences or fragments in the original text and piecing them together to form a summary
[33]. In contrast, abstractive summarization or generative summarization aims to generate
novel sentences, potentially using new phrasing or condensation, to provide an overview of the
contents of the original text [33]. A hybrid approach can combine both techniques, leveraging
their respective strengths [34]. Additionally, some researchers have proposed citation-based
summarization, where the content of citations is used to help produce the summary [35]. The
length and complexity of scientific articles have historically made it dificult to train abstractive
summarization models only. There is a distinct lack of research on the generation of abstracts
from scientific articles using abstractive summarization techniques directly.</p>
      <p>Concerning existing studies that have attempted to control the style of an automatically
generated text, important mentions are Syed et al. (2020) ([36]), who control for stylistic
properties by fine-tuning on a target author’s corpus using denoising autoencoder loss, Wang et
al. 2019 ([37]), who incorporate GPT-2 with a rule-based system for formality style transfer, and
Singh et al. (2020) ([38]), who, using reinforcement learning, attempt to induce certain lexical
target-author attributes by incorporating continuous multi-dimensional lexical preferences of
the target author into the language model.</p>
      <p>To our knowledge, no one has yet attempted to incorporate text together with decimal
numbers as the input of LLM-based summarization systems to enhance the quality of the final
summary in the form of an abstract. This current study seeks to bridge this gap by investigating
the application and performance of such an approach.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Our Approach</title>
      <sec id="sec-4-1">
        <title>4.1. Dataset</title>
        <p>As our initial dataset, we decided to use the Huggingface ArXiv dataset1, which features 215,913
scientific articles along with their respective abstracts. We then eliminated the bibliography
section from each article. Next, we computed the word-piece token length [39] and word length
for every article. All articles exceeding 2800 words or 3650 word-piece tokens were excluded
from the dataset. This step was necessary to adhere to the maximum token length allowed as
input, considering the limitations of our Nvidia H100 GPUs, which are the GPUs we used for
training.</p>
        <p>The resulting dataset consists of 18,175 scientific articles and their corresponding abstract,
and can be accessed here2. To develop and evaluate an unbiased generative model, we divided
this dataset using a 60:20:20 split.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Stylistic Metrics</title>
        <p>Academic authors bring their own distinctive writing styles, which can be influenced by their
educational training, research experiences, and personal writing preferences. These variations in
style become apparent in multiple aspects, encompassing sentence organization, word selection,
formality levels, use and frequency of specialized terminology, and punctuation preferences.
The diversity of academic writing styles is made even more significant because of disciplinary
diferences, as each research field may adhere to specific conventions and norms concerning
writing approaches.</p>
        <p>In order to replicate a particular author’s style, one should ideally account for all these factors.
However, in this paper, we only attempt to control the parameters that are outlined below:
1. Average, mean, max, min number of words per sentence, and st. dev. value;
2. Average, mean, max, min word length, and st. dev. value;
3. Average, mean, max, min paragraph length, and st. dev. value;
1https://huggingface.co/datasets/scientific_papers
2https://www.kaggle.com/datasets/desaraxhura/arxiv-dataset-enhanced-with-stylometric-features</p>
        <p>Although these metrics cover just a portion of all conceivable dimensions that could be
analyzed, they present a balanced combination of syntactic features (e.g., maximum number
of subordinates), morphological aspects (average word length), lexical characteristics (lexical
diversity), and purely stylistic elements (use of Oxford commas).</p>
        <p>We extracted these metrics from the texts in our dataset, resulting in stylistic reports similar
to that illustrated in Figure 1. As can be seen in Fig. 1, these reports contain not only text but
also numerical values with decimal points.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Model Selection</title>
        <p>
          After considering several large language models, we ultimately opted for T5 due to its exceptional
performance on numerical reasoning tasks [40] and proficiency in numeracy tasks with integer
numbers [41]. Nevertheless, the original T5 model was not specifically trained to handle
inputs containing both text and numbers with decimal points [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. To overcome this limitation,
we explored the possibility of using Flan T5, a modified version of T5 known for its good
performance in few-shot learning [42]. We ran multiple experiments and observed that Flan T5
outperforms the original T5 base model on various summarization tasks, a similar conclusion to
the one reached by Chung et al. (2022) ([42]). Based on these observations, we opted for further
ifne-tuning Flan T5 instead of T5.
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Logic &amp; Experiments</title>
        <p>Our aim was to investigate whether and how the inclusion of stylistic reports, computed from
various types of target texts (either the original abstract or the full article text), impacted our
model’s performance.</p>
        <p>Our approach consisted of three phases. In the first phase, we trained the model using raw
article text as input, without incorporating any stylistic report. This served as our baseline,
enabling us to evaluate the model’s performance without the influence of stylistic metrics. The
model solely relied on the patterns within the input text for making inferences.</p>
        <p>Having established the baseline, we proceeded to the second phase of our approach, where
we integrated stylistic reports calculated from the original abstract of each paper. The stylistic
report and the raw text of each article were combined to create a new input for the model.</p>
        <p>In the last phase of the experiment, we built upon the second phase by employing stylistic
reports calculated from the raw text of each article (i.e., the article text without the abstract and
bibliography), instead of relying on the original abstract. Our hypothesis was that incorporating
the stylistic report from the entire article would ofer a more comprehensive representation of
the article’s style compared to a report derived solely from the original abstract. Just like in the
previous phase, we concatenated the stylistic report with the raw article text to create the input
for the Flan T5 model.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Fine-tuning Flan T5 models</title>
        <p>For fine-tuning, we utilized PyTorch as the framework, running the process concurrently on
three Nvidia H100 GPUs with 80 gigabytes of GPU memory.</p>
        <p>To train our models, we experimented with multiple diferent training parameters,
acknowledging the constant need for optimization. We reached the best results when we opted for 3
epochs, using a learning rate of 1e-5, a batch size of 3, and the Adam optimizer [43]. We set the
maximum input sequence length to 4,000 word-piece tokens, with the stylistic report taking up
350 tokens, allowing for a maximum of 3650 tokens for the article input. For generated abstracts,
the maximum output sequence length was set to 400 word-piece tokens. To promote diversity
and exploration during training, we employed a sampling parameter set to True. To ensure
reproducibility and maintain control over randomization during training, we set the random
seed to 42.</p>
        <p>Despite our best eforts, the inherent GPU memory limitations constrained us from training
on even longer scientific articles, restricting us to papers of up to 4,000 word-piece tokens.
Though these articles, at 4,000 word-piece tokens, are on the longer end of what we categorize
as “short papers”, this limitation underscores the severity of the imposed hardware constraints.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>In assessing our experiments, we employed Rouge metrics [44] to assess the generated abstract’s
quality by examining word overlap with the original abstract. The outcomes are presented
in Table 2. The inclusion of a stylistic report calculated from the original abstract results in
an increased overlap compared to the baseline, as evidenced by all Rouge metrics. However,
integrating a stylistic report calculated from the article text does not lead to any improvement
over the baseline model’s results.</p>
      <p>We also conducted cosine similarity calculations between the numerical values present in
the stylistic reports of the generated abstracts and those derived from the original abstracts.
This allowed us to determine the efectiveness of incorporating a stylistic report into the Flan
T5 model as a CTG technique, aligning the stylistic attributes of the generated abstracts with
those of the originals.</p>
      <p>To compute the cosine similarity, we transformed each stylistic report (illustrated in Fig. 1)
into a vector representation (demonstrated in Fig. 1). Subsequently, we computed the cosine
similarity between the stylistic vectors of the generated abstracts and those of the original
abstracts. In Table 3, we present the cosine similarity statistics obtained from the test dataset,
with the mean and standard deviation being the most relevant values. Notably, the model
that incorporates stylistic reports from the original abstracts produces abstracts that are most
stylistically similar to the original ones. Furthermore, this model achieves the lowest variability
around the mean, indicated by the standard deviation, ensuring that the stylistic similarities are
the least divergent compared to the other models.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>Explainable AI is crucial for fostering trust, transparency, and interpretability of NLG systems.
Rule-based control in NLG provides a practical approach to increase explainability by leveraging
explicit rules to govern the text-generation process. However, most potential end users of NLG
applications may lack the specialized linguistic knowledge and introspection needed to control
text generation down to truly fine-grained linguistic parameters.</p>
      <p>In this paper, we have introduced a novel approach to control the stylistic features of text
generation by incorporating stylistic metrics, featuring decimal numbers, from a target text into
Transformer-based language models. Our results demonstrate the potential of using stylistic
metrics extracted from the original abstract as control mechanisms for abstract generation;
this method allowed us to achieve better stylistic alignment with the target text than what we
achieved with the baseline alone. In the future, we would like to experiment with vectors that
include more metrics than the 11 dimensions we included in this first study, to see how much
alignment improves when additional linguistic dimensions are included in the stylistic report
that is used as the input for our text-generation model.</p>
      <p>While we were successful in improving alignment when using the original abstract as target
text, we observed a slight decrease in Rouge and vector similarity scores when calculating
stylistic metrics on the full article instead. Several factors might be responsible for this diference.
Firstly, the presence of multiple authors in STEM articles (articles in the ArXiv dataset are
all STEM articles) might mean that diferent sections of an article are written by diferent
individuals, resulting in a multitude of diferent styles being used throughout the paper. To
explore this further, it would be interesting to conduct a study solely using single-authored
articles as input text, to examine whether this nullifies the observed diference between using
the original abstract and the full article text to calculate the stylistic report.</p>
      <p>Secondly, it is possible that diferent sections of an article require distinct writing styles. For
example, the Methodology section might difer in style from the Introduction or Conclusions
sections. Investigating variations in writing styles across diferent sections and exploring
stylistic patterns that better capture each section could be a valuable avenue for future research.</p>
      <p>Overall, our findings suggest that the introduced CTG technique has the potential to assist
researchers in refining specific sections of an article to align them with their desired writing
style. Additionally, our methods ofer the flexibility to finely adjust certain sections to match a
target writing style, providing researchers with greater control over the stylistic properties of
their generated text. Finally, our proposed CTG technique ofers an efective way of enhancing
the transparency of NLG applications without the need for end users to have any specialized
linguistic knowledge, as the fine-grained parameters that are used to control the text generation
are extracted automatically from a reference text that the end user can pick themselves.</p>
      <p>A limitation of this study is the lack of a qualitative human evaluation of the generated
abstracts and their alignment with the original abstract. We have not resorted to human
evaluation due to the high-level expertise required to evaluate abstracts of highly technical
papers. This is an often-cited issue in NLG studies [36, 38]. Despite the lack of such evaluation,
our results are promising and ofer a plausible line of research for controlling text generation
using a target text to extract the control parameters.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgements</title>
      <p>This study was partly supported by a grant from Rannís, the Icelandic Institute for Research, and
a grant by the European Union (Women TechEU, European Innovation Ecosystems programme,
Horizon Europe). Computational resources were provided by the e-INFRA CZ project (ID:90254),
supported by the Ministry of Education, Youth and Sports of the Czech Republic.
[27] E. Stamatatos, A survey of modern authorship attribution methods, Journal of the</p>
      <p>American Society for information Science and Technology 60 (2009) 538–556.
[28] H. Gómez-Adorno, J.-P. Posadas-Duran, G. Ríos-Toledo, G. Sidorov, G. Sierra,
Stylometrybased approach for detecting writing style changes in literary texts, Computación y
Sistemas 22 (2018) 47–53.
[29] Y. Sari, M. Stevenson, A. Vlachos, Topic or style? exploring the most useful features for
authorship attribution, in: Proceedings of the 27th international conference on computational
linguistics, 2018, pp. 343–353.
[30] R. Sarwar, C. Yu, N. Tungare, K. Chitavisutthivong, S. Sriratanawilai, Y. Xu, D. Chow,
T. Rakthanmanon, S. Nutanong, An efective and scalable framework for authorship
attribution query processing, IEEE Access 6 (2018) 50030–50048.
[31] K. Lagutina, N. Lagutina, E. Boychuk, I. Vorontsova, E. Shliakhtina, O. Belyaeva, I.
Paramonov, P. Demidov, A survey on stylometric text features, in: 2019 25th Conference of
Open Innovations Association (FRUCT), IEEE, 2019, pp. 184–195.
[32] K.-H. Zeng, M. Shoeybi, M.-Y. Liu, Style example-guided text generation using generative
adversarial transformers, arXiv preprint arXiv:2003.00674 (2020).
[33] N. I. Altmami, M. E. B. Menai, Automatic summarization of scientific articles: A survey,</p>
      <p>Journal of King Saud University-Computer and Information Sciences 34 (2022) 1011–1028.
[34] L. Xiao, L. Wang, H. He, Y. Jin, Copy or rewrite: Hybrid summarization with hierarchical
reinforcement learning, in: Proceedings of the AAAI Conference on Artificial Intelligence,
volume 34, 2020, pp. 9306–9313.
[35] M. Yasunaga, J. Kasai, R. Zhang, A. R. Fabbri, I. Li, D. Friedman, D. R. Radev, Scisummnet:
A large annotated corpus and content-impact models for scientific paper summarization
with citation networks, in: Proceedings of the AAAI conference on artificial intelligence,
volume 33, 2019, pp. 7386–7393.
[36] B. Syed, G. Verma, B. V. Srinivasan, A. Natarajan, V. Varma, Adapting language models
for non-parallel author-stylized rewriting, in: Proceedings of the AAAI Conference on
Artificial Intelligence, volume 34, 2020, pp. 9008–9015.
[37] Y. Wang, Y. Wu, L. Mou, Z. Li, W. Chao, Harnessing pre-trained neural networks with
rules for formality style transfer, in: Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and the 9th International Joint Conference on
Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 3573–3578.
[38] H. Singh, G. Verma, B. V. Srinivasan, Incorporating stylistic lexical preferences in generative
language models, arXiv preprint arXiv:2010.11553 (2020).
[39] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I.
Polosukhin, Attention is all you need, Advances in neural information processing systems 30
(2017).
[40] P.-J. Yang, Y. T. Chen, Y. Chen, D. Cer, Nt5?! training t5 to perform numerical reasoning,
arXiv preprint arXiv:2104.07307 (2021).
[41] K. K. Pal, C. Baral, Investigating numeracy learning ability of a text-to-text transfer
model, in: Findings of the Association for Computational Linguistics: EMNLP 2021,
Association for Computational Linguistics, Punta Cana, Dominican Republic, 2021, pp.
3095–3101. URL: https://aclanthology.org/2021.findings-emnlp.265. doi: 10.18653/v1/
2021.findings-emnlp.265.
[42] H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, E. Li, X. Wang, M.
Dehghani, S. Brahma, et al., Scaling instruction-finetuned language models, arXiv preprint
arXiv:2210.11416 (2022).
[43] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint
arXiv:1412.6980 (2014).
[44] C.-Y. Lin, Rouge: A package for automatic evaluation of summaries, in: Text summarization
branches out, 2004, pp. 74–81.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Adke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghorpade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chaudhari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Patil</surname>
          </string-name>
          ,
          <article-title>Navigating the confluence of machine learning with deep learning: Unveiling cnns, layer configurations, activation functions, and real-world utilizations (</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chandak</surname>
          </string-name>
          , S. Liu,
          <string-name>
            <given-names>P.</given-names>
            <surname>Van Katwyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Deac</surname>
          </string-name>
          , et al.,
          <article-title>Scientific discovery in the age of artificial intelligence</article-title>
          ,
          <source>Nature</source>
          <volume>620</volume>
          (
          <year>2023</year>
          )
          <fpage>47</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          , I. Shakeel,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mehfuz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <article-title>Deep learning models for cloud, edge, fog, and iot computing paradigms: Survey, recent advances, and future directions</article-title>
          ,
          <source>Computer Science Review</source>
          <volume>49</volume>
          (
          <year>2023</year>
          )
          <fpage>100568</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Doshi-Velez</surname>
          </string-name>
          ,
          <article-title>Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients</article-title>
          ,
          <source>in: Proceedings of the AAAI conference on artificial intelligence</source>
          , volume
          <volume>32</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rudin</surname>
          </string-name>
          ,
          <article-title>Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead</article-title>
          ,
          <source>Nature machine intelligence</source>
          <volume>1</volume>
          (
          <year>2019</year>
          )
          <fpage>206</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rudin</surname>
          </string-name>
          ,
          <article-title>Why black box machine learning should be avoided for high-stakes decisions</article-title>
          , in brief,
          <source>Nature Reviews Methods Primers</source>
          <volume>2</volume>
          (
          <year>2022</year>
          )
          <fpage>81</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Petch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Di</surname>
          </string-name>
          , W. Nelson,
          <article-title>Opening the black box: the promise and limitations of explainable machine learning in cardiology</article-title>
          ,
          <source>Canadian Journal of Cardiology</source>
          <volume>38</volume>
          (
          <year>2022</year>
          )
          <fpage>204</fpage>
          -
          <lpage>213</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Holzinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Malle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kieseberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Reihs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zatloukal</surname>
          </string-name>
          ,
          <article-title>Towards the augmented pathologist: Challenges of explainable-ai in digital pathology</article-title>
          ,
          <source>arXiv preprint arXiv:1712.06657</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Holzinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Plass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Holzinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. C.</given-names>
            <surname>Crisan</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-M. Pintea</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Palade</surname>
          </string-name>
          ,
          <article-title>A glass-box interactive machine learning approach for solving np-hard problems with the human-inthe-loop</article-title>
          ,
          <source>arXiv preprint arXiv:1708.01104</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>McGovern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lagerquist</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. John</given-names>
            <surname>Gagne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Jergensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. L.</given-names>
            <surname>Elmore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Homeyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Smith,</surname>
          </string-name>
          <article-title>Making the black box more transparent: Understanding the physical implications of machine learning</article-title>
          ,
          <source>Bulletin of the American Meteorological Society</source>
          <volume>100</volume>
          (
          <year>2019</year>
          )
          <fpage>2175</fpage>
          -
          <lpage>2199</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>J. M. Alonso</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ramos-Soto</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Reiter</surname>
            ,
            <given-names>K. van Deemter</given-names>
          </string-name>
          ,
          <article-title>An exploratory study on the benefits of using natural language for explaining fuzzy rule-based systems</article-title>
          ,
          <source>in: 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)</source>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Hendricks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Akata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rohrbach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Donahue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schiele</surname>
          </string-name>
          , T. Darrell,
          <article-title>Generating visual explanations</article-title>
          , in: Computer Vision-ECCV
          <year>2016</year>
          : 14th European Conference, Amsterdam, The Netherlands,
          <source>October 11-14</source>
          ,
          <year>2016</year>
          , Proceedings,
          <source>Part IV 14</source>
          , Springer,
          <year>2016</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Anjomshoae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Främling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Najjar</surname>
          </string-name>
          ,
          <article-title>Explanations of black-box model predictions by contextual importance and utility</article-title>
          , in: Explainable, Transparent Autonomous Agents and
          <string-name>
            <surname>Multi-Agent</surname>
            <given-names>Systems</given-names>
          </string-name>
          : First International Workshop, EXTRAAMAS 2019, Montreal, QC, Canada, May
          <volume>13</volume>
          -14,
          <year>2019</year>
          ,
          <source>Revised Selected Papers 1</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>95</fpage>
          -
          <lpage>109</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>E.</given-names>
            <surname>Reiter</surname>
          </string-name>
          ,
          <article-title>Natural language generation challenges for explainable ai</article-title>
          , arXiv preprint arXiv:
          <year>1911</year>
          .
          <volume>08794</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>J. M. Alonso</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Barro</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Bugarín</surname>
          </string-name>
          , K. van
          <string-name>
            <surname>Deemter</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gardent</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Gatt</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Reiter</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Sierra</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Theune</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Tintarev</surname>
          </string-name>
          , et al.,
          <article-title>Interactive natural language technology for explainable artificial intelligence</article-title>
          ,
          <source>in: International Workshop on the Foundations of Trustworthy AI Integrating Learning, Optimization and Reasoning</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>63</fpage>
          -
          <lpage>70</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>E.</given-names>
            <surname>Mariotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Alonso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gatt</surname>
          </string-name>
          ,
          <article-title>Towards harnessing natural language generation to explain black-box models</article-title>
          ,
          <source>in: 2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>22</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>I.</given-names>
            <surname>Donadello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dragoni</surname>
          </string-name>
          ,
          <article-title>Bridging signals to natural language explanations with explanation graphs</article-title>
          ,
          <source>in: Proceedings of the 2nd Italian Workshop on Explainable Artificial Intelligence</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yoo</surname>
          </string-name>
          , J.-G. Lou,
          <article-title>Explainable automated debugging via large language model-driven scientific debugging</article-title>
          ,
          <source>arXiv preprint arXiv:2304.02195</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <article-title>Explainability for large language models: A survey</article-title>
          ,
          <source>arXiv preprint arXiv:2309.01029</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Danilevsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Aharonov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Katsis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kawas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <article-title>A survey of the state of explainable ai for natural language processing</article-title>
          , arXiv preprint arXiv:
          <year>2010</year>
          .
          <volume>00711</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , et al.,
          <article-title>Language models are unsupervised multitask learners</article-title>
          ,
          <source>OpenAI blog 1</source>
          (
          <year>2019</year>
          )
          <article-title>9</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Exploring the limits of transfer learning with a unified text-to-text transformer</article-title>
          ,
          <source>The Journal of Machine Learning Research</source>
          <volume>21</volume>
          (
          <year>2020</year>
          )
          <fpage>5485</fpage>
          -
          <lpage>5551</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghazvininejad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          , L. Zettlemoyer, Bart:
          <article-title>Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension</article-title>
          , arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>13461</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <article-title>A survey of controllable text generation using transformer-based pre-trained language models</article-title>
          ,
          <source>arXiv preprint arXiv:2201.05337</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>R.</given-names>
            <surname>Liu</surname>
          </string-name>
          , G. Xu,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jia</surname>
          </string-name>
          , W. Ma, L.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Vosoughi</surname>
          </string-name>
          ,
          <article-title>Data boost: Text data augmentation through reinforcement learning guided conditional generation</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <source>Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          . URL: https://doi.org/10.18653%2Fv1%
          <fpage>2F2020</fpage>
          . emnlp-main.
          <volume>726</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>K.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Klein</surname>
          </string-name>
          , Fudge:
          <article-title>Controlled text generation with future discriminators</article-title>
          ,
          <source>ArXiv abs/2104</source>
          .05218 (
          <year>2021</year>
          ). URL: https://api.semanticscholar.org/CorpusID:233210709.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>