=Paper= {{Paper |id=Vol-3917/paper59 |storemode=property |title=Advances in neural text generation: A systematic review (2022-2024) |pdfUrl=https://ceur-ws.org/Vol-3917/paper59.pdf |volume=Vol-3917 |authors=Artem V. Slobodianiuk,Serhiy O. Semerikov |dblpUrl=https://dblp.org/rec/conf/cs-se-sw/SlobodianiukS24 }} ==Advances in neural text generation: A systematic review (2022-2024)== https://ceur-ws.org/Vol-3917/paper59.pdf
                         Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                                                                                332–361


                         Advances in neural text generation: A systematic review
                         (2022-2024)
                         Artem V. Slobodianiuk1 , Serhiy O. Semerikov1,2,3,4,5
                         1
                           Kryvyi Rih State Pedagogical University, 54 Universytetskyi Ave., Kryvyi Rih, 50086, Ukraine
                         2
                           Institute for Digitalisation of Education of the NAES of Ukraine, 9 M. Berlynskoho Str., Kyiv, 04060, Ukraine
                         3
                           Zhytomyr Polytechnic State University, 103 Chudnivsyka Str., Zhytomyr, 10005, Ukraine
                         4
                           Kryvyi Rih National University, 11 Vitalii Matusevych Str., Kryvyi Rih, 50027, Ukraine
                         5
                           Academy of Cognitive and Natural Sciences, 54 Universytetskyi Ave., Kryvyi Rih, 50086, Ukraine


                                     Abstract
                                      Recent years have witnessed significant advancements in neural text generation driven by the emergence of
                                      large language models and growing interest in this field. This systematic review aims to identify and summarize
                                      current trends, approaches, and methods in neural text generation from 2022 to 2024, complementing the findings
                                      of a previous review covering 2015-2021. Following the PRISMA methodology, 43 articles were selected from the
                                      Scopus database for analysis. The review reveals a shift towards innovative model architectures like Transformer-
                                      based models (GPT-2, GPT-3, BERT), attention mechanisms, and controllable text generation. While BLEU,
                                      ROUGE, and human evaluation remain the most popular evaluation metrics, new metrics like BERTScore have
                                      emerged. Datasets span diverse domains and data types, with growing interest in unlabeled data. Applications
                                      have expanded to areas such as table-to-text generation, knowledge graph-based generation, and medical text
                                      generation. Although English dominates, there is increasing research on low-resource languages. The findings
                                      highlight the rapid evolution of neural text generation methods, the broadening of application areas, and promising
                                      avenues for future research.

                                      Keywords
                                      neural text generation, deep learning, systematic review, natural language processing, evaluation metrics, datasets,
                                      applications, low-resource languages




                         1. Introduction
                         1.1. Problem statement
                         Natural Language Processing (NLP) is an interdisciplinary field of computer science and linguistics [1,
                         p. 1], the classification of the main tasks of which is shown in figure 1.
                            Text content generation is a branch of NLP that combines computational linguistics and artificial
                         intelligence to generate text [2, p. 53490].
                            In 2022, OpenAI [3] introduced ChatGPT, a chatbot based on the GPT model that provided a natural
                         language interface to the user. In most systematic reviews, similar questions are considered, which
                         explains the choice of our review type.
                            The previous review "A Systematic Literature Review on Text Generation Using Deep Neural Network
                         Models" [2] covered 90 sources from 2015 to 2021. The emergence of access to large language models in
                         2022-2023 [4] led to an increase in interest in them (figure 2), so there was a need to supplement the
                         previous review, the main result of which is a classification (figure 3):

                         1) by neural network architecture:
                                 • traditional:
                                     – RNN – Recurrent Neural Network, used for sequential data;

                          CS&SE@SW 2024: 7th Workshop for Young Scientists in Computer Science & Software Engineering, December 27, 2024, Kryvyi
                          Rih, Ukraine
                          " minekosdid@kdpu.edu.ua (A. V. Slobodianiuk); semerikov@gmail.com (S. O. Semerikov)
                          ~ https://acnsci.org/semerikov (S. O. Semerikov)
                           0009-0007-9425-1255 (A. V. Slobodianiuk); 0000-0003-0789-0272 (S. O. Semerikov)
                                     © 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings

                                                                                                           332
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                                                                               332–361



                                     Natural Language Processing




             Analysis                                                 Generation




                                                                                                                       Natural language generation
                                                                                                  Language modelling
                                                                            Machine translation
                                                 Question answering
                Text summarization




Figure 1: Taxonomy of popular NLP tasks for text generation (based on [1, p. 4]).


          – LSTM – Long Short-Term Memory network, works better than RNN for larger data
            volumes;
          – GRU – Gated Recurrent Unit (simplified version of LSTM);
          – CNN – Convolutional Neural Network.
      • innovative:
          – Attention Based – networks that use an attention mechanism to increase the importance
            of input data;
          – Transformer – networks that use an attention mechanism without recurrent or convolu-
            tional layers;
          – BERT – a neural network developed by Google that combines attention mechanisms
            without recurrent or convolutional layers with bidirectional encoders
2) by quality metrics:
      • human-centered:
          – Domain-Expert – involving a person who is an expert in the given field to validate the
            results.
      • machine-centered (automatic):
          – BLEU (bilingual evaluation understudy) – compares the number and value of tokens
            (lexemes) of machine and human translation; the meaning of words is not taken into
            account;
          – ROUGE (Recall-Oriented Understudy for Gisting Evaluation) – compares machine-generated
            and human-generated summaries/translations;
          – Cosine Similarity – comparison of the cosines of the angle of two non-zero vectors: a value
            of +1 corresponds to unidirectional proportional vectors, -1 corresponds to oppositely
            directed proportional vectors;



                                                                      333
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                          332–361


          – Content Selection – a metric similar to ROUGE that uses an attention mechanism for a
             given task;
          – Diversity Score – a metric for evaluating diversity.
3) by application of the neural network:
      • AMR (Abstract Meaning Representation) – extracting semantic relationships from text;
      • Language Generation – generating human-like text;
      • Speech-to-text – converting speech to text;
      • Script Generation – generating scripts based on given words;
      • Machine Translation – generating machine translation of text from one language to another;
      • Text Summarization – generating a summary for a given text;
      • Image Captioning – generating a description for a given image;
      • Shopping Guide – generating an advertising description for a given product image;
      • Weather Forecast – generating a weather forecast text.
4) by generation language:
      • well-resourced: English, Chinese;
      • low-resourced: Bengali, Korean, Balinese, Spanish, Hindi, Slovak, Macedonian.
5) by dataset for training the neural network:
      • by annotation type:
          – Labeled (labeled data);
          – Unlabeled (unlabeled data);
      • by type:
          – Sentence – sentence;
          – Paragraph – paragraph;
          – Question/answer – question and answer type data;
          – Document – document type data.


1.2. Research tasks and questions
To obtain the results presented in figure 3, Fatima et al. [2] set the following tasks:
   1. To investigate the existing traditional and advanced deep learning-based text generation ap-
      proaches/techniques.
   2. To explore various performance metrics used for evaluating text generation models.
   3. To investigate various evaluation methods for measuring the quality of generated text.
   4. To review the recent application domains where text generation is being applied.
   5. To discuss the major challenges and future research directions in the text generation domain.
  To supplement the results obtained by Fatima et al. [2], these research tasks were refined:
   1. To explore deep learning methods (approaches, architectures) for text generation that have
      appeared or were mentioned in the works of 2022-2024.
   2. To consider metrics for evaluating the effectiveness of text generation models that have appeared
      or were mentioned in the works of 2022-2024.
   3. To identify text generation datasets described in the works of 2022-2024.
   4. To explore new text generation applications described in the works of 2022-2024.
   5. To determine which natural languages were used for text generation in the works of 2022-2024.
  Similarly, the research questions were refined:
RQ1. What advanced deep learning methods are used for text generation in the literature of 2022-2024?
RQ2. What new metrics for evaluating the effectiveness of text generation models are there in the
     literature of 2022-2024?
RQ3. What text generation datasets are described in the literature of 2022-2024?
RQ4. What new text generation applications are described in the literature of 2022-2024?
RQ5. What natural languages are used for text generation in the literature of 2022-2024?



                                                    334
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                           332–361




Figure 2: Dynamics of search queries for the term “large language models” [4].


2. Methodology
Systematic literature analysis is the main method of this research, which allows generalising and
synthesising information from a large number of scientific publications (secondary sources) according
to a clearly defined methodology. The PRISMA (Preferred Reporting Items for Systematic Reviews
and Meta-Analyses) methodology, which is a generally recognised standard for systematic reviews
and meta-analyses in various fields of science, was chosen for conducting the review [5]. Systematic
analysis according to the PRISMA methodology involves clear research planning, defining criteria for
the selection of publications, conducting a thorough literature search in leading scientific databases,
selecting relevant studies, extracting and synthesising data. This approach ensures the completeness,
reliability and reproducibility of the obtained results.
   The chosen method fully corresponds to the aim and objectives of the research, allowing to obtain a
generalised picture of the current state of research in the field of text content generation based on the
analysis of a significant array of scientific publications in recent years.




                                                    335
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                              332–361




Figure 3: Taxonomy of the text generation [2, p. 53493].


2.1. Information sources and search strategy
Fatima et al. [2] in the previous review used 2 scientometric databases (Web of Science and Scopus)
and 4 libraries (IEEE Xplore, SpringerLink, ScienceDirect and ACM Digital Library) as reliable data
sources. The search query for article titles, abstracts and keywords used by Fatima et al. [2] is presented
in table 1.

Table 1
Groups of selected keywords [2, p. 53494].
                                                     deep learning OR natural language processing
         Group 1:                                    OR NLP OR neural network
         Words related                               OR RNN OR Recurrent OR Recursive
         to deep learning                            OR LSTM OR GAN OR GPT-2
                                                     OR generative adversarial network
                                                     text generation
         Group 2:                                    OR language generation
         Words related                               OR language modelling
         to text generation                          OR natural language generation
                                                     OR neural language generation
         Search query                                (Group 1) and (Group 2)

   Currently, Scopus covers about 90% of IEEE Xplore and ACM Digital Library, Web of Science – about
50%; ScienceDirect and Scopus have the same owner – Elsevier. Given that Scopus includes a significant
part of these libraries, only one database – Scopus – was used instead of 2 databases and 4 libraries.
Applying the search query from the previous review (Table 1) yields 2580 documents for 2015–2020
(versus 100 documents specified in [2, p. 53494]). When searching only in article titles, the number of
documents decreases to 109 and there is a partial match with the list of sources [2, p. 53500-53503]).
   The inability to reproduce the previous results for the query from table 1 prompted the creation of a
new query:

     (
           TITLE-ABS-KEY(neural network)
           OR
           TITLE-ABS-KEY(machine learning)
           OR
           TITLE-ABS-KEY(deep learning)



                                                     336
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                                   332–361


      )
      AND
      TITLE("text generation")

   The first part of the query was simplified to three key phrases, two of which (“neural network”
and “deep learning”) match the first group of table 1, and the third (“machine learning”) generalizes
all other keywords of the first group, including those that did not exist at the time of the previous
review. The second part of the query included only the key phrase “text generation”, the search
for which is performed in document titles (TITLE), and not in titles, abstracts and author keywords
(TITLE-ABS-KEY).

2.2. Document inclusion and exclusion criteria
  Inclusion criteria:
   1. Documents published between 2022 and 2024.
   2. Documents related to text generation using artificial neural networks.
   3. Documents describing approaches, architectures, quality metrics, languages, datasets or applica-
      tions of text generation.
  Exclusion criteria:
   1. Documents published before 2022 or those that do not contain data for 2022-2024.
   2. Documents that are not related to text generation or do not use artificial neural networks.
   3. Documents that do not contain relevant information regarding the posed research questions (new
      methods, metrics, datasets, applications, natural languages).

2.3. Document selection process
The Scopus query on 04.03.2024 returned 248 documents, the distribution of which by year is shown in
figure 4. Of these, 2 were duplicates and 157 were dated before 2022, so they were excluded from the
list for obtaining.
   Figure 5 presents a scheme of data selection for the systematic review.
   An attempt was made to obtain 89 documents from publishers’ websites, the scientific social network
ResearchGate, and preprint servers (primarily arXiv). 41 documents (primarily from the ACM Digital
Library and IEEE Xplore) could not be obtained. Thus, 48 documents were selected for evaluation, the
review of which revealed 1 document that did not contain data for 2022-2024, and 4 documents that did
not contain relevant information regarding the posed research questions.
   43 documents were selected for review: [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48]. The review of
each document was performed according to the review map (appendix A). To automate data extraction
for the map questions, a large language model (LLM) Claude 3 Sonnet [49] was used, to which the
document file in PDF format was fed with the following prompt:

Describe the article according to the following characteristics:
Document type:
journal article (ARTICLE) or conference proceedings article (CONFERENCE)
Title
Year of publication
Countries represented by the authors
Article purpose
Used neural network architectures
Used quality metrics
Characteristics of the used datasets - name



                                                       337
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                                                             332–361



                                                                                                52
                          50
                                                                                                              46

                                                                                                       41
                          40                                                                                         39
    Number of documents




                                                                                         32
                          30


                                                                                  21
                          20



                          10

                                                                           4                                                4
                               2      1      1      1      1      2
                          0
                               1984

                                      2002

                                             2010

                                                    2011

                                                           2013

                                                                  2016

                                                                           2017

                                                                                  2018

                                                                                         2019

                                                                                                2020

                                                                                                       2021

                                                                                                              2022

                                                                                                                     2023

                                                                                                                            2024
                                                                               Year

Figure 4: Distribution of search results by year.


Characteristics of the used datasets - data type:
sentence, paragraph, document, question-answer, not specified
Characteristics of the used datasets - size
Characteristics of the used datasets - format:
CSV, JSON, XML, files, not specified
Characteristics of the used datasets - by annotation type:
labeled data, unlabeled data
Characteristics of the used datasets - data quality:
raw (unprocessed), preprocessed
Characteristics of the used datasets - by availability:
publicly available, private, not specified
Characteristics of the used datasets - link
Solved text generation task (what was the neural network used for)
Language of text generation

   An example of a response is shown in figure 6.
   For external verification, the LLM GPT 4 was used through the Scholar GPT [50] extension. Table 2
compares the results of the automated review of the article [47].
   Comparison of the description of the article [47], made with the help of another LLM and verified
by a human, with the results in table 2 shows that the description is well consistent with the results
of both Claude 3 Sonnet and GPT-4. Both models accurately identified the document type, title, year
of publication, countries of authors, purpose of the article, neural network architectures used, quality
metrics, and solved text generation tasks. Regarding dataset characteristics, both models indicated that
details about specific datasets are not provided, with the exception of the Quora dataset for paraphrasing.
They also noted that the article focuses on reducing dependence on labeled or publicly available datasets,
although both labeled and unlabeled data are discussed in different contexts.




                                                                         338
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                                  332–361



   Identification

                                                                                  Records removed
                        Records identified from
                                                                                  before screening:
                           Scopus (n = 248)
                                                                          duplicate records removed (n = 2)




                            Records screened                                   Excluded records before
                                (n = 246)                                           2022 (n = 157)




                           Documents sought                                         Documents not
                          for retrieval (n = 89)                                   retrieved (n = 41)
  Screening




                                                                                  Excluded documents:
                                                                               a) do not contain data for
                          Documents assessed                                        2022-2024 (n = 1);
                          for eligibility (n = 48)                             b) do not contain relevant
                                                                           information regarding the posed
                                                                              research questions (n = 4).
   Included




                          Documents included
                           in review (n = 43)




Figure 5: Scheme of data selection for the systematic review (according to the PRISMA methodology [5]).



          Table 2: Comparison of the results of the large language models Claude 3 Sonnet and GPT 4.
          Query                 Claude 3 Sonnet Response                           GPT 4 Response
Document type: journal ARTICLE (journal article)                     Article in a journal
article (ARTICLE) or con-
ference proceedings arti-
cle (CONFERENCE)
Title                     Search and learning for unsupervised text Search and Learning for Unsupervised Text
                          generation                                 Generation
Year of publication       2022                                       2022
                                                                                          Continued on next page




                                                      339
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                                        332–361


                                      Table 2 – continued from previous page
         Query                    Claude 3 Sonnet Response                             GPT 4 Response
Countries represented by Canada                                           Canada (The author, Lili Mou, is affiliated
the authors                                                               with the University of Alberta in Edmonton,
                                                                          Canada)
Article purpose            To introduce a search and learning ap- The article aims to introduce approaches to
                           proach for unweighted text generation, unsupervised text generation using search
                           where the task of text generation is formu- and learning methodologies. It focuses on
                           lated as a search problem with a heuristic overcoming the need for massive labeled
                           function for evaluating the quality of a can- datasets, which are required in supervised
                           didate sentence, and discrete local search al- methods, thereby making text generation
                           gorithms are used to maximize this evalua- feasible for low-resource languages and re-
                           tion function. A machine learning model is ducing the human labor required for data
                           also trained on the search results to smooth annotation.
                           out noise and improve efficiency.
Used neural network ar- Recurrent Neural Networks, Transformers, Neural architectures mentioned include re-
chitectures                in particular GPT-2                            current neural networks (RNNs), Trans-
                                                                          former architecture, and language models
                                                                          such as RoBERTa and pretrained language
                                                                          models by Radford et al., 2019.
Used quality metrics       iBLEU (for evaluating paraphrasing)            Metrics such as language fluency, seman-
                                                                          tic coherency, and task-specific constraints
                                                                          (like sentence simplicity scores) are used to
                                                                          evaluate the quality of generated texts.
                                        Characteristics of the used datasets
- name                     Quora dataset (for the task of paraphrasing). Not specified in detail for individual
                           No names specified for other tasks             datasets.
- data type: sentence, sentence, paragraph (for the task of sum- Sentences, as the text generation involves
paragraph, document, marization)                                          tasks like paraphrasing, summarization,
question-answer, not                                                      etc.
specified
- size                     Not specified                                  Specific sizes are not mentioned, but the use
                                                                          of large-scale generic corpora is implied.
- format: CSV, JSON, Not specified                                        Not explicitly mentioned, but it includes la-
XML, files, not specified                                                 beled input-output pairs in various formats
                                                                          typical for NLP tasks.
- by annotation type: la- unlabeled data                                  Primarily deals with labeled data in super-
beled data, unlabeled                                                     vised contexts, though the focus here is on
data                                                                      reducing dependence on such datasets.
- data quality: raw (un- Not specified                                    The discussion includes the use of both raw
processed), preprocessed                                                  and preprocessed data in different contexts.
- publicly available, pri- Not specified                                  The text suggests a shift towards methods
vate, not specified                                                       that do not require publicly available or
                                                                          parallel datasets.
- link                     Not specified                                  No specific dataset URLs provided.
Solved text generation Paraphrasing, summarization, text simplifi- The neural networks are applied to gener-
task (what was the neu- cation, sentence generation from keywords, ate text for various applications including
ral network used for)      grammar error correction, style transfer paraphrasing, summarization, text simpli-
                                                                          fication, and creating sentences from key-
                                                                          words, all in an unsupervised manner.
Language of text genera- English                                          The tasks are primarily demonstrated with
tion                                                                      English examples, though the methodolo-
                                                                          gies are applicable to multiple languages.

   Thus, LLMs can accurately extract key information from articles, although they may sometimes omit
details that are not explicitly stated in the text. To minimize the risk of incorrect information extraction,
a human review of Claude 3 Sonnet’s results was performed. To avoid issues related to terminology



                                                         340
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                             332–361




Figure 6: Claude 3 Sonnet’s response to the request to compile a review map of the article [47].




                                                     341
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                             332–361


translation, LLM responses were additionally requested in the language of the selected documents
(English).

2.4. Quality assessment
To assess the quality of the research selection and analysis process in this review, the following criteria
were applied:

   1. Clarity and relevance of the research inclusion and exclusion criteria to the purpose of the review.
   2. Completeness and systematic nature of the search for relevant research in the selected databases.
   3. Consistency and reproducibility of the research selection process according to the inclusion and
      exclusion criteria.
   4. Application of a standardized review map for collecting and systematizing data from selected
      studies.
   5. Involvement of at least two independent researchers in the process of data selection, analysis,
      and synthesis to minimize the risk of bias.
   6. Consideration and description of any discrepancies or uncertainties in the process of research
      selection and analysis.
   7. Ensuring transparency and reproducibility of the review process by detailed description of each
      stage in the report.

  Adherence to these quality criteria made it possible to ensure the reliability and validity of the results
and conclusions of this systematic review.
  PRISMA provides for the presence of the following additional components in the research methodol-
ogy:

    • assessment of the risk of bias in the selected studies is not relevant because this review considers
      different approaches and methods of text generation, and does not compare the results of individual
      studies;
    • determination of the effect size for each outcome (or type of outcome) is not performed because this
      review does not aim to conduct a meta-analysis or quantitative synthesis of the results;
    • description of the methods of synthesizing research results, such as meta-analysis, is not performed
      because the review does not involve a quantitative synthesis of the results;
    • assessment of the risk of bias due to incomplete presentation of the results in publications is not
      performed because this review focuses on describing and classifying existing approaches and
      methods.
    • assessments of the reliability and trustworthiness of the results obtained from publications are not
      performed due to the use of reliable sources: publications selected by Scopus.


3. Results
3.1. Distribution of selected documents by year
In [51], the completed review maps for each article are presented. The results of individual studies are
not provided because this review does not aim to conduct a meta-analysis or quantitative synthesis of
the results.
   As can be seen from figure 7, the number of articles in journals (ARTICLE) prevails over the number
of conference proceedings articles (CONFERENCE) during 2022-2024. In 2022, the number of conference
proceedings documents (15) was significantly higher than the number of articles in journals (4), but in
2023 there is an increase in the number of articles in journals (16) compared to conference proceedings
articles (6). For January and February 2024, there are only articles in journals (2), and conference
proceedings articles are absent. In total, for the period 2022-2024, the number of articles in journals



                                                    342
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                             332–361


(22) is equal to the number of conference proceedings articles (21). The increase may indicate a more
thorough coverage of the issue in scientific journals compared to conference proceedings in recent
years.

                                            25
                                                                   22

                                            20   19
                      Number of documents
                                                 15
                                            15


                                            10

                                                                   6
                                             5
                                                                                    2
                                                                                    0
                                             0
                                                 2022             2023             2024
                                                                  Year

                                                        CONFERENCE       ARTICLE

Figure 7: Number of CONFERENCE and ARTICLE document types by year.



3.2. RQ1: What advanced deep learning methods are used for text generation in the
     literature of 2022-2024?
Table 3 presents an overview of neural network architectures used for text generation, according to
data from 2022-2024 studies.

                   Table 3: Neural network architectures for text generation.
     Architecture                  Description                 Representatives               Articles
                                   Traditional approaches
RNN (Recurrent Neural Recurrent neural networks used for               –                  [6, 9, 47, 10, 11,
Networks)             processing sequential data.                                         29, 30]
LSTM (Long Short-Term A variant of RNN that better remem-              –                  [6, 10, 13, 14,
Memory)               bers long-term dependencies.                                        15, 29, 30, 33,
                                                                                          11, 41, 42]
GRU (Gated Recurrent A simplified variant of LSTM with                –                   -
Unit)                  fewer parameters.
CNN     (Convolutional Convolutional neural networks, often YOLOv5                [6, 9, 38, 16]
Neural Networks)       used for image processing.
Graph Neural Networks Models that work with graph data GraphWriter, CGE-LW [7, 9]
                       structures.
                                   Innovative approaches
Autoencoders           Networks used for learning efficient AE, VAE, iVAE, clVAE+ [17, 15, 29]
                       encodings of unlabeled data          MI, 𝛽0.4 VAE, SaVAE,
                                                            LagVAE
                                                                         Continued on next page




                                                            343
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                                           332–361


                               Table 3 – continued from previous page
      Architecture                    Description                  Representatives          Articles
Transformer              Architecture that uses an attention T5, CodeT5, TrICY, [7, 9, 18, 19, 31,
                         mechanism for processing sequential DETR                        47, 8, 20, 32,
                         data.                                                           22, 27, 41, 39,
                                                                                         43, 34, 44, 19,
                                                                                         48]
BERT      (Bidirectional A Transformer-based model trained PubmedBERT,               Bi- [8, 37, 13, 18,
Encoder Representations on large amounts of unlabeled text. oLinkBERT, RoBERTa, 19, 20, 26, 28,
from Transformers)                                               XLM-RoBERTa             30, 35, 32, 12,
                                                                                         9, 39, 40, 45]
GPT-2, GPT-3 (Genera- Transformer-based models used for OPT, Llama, CodeBERT [6, 8, 10, 11, 13,
tive Pre-trained Trans- text generation.                                                 47, 15, 18, 21,
former)                                                                                  22, 12, 23, 24,
                                                                                         26, 19, 33, 25,
                                                                                         32, 37, 34, 45,
                                                                                         36, 39, 43, 44]
Attention-based models Models that use an attention mecha-                 –             [47, 8, 20, 26,
                         nism to improve the quality of gener-                           43, 44]
                         ated text.
Seq2Seq (Sequence-to- Architecture that uses an encoder and S2ST, S2SL, S2SG, [39, 15, 42, 28,
Sequence)                decoder to generate sequences.          S2ST+, D+ Full, DSG 31, 46, 43]
GAN (Generative Adver- Generative adversarial networks con- EGAN,             TILGAN, [6, 29, 25]
sarial Networks)         sisting of a generator and discrimina- DoubAN-Full, WRGAN,
                         tor.                                    CatGAN,      SeqGAN,
                                                                 DGSAN
Memory Networks          Models that use external memory for DM-NLG               (with [34, 9]
                         storing and accessing information. memory), MemNNs,
                                                                 Mem2Seq, GLMP
Diffusion Models         Models that use a diffusion process to GENIE, NAT, iNAT, [41]
                         generate text.                          ELMER,          MASS,
                                                                 ProphetNet,       InsT,
                                                                 CMLM, LevT, BANG,
                                                                 ConstLeven
Prompt-based models Models that use prompt-engineering                     –             [23]
                         fine-tuning to control text generation.

  Table 4 presents a summary of text generation approaches based on the data from table 3.

Table 4
Approaches to text generation.
        Category                                                      Articles
Traditional approaches     [14, 38, 16]
Innovative approaches      [18, 19, 31, 8, 20, 32, 22, 27, 39, 43, 34, 44, 48, 37, 26, 28, 35, 12, 40, 45, 21, 23, 24, 25,
                           36, 46]
Combination of traditional [6, 9, 47, 10, 11, 29, 30, 13, 15, 33, 41, 42, 7]
and innovative approaches

   Among the innovative approaches, the most popular are the use of models based on the Transformer
architecture, in particular GPT-2, GPT-3, BERT and their variations. These models demonstrate high ef-
ficiency in generating coherent and semantically relevant text. Approaches using attention mechanisms
and controllable text generation are also gaining popularity.



                                                          344
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                                  332–361


   Traditional approaches, although used less frequently, still find their application in certain tasks,
such as image-based text generation, machine translation and others.
   Overall, there is a trend towards the transition from traditional approaches to more innovative
and efficient models based on the Transformer architecture and attention mechanisms. This allows
improving the quality of the generated text and expanding the scope of application of these technologies.
   Figure 8 shows that in 2022 and 2023, innovative approaches to text generation prevail, while
traditional approaches and a combination of approaches are less common. In 2024, there are articles that
use innovative and combined approaches in equal numbers, but the sample for this year is incomplete,
since data were collected only for part of the year. In general, there is a trend towards an increase
in the number of studies applying innovative approaches, such as models based on the Transformer
architecture and attention mechanisms.

                                        25
                                                              22

                                        20   19
                   Number of articles




                                                              16
                                        15
                                             13

                                        10


                                        5
                                              3
                                                                                   2
                                                                                   1
                                                              0                    0
                                        0
                                             2022            2023                 2024
                                                             Year

                Traditional approaches              Innovative approaches   Combination of approaches

Figure 8: Distribution of articles by year according to categories of text generation approaches.


  Comparing the obtained results with the data from the previous systematic review [2], the following
conclusions can be drawn:

    • Traditional approaches, such as RNN, LSTM, CNN, are still used for text generation, but to a
      lesser extent compared to innovative approaches.
    • The Transformer architecture and its variants (GPT-2, GPT-3, BERT) have gained significant
      popularity in 2022-2024, demonstrating high efficiency in generating coherent and semantically
      relevant text.
    • New architectures and approaches have emerged, such as Diffusion Models and Memory Networks
      models, which were not presented in the previous review.
    • Considerable attention is paid to models that use attention mechanisms and controllable text
      generation.
    • There is a trend towards combining traditional and innovative approaches to achieve better
      results in text generation.
    • Overall, in 2022-2024, there is a transition from traditional approaches to more innovative
      and efficient models based on the Transformer architecture and attention mechanisms, which
      allows improving the quality of generated text and expanding the scope of application of these
      technologies.




                                                              345
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                           332–361


  Thus, comparing the results of the two reviews demonstrates that although traditional metrics, such
as BLEU and ROUGE, remain widely used, new metrics appear in 2022-2024 that take into account
various aspects of generated text quality. This indicates the active development of quality assessment
methods and the search for more effective and informative approaches to evaluating text generation
models.

3.3. RQ2: What new metrics for evaluating the effectiveness of text generation
     models are there in the literature of 2022-2024?
Table 5 presents an overview of quality metrics used to evaluate text generation. The metrics are divided
into two categories: human-centred and machine-centred. Human-centred metrics include Human
Evaluation and Turing Test, which involve evaluating the quality of generated text by human experts or
testing a model’s ability to generate text similar to that written by a human. Machine-centred metrics
include a wide range of automatic metrics such as BLEU, ROUGE, METEOR, Perplexity, Distinct-n,
BERTScore, and others. These metrics evaluate different aspects of generated text quality, such as
similarity to the reference text, fluency, meaningfulness, lexical and syntactic diversity, etc.


                     Table 5: Main quality metrics for evaluating text generation.
     Quality
                                Description                    Representatives           Articles
     metric
                                      Human-centred metrics
 Human Evalua- Evaluation of the quality of generated text            –          [9, 10, 11, 25, 30,
 tion          by human experts.                                                 32, 33, 36, 31, 37]
 Turing Test   A test of a model’s ability to generate text           –          [33]
               indistinguishable from that written by a
               human.
                                     Machine-centred metrics
 BLEU          A metric that evaluates the quality of gen- BLEU-1,       BLEU-2, [7, 8, 18, 9, 10, 13,
               erated text by comparing it with reference BLEU-3,        BLEU-4, 15, 18, 19, 23, 27,
               text.                                          BLEU-5             29, 31, 32, 33, 34,
                                                                                 36, 37, 41, 42, 43,
                                                                                 44, 45, 46, 47]
 ROUGE         A metric that evaluates the quality of au- ROUGE-1, ROUGE-2, [7, 8, 9, 10, 13, 18,
               tomatic text summarization.                    ROUGE-3, ROUGE-L 19, 23, 27, 28, 33,
                                                                                 34, 35, 36, 41, 42,
                                                                                 43, 44, 45, 46, 48]
 METEOR        A metric that evaluates the quality of ma-             –          [18, 27, 32, 34, 36,
               chine translation.                                                42, 43, 44, 46, 48]
 BERTScore     A metric that evaluates the quality of                 –          [8, 13, 18, 19, 26,
               generated text using a pre-trained BERT                           34, 32, 37]
               model.
 CIDEr         A metric that evaluates the quality of au-             –          [14, 18, 23, 36, 37,
               tomatic image captioning by comparing                             41, 42, 46]
               machine-generated captions with sets of
               reference captions.
 Perplexity    A metric that evaluates the quality of a               –          [8, 9, 15, 17, 26, 29,
               language model.                                                   36, 39]
 F1-score      A metric that evaluates the quality of clas-           –          [13, 20, 21, 26, 34,
               sification, particularly in binary classifica-                    40]
               tion tasks.
                                                                            Continued on next page



                                                   346
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                                            332–361


                                     Table 5 – continued from previous page
   Quality
                                      Description                         Representatives                 Articles
    metric
 CHRF++             A metric that evaluates the quality of ma- –                     [7, 32, 37, 48]
                    chine translation based on character and
                    n-gram matches.
 Distinct-n         A metric that evaluates the diversity of Dist-1, Dist-2, Dist-3, [8, 9, 15]
                    generated text.                            Dist-4

   Table 6 provides an overview of the quality evaluation metrics applied in the articles. Most studies
use machine-centred metrics for automatic evaluation of generated text quality. A significantly smaller
number of studies apply human-centred metrics, which may be due to the labour-intensive and subjective
nature of human quality assessment. However, the use of human-centred metrics remains important
for obtaining a more complete and reliable evaluation of text generation quality. Some studies do not
apply any quality metrics, which may be related to the focus on other aspects of text generation, such
as efficiency or speed of model operation.

Table 6
Overview of quality evaluation metrics applied in the articles.
 Quality metrics                                                 Articles
 Machine-centred [7, 8, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 26, 27, 28, 29, 34, 35, 38, 39, 40, 41, 42, 43, 44,
                 45, 46, 47, 48]
 Human-centred [11, 30]
 Both            [9, 10, 25, 32, 33, 36, 31, 37]
 Not applied     [24, 12, 6]

   The use of diverse quality metrics is important for a comprehensive evaluation of the effectiveness of
models and approaches to text generation. Combining machine-centred and human-centred metrics
allows obtaining more reliable and valid evaluation results.
   The diagram in figure 9 shows that the most frequently used quality metrics are BLEU (55.8% of
articles) and ROUGE (48.8% of articles). Human Evaluation is also quite common – it is applied in
23.3% of articles. Other metrics, such as Perplexity, METEOR, BERTScore, and Distinct-n, are used less
frequently but still have a significant share of mentions in articles. The least common metrics are the
Turing Test, Fluency, Coherence, Diversity, N-gram Overlap, and Embedding Similarity, each of which
is mentioned in only one article (2.3%).

                                                                              Human Evaluation
                                                                              BLEU
                                         10                                   ROUGE
                                                     25                       METEOR
                                                                              CIDER
                                24
                                                                   3          BERTScore
                                                                    4         Perplexity
                                                                    4
                                                                              F1
                                                                   5
                                                                              Distinct N-Grams
                               21                              6
                                                                              CHRF++
                                                          8                   NIST
                                       10      8     7                        Recall
                                                                              other metrics

Figure 9: Distribution of quality metrics by the number of articles in which they are mentioned.




                                                              347
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                               332–361


   Automatic quality metrics, such as BLEU and ROUGE, are the most widely used for evaluating the
effectiveness of text generation models, while human quality evaluation is used less frequently but
remains an important component for obtaining a more complete and reliable assessment of generated
text quality.
   Comparing the obtained results with the data from the previous systematic review [2], the following
observations can be made:

    • BLEU and ROUGE remain the most popular metrics for evaluating the quality of generated text
      both in 2015-2021 and in 2022-2024.
    • Human Evaluation is still widely used to obtain a more complete and reliable evaluation of text
      generation quality, despite the labour-intensiveness and subjectivity of this approach.
    • In 2022-2024, new metrics appeared, such as BERTScore, Fluency, Coherence, Diversity, N-gram
      Overlap, and Embedding Similarity, which were not presented in the previous review. This
      indicates the active development of methods for evaluating the quality of generated text and the
      search for more effective and informative metrics.
    • Perplexity has gained more popularity in 2022-2024 compared to the previous period, which may
      be related to its effectiveness in assessing the quality of language models.
    • The METEOR metric, which evaluates the quality of machine translation, is also used more
      frequently in 2022-2024, which may indicate a growing interest in applying text generation to
      machine translation tasks.
    • In general, there is a trend towards combining different types of metrics (machine-centred and
      human-centred) to obtain more reliable and valid results when evaluating the effectiveness of
      text generation models.

  Thus, comparing the results of the two reviews demonstrates that while traditional metrics, such as
BLEU and ROUGE, remain widely used, new metrics appear in 2022-2024 that take into account various
aspects of generated text quality. This indicates the active development of quality assessment methods
and the search for more effective and informative approaches to evaluating text generation models.

3.4. RQ3: What text generation datasets are described in the literature of 2022-2024?
Table 7 presents the datasets mentioned in the reviewed articles, sorted in descending order by the
number of mentions and alphabetically in case of an equal number of mentions. The E2E dataset is
mentioned most frequently – in 7 articles, followed by XSum (4 articles), CNN/DailyMail (4 articles),
CommonGen (4 articles), ToTTo (4 articles), WebNLG (4 articles), WikiBio (3 articles), DDI (2 articles),
NIST (2 articles), PubMed (2 articles), Quora (2 articles), RocStories (2 articles), Snips (2 articles), SST-2
(2 articles), WMT’14 English-German (2 articles), WMT’16 Romanian-English (2 articles), and Yelp (2
articles). Other datasets are mentioned once, sorted by appearance in the review.


                       Table 7: Datasets mentioned in the reviewed articles.
                             Dataset name                                  Articles
         E2E                                                      [19, 23, 30, 31, 34, 36, 44]
         CNN/DailyMail (CNN/DM)                                          [9, 23, 41, 45]
         Totto                                                          [18, 31, 43, 46]
         CommonGen                                                       [9, 18, 36, 41]
         WebNLG                                                          [7, 31, 37, 44]
         XSum                                                            [9, 18, 23, 41]
         WikiBio                                                          [34, 18, 31]
         Abstract Generation Dataset (AGENDA)                                 [7, 9]
         DDI                                                                 [9, 12]
                                                                    Continued on next page



                                                     348
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                       332–361


                               Table 7 – continued from previous page
                               Dataset name                               Articles
         NIST                                                              [9, 27]
         PubMed                                                           [12, 23]
         Quora                                                             [47, 9]
         ROCStories                                                        [36, 9]
         Snips                                                            [19, 39]
         SST-2                                                            [21, 45]
         WMT’14 English-German                                            [18, 27]
         WMT’16 Romanian-English                                          [18, 27]
         Yelp                                                             [17, 40]
         Baidu Tieba                                                         [9]
         PersonaChat                                                         [9]
         Gigawords                                                           [9]
         Yahoo! Answers                                                      [9]
         NLPCC                                                               [9]
         Tencent                                                             [9]
         SQuAD                                                               [9]
         ComVE                                                               [9]
         𝛼NLG-ART                                                            [9]
         EntDesc                                                             [9]
         VisualStory                                                         [9]
         PaperWriting                                                        [9]
         Reddit-10M                                                          [9]
         EMNLP dialog                                                        [9]
         ICLR dialog                                                         [9]
         NarrativeQA                                                         [9]
         Wizard of Wikipedia (WoW)                                           [9]
         MS-MARCO                                                            [9]
         ELI5                                                                [9]
         ChangeMyView                                                        [9]
         Amazon books                                                        [9]
         Foursquare                                                          [9]
         Scratch online community comments                                  [11]
         BC5-Chemical                                                       [12]
         BC5-Disease                                                        [12]
         NCBI-Disease                                                       [12]
         BC2GM                                                              [12]
         JNLPBA                                                             [12]
         EBM PICO                                                           [12]
         ChemProt                                                           [12]
         GAD                                                                [12]
         BIOSSES                                                            [12]
         HoC                                                                [12]
         PubMedQA                                                           [12]
         BioASQ                                                             [12]
         Logic2Text                                                         [13]
         Concadia                                                           [14]
         REDIAL                                                             [15]
                                                                    Continued on next page



                                                   349
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                      332–361


                              Table 7 – continued from previous page
                              Dataset name                               Articles
         Custom dataset for Bangla word sign language                      [16]
         Synthetic dataset                                                 [17]
         Penn Treebank                                                     [17]
         IWSLT’14 De-En                                                    [18]
         WMT16 English-German                                              [45]
         WMT17 English-German                                              [36]
         WMT20                                                             [37]
         WMT21                                                             [37]
         WMT’14 German-English                                             [27]
         Multi-News                                                        [18]
         Java                                                              [18]
         Python                                                            [18]
         English ATIS                                                      [19]
         ViGGO                                                             [19]
         TREC                                                              [19]
         Korean Weather                                                    [19]
         Rest                                                              [19]
         KLUE-TC                                                           [19]
         C4                                                                [20]
         M2D2                                                              [20]
         Political Slant                                                   [20]
         Layoff                                                            [21]
         MC                                                                [21]
         M&A                                                               [21]
         Flood                                                             [21]
         Wildfire                                                          [21]
         Boston Bombings                                                   [21]
         Bohol Earthquake                                                  [21]
         West Texas Explosion                                              [21]
         Dublin                                                            [21]
         New York City                                                     [21]
         WSC                                                               [22]
         CBT-CN                                                            [22]
         CBT-NE                                                            [22]
         Wikihow                                                           [23]
         SAMSum                                                            [23]
         DART                                                              [23]
         Custom dataset composed of tweets labeled with emotions           [25]
         AFQMC                                                             [26]
         CHIP-STS                                                          [26]
         QQP                                                               [26]
         MRPC                                                              [26]
         ParaNMT-small                                                     [27]
         NIST Chinese-English                                              [27]
         GTZAN                                                             [28]
         Minions                                                           [29]
         Japanimation                                                      [29]
                                                                   Continued on next page



                                                   350
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                              332–361


                                Table 7 – continued from previous page
                                Dataset name                                     Articles
         WikiArt                                                                  [29]
         Nottingham                                                               [29]
         Lakh MIDI                                                                [29]
         TheoryTab                                                                [29]
         Poem-5                                                                   [29]
         Poem-7                                                                   [29]
         Synthetic date generation dataset                                        [30]
         LDC2020T02 (AMR 3.0 release)                                             [32]
         One Million Urdu News Dataset                                            [33]
         Australian Broadcasting Corporation (ABC) news dataset                   [33]
         DailyMed drug labels                                                     [35]
         COCO Image Captioning                                                    [37]
         German and French commercial datasets                                    [39]
         MASSIVE                                                                  [39]
         Gold-PMB                                                                 [42]
         Silver-PMB                                                               [42]
         numericNLG                                                               [43]
         Custom dataset related to text messaging applications                    [44]
         TweetEval                                                                [45]
         AGnews                                                                   [45]
         QNLI                                                                     [45]
         IMDB                                                                     [45]
         CC-News                                                                  [45]
         WITA                                                                     [46]
         XWIKIREF                                                                 [48]

   The analysed studies use a wide range of datasets covering various domains and types of texts, from
user reviews and news articles to medical and technical texts. This indicates the active development
and application of text generation methods in diverse fields.
   Table 8 presents the data types used in the reviewed articles, sorted in descending order by the number
of mentions. Datasets containing sentences are used most often – they are mentioned in 26 articles. In
5 articles, the data type is not explicitly specified. Other data types, such as paragraphs (18 articles),
documents (11 articles), question-answer (10 articles), descriptive tables (9 articles), translations (7
articles), stories (4 articles), images (4 articles), and others, are less common.
   The prevalence of datasets with sentences may be due to the fact that many text generation tasks,
such as machine translation, paraphrasing, question answering, etc., often work at the sentence level.
At the same time, the presence of various data types, including paragraphs, documents, images, music,
and others, indicates that text generation methods can be applied to a wide range of tasks and domains.
   Table 9 presents the data annotation types used in the reviewed articles, sorted in descending order
by the number of mentions. Labeled datasets are used most often – they are mentioned in 22 articles.
In 20 articles, the annotation type is not explicitly specified. In 5 articles, unlabeled data are used. In 4
articles, both labeled and unlabeled data are used.
   The prevalence of labeled datasets may be due to the fact that many text generation tasks, especially
those that use controlled approaches or require compliance with certain templates or structures, require
labeled data for training models. Annotation can include elements such as parts of speech, syntactic
structures, semantic roles, tags for controlled generation, etc.
   At the same time, the presence of studies that use unlabeled data or a combination of labeled and
unlabeled data indicates the active development of unsupervised and semi-supervised learning methods
in the field of text generation. These approaches allow using large volumes of unlabeled text data for



                                                    351
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                                              332–361


Table 8
Data types used in the reviewed articles.
      Data type                                                    Articles
 Sentence           [7, 9, 11, 12, 14, 15, 17, 18, 19, 20, 22, 23, 25, 26, 29, 30, 31, 32, 33, 36, 39, 40, 41, 45, 46, 48]
 Paragraph          [7, 9, 12, 15, 17, 18, 19, 20, 23, 29, 37, 14, 39, 40, 41, 45, 46, 48]
 Document           [9, 12, 18, 19, 20, 29, 35, 40, 41, 42, 45]
 Question-answer    [7, 9, 12, 17, 18, 19, 22, 26, 45, 47]
 Descriptive tables [13, 21, 30, 31, 33, 34, 36, 43, 44]
 Translations       [18, 27, 31, 33, 36, 37, 45]
 Stories            [9, 31, 33, 36]
 Images             [14, 29, 37, 38]
 Audio files        [29, 28]
 Video clips        [16]
 Computer programs [18]
 Not specified      [6, 8, 10, 24]

Table 9
Data annotation types used in the reviewed articles.
        Annotation type                                          Articles
        Labeled data    [12, 13, 14, 16, 17, 18, 21, 27, 28, 31, 33, 34, 35, 39, 40, 42, 43, 44, 46, 48, 47, 25]
        Unlabeled data [11, 12, 39, 40, 47]
        Not specified   [6, 7, 8, 9, 10, 15, 19, 20, 22, 24, 23, 26, 29, 30, 32, 36, 37, 38, 41, 45]

Table 10
Data quality used in the reviewed articles.
       Data quality                                              Articles
       Preprocessed [13, 14, 16, 17, 18, 44, 47, 48, 39, 34, 31, 42]
       Raw           [7, 11, 28, 33, 35, 37, 39, 34, 31, 42]
       Not specified [6, 8, 9, 10, 12, 15, 20, 19, 21, 22, 24, 25, 23, 26, 27, 29, 30, 32, 36, 38, 40, 41, 43, 45, 46]


pre-training models and improving their ability to generate coherent and meaningful text.
   Table 10 presents the data quality used in the reviewed articles, sorted in descending order by the
number of mentions. In 28 articles, the data quality is not explicitly specified. In 12 articles, preprocessed
data are used, while in 10 articles – raw data. In 4 articles, both preprocessed and raw data are used.
   Preprocessed data usually go through the stages of cleaning, normalization, tokenization, and some-
times additional annotation before being used in model training. This improves the quality and
consistency of the data, as well as facilitates the learning process. Examples of preprocessed data can
be datasets obtained from existing corpora or databases that have already undergone some processing.
   Raw data, on the other hand, are data obtained directly from real sources, such as web pages, social
networks, unprocessed texts, etc. They can contain noise, incorrect formatting, errors, and other
artifacts. Using raw data can be useful for training models that need to be robust to real conditions and
able to process unstructured data.
   The lack of information about data quality in a significant part of the analyzed articles may indicate
that the authors do not pay enough attention to this aspect or consider it less important for the research.
At the same time, data quality is a critical factor that affects the efficiency and generalizability of text
generation models, so it is worth paying more attention to the description and analysis of the quality of
the data used in future research.
   Comparing the results of the 2022-2024 review with the previous review [2], the following conclusions
can be drawn:

    • In 2022-2024, new datasets appeared, such as XWIKIREF, DailyMed, numericNLG, WITA, DIST-
      ToTTo, which were not presented in the previous review. This indicates the active development
      of resources for research and application of text generation methods.



                                                            352
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                             332–361


    • The datasets E2E, WikiBio, ToTTo, CommonGen, CNN/DailyMail, and XSum remain popular and
      widely used in research both in 2015-2021 and in 2022-2024.
    • There is a trend towards the use of more diverse data types, such as descriptive tables, images, mu-
      sic, translations, question-answer, video clips, and computer programs, in addition to traditional
      types such as sentences, paragraphs, and documents.
    • Labeled data remain the most widely used, but there is a growing interest in using unlabeled data
      and a combination of labeled and unlabeled data for training text generation models.
    • Although data quality is a critical factor affecting model efficiency, a significant part of the
      2022-2024 research does not cover this aspect, which may indicate the need to pay more attention
      to the description and analysis of the quality of the data used in future research.

   Thus, comparing the results of the two reviews demonstrates that text generation datasets continue
to actively develop, covering new domains and data types. At the same time, some popular datasets
remain relevant and widely used in research. There is a trend towards the use of more diverse data
types and a growing interest in unlabeled data and combined approaches. However, the description of
data quality still requires more attention in future research to ensure the reliability and reproducibility
of the results.

3.5. RQ4: What new text generation applications are described in the literature of
     2022-2024?
Table 11 shows the text generation applications found in the analyzed articles, sorted in descending
order by the number of references. The most common applications are text summarization (8 articles),
machine translation (8 articles), table-to-text generation (5 articles), paraphrasing and data augmentation
(4 articles each). Other applications, such as controllable text generation, image-based text generation,
text generation from knowledge graphs, etc., are mentioned in a smaller number of articles.

Table 11
Text generation applications.
                                   Application                        Articles
                     Table-to-Text Generation              [43, 13, 36, 30, 34]
                     Text Generation from Knowledge Graphs [7, 9]
                     Controllable Text Generation          [8, 19, 30]
                     Medical Text Generation               [12, 35]
                     Paraphrasing                          [9, 47, 39, 27]
                     Image-based Text Generation           [29, 14, 37]
                     Text Summarization                    [9, 18, 47, 23, 41, 37, 45, 48]
                     Emotional Text Generation             [11, 25]
                     Question Answering                    [15, 9]
                     Music Text Generation                 [28, 29]
                     Machine Translation                   [9, 16, 17, 18, 27, 47, 36, 37]
                     Data Augmentation                     [8, 21, 42, 40]
                     Script Generation                     [29, 9]
                     News Headline Generation              [33]
                     Technical Documentation Generation    [10]
                     Cybersecurity                         [45]
                     Encyclopedic Text Generation          [48]
                     Data-to-Text Generation               [8, 31, 34, 44, 46]
                     Sign Language to Text Translation     [16]

   Figure 10 visualizes the text generation applications listed in table 11 as a diagram. The diagram
clearly shows the prevalence of table-to-text generation, text generation from knowledge graphs,
controllable text generation, and medical text generation applications compared to other areas.



                                                     353
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                           332–361


                                                               Machine Translation
                                                               Text Summarization
                                                               Table-to-Text Generation
                                                               Data-to-Text Generation
                                                               Paraphrasing
                            1111                               Data Augmentation
                                     1
                    8                    2                     Controllable Text Generation
                                             2
                                                               Image-based Text Generation
                                                 2
                                                     2         Text Generation from Knowledge Graphs
         8
                                                     2         Medical Text Generation
                                                     2         Emotional Text Generation
                                                               Question Answering
         5                                       3
                                                               MIDI or Music File Generation
                                         3                     Script Generation
                5
                        4        4                             News Headline Generation
                                                               Technical Documentation Generation
                                                               Cybersecurity
                                                               Encyclopedic Text Generation
                                                               Sign Language to Text Translation


Figure 10: Text generation applications in the analyzed articles of 2022-2024.


   The analysis of text generation applications demonstrates a wide range of possibilities for using this
technology in various fields, from processing structured data to creating emotionally colored texts and
translating sign language into text. The development of new methods and neural network architectures
opens up new prospects for further expanding the areas of text generation application.
   Comparing the results of the 2022-2024 review with the previous review [2], the following observations
can be made:

    • Machine Translation and Text Summarization have gained more popularity in 2022-2024 compared
      to the previous period. However, in 2022-2024, text generation from tables and structured data
      was added to them, which may indicate a growing interest in processing structured information
      using text generation methods.
    • Controllable Text Generation has also become more common, indicating a growing interest in
      methods that allow controlling the text generation process and obtaining more relevant and
      high-quality results.
    • Medical Text Generation has emerged as a new area of text generation application in 2022-2024,
      which may be related to the active development of methods for processing medical data and the
      need to automate the creation of medical documentation.
    • New applications have emerged, such as Emotional Text Generation, Encyclopedic Text Genera-
      tion, Technical Documentation Generation, and Sign Language to Text Translation, indicating an
      expansion of the areas of text generation use.
    • Paraphrasing and Data Augmentation remain relevant text generation applications both in 2015-
      2021 and in 2022-2024.
    • Some applications that were popular in the previous review, such as poetry generation, dialogue
      systems, text classification, topic modeling, do not appear among the most frequently mentioned



                                                         354
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                         332–361


      in the new review. This may be related to a change in research focus and the emergence of new
      promising directions.
    • Overall, there is a trend towards increasing diversity of text generation applications compared
      to the previous one, which indicates the active development of this area of research and the
      expansion of the possibilities of using generative models to solve applied problems in various
      subject areas.

   Thus, comparing the results of the two reviews demonstrates that the field of text generation
application continues to actively expand, covering new areas and directions. The popularity of such
applications as text generation from tables and knowledge graphs, controllable text generation, and
medical text generation indicates a growing interest in methods that allow efficiently processing
structured data and obtaining more relevant and high-quality results. At the same time, traditional
applications, such as paraphrasing, text summarization, and machine translation, remain relevant and
widely used in research.

3.6. RQ5: What natural languages are used for text generation in the literature of
     2022-2024?
Table 12 presents an extended annual summary of the languages used for text generation in 2022-2024.
English is the most widely used language, with 38 articles covering all three years. Various neural
network architectures are used for generating English texts, including Transformer, BERT, GPT-2,
GPT-3, RNN, LSTM, CNN, GAN, and Seq2Seq.
   German is represented in 5 articles using GAN architectures (Conditional GAN, StyleGAN, DCGAN).
Chinese is represented in 4 articles using Graph Neural Networks and B2T architecture. Bengali
is represented in 2 articles (one in 2022 and one in 2023) dedicated to recognition using CNN and
YOLO. Romanian is represented in 2 articles (one in 2022 and one in 2023) using DCGAN and BART
architectures. French, Urdu, Shakespearean English, and Korean are each mentioned in one article,
using various architectures such as Conditional GAN, StyleGAN, DCGAN, and GPT-2.
   In 2023, a study by Taunk et al. [48] appears dedicated to generating texts in several Indian lan-
guages (Hindi, Malayalam, Marathi, Oriya, Punjabi, and Tamil) using HipoRank, mBART, and mT5
architectures.

               Table 12: Extended annual summary of text generation languages.
       Language                2022                   2023            2024 Total    Architectures
 English                        17                      19              2     38 Transformer, BERT,
                      [6, 14, 17, 18, 23, 28, [7, 8, 11, 12, 15, 19, [13, 44]    GPT-2, GPT-3, RNN,
                      22, 25, 16, 42, 47, 31, 41, 20, 21, 26, 27,                LSTM, CNN, GAN,
                       36, 37, 39, 30, 43] 32, 33, 34, 35, 40,                   Seq2Seq
                                                   45, 46, 48]
 German                          3                       2              –     5 Conditional      GAN,
                           [39, 37, 18]              [45, 27]                    StyleGAN, DCGAN
 Chinese                         1                       3              –     4 Graph Neural Net-
                               [37]               [27, 29, 10]                   works, B2T
 French                          1                      –               –     1 Conditional      GAN,
                               [39]                                              StyleGAN, DCGAN
 Bengali                         1                       1              –     2 CNN, YOLO, mBART
                               [16]                    [48]
 Urdu                           –                        1              –     1 GPT-2
                                                       [33]
                                                                                Continued on next page




                                                   355
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                             332–361


                               Table 12 – continued from previous page
       Language                 2022               2023          2024 Total   Architectures
 Hindi, Malayalam, Ma-           –                    1            –    1 HipoRank, mBART,
 rathi, Oriya, Punjabi,                             [48]                    mT5
 Tamil
 Shakespearean     En-            1                       –           –       1   Modified DCGAN
 glish                           [29]
 Romanian                         1                       1           –       2   DCGAN, BART
                                 [18]                    [27]
 Korean                           –                       1           –       1   Modified DCGAN
                                                         [19]

  Comparing the results of the 2022-2024 review with the previous review [2], the following observations
can be made:

    • English remains the most widely used language for text generation in both 2015-2021 and 2022-
      2024. However, there is a trend towards an increase in the number of studies dedicated to other
      languages, especially low-resource languages.
    • In 2022-2024, studies appeared dedicated to generating texts in languages that were not repre-
      sented in the previous review, such as Urdu, Hindi, Malayalam, Marathi, Oriya, Punjabi, and Tamil.
      This indicates a growing interest in developing text generation models for diverse languages.
    • The study [48] demonstrates the possibility of generating texts in several Indian languages
      simultaneously using modern architectures such as HipoRank, mBART, and mT5, which was not
      presented in the previous review.
    • Both traditional architectures (RNN, LSTM, CNN) and more modern approaches, such as Trans-
      former, BERT, GPT-2, GPT-3, GAN, and Graph Neural Networks, are used for generating texts in
      different languages.
    • Overall, there is a trend towards expanding the range of languages for which text generation
      models are being developed and using more diverse neural network architectures for this task.

   Thus, comparing the results of the two reviews demonstrates that although English remains the
dominant language in text generation research, there is a growing interest in developing models for
other languages, especially low-resource languages. The emergence of studies dedicated to generating
texts in languages such as Urdu, Hindi, Malayalam, Marathi, Oriya, Punjabi, and Tamil indicates an
expansion of the possibilities for applying text generation to diverse languages. Furthermore, the use
of modern neural network architectures such as Transformer, BERT, GPT-2, GPT-3, GAN, and Graph
Neural Networks allows improving the quality and efficiency of text generation for various languages.
   Comparing the language distributions in the old and new reviews with the distribution of languages
by the number of models on Hugging Face [52], the following observations can be made:

    • English dominates in all three distributions. In the old and new reviews, it is the most widely used
      for text generation, and on Hugging Face, the largest number of models (51738) are available for
      it. This indicates significant attention from researchers and developers to the English language
      and the availability of a large number of resources for it.
    • Chinese ranks second in the number of models on Hugging Face (4546) and is mentioned in
      several articles in the new review. This points to a growing interest in Chinese text generation
      and the development of relevant resources.
    • Languages such as French, Spanish, Russian, and German have a significant number of models
      on Hugging Face (from 2326 to 4049) but are less frequently mentioned in the reviews. This may
      indicate that despite the availability of resources for these languages, text generation research for
      them is not as widely represented in the literature.



                                                   356
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                         332–361


    • Low-resource languages such as Bengali, Urdu, Arabic, and Hindi are mentioned in the new
      review, indicating a growing interest in developing text generation models for these languages.
      However, the number of available models on Hugging Face for these languages is significantly
      lower compared to English (from 670 to 1674).
    • Hugging Face represents significantly more languages (over 200) than are mentioned in the
      reviews. This indicates that text generation research covers only a portion of the languages for
      which models and resources are available.
    • Some languages, such as Japanese, Korean, Indonesian, and Arabic, have a significant number of
      models on Hugging Face (from 1674 to 2920) but are rarely mentioned in the reviews. This may
      indicate the potential for further research on text generation in these languages.

   Comparing the language distributions shows that despite the dominance of English in research
and available resources, there is a growing interest in text generation in other languages, especially
low-resource ones. However, the number of available models and resources for these languages is still
significantly lower compared to English. Furthermore, the presence of a large number of models for
some languages on Hugging Face that are rarely mentioned in the reviews indicates the potential for
further research and development in this field.


4. Conclusions
The paper presented the results of a systematic review of the application of artificial neural networks
for generating textual content in 2022-2024 and compared them with the results of the previous review
[2] for 2015-2021. The main conclusions can be summarized as follows:

   1. There is a trend towards an increase in the number of articles in scientific journals compared to
      conference proceedings, which may indicate a more thorough coverage of text generation issues
      in journals.
   2. Among the advanced deep learning methods for text generation, the most popular are models
      based on the Transformer architecture, such as GPT-2, GPT-3, BERT, and their variations. Ap-
      proaches using attention mechanisms and controlled text generation are also gaining popularity.
      Overall, there is a shift from traditional approaches to more innovative and efficient models.
   3. Among the metrics for evaluating the effectiveness of text generation models, BLEU and ROUGE
      are the most widely used, along with human evaluation. In 2022-2024, new metrics such as
      BERTScore, Fluency, Coherence, Diversity, N-gram Overlap, and Embedding Similarity appeared,
      indicating active development of methods for assessing the quality of generated text.
   4. Datasets for text generation continue to actively develop, covering new domains and types of
      data. There is a trend towards using more diverse types of data (tables with descriptions, images,
      music, translations, etc.) and a growing interest in unlabeled data and combined approaches.
   5. The field of text generation applications continues to actively expand, covering new areas and
      directions. The popularity of applications such as text generation from tables and knowledge
      graphs, controlled text generation, and medical text generation indicates a growing interest
      in methods that allow efficient processing of structured data and obtaining more relevant and
      high-quality results.
   6. Although English remains the dominant language in text generation research, there is a growing
      interest in developing models for other languages, especially low-resource languages. The use
      of modern neural network architectures allows improving the quality and efficiency of text
      generation for various languages.

  The results of this review demonstrate the active development of the field of text generation in
2022-2024, characterized by the emergence of new approaches, metrics, datasets, and the expansion of
application areas.



                                                   357
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                                            332–361


  Despite significant progress in the development of text generation technologies, questions remain
open regarding the assessment of the quality of generated text, the adaptation of models to different
subject domains and languages, and the ethical aspects of using these technologies. Further research can
be aimed at solving these problems and developing more effective, universal, and safe text generation
models.
Declaration on Generative AI: During the preparation of this work, the authors used Claude 3 Opus in order to: Text
Translation, Abstract drafting, Formatting assistance. After using this tool, the authors reviewed and edited the content as
needed and takes full responsibility for the publication’s content.


References
 [1] T. Ganegedara, Natural Language Processing with TensorFlow: Teach language to machines
     using Python’s deep learning library, Packt Publishing, Birmingham – Mumbai, 2018. URL: https:
     //tinyurl.com/3xps3c5u.
 [2] N. Fatima, A. S. Imran, Z. Kastrati, S. M. Daudpota, A. Soomro, A systematic literature review
     on text generation using deep neural network models, IEEE Access 10 (2022) 53490 – 53503.
     doi:10.1109/ACCESS.2022.3174108.
 [3] OpenAI, Introducing ChatGPT, 2022. URL: https://openai.com/blog/chatgpt.
 [4] large language models - Google Trends, 2023. URL: https://trends.google.com/trends/explore?
     date=2022-01-01%202023-12-21&q=large%20language%20models&hl=en.
 [5] M. J. Page, J. E. McKenzie, P. M. Bossuyt, I. Boutron, T. C. Hoffmann, C. D. Mulrow, L. Shamseer,
     J. M. Tetzlaff, E. A. Akl, S. E. Brennan, R. Chou, J. Glanville, J. M. Grimshaw, A. Hróbjartsson, M. M.
     Lalu, T. Li, E. W. Loder, E. Mayo-Wilson, S. McDonald, L. A. McGuinness, L. A. Stewart, J. Thomas,
     A. C. Tricco, V. A. Welch, P. Whiting, D. Moher, The PRISMA 2020 statement: an updated guideline
     for reporting systematic reviews, BMJ 372 (2021) n71. doi:10.1136/bmj.n71.
 [6] A. Bas, M. O. Topal, Ç. Duman, I. Van Heerden, A Brief History of Deep Learning-Based Text
     Generation, in: J. M. Alja’Am, S. AlMaadeed, S. A. Elseoud, O. Karam (Eds.), Proceedings of the
     International Conference on Computer and Applications, ICCA 2022 - Proceedings, Institute of Elec-
     trical and Electronics Engineers Inc., 2022, pp. 1–4. doi:10.1109/ICCA56443.2022.10039545.
 [7] J. Zhu, X. Ma, Z. Lin, P. De Meo, A quantum-like approach for text generation from knowledge
     graphs, CAAI Transactions on Intelligence Technology (2023). doi:10.1049/cit2.12178.
 [8] H. Zhang, H. Song, S. Li, M. Zhou, D. Song, A Survey of Controllable Text Generation Using
     Transformer-based Pre-trained Language Models, ACM Computing Surveys 56 (2023) 64. doi:10.
     1145/3617680.
 [9] W. Yu, C. Zhu, Z. Li, Z. Hu, Q. Wang, H. Ji, M. Jiang, A Survey of Knowledge-enhanced Text
     Generation, ACM Computing Surveys 54 (2022) 227. doi:10.1145/3512467.
[10] J. Wu, Y. Guo, C. Gao, J. Sun, An automatic text generation algorithm of technical disclosure for
     catenary construction based on knowledge element model, Advanced Engineering Informatics 56
     (2023) 101913. doi:10.1016/j.aei.2023.101913.
[11] H. Du, W. Xing, B. Pei, Automatic text generation using deep learning: providing large-scale
     support for online learning communities, Interactive Learning Environments 31 (2023) 5021–5036.
     doi:10.1080/10494820.2021.1993932.
[12] Q. Chen, H. Sun, H. Liu, Y. Jiang, T. Ran, X. Jin, X. Xiao, Z. Lin, H. Chen, Z. Niu, An extensive
     benchmark study on biomedical text generation and mining with ChatGPT, Bioinformatics 39
     (2023) btad557. doi:10.1093/bioinformatics/btad557.
[13] I. Alonso, E. Agirre, Automatic logical forms improve fidelity in table-to-text generation, Expert
     Systems with Applications 238 (2024). doi:10.1016/j.eswa.2023.121869.
[14] E. Kreiss, F. Fang, N. D. Goodman, C. Potts, Concadia: Towards Image-Based Text Generation with
     a Purpose, in: Y. Goldberg, Z. Kozareva, Y. Zhang (Eds.), Proceedings of the 2022 Conference on
     Empirical Methods in Natural Language Processing, EMNLP 2022, Association for Computational
     Linguistics (ACL), 2022, pp. 4667–4684. doi:10.18653/v1/2022.emnlp-main.308.



                                                             358
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                              332–361


[15] K. Y. Rao, K. S. Rao, S. V. S. Narayana, Conditional-Aware Sequential Text Generation In Knowledge-
     Enhanced Conversational Recommendation System, Journal of Theoretical and Applied In-
     formation Technology 101 (2023) 2820–2836. URL: http://www.jatit.org/volumes/Vol101No7/
     30Vol101No7.pdf.
[16] T. Tazalli, Z. A. Aunshu, S. S. Liya, M. Hossain, Z. Mehjabeen, M. S. Ahmed, M. I. Hossain,
     Computer Vision-Based Bengali Sign Language To Text Generation, in: 5th IEEE International
     Image Processing, Applications and Systems Conference, IPAS 2022, Institute of Electrical and
     Electronics Engineers Inc., 2022, pp. 1–6. doi:10.1109/IPAS55744.2022.10052928.
[17] Z. Teng, C. Chen, Y. Zhang, Y. Zhang, Contrastive Latent Variable Models for Neural Text
     Generation, in: J. Cussens, K. Zhang (Eds.), Proceedings of Machine Learning Research, volume 180,
     ML Research Press, 2022, pp. 1928–1938. URL: https://proceedings.mlr.press/v180/teng22a.html.
[18] C. An, J. Feng, K. Lv, L. Kong, X. Qiu, X. Huang, CONT: contrastive neural text generation, in:
     Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS
     ’22, Curran Associates Inc., Red Hook, NY, USA, 2022, p. 160. URL: https://dl.acm.org/doi/10.5555/
     3600270.3600430.
[19] H. Seo, S. Jung, J. Jung, T. Hwang, H. Namgoong, Y.-H. Roh, Controllable Text Generation Using
     Semantic Control Grammar, IEEE Access 11 (2023) 26329–26343. doi:10.1109/ACCESS.2023.
     3252017.
[20] W. Zhou, Y. E. Jiang, E. Wilcox, R. Cotterell, M. Sachan, Controlled Text Generation with Natural
     Language Instructions, in: A. Krause, E. Brunskill, C. K., B. Engelhardt, S. Sabato, J. Scarlett (Eds.),
     Proceedings of Machine Learning Research, volume 202, ML Research Press, 2023, pp. 42602–42613.
[21] M. Bayer, M.-A. Kaufhold, B. Buchhold, M. Keller, J. Dallmeyer, C. Reuter, Data augmentation in
     natural language processing: a novel text generation approach for long and short text classifiers,
     International Journal of Machine Learning and Cybernetics 14 (2023) 135–150. doi:10.1007/
     s13042-022-01553-3.
[22] S. Hong, S. Moon, J. Kim, S. Lee, M. Kim, D. Lee, J.-Y. Kim, DFX: A Low-latency Multi-FPGA
     Appliance for Accelerating Transformer-based Text Generation, in: Proceedings of the Annual
     International Symposium on Microarchitecture, MICRO, volume 2022-October, IEEE Computer
     Society, 2022, pp. 616–630. doi:10.1109/MICRO56248.2022.00051.
[23] M. Ghazvininejad, V. Karpukhin, V. Gor, A. Celikyilmaz, Discourse-Aware Soft Prompting for Text
     Generation, in: Y. Goldberg, Z. Kozareva, Y. Zhang (Eds.), Proceedings of the 2022 Conference on
     Empirical Methods in Natural Language Processing, EMNLP 2022, Association for Computational
     Linguistics (ACL), 2022, pp. 4570–4589. doi:10.18653/v1/2022.emnlp-main.303.
[24] J. J. Koplin, Dual-use implications of AI text generation, Ethics and Information Technology 25
     (2023) 32. doi:10.1007/s10676-023-09703-z.
[25] A. Pautrat-Lertora, R. Perez-Lozano, W. Ugarte, EGAN: Generatives Adversarial Networks for
     Text Generation with Sentiments, in: F. Coenen, A. Fred, J. Filipe (Eds.), International Joint
     Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management,
     IC3K - Proceedings, volume 1, Science and Technology Publications, Lda, 2022, pp. 249–256.
     doi:10.5220/0011548100003335.
[26] T. Wu, H. Wang, Z. Zeng, W. Wang, H.-T. Zheng, J. Zhang, Enhancing Text Generation with
     Cooperative Training, Frontiers in Artificial Intelligence and Applications 372 (2023) 2704–2711.
     doi:10.3233/FAIA230579.
[27] Y. Li, L. Cui, J. Yan, Y. Yin, W. Bi, S. Shi, Y. Zhang, Explicit Syntactic Guidance for Neural
     Text Generation, in: Proceedings of the Annual Meeting of the Association for Computational
     Linguistics, volume 1, Association for Computational Linguistics (ACL), 2023, pp. 14095–14112.
     doi:10.18653/v1/2023.acl-long.788.
[28] X. Chu, Feature extraction and intelligent text generation of digital music, Computational
     Intelligence and Neuroscience 2022 (2022). doi:10.1155/2022/7952259.
[29] S. Shahriar, GAN computers generate arts? A survey on visual arts, music, and literary text
     generation using generative adversarial network, Displays 73 (2022) 102237. doi:10.1016/j.
     displa.2022.102237.



                                                    359
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                              332–361


[30] H. Strobelt, J. Kinley, R. Krueger, J. Beyer, H. Pfister, A. M. Rush, GenNI: Human-AI Collaboration
     for Data-Backed Text Generation, IEEE Transactions on Visualization and Computer Graphics 28
     (2022) 1106–1116. doi:10.1109/TVCG.2021.3114845.
[31] X. Yin, X. Wan, How Do Seq2Seq Models Perform on End-to-End Data-to-Text Generation?, in:
     S. Muresan, P. Nakov, A. Villavicencio (Eds.), Proceedings of the Annual Meeting of the Association
     for Computational Linguistics, volume 1, Association for Computational Linguistics (ACL), 2022,
     pp. 7701–7710. doi:10.18653/v1/2022.acl-long.531.
[32] S. Montella, A. Nasr, J. Heinecke, F. Bechet, L. M. Rojas-Barahona, Investigating the Effect of
     Relative Positional Embeddings on AMR-to-Text Generation with Structural Adapters, in: EACL
     2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics,
     Proceedings of the Conference, Association for Computational Linguistics (ACL), 2023, pp. 727–736.
     doi:10.18653/v1/2023.eacl-main.51.
[33] N. Fatima, S. M. Daudpota, Z. Kastrati, A. S. Imran, S. Hassan, N. S. Elmitwally, Improving news
     headline text generation quality through frequent POS-Tag patterns analysis, Engineering Appli-
     cations of Artificial Intelligence 125 (2023) 106718. doi:10.1016/j.engappai.2023.106718.
[34] E. Seifossadat, H. Sameti, Improving semantic coverage of data-to-text generation model using
     dynamic memory networks, Natural Language Engineering 30 (2024) 454–479. doi:10.1017/
     S1351324923000207.
[35] C. Meyer, D. Adkins, K. Pal, R. Galici, A. Garcia-Agundez, C. Eickhoff, Neural text generation
     in regulatory medical writing, Frontiers in Pharmacology 14 (2023). doi:10.3389/fphar.2023.
     1086913.
[36] X. Lu, S. Welleck, P. West, L. Jiang, J. Kasai, D. Khashabi, R. Le Bras, L. Qin, Y. Yu, R. Zellers, N. A.
     Smith, Y. Choi, NEUROLOGIC AFesque Decoding: Constrained Text Generation with Lookahead
     Heuristics, in: NAACL 2022 - 2022 Conference of the North American Chapter of the Association
     for Computational Linguistics: Human Language Technologies, Proceedings of the Conference,
     Association for Computational Linguistics (ACL), 2022, pp. 780–799. doi:10.18653/v1/2022.
     naacl-main.57.
[37] W. Xu, Y. Tuan, Y. Lu, M. Saxon, L. Li, W. Y. Wang, Not All Errors Are Equal: Learning Text Genera-
     tion Metrics using Stratified Error Synthesis, in: Y. Goldberg, Z. Kozareva, Y. Zhang (Eds.), Findings
     of the Association for Computational Linguistics: EMNLP 2022, Association for Computational
     Linguistics (ACL), 2022, pp. 6588–6603. doi:10.18653/v1/2022.findings-emnlp.489.
[38] A. Hanafi, M. Bouhorma, L. Elaachak, Machine Learning-Based Augmented Reality For Improved
     Text Generation Through Recurrent Neural Networks, Journal of Theoretical and Applied Informa-
     tion Technology 100 (2022) 518–530. URL: http://www.jatit.org/volumes/Vol100No2/18Vol100No2.
     pdf.
[39] H. Le, D.-T. Le, V. Weber, C. Church, K. Rottmann, M. Bradford, P. Chin, Semi-supervised Adversar-
     ial Text Generation based on Seq2Seq models, in: EMNLP 2022 - Proceedings of the 2022 Conference
     on Empirical Methods in Natural Language Processing: Industry Track, Association for Computa-
     tional Linguistics (ACL), 2022, pp. 264–272. doi:10.18653/v1/2022.emnlp-industry.26.
[40] X. Yue, H. A. Inan, X. Li, G. Kumar, J. McAnallen, H. Shajari, H. Sun, D. Levitan, R. Sim, Synthetic
     Text Generation with Differential Privacy: A Simple and Practical Recipe, in: Proceedings of
     the Annual Meeting of the Association for Computational Linguistics, volume 1, Association for
     Computational Linguistics (ACL), 2023, pp. 1321–1342. doi:10.18653/v1/2023.acl-long.74.
[41] Z. Lin, Y. Gong, Y. Shen, T. Wu, Z. Fan, C. Lin, N. Duan, W. Chen, Text generation with diffusion
     language models: a pre-training approach with continuous paragraph denoise, in: Proceedings of
     the 40th International Conference on Machine Learning, ICML’23, JMLR.org, 2023. URL: https:
     //dl.acm.org/doi/abs/10.5555/3618408.3619275.
[42] M. S. Amin, A. Mazzei, L. Anselma, Towards Data Augmentation for DRS-to-Text Generation,
     CEUR Workshop Proceedings 3287 (2022) 141–152. URL: https://ceur-ws.org/Vol-3287/paper14.pdf.
[43] M. Chen, X. Lu, T. Xu, Y. Li, J. Zhou, D. Dou, H. Xiong, Towards Table-to-Text Generation with
     Pretrained Language Model: A Table Structure Understanding and Text Deliberating Approach,
     in: Y. Goldberg, Z. Kozareva, Y. Zhang (Eds.), Proceedings of the 2022 Conference on Empirical



                                                    360
Artem V. Slobodianiuk et al. CEUR Workshop Proceedings                                      332–361


     Methods in Natural Language Processing, EMNLP 2022, Association for Computational Linguistics
     (ACL), 2022, pp. 8199–8210. doi:10.18653/v1/2022.emnlp-main.562.
[44] V. Agarwal, S. Ghosh, H. BSS, H. Arora, B. R. K. Raja, TrICy: Trigger-Guided Data-to-Text
     Generation With Intent Aware Attention-Copy, IEEE/ACM Transactions on Audio, Speech, and
     Language Processing 32 (2024) 1173–1184. doi:10.1109/TASLP.2024.3353574.
[45] W. M. Si, M. Backes, Y. Zhang, A. Salem, Two-in-One: A Model Hijacking Attack Against Text Gen-
     eration Models, in: 32nd USENIX Security Symposium, USENIX Security 2023, volume 3, USENIX
     Association, 2023, pp. 2223–2240. URL: https://www.usenix.org/system/files/usenixsecurity23-si.
     pdf.
[46] H. Gong, X. Feng, B. Qin, Quality Control for Distantly-Supervised Data-to-Text Generation via
     Meta Learning, Applied Sciences 13 (2023) 5573. doi:10.3390/app13095573.
[47] L. Mou, Search and learning for unsupervised text generation, AI Magazine 43 (2022) 344–352.
     doi:10.1002/aaai.12068.
[48] D. Taunk, S. Sagare, A. Patil, S. Subramanian, M. Gupta, V. Varma, XWikiGen: Cross-lingual
     Summarization for Encyclopedic Text Generation in Low Resource Languages, in: ACM Web
     Conference 2023 - Proceedings of the World Wide Web Conference, WWW 2023, Association for
     Computing Machinery, Inc, 2023, pp. 1703–1713. doi:10.1145/3543507.3583405.
[49] Introducing the next generation of Claude, 2024. URL: https://www.anthropic.com/news/
     claude-3-family.
[50] awesomegpts.ai, Scholar GPT, 2024. URL: https://chatgpt.com/g/g-kZ0eYXlJe-scholar-gpt?oai-dm=
     1.
[51] A. V. Slobodianiuk, Ohliad statei [Papers’ review], 2024. URL: https://docs.google.
     com/spreadsheets/d/e/2PACX-1vR6ZUaeeBjVgVl-do6QXm-Pua-HdztOxjC4DUqunrSDZ_
     -YSRz-Ng9xktYH9b0LDT502SiVy3YePx9F/pubhtml.
[52] Hugging Face, Languages, 2024. URL: https://huggingface.co/languages.



A. Review map for an article
   1. Bibliographic reference
   2. Document type: journal article or conference paper
   3. Title
   4. Year of publication
   5. Countries represented by the authors
   6. Purpose of the article
   7. Neural network architectures used
   8. Quality metrics used
   9. Characteristics of the datasets used
         • name
         • data type: sentence, paragraph, document, question-answer, not specified
         • size
         • format: CSV, JSON, XML, files, not specified
         • labeling type: labeled data, unlabeled data
         • data quality: raw, pre-processed
         • accessibility: publicly available, private, not specified
         • link
  10. Text generation task solved (what the neural network was used for)
  11. Language of text generation




                                                   361