<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Forum for Information Retrieval Evaluation, December</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>marization Through Post-Processing Techniques</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>A Fathima Firose</string-name>
          <email>fathimafiroseofficial@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Reshma Sheik</string-name>
          <email>rezmasheik@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>KathijaAfrose A</string-name>
          <email>kathijaafrose@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>S. Jaya Nirmala</string-name>
          <email>sjaya@nitt.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>News Article Summarization, Post-Processing Techniques, Redundancy, Repetition Avoidance, Sentence Length</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Control</institution>
          ,
          <addr-line>Pre-Trained Language Models, ROUGE Metrics, BERTScore, Summary Quality, Informative Summaries</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Indian Institute of Technology</institution>
          ,
          <addr-line>Jodhpur</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National Institute of Technology</institution>
          ,
          <addr-line>Trichy</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Thangal Kunju Musaliar College of Engineering</institution>
          ,
          <addr-line>Kollam</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Thiagarajar College of Engineering</institution>
          ,
          <addr-line>Madurai</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>1</volume>
      <fpage>2</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>This study explores the enhancement of automatic news article summarization by incorporating a series of post-processing techniques into pre-trained language models. The work focuses on improving summary quality by addressing common issues such as redundancy and lack of conciseness through repetition avoidance, sentence length control, and the inclusion of article headlines in the summarization process. Using fine-tuned models across four languages-English, Tamil, Telugu, and Kannada-the study benchmarks the impact of these techniques on summary quality. Evaluation metrics such as ROUGE and BERTScore were used to measure improvements in both informativeness and fluency. The results show that applying post-processing techniques significantly enhances the performance of the models, yielding more coherent, concise, and contextually relevant summaries. These findings provide a strong framework for improving summarization models in multilingual settings, with implications for more efective news content generation and summarization in various languages.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>CEUR</p>
      <p>ceur-ws.org
summarization of news articles across multiple languages. A notable gap exists in the efective
summarization of low-resource Indic languages, where current models often struggle to produce coherent
and informative summaries due to limited training data and resources. This research addresses that
challenge by systematically evaluating the impact of the proposed techniques on the summarization
performance of Indic languages. The study seeks to contribute to the growing body of knowledge in
multilingual text summarization and provide actionable insights for improving summary quality in
underrepresented languages. The findings are expected to demonstrate significant improvements in
metrics such as ROUGE and BERTScore, underscoring the importance of post-processing in enhancing
the performance of language models in generating informative and coherent summaries.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Text summarization has evolved dramatically, with various approaches addressing both extractive and
abstractive techniques. Extractive methods select important sentences directly from the text, while
abstractive methods generate summaries that may rephrase or condense the content. In recent years,
transformer-based models like BERT, GPT, and PEGASUS have shown considerable improvements
in text summarization. For example, BERT-based models have been used in generating summaries
by pre-training on large corpora of documents, improving the fluency and coherence of generated
summaries [4]. PEGASUS introduced gap-sentence generation as a novel pre-training strategy for
abstractive summarization, enhancing the quality of generated summaries across multiple domains [5].
However, the issue of repetition, particularly in neural models, has gained increasing attention. Fu
et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] analyzed the repetition problem and its causes within neural sequence-to-sequence models,
providing key insights into this issue. Xu et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]later proposed solutions by incorporating penalties
for repetitive tokens, leading to better performance in abstractive summarization. In addition, Nair and
Singh [6] discussed techniques for reducing repetition in abstractive summarization, emphasizing the
role of linguistic properties in causing repeated phrases.
      </p>
      <p>
        Moreover, research in multilingual summarization, especially for low-resource languages such as
Indic languages, has been relatively sparse. IndicBART, introduced by Dabre et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], addresses this
gap by providing a pre-trained model specifically designed for Indic natural language generation. This
advancement is crucial, as low-resource languages often sufer from a lack of comprehensive datasets,
which afects model performance. The model demonstrated improved summarization capabilities
by utilizing the shared characteristics of Indic languages, thereby filling a critical void in the field.
Recent work has also focused on eliminating redundancy in multi-document summarization. Mani and
Bloedorn [7] introduced a contextual parameter that avoids redundancy by selecting each term in a
phrase only once, improving the summary’s informativeness. Additionally, Calvo et al. [8] proposed
techniques to handle redundancy in multi-document summarization, highlighting the importance of
sentence-level diversity in generated summaries.
      </p>
      <p>
        While substantial progress has been made in improving text summarization, several gaps remain,
especially for low-resource languages. Much of the existing research focuses on resource-rich languages
like English, which benefit from large training datasets. This creates a performance disparity when
models are applied to low-resource languages, such as those found in the Indic language family. Although
IndicBART [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] has made strides in this area, there is still a notable lack of research focused on refining
summarization techniques for these languages. Indic languages present unique challenges, such as
complex syntactic structures and limited annotated data, which are not adequately addressed by current
models. Furthermore, repetition in generated summaries remains an ongoing challenge. While studies
like those by Fu et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Nair and Singh [6] propose methods for reducing repetition, there is still
room for improvement, particularly in multilingual contexts. This research aims to bridge these gaps
by focusing on advanced post-processing techniques, including repetition avoidance, sentence length
control, and headline integration, specifically tailored for low-resource languages like Indic ones. These
post-processing techniques, largely underexplored for Indic languages, are critical for improving the
lfuency, coherence, and overall quality of generated summaries.
      </p>
      <p>The ILSUM shared tasks have significantly contributed to advancing Indian language summarization.
The first shared task, introduced in 2022, focused on summarization challenges and methodologies
across multiple Indian languages, establishing a benchmark for future research [9, 10]. In 2023, the
task expanded with additional datasets and improved evaluation strategies, providing insights into key
challenges and solutions for Indian language summarization [11, 12]. These eforts have paved the way
for this year’s task, which further enhances the dataset with new languages and additional documents.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>The dataset used in this study is derived from the ILSUM 2024 shared task introduced as part of the
FIRE 2024 conference. This year’s task expanded the dataset to include additional documents for
Dravidian languages such as Tamil, Telugu, and Kannada, alongside enhancements for other languages.
These new additions provide a richer and more diverse dataset to advance research in Indian language
summarization [13], [14]. The datasets are split into training, validation, and testing sets, as shown in
Table 1. For Tamil, Telugu, and Kannada, the IndicBARTSS model was fine-tuned, while for English,
the T5-Base model was used. These splits align with the dataset structure provided for the ILSUM 2024
task.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Model Architecture</title>
        <p>This study primarily employs IndicBARTSS, a multilingual, sequence-to-sequence pre-trained model
ifne-tuned separately for each language. IndicBARTSS is specifically designed for Indic languages and
is built on the mBART architecture. It supports 11 Indian languages and ofers a more lightweight and
computationally eficient alternative compared to larger models like mBART or mT5. IndicBARTSS
was trained on an extensive corpus of over 452 million sentences and 9 billion tokens, covering various
linguistic nuances. The model can be adapted to tasks such as summarization, machine translation, and
question generation.</p>
        <p>The fine-tuning process for IndicBARTSS involved using the aforementioned training, validation, and
test sets for Tamil, Telugu, and Kannada. Each language was handled separately, using the respective
script to avoid any transliteration challenges. The model was optimized to generate summaries that
maintain the integrity of the source language while minimizing redundancy and repetition, which is
often a challenge in multilingual summarization tasks. For English, the T5-Base model was employed,
featuring 220 million parameters. This model reframes natural language processing tasks into a unified
text-to-text format, where the input and output are always text. The T5-Base model has been applied
to a wide range of NLP tasks, including document summarization, machine translation, and question
answering, making it a strong candidate for summarization tasks in this study. Both models were
optimized using supervised fine-tuning, focusing on minimizing the loss between predicted and
groundtruth summaries. The models were evaluated using standard metrics such as ROUGE to assess the
quality and faithfulness of the generated summaries.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Summarization Techniques</title>
        <p>The summarization experiments were conducted in four diferent runs, each utilizing distinct
postprocessing techniques to enhance the quality and diversity of generated summaries. The same
procedures were applied to all the languages in the dataset: Tamil, Telugu, Kannada, and English (the latter
utilizing the T5-Base model).</p>
        <sec id="sec-3-3-1">
          <title>3.3.1. Run 0: Summarization Without Post-Processing</title>
          <p>In this configuration, summaries were generated directly from the model without any additional
postprocessing techniques. The IndicBARTSS model, fine-tuned on each respective language dataset, was
employed to generate the summaries. The model’s output was kept at a maximum token length of 128
to ensure brevity. This run serves as a baseline to evaluate the intrinsic capabilities of the model in
summarizing text without the interference of external constraints. The main goal was to assess how
efectively the model can produce coherent, contextually accurate summaries with no extra operations
for handling issues like repetition or sentence length control.</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>3.3.2. Run 1: Post-Processing with Repetition Avoidance</title>
          <p>For the first run, a post-processing step was applied to minimize repetition in the generated summaries.
Repetition is a known issue in sequence-to-sequence models, especially for tasks involving longer texts.
To address this, diverse beam search was incorporated. Specifically, a beam search mechanism with
6 beams was utilized, divided into 3 beam groups, and a diversity penalty of 0.7 was introduced to
encourage diversity across beams. Additionally, the no_repeat_ngram_size parameter was set to 3,
preventing the model from repeating the same trigrams. This configuration aimed to produce more
varied and non-repetitive outputs while preserving the meaning of the original articles.</p>
        </sec>
        <sec id="sec-3-3-3">
          <title>3.3.3. Run 2: Post-Processing with Sentence Length Control</title>
          <p>In the second experimental setup, a diferent post-processing constraint was applied, focusing on
controlling the length of the generated summaries. Limiting the summary to two sentences was a key
objective. Given that some summaries might be excessively verbose, this post-processing step sought
to enhance conciseness while retaining the essential information. The model was modified to terminate
early when two sentences were generated, employing a stop condition within the text generation
process. The result of this approach was a set of summaries that aligned with length restrictions,
providing succinct overviews while maintaining clarity.</p>
        </sec>
        <sec id="sec-3-3-4">
          <title>3.3.4. Run 3: Heading Integration and Sentence Length Control</title>
          <p>In the third run, the heading of the article is integrated during the preprocessing step to enrich the
context for the model before generating the summary. This step aims to align the summary more closely
with the article’s main theme or headline. Additionally, the sentence length control mechanism from
Run 2 is retained, ensuring that the summaries are concise and focused on the most important content.
The combination of heading integration and sentence length control enables the model to generate
more focused summaries that incorporate the article’s title as a contextual anchor, while limiting output
length to enhance clarity and relevance.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Evaluation Metrics</title>
        <p>The evaluation of summarization models in the conducted experiments employs several established
metrics, including ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BERTScore.
These metrics provide quantitative assessments of the alignment between generated summaries and
reference summaries.</p>
        <p>ROUGE includes a set of measures that evaluate the quality of generated summaries by comparing
them to reference summaries, primarily focusing on n-gram overlap. ROUGE-1 measures the overlap of
unigrams (individual words), ROUGE-2 assesses bigram (two-word) overlaps, and ROUGE-L evaluates
the longest common subsequence between the generated and reference summaries. These metrics
emphasize recall, precision, and F1 scores, ofering insights into the relevance and completeness of the
summaries produced by the models. Due to its efectiveness and widespread use, ROUGE remains a key
tool for summarization evaluation [15]. BERTScore utilizes pre-trained BERT models to assess the
quality of generated text. Unlike ROUGE, which relies on exact word matching, BERTScore compares
generated and reference texts using contextual embeddings, capturing deeper semantic similarities.
This approach is particularly efective for abstractive summarization, where maintaining semantic
integrity is important. BERTScore provides precision, recall, and F1 scores, indicating how well the
generated summaries convey the intended meaning of the reference texts [16].</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Experimental Setup</title>
        <p>The experiments were conducted using Google Colab, utilizing a T4 GPU with high RAM capacity,
which facilitated eficient model training and evaluation. The process began with fine-tuning the
IndicBARTSS model and the T5-Base model on the respective datasets, which included Tamil, Telugu,
Kannada, and English text. A batch size of 2 was employed during training and inference to balance
memory usage and processing speed. This choice allowed the models to handle the complexity of the
tasks while ensuring that the GPU resources were utilized efectively. After fine-tuning, the models
were saved to the Hugging Face Model Hub, providing an accessible format for further experimentation.
Subsequent tests were conducted across four distinct runs, labeled from Run 0 to Run 3, to evaluate the
impact of various summarization techniques and post-processing methods on the quality of generated
summaries. Each run applied a diferent approach, ranging from generating summaries without any
modifications to integrating post-processing techniques such as repetition avoidance, sentence length
control, and the combination of headings with articles.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results</title>
        <p>The performance of the multilingual summarization system was evaluated across four languages: Tamil,
Telugu, Kannada, and English. The evaluation metrics used include ROUGE-1, ROUGE-2, ROUGE-4, and
ROUGE-L scores to assess the overlap between system-generated summaries and reference summaries,
while BERTScore was employed to gauge the semantic similarity between the summaries. Tables 2and 3
present a detailed breakdown of the results for each language and run.</p>
        <p>For Tamil, the results reveal that Run1, which employed post-processing with repetition avoidance,
achieved the highest ROUGE scores across all metrics, particularly in ROUGE-1 (0.218) and ROUGE-L
(0.2091). This suggests that the repetition avoidance technique applied in this run proved to be the most
efective in fine-tuning sentence structure while maintaining content integrity. Although Run2, which
applied post-processing with sentence-length control, demonstrated close performance, it fell slightly
behind, particularly in ROUGE-4 and ROUGE-L, indicating that while it handled n-gram overlap well, its
ability to capture the overall structure of the summaries was slightly reduced. Run3, which focused on
heading integration and sentence-length control, showed a notable decline in performance (ROUGE-1
of 0.208), suggesting that this method may have resulted in oversimplified summaries that missed key
content aspects (see Table 2). In terms of BERTScore, Run2 emerged as the strongest performer, with
an F1 score of 0.729, indicating a slight edge in semantic similarity (see Table 3). This suggests that the
combination of sentence simplification with repetition avoidance led to better alignment with reference
summaries in terms of meaning. While Run1 followed closely with an F1 score of 0.7267, Run3 lagged
behind, confirming the limitations of its post-processing technique for Tamil summaries.</p>
        <p>For Kannada, the results show that Run1 achieved the highest ROUGE scores across all metrics,
particularly in ROUGE-1 (0.2284) and ROUGE-L (0.2218). This suggests that the post-processing
technique applied in Run1, which focused on repetition avoidance while maintaining content accuracy,
was the most efective. Although Run2 showed close performance, it fell slightly behind, particularly in
ROUGE-2 and ROUGE-4, indicating that while it controlled sentence length well, it had a reduced ability
to capture the overall structure and detail of the summaries. Run3, which also applied sentence-length
control, showed a slight decline in performance, with ROUGE-1 at 0.2243 and ROUGE-L at 0.2167,
suggesting that stricter sentence-length constraints might have led to summaries with fewer details
and a less coherent structure (see Table 2). In terms of BERTScore, Run1 again performed the best,
achieving an F1 score of 0.7349, indicating the best alignment with the reference summaries in terms of
meaning. Run2 had a close F1 score of 0.7289, while Run3 had a lower F1 score of 0.7317, confirming
that limiting sentence length, without focusing on repetition avoidance, reduced the semantic quality
of the summaries (see Table 3).</p>
        <p>For Telugu, the results show that Run1, which employed post-processing with repetition avoidance,
performed the best across all ROUGE metrics, with ROUGE-1 at 0.7488 and ROUGE-L at 0.7555. This
indicates that the repetition avoidance technique efectively captured the main content and structure
of the summaries. Run2, which applied post-processing with sentence-length control, had slightly
lower scores, with ROUGE-L at 0.7486. While it helped manage sentence lengths, it might have slightly
afected the overall flow and coherence. Run3, which combined heading integration and sentence-length
control, had the lowest scores, with ROUGE-1 at 0.7313 and ROUGE-L at 0.7284, suggesting that this
approach, while structured, led to less detailed summaries (see Table 2). In terms of BERTScore, Run1
again had the highest F1 score (0.7555), showing the best alignment with the reference summaries in
terms of meaning. Run2 followed closely with an F1 score of 0.7486, while Run3 had the lowest F1
(0.7284), confirming that the combined heading integration and sentence-length control reduced the
semantic quality of the summaries (see Table 3).</p>
        <p>For English, the results indicate that Run1 achieved the highest ROUGE scores across all metrics,
particularly in ROUGE-1 (0.3435) and ROUGE-L (0.2964). This suggests that the post-processing
technique used in Run1, which focused on reducing repetition while maintaining key content, was
the most efective for English summaries. Run2 showed slightly lower performance, particularly in
ROUGE-2 and ROUGE-4, with ROUGE-2 at 0.1589 and ROUGE-4 at 0.1011, indicating that although
it controlled sentence length efectively, it slightly compromised the capture of n-gram overlap and
summary structure. Run3, which also applied sentence-length control, showed a slight decline in
performance (ROUGE-1 of 0.3388 and ROUGE-L of 0.2834), suggesting that stricter length constraints
led to less detailed and less coherent summaries (see Table 2). In terms of BERTScore, Run1 again
emerged as the strongest performer with an F1 score of 0.8735, showing the best alignment with the
reference summaries in terms of meaning. Run3 closely followed with an F1 score of 0.8742, while
Run2 had a slightly lower F1 score of 0.8702, confirming that sentence-length control with repetition
avoidance yielded the best semantic similarity (see Table 3).</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <sec id="sec-5-1">
        <title>5.1. Implications</title>
        <p>The results of our multilingual summarization task provide several key implications for future research
and applications in text summarization, particularly in the context of Indic languages. The findings
indicate that post-processing techniques significantly influence the quality of generated summaries,
with varying efectiveness across languages. For example, in Tamil and Telugu, balancing sentence
restructuring and repetition avoidance proved most beneficial, leading to higher ROUGE scores and
improved semantic coherence. This suggests that language-specific fine-tuning of post-processing
methods is critical, as general approaches may not capture the unique linguistic nuances inherent in
diferent languages.</p>
        <p>In the case of Kannada and English, techniques that prioritized simplicity without oversimplification
produced better results, emphasizing that reducing sentence complexity while maintaining critical
content is crucial. For future research, these insights highlight the need to develop more adaptive
postprocessing techniques that can dynamically adjust to the linguistic and structural requirements of the
target language. Additionally, the strong performance of certain runs in terms of BERTScore points to the
potential of exploring semantic-focused evaluation metrics further, which could complement traditional
n-gram-based evaluations like ROUGE in producing more meaningful and human-readable summaries.
The findings have practical implications for real-world applications, especially for industries or platforms
relying on automated summarization of multilingual content, such as news articles, governmental
reports, and legal documents. The ability to fine-tune summarization systems for specific languages
will enhance the reliability and readability of summaries in low-resource languages like Tamil, Telugu,
and Kannada, making it feasible to deploy these systems in diverse linguistic environments.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Limitations</title>
        <p>While the study ofers valuable insights into post-processing techniques for multilingual summarization,
certain limitations should be acknowledged. First, the evaluation was limited to four languages: Tamil,
Telugu, Kannada, and English, which may constrain the generalizability of the findings to other Indic
or non-Indic languages. Diferent languages, particularly those with complex grammatical structures
or varying sentence patterns, may require further adjustments to the post-processing methods tested
here. As a result, our approach may not be universally applicable without modifications for these
other languages. Another limitation concerns the dataset size and domain. The training and validation
data for each language were limited to specific text domains, which may impact the summarization
system’s generalizability across various content types, such as technical documents, literary texts, or
conversational content. Future research could address this by incorporating more diverse datasets to u
whether the post-processing strategies remain efective across broader contexts.</p>
        <p>Moreover, while we focused on ROUGE and BERTScore metrics for evaluation, these metrics, though
widely used, may not capture all aspects of summary quality. ROUGE primarily focuses on n-gram
overlap, which might overlook summaries that are semantically equivalent but phrased diferently.
Similarly, BERTScore focuses on semantic similarity but may not always correlate with human judgment
of summary readability or informativeness. Expanding the evaluation criteria to include human
assessment of summaries could provide a more comprehensive understanding of the summarization
system’s performance and practical utility. Finally, computational constraints posed a limitation in
terms of the number of post-processing variations tested. Future research could explore a broader range
of techniques, such as sentence fusion or controlled generation, to further enhance the quality of the
generated summaries.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This study investigated the influence of various post-processing techniques on the performance of
multilingual summarization systems, specifically focusing on Tamil, Telugu, Kannada, and English
datasets. The results reveal that the efectiveness of summarization strategies is markedly contingent
upon the specific language and the post-processing methodologies employed. Notably, a balanced
approach that integrates sentence restructuring with repetition avoidance yielded optimal outcomes
for Tamil and Telugu, as evidenced by superior ROUGE and BERTScore metrics. In contrast, the
simplification of sentence structure while preserving critical content emerged as the most efective
strategy for Kannada and English. These findings highlight the necessity for tailored, language-specific
approaches in multilingual summarization tasks, underscoring the importance of adapting techniques
to accommodate linguistic and structural peculiarities inherent in each language.</p>
      <p>Future research should focus on developing more flexible post-processing techniques that can adapt
to the unique characteristics of diferent languages. Expanding the study to include more languages,
especially those that are less studied, will help determine how well these techniques work across various
linguistic contexts. Additionally, incorporating human evaluations alongside traditional metrics like
ROUGE and BERTScore could provide deeper insights into the quality of the generated summaries.
Further exploration of cross-lingual summarization, where summaries are created in one language
based on content in another, also presents an interesting direction for future studies.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
[4] Y. Liu, M. Lapata, Text summarization with pretrained encoders, arXiv preprint arXiv:1908.08345
(2019).
[5] J. Zhang, Y. Zhao, M. Saleh, P. Liu, Pegasus: Pre-training with extracted gap-sentences for
abstractive summarization, in: International conference on machine learning, PMLR, 2020, pp.
11328–11339.
[6] P. Nair, A. K. Singh, On reducing repetition in abstractive summarization, in: Proceedings of the</p>
      <p>Student Research Workshop Associated with RANLP 2021, 2021, pp. 126–134.
[7] I. Mani, Summarizing similarities and diferences among related document, Advances in Automatic</p>
      <p>Text Summarization (1999).
[8] H. Calvo, P. Carrillo-Mendoza, A. Gelbukh, On redundancy in multi-document summarization,</p>
      <p>Journal of Intelligent &amp; Fuzzy Systems 34 (2018) 3245–3255.
[9] S. Satapara, P. Mehta, S. Modha, D. Ganguly, Indian language summarization at FIRE 2023, in:
D. Ganguly, S. Majumdar, B. Mitra, P. Gupta, S. Gangopadhyay, P. Majumder (Eds.), Proceedings
of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2023, Panjim,
India, December 15-18, 2023, ACM, 2023, pp. 27–29. URL: https://doi.org/10.1145/3632754.3634662.
doi:10.1145/3632754.3634662.
[10] S. Satapara, P. Mehta, S. Modha, A. Hegde, S. HL, D. Ganguly, Overview of the third shared task
on indian language summarization (ilsum 2024), in: K. Ghosh, T. Mandl, P. Majumder, D. Ganguly
(Eds.), Working Notes of FIRE 2024 - Forum for Information Retrieval Evaluation, Gandhinagar,
India. December 12-15, 2024, CEUR Workshop Proceedings, CEUR-WS.org, 2024.
[11] S. Satapara, P. Mehta, S. Modha, A. Hegde, S. HL, D. Ganguly, Key insights from the third ilsum
track at fire 2024, in: Proceedings of the 16th Annual Meeting of the Forum for Information
Retrieval Evaluation, FIRE 2024, Gandhiinagar, India. December 12-15, 2024, ACM, 2024.
[12] S. Satapara, P. Mehta, D. Ganguly, S. Modha, Fighting fire with fire: Adversarial prompting to
generate a misinformation detection dataset, CoRR abs/2401.04481 (2024). URL: https://doi.org/10.
48550/arXiv.2401.04481. doi:10.48550/ARXIV.2401.04481. arXiv:2401.04481.
[13] S. Satapara, B. Modha, S. Modha, P. Mehta, FIRE 2022 ILSUM track: Indian language summarization,
in: D. Ganguly, S. Gangopadhyay, M. Mitra, P. Majumder (Eds.), Proceedings of the 14th Annual
Meeting of the Forum for Information Retrieval Evaluation, FIRE 2022, Kolkata, India, December
9-13, 2022, ACM, 2022, pp. 8–11. URL: https://doi.org/10.1145/3574318.3574328. doi:10.1145/
3574318.3574328.
[14] S. Satapara, P. Mehta, S. Modha, D. Ganguly, Key takeaways from the second shared task on indian
language summarization (ILSUM 2023), in: K. Ghosh, T. Mandl, P. Majumder, M. Mitra (Eds.),
Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation (FIRE-WN 2023), Goa,
India, December 15-18, 2023, volume 3681 of CEUR Workshop Proceedings, CEUR-WS.org, 2023, pp.
724–733. URL: https://ceur-ws.org/Vol-3681/T8-1.pdf.
[15] C.-Y. Lin, Rouge: A package for automatic evaluation of summaries, in: Text summarization
branches out, 2004, pp. 74–81.
[16] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, Y. Artzi, Bertscore: Evaluating text generation with
bert, arXiv preprint arXiv:1904.09675 (2019).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lam</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. M.-C. So</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Shi</surname>
          </string-name>
          ,
          <article-title>A theoretical analysis of the repetition problem in text generation</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>35</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>12848</fpage>
          -
          <lpage>12856</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Dabre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shrotriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kunchukuttan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Puduppully</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Khapra</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>Indicbart: A pre-trained model for indic natural language generation</article-title>
          ,
          <source>arXiv preprint arXiv:2109.02903</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Learning to break the loop: Analyzing and mitigating repetitions for neural text generation</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          )
          <fpage>3082</fpage>
          -
          <lpage>3095</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>