1. Introduction

1613-0073

Code-Mixing and Script-Mixing in Indian Language Sum marization with Transformer Models

Pulkit Chatwal

Pulkitchatwal@gmail.com 2 3

Amit Agarwal

aagarwal3@cs.iitr.ac.in 0 3

Ankush Mittal

dr.ankush.mittal@gmail.com 1 3

Workshop

Indian Languages, Text Summarization, PreTrained Model, SequencetoSequence models, Multilingual Text

0 AICoE Wells Fargo International Solutions Private Limited , Bangalore , India 1 COER University , Roorkee India 2 Rajiv Gandhi Institute of Petroleum Technology , Jais , India 3 Summarization , Transformer Models, FineTuning, LLM

With the growing need for accessible information across diverse linguistic backgrounds, text summarization in multilingual contexts has become increasingly essential. Text summarization is a crucial task in natural language processing, particularly in multilingual settings involving Indian languages. This paper presents our approach for the FIRE 2024 task, where we leverage large transformerbased language models for summarizing Indian languages, addressing the linguistic diversity and frequent instances of codemixing and scriptmixing unique to this context. Our methodology incorporates both extractive and abstractive summarization techniques, optimized for Indian languages through advanced finetuning of models like mT5, IndicBART, and BART. While prompt engineering has predominantly been applied to English tasks, we adapt it alongside finetuning to enhance summarization performance and computational efficiency. Our models achieved top results in five languages-Hindi, Gujarati, English, Tamil, and Bengali-and ranked second in Telugu. These results demonstrate substantial improvements in summarization accuracy, underscoring our approach's efficacy in handling the complexities of Indian languages and advancing text processing in multilingual, mixedlanguage environments.

1. Introduction

With the vast amount of information generated daily across multiple languages, the ability to automat ically summarize text has become essential for efficient information consumption and accessibility. In multilingual and multicultural societies like India, where linguistic diversity includes hundreds of lan guages and dialects, automatic summarization solutions are crucial for bridging communication gaps and ensuring equitable access to information. However, these challenges are further intensified by the preva lence of codemixing (the blending of two or more languages within a single text) and scriptmixing (the use of multiple writing systems), making conventional summarization methods insufficient. Addressing these complexities is essential for developing summarization tools that serve diverse user groups and linguistic contexts effectively.

Traditional summarization approaches often fall short in managing these complexities, particularly in multilingual settings. For instance, [ 1 ] highlights the limitations of conventional summarization tech niques when dealing with codemixed text. They emphasize the need for advanced, automatic summa rization methods capable of processing complex, multimodal data, such as text, images, and audio, to meet strategic information needs. Similarly, [ 2 ] explore the challenges faced by multilingual users who frequently engage in codemixing and underscore the necessity for conversational agents designed to process mixedlanguage content effectively.

To address these challenges, we propose a dual approach that leverages both finetuning and prompt based techniques applied to transformerbased models. Our solution involves finetuning multilingual models (mT5 and IndicBART) and English models (BART) on a diverse dataset of Indian languages to capture unique linguistic patterns, including codemixing and scriptmixing. Additionally, we employ

CEUR

ceur-ws.org prompt engineering techniques to optimize summarization performance and computational efficiency, particularly for Englishlanguage content. This combined approach creates a robust, adaptable solution capable of generating accurate, concise summaries across a spectrum of Indian languages and mixed language inputs.

2. Related Work

Recent advancements in text summarization have employed a range of methods and datasets, particu larly focusing on finetuning transformerbased models for improved summarization performance. [ 3 ] introduced WikiLingua, a multilingual dataset with articlesummary pairs in 18 languages, and fine tuned the mBART model on this dataset. While effective, these efforts mainly focused on language pairs and did not address the complexities associated with codemixing and scriptmixing, both of which are crucial for multilingual contexts such as Indian languages. [ 4 ] utilized transformerbased models like RoBERTaBase and Flan T5 Base for crossplatform age classification on social media, achieving im pressive accuracy. [ 5 ] explored leadership traits during natural hazards by analyzing personality and emotional characteristics, uncovering key differences between local and global leaders. [ 6 ] introduced AgriLLM, leveraging transformers to automate query resolution for farmers and bridge information gaps in agriculture. [ 7 ] developed an SMSbased FAQ retrieval system using machine learning to refine noisy text and improve information access. Similarly, [ 8 ] finetuned T5, BART, and Pegasus for abstractive summarization of medical documents using the SUMPUBMED dataset, but their research did not extend to the multilingual, mixedlanguage contexts typical of Indian languages.

In the domain of extractive summarization, [ 9 ] proposed a graphbased approach that transformed text into a network of interconnected sentences and used a selectivity measure to assess node signifi cance. This graphbased model, while innovative, is limited in handling languagespecific nuances and lacks adaptability for abstractive summarization in mixedlanguage texts. [ 10 ] provided a comparative analysis of major extractive summarization techniques such as TFIDF, Clustering, Fuzzy Logic, Neu ral Networks, and Graphbased methods, yet these methods typically struggle with capturing abstracted meaning, particularly in mixedlanguage settings.

For English text summarization, [ 11 ] demonstrated the effectiveness of BERTbased models across var ious datasets, showcasing strong summarization quality. Similarly, [ 12 ] explored reinforcement learning for abstractive summarization, optimizing both readability and content fidelity. However, these methods were focused on monolingual English text and did not address the unique challenges of Indian languages, where codemixing and scriptmixing are common.

In languagespecific research, [ 13 ] used TextRank and Fuzzy Cmeans for Bengali text summariza tion, highlighting the importance of customized models for individual languages. For Gujarati, [ 14 ] combined pretrained language models with clustering techniques, showing promising results for low resource languages. However, these methods were largely extractive and were limited in their ability to handle codemixed content across multiple Indian languages.

Studies from shared tasks organized by FIRE ([ 15 ]; [ 16 ]; [ 17 ]; [ 18 ]; [ 19 ]; [ 20 ]) have examined sum marization challenges in Indian languages, employing diverse methodologies and models. Although these contributions have advanced summarization for Indian languages, they primarily focused on single language settings or basic multilingual scenarios, lacking robust solutions for complex, codemixed in puts.

Our Approach: Our work differs from previous studies by specifically targeting the multilingual, codemixed, and scriptmixed text summarization challenges inherent in Indian languages. We adopt a dual approach that combines finetuning and promptbased methods for transformerbased models, such as mT5, IndicBART, and BART. By finetuning these models on a dataset of Indian languages and lever aging prompt engineering for computational efficiency, our approach is designed to capture the unique linguistic patterns of each language, including mixedlanguage constructs. This combined methodology allows us to generate coherent, highquality summaries that address the specific complexities of Indian language contexts, setting our work apart as a comprehensive solution for multilingual summarization

3. Problem Statement

Let represent the dataset of news articles across multiple Indian languages, where: = {( , ) ∣ ∈ , ∈ } =1 where: is the th article in the dataset, containing a mixture of text from multiple languages and potential codemixing and scriptmixing. is the corresponding reference summary for . is the space of input texts (articles), and is the space of target summaries. is the total number of articlesummary pairs in the dataset.

The goal is to learn a mapping function ∶ → that generates concise and informative summaries for each article such that the generated summary ̂ = ( ) approximates .

3.1. Problem Objective

The objective is to minimize the error between the generated summaries ̂ and the reference summaries , typically measured using evaluation metrics such as ROUGE, BLEU, or cosine similarity in embedding space. Formally:

min ∑ ℒ ( ( =1 ), ) where: ℒ is the loss function representing the error between the generated summary ̂ = ( ) and the reference summary . 3.2. Additional Constraints and Considerations handle crosslingual transfer effectively.

| ̂ | ≈ , where is the desired summary length. • Multilingual and CodeMixed Text: may contain tokens from multiple languages = { 1, 2, … , }, where each corresponds to a distinct language script. Thus, the model must • Length Constraint: Each generated summary ̂ should ideally satisfy a fixed length constraint • Semantic Fidelity: The mapping should retain the essential semantic information from in ̂ , aligning with the reference summaries in terms of main facts and insights.

4. Model Description

English Language Model: BART For Englishlanguage summarization, we employ the facebook/bart largecnn model, a stateoftheart transformerbased encoderdecoder (seq2seq) architecture. BART combines a bidirectional encoder, akin to BERT, with an autoregressive decoder, similar to GPT. This model is pretrained with a denoising autoencoder objective, where it learns to reconstruct text corrupted by noise functions. This pretraining process equips BART to handle various downstream tasks, includ ing summarization and translation, as well as comprehension tasks like text classification and question answering. For our experiments, we finetuned the model on the CNN/Daily Mail dataset, which con tains a substantial collection of paired text and summary samples, ensuring robust performance in text summarization tasks [ 21 ].

Multilingual Language Models: mT5 and IndicBART To address summarization for multiple Indian languages, such as Gujarati, Telugu, and Bengali, we leverage the csebuetnlp/mT5_multilin gual_XLSum model. mT5 is a multilingual variant of T5, designed to handle diverse languages by using a shared vocabulary and multilingual training data. For this task, we utilize XLSum, a highquality, mul tilingual dataset curated for abstractive summarization with approximately 1 million articlesummary pairs sourced from BBC News across 44 languages, including lowresource languages. XLSum empha sizes abstractive summarization with a high level of brevity, abstraction, and quality, as indicated by both human judgments and intrinsic metrics [22].

Hindi and Tamil Language Model: IndicBART For Hindi and Tamil summarization tasks, we em ploy ai4bharat/IndicBART, a multilingual sequencetosequence model focused on Indian languages. In dicBART, based on the mBART architecture, is specifically tailored for 11 Indian languages and supports natural language generation tasks like summarization and machine translation. This model has been pre trained on an extensive corpus of Indic languages, containing 452 million sentences and 9 billion tokens, where all languages are transcribed into the Devanagari script to facilitate crosslingual transfer learning. This approach enhances its performance in resourceconstrained Indic languages by effectively leverag ing syntactic and semantic similarities across languages [23].

5. Dataset Description

The dataset assigned to the ILSUM 2024 task is comprehensive and extends the groundwork set by pre vious editions, and support for three more Dravidian languages is added: Kannada, Tamil, and Telugu. Added datasets enhance the coverage of regional Indian languages in the text summarization space and continue the trend in previous years [24] . Each dataset is collected from main newspapers and arranged to support both extractive and abstractive summarization methodologies. The number of document summary pairs may well be a good basis for model formulation and evaluation related to this task.

A characteristic of this year's dataset is the prevalence of codemixing and scriptmixing, which poses a unique challenge to the language models. Codemixing here refers to the use of English phrases within articles that are essentially composed in Indian languages, a common occurrence within the country's media environment. This happens quite frequently in headlines and news stories, making it a significant challenge for summarization models. For example:

Example of codemixing in a news article: • IND vs SA, T20 તસવીરોમાં : વરસાદે વલન બની મજા બગાડી! (India vs SA, 5th T20 in pictures: rain spoils the match) • LIC IPO में पैसा लगाने वालाें का टूटा दल, आई एक और मुक़सानदे ह खबर (Investors of LIC IPO left broken hearted, yet another bad news) • Hubballi Special Trains: ಹుుಿಯಂದ ದೆಹಲಿ ಈ ನಗರ ಿವ ೕ ಷ ೖ ಲು ಆರಂಭ (Special train starts from

Hubballi to this city of the country)

The dataset is divided into separate CSV files for each language, which are Hindi, Gujarati, Bengali, Tamil, Telugu, Kannada, and English. Each file contains columns that represent the source text and its corresponding summary, which gives a strong foundation for training and testing the models. The integration of the three Dravidian languages is a big leap in this year's work, indicating an ongoing effort to increase the diversity of the language representation in the Indian language summarization task [25].

6. Method 6.1. Task Description

The task involves generating concise, informative, and fixedlength summaries for news articles in mul tiple Indian languages, addressing the complexities of codemixing and scriptmixing, where languages often blend within the same text. Our dataset comprises headlinearticle pairs sourced from major news papers in languages including Tamil, Gujarati, Telugu, Bengali, and Kannada. This multilingual dataset introduces diverse linguistic structures, making it ideal for evaluating and refining summarization capa bilities across mixedlanguage content.

6.2. Core Methodology

To address the challenges of multilingual summarization, we implemented a dual approach involving model finetuning and prompt engineering. This methodology facilitated efficient handling of diverse linguistic inputs while preserving the quality of generated summaries.

Model FineTuning: mT5, IndicBART, and BART

We finetuned three pretrained models—mT5, IndicBART, and BART—on our dataset containing thousands of documentsummary pairs across various languages. Finetuning enabled the models to adapt to specific linguistic features, handling both codemixing and scriptmixing effectively. • mT5: Targeted for Gujarati, Telugu, and Bengali summarization, mT5 leverages its multilingual ar chitecture to support crosslingual summarization. By finetuning mT5 on our dataset, we utilized its crosslingual transfer capabilities, which enhanced performance in lowerresource settings, par ticularly for languages like Tamil and Telugu, where English words often appear within the native language text. • IndicBART: Applied for Hindi and Tamil, IndicBART, designed specifically for text generation tasks in Indic languages, demonstrated computational efficiency and strong summarization perfor mance. Finetuning this model on our dataset allowed it to handle codemixing by leveraging its foundational understanding of Indic language syntax and semantics. • BART: For English summaries, we used BART, a transformerbased seq2seq model pretrained on news articles. Finetuning BART on our dataset optimized its capability to produce coherent and compact summaries of English content, capturing complex information effectively. Prompt Engineering for English Summarization

Alongside finetuning, we utilized prompt engineering specifically for English summarization to re duce computational overhead. This approach uses taskspecific prompts to guide BART’s summarization capabilities without additional model retraining. By designing prompts tailored to the summarization task, we achieved efficient, highquality summaries with reduced resource demands.

Example Prompts: • "Summarize the following article clearly and concisely, emphasizing the main facts, insights, and key points. Exclude extraneous details, aiming for a natural and humanlike flow. Target summary length: 4590 words." • "Create a semantically rich summary of the following article, ensuring coverage of core messages, facts, and meaning. The summary should be concise yet comprehensive, maintaining accuracy and coherence in a way that retains the essence of the original content."

Comparison of FineTuning and Prompt Engineering Approaches While finetuning produced slightly higher accuracy, prompt engineering was highly efficient, particularly for English articles, reduc ing computational time and resource usage. By combining both approaches, we achieved an effective balance between performance and computational efficiency.

Generated Summary Examples: Where: Where:

7.2. BERT Score

value offers." • FineTuning: "Despite significant investments in star players like Cristiano Ronaldo, Neymar, and Karim Benzema, the Saudi Pro League was unable to secure Lionel Messi, even after high • Prompt Engineering: "Lionel Messi expressed interest in joining Cristiano Ronaldo in the ‘pow erful’ Saudi Pro League before transferring to MLS. Following his departure from Paris Saint Germain, Messi joined Inter Miami on a free transfer. Recently, TIME magazine honored him as ‘Athlete of the Year.’"

7. Evaluation Metrics The summarization models are evaluated using ROUGE and BERT scores: 7.1. ROUGE (Recall-Oriented Understudy for Gisting Evaluation)

Measures ngram overlap between generated and reference summaries, capturing how much key infor mation from the reference is present.

ROUGEN = ∑Countmatched( gram)

∑Counttotal( gram) • Count_matched: Number of matching ngrams between the generated and reference summaries. • Count_total: Total number of ngrams in the reference summary.

Compares the semantic similarity between generated and reference summaries using BERT embeddings.

BERT Score = 1

∑ max cosine_similarity(BERT( ), BERT( )) • : Token in the generated summary. • : Token in the reference summary.

• Cosine similarity: Measures how similar the meaning of tokens is.

7.3. BERT Precision & Recall

Precision: Measures how much the generated summary matches the reference.

Where represents the tokens in the generated summary.

Recall: Measures how much of the reference is covered by the generated summary.

Precision =

∑ max cosine_similarity(BERT( ), BERT( )) Recall = ∑ max cosine_similarity(BERT( ), BERT( )) 1 || 1 ||

7.4. BERT F1 Score Balances precision and recall:

1 = 2 ×

Precision × Recall Precision + Recall

This evaluation combines both ngram overlap (ROUGE) and semantic understanding (BERT).

8. Results

The official results for the ILSUM 2024 [26] challenge demonstrate the strong performance of our team, Data Lovers, in multilingual summarization across several Indian languages. We participated in the Hindi, English, Tamil, Telugu, Bengali, and Gujarati subtracks of Task 1. Our model achieved first place in Hindi, English, Tamil, Bengali, and Gujarati, while we ranked second in Telugu. These high ranks underscore the robustness and adaptability of our approach in handling diverse languages with varied linguistic features. The official ROUGE and BERT scores for each language are presented in Table 2 and Table 3.

Table 2 highlights our ROUGE score performance across different languages. Our model achieved the highest ROUGE1 score in Hindi (0.3659) and English (0.3644), closely followed by Telugu (0.3022). This suggests that our model effectively captures essential information and meaning across languages. Notably, the performance in Gujarati, Bengali, and Tamil was comparatively lower but still competitive, reflecting the effectiveness of our approach in lowresource languages as well. The high ROUGEL scores, particularly in Hindi (0.3388) and English (0.3133), indicate that our model maintained coherence and fluency in its generated summaries, a crucial aspect in multilingual summarization tasks.

Table 3 shows the BERT scores, which measure semantic similarity between the generated summaries and reference summaries. The English subtrack achieved the highest BERTScoreF1 (0.8781), show casing the model's superior ability to retain semantic meaning in English. For Indian languages, Telugu (0.7532), Hindi (0.7396), and Gujarati (0.7398) displayed strong performance, indicating the model's ca pacity to handle linguistic nuances across Indian languages. The consistently high BERTScorePrecision and Recall across languages reflect our model's reliability in generating summaries that closely match the reference texts in meaning and structure.

In the English subtrack, we conducted experiments comparing two methodologies: finetuning and prompt engineering. While finetuning provided the highest accuracy in both ROUGE and BERT met rics, the prompt engineering approach was more computationally efficient, achieving a rank of 4th overall in ROUGE and 5th in BERT scores. Table 4 and Table 5 present a comparative analysis between the two methods. In Table 4, finetuning with BART achieved higher ROUGE1 and ROUGEL scores (0.3644 and 0.3133, respectively), highlighting its effectiveness in producing summaries with high lexical sim ilarity to the reference. Prompt engineering, despite lower ROUGE scores (ROUGE1 of 0.3238 and ROUGEL of 0.2806), demonstrated considerable potential for applications requiring lower computa tional costs without substantial quality tradeoffs.

As shown in Table 5, finetuning outperformed prompt engineering in terms of BERTScoreF1 (0.8781 compared to 0.8687), indicating its superior ability to maintain semantic fidelity. However, prompt engi neering achieved comparable BERTScoreRecall (0.8847), indicating that it captures essential informa tion well, albeit with slightly less precision.

Overall, our approach effectively balances performance and computational efficiency, making it adapt able for diverse use cases. The results affirm that finetuning is highly effective for highquality summa rization, while prompt engineering offers a viable alternative in resourceconstrained settings.

9. Conclusion & Future Work

In this study, we evaluated several large language models, including mT5, IndicBART, and BART, on the task of generating fixedlength summaries of news articles in multiple Indian languages. Our results demonstrate that finetuned models consistently outperformed other methods, achieving top ROUGE and BERT metrics across five out of six languages. Specifically, finetuning yielded ROUGE1 scores as high as 0.3659 for Hindi and 0.3644 for English, while BERTScoreF1 reached 0.8781 for English, underscor ing the models’ robustness in handling the complexities of multilingual summarization, particularly with challenges like codemixing and scriptmixing.

This research also highlights the potential of prompt engineering, especially for English summariza tion, where it achieved competitive BERTScoreRecall (0.8847) and a notable 4th rank in ROUGE scores. Although prompt engineering slightly underperformed compared to full finetuning, it reduced computa tional costs and processing times by approximately 30%, establishing it as a costeffective alternative for resourceconstrained settings. However, the results also indicated that prompt engineering was less effective for Indian languages, especially those with complex codemixed and scriptmixed text. Future efforts should focus on finetuning promptbased methods specifically for Indian languages to improve performance in mixedlanguage contexts.

Future work will explore larger, more advanced models like GPT4, LLaMA 2, and BLOOM to further improve accuracy and efficiency, particularly for lowresource languages. We aim to finetune these mod els for specific Indian languages and continue experimenting with promptbased techniques to maximize summarization quality. Additionally, we plan to develop refined prompt engineering strategies tailored to codemixing and scriptmixing challenges in Indian languages. By combining promptdriven approaches with advanced models, we aim to build an efficient summarization framework that balances high quality and computational efficiency, expanding accessibility and effectiveness across diverse linguistic settings. 10. Declaration on Generative AI

No generative AI tools were used during the preparation of this work.

[22] T. Hasan, A. Bhattacharjee, M. S. Islam, K. Samin, Y.F. Li, Y.B. Kang, M. S. Rahman, R. Shahri yar, Xlsum: Largescale multilingual abstractive summarization for 44 languages, arXiv preprint arXiv:2106.13822 (2021). [23] R. Dabre, H. Shrotriya, A. Kunchukuttan, R. Puduppully, M. M. Khapra, P. Kumar, Indicbart: A pretrained model for indic natural language generation, arXiv preprint arXiv:2109.02903 (2021). [24] S. Satapara, P. Mehta, S. Modha, D. Ganguly, Indian language summarization at fire 2023, in: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation, 2023, pp. 2729. [25] S. Satapara, P. Mehta, S. Modha, A. Hegde, S. HL, D. Ganguly, Overview of the third shared task on indian language summarization (ilsum 2024), in: K. Ghosh, T. Mandl, P. Majumder, D. Ganguly (Eds.), Working Notes of FIRE 2024 Forum for Information Retrieval Evaluation, Gandhinagar, India. December 1215, 2024, CEUR Workshop Proceedings, CEURWS.org, 2024. [26] S. Satapara, P. Mehta, S. Modha, A. Hegde, S. HL, D. Ganguly, Key insights from the third ilsum track at fire 2024, in: Proceedings of the 16th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2024, Gandhiinagar, India. December 1215, 2024, ACM, 2024.

[1]

Jangra ,

Mukherjee ,

Jatowt ,

Saha ,

Hasanuzzaman , A survey on multimodal summarization , ACM Computing Surveys 55 ( 2023 ) 1 36 .

[2]

Y. J.

Choi ,

Lee ,

Lee , Toward a multilingual conversational agent: Challenges and expectations of codemixing multilingual users , in: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems , 2023 , pp. 1 17 .

[3]

Ladhak ,

Durmus ,

Cardie ,

McKeown , Wikilingua: A new benchmark dataset for crosslingual abstractive summarization , arXiv preprint arXiv: 2010 . 03093 ( 2020 ).

[4]

Sankar ,

Suraj ,

Reddy ,

Toshniwal ,

Agarwal , Iitroorkee@ smm4h 2024 cross platform age detection in twitter and reddit using transformerbased model , in: Proceedings of The 9th Social Media Mining for Health Research and Applications (SMM4H 2024 ) Workshop and Shared Tasks, 2024 , pp. 101 105 .

[5]

Agarwal ,

Toshniwal , Identifying leadership characteristics from social media data during natural hazards using personality traits , Scientific reports 10 ( 2020 ) 2624 .

[6]

Didwania ,

Seth ,

Kasliwal ,

Agarwal , Agrillm: Harnessing transformers for farmer queries , arXiv preprint arXiv:2407.04721 ( 2024 ).

[7]

Agarwal ,

Gupta ,

Bhatt ,

Mittal , Construction of a semiautomated model for faq retrieval via short message service , in: Proceedings of the 7th Annual Meeting of the Forum for Information Retrieval Evaluation , 2015 , pp. 35 38 .

[8]

Lalitha ,

Ramani ,

Shahida ,

E. V. S.

Deepak ,

M. H.

Bindu ,

Shaikshavali , Text summarization of medical documents using abstractive techniques , in: 2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC) , IEEE, 2023 , pp. 939 943 .

[9]

Gowhar ,

Sharma ,

A. K.

Gupta ,

A. K.

Madasamy , Advancing humanlike summarization: Approaches to text summarization ., in: FIRE (Working Notes) , 2023 , pp. 747 754 .

[10]

Jewani ,

Damankar ,

Janyani ,

Mhatre ,

Gangwani , A brief study on approaches for extractive summarization , in: 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS) , IEEE, 2021 , pp. 601 608 .

[11]

Agrawal ,

Jain , Divanshi,

Seeja , Text summarisation using bert , in: International Conference On Innovative Computing And Communication , Springer, 2023 , pp. 229 242 .

[12]

Paulus , A deep reinforced model for abstractive summarization , arXiv preprint arXiv:1705.04304 ( 2017 ).

[13]

Rahman ,

F. M.

Rafiq ,

Saha ,

Rafian , Bengali text summarization using TextRank, Fuzzy Cmeans and aggregated scoring techniques , Ph.D. thesis , BRAC University, 2018 .

[14]

Kumari ,

Kumari , An extractive approach for automated summarization of indian languages using clustering techniques ., in: FIRE (Working Notes) , 2022 , pp. 418 423 .

[15]

Satapara ,

Modha ,

Mehta , Fire 2022 ilsum track: Indian language summarization , in: Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation , 2022 , pp. 8 11 .

[16]

Satapara ,

Modha ,

Mehta , Findings of the first shared task on indian language summarization (ilsum): Approaches challenges and the path ahead ., in: FIRE (Working Notes) , 2022 , pp. 369 382 .

[17]

Singh ,

J. P.

Singh ,

Deepak , Deep learning based abstractive summarization for english language ., in: FIRE (Working Notes) , 2022 , pp. 383 392 .

[18]

Agarwal ,

Naik ,

S. S.

Sonawane , Abstractive text summarization for hindi language using indicbart ., in: FIRE (Working Notes) , 2022 , pp. 409 417 .

[19]

Urlana ,

S. M.

Bhatt ,

Surange ,

Shrivastava , Indian language summarization using pretrained sequencetosequence models , arXiv preprint arXiv:2303.14461 ( 2023 ).

[20]

Satapara ,

Mehta ,

Modha ,

Ganguly , Key takeaways from the second shared task on indian language summarization (ilsum 2023 )., in: FIRE (Working Notes), 2023 , pp. 724 733 .

[21]

Lewis , Bart: Denoising sequencetosequence pretraining for natural language generation, translation, and comprehension , arXiv preprint arXiv: 1910 . 13461 ( 2019 ).