=Paper= {{Paper |id=Vol-3681/T8-1 |storemode=property |title=Key Takeaways from the Second Shared Task on Indian Language Summarization (ILSUM 2023) |pdfUrl=https://ceur-ws.org/Vol-3681/T8-1.pdf |volume=Vol-3681 |authors=Shrey Satapara,Parth Mehta,Sandip Modha,Debasis Ganguly |dblpUrl=https://dblp.org/rec/conf/fire/Satapara0MG23a }} ==Key Takeaways from the Second Shared Task on Indian Language Summarization (ILSUM 2023)== https://ceur-ws.org/Vol-3681/T8-1.pdf
                                Key Takeaways from the Second Shared Task on
                                Indian Language Summarization (ILSUM 2023)
                                Shrey Satapara1 , Parth Mehta2 , Sandip Modha3 and Debasis Ganguly4
                                1
                                  Indian Institute of Technology Hyderabad, India
                                2
                                  Parmonic, USA
                                3
                                  LDRP-ITR, Gandhinagar, India
                                4
                                  University of Glasgow, Scotland, UK


                                                                         Abstract
                                                                         This paper provides an overview of the second edition of the shared task on Indian Language Summa-
                                                                         rization (ILSUM) organized at the 15th Forum for Information Retrieval Evaluation (FIRE 2023). This
                                                                         edition builds upon ILSUM 2022 by creating additional benchmark data for text summarization in Indian
                                                                         languages. Apart from expanding the datasets of the three languages from the previous edition, namely
                                                                         Hindi, Gujarati and Indian English, a new Bengali dataset was introduced this year. In addition to this, a
                                                                         new misinformation detection subtask was introduced. ILSUM 2023 saw an enthusiastic response, with
                                                                         registrations from over 35 teams. A total of 6 teams submitted runs across both subtasks and 4 teams
                                                                         submitted working notes. Standard ROUGE metrics as well as Bert-score were used as the evaluation
                                                                         metric for the summarization subtask, while macro-F1 score was used for the misinformation detection
                                                                         subtask.

                                                                         Keywords
                                                                         Automatic Text Summarization, Indian Languages, Headline Generation, Misinformation Detection




                                1. Introduction
                                The second shared task on Indian Language Summarization was continuation of the efforts
                                for bridging the gap in progress of NLP research between resource-rich languages like English,
                                Spanish, Chinese, etc as opposed to more resource-constrained Indian languages. Platforms
                                like the Forum for Information Retrieval Evaluation (FIRE)[1] has been consistently trying
                                to bridge this gap by building reusable and open source test collections. The progress has
                                been noteworthy in several language-dependent tasks like hate speech detection[2, 3, 4, 5,
                                6], Sentiment analysis[7, 8], mixed script IR[9, 10], Fake news detection[11, 12], authorship
                                attribution[13, 14] as well as language independent tasks like Indian legal document retrieval
                                and summarization[15, 16, 17, 18, 19, 20], IR from microblogs[21], IR for software engineering[22],
                                etc. Several large-scale datasets and pre-trained models have become publicly available. AI-


                                Forum for Information Retrieval Evaluation, December 15-18, 2023, India
                                £ shreysatapara@gmail.com (S. Satapara); parth.mehta126@gmail.com (P. Mehta); sjmodha@gmail.com
                                (S. Modha); debforit@gmail.com (D. Ganguly)
                                Ȉ 0000-0001-6222-1288 (S. Satapara); 0000-0002-4509-1298 (P. Mehta); 0000-0003-2427-2433 (S. Modha);
                                0000-0003-0050-7138 (D. Ganguly)
                                                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
4BHARAT1 is another initiative that is playing a pivotal role in bridging this gap, especially in
machine translation and Indian language LLMs.

   With the series of ILSUM tasks[23, 24, 25] we aim to replicate this for Automatic text sum-
marization where research is skewed towards English[26, 27, 28] and other resource-rich lan-
guages, while the focus on other resource-poor languages is almost negligible[29]. Previous
attempts at building test collections for Indian language summarization were limited in scope
with at most a few dozen documents[30, 31, 32, 33, 34, 35]. Moreover, most of these datasets
are either not public or are too small to be useful. In contrast, ILSUM 2023 dataset consists
of 20,000 article-summary pairs for Hindi, Gujarati, Bengali and Indian Languages. Table 1
presents the details of the ILSUM dataset. The task is to generate a meaningful summary, ei-
ther extractive or abstractive, for each article.

   We also introduce a new subtask on misinformation detection in LLM generated summaries.
This subtask was limited to Indian English in the current edition. The recent success in lan-
guage generation capabilities of large language models (LLMs) [36], such as GPT [37], Llama
[38] etc., has raised concerns about their possible misuse for generating fake news and spread-
ing misinformation. This problem can easily extend to summaries where instead of fabricating
an entire story, miscreants can use a real new article and generate a summary tailored to suit
their purpose. In this subtask participants are given a machine generated summary and the
task is to identify if the content in the summary are correct, or if they fall into one of four cat-
egories of misinformation namely incorrect numerical quantities, fabrication, false attribution
or misrepresentation. Both subtasks are explained in detail in the next section, followed by a
description of the approaches used by the participating teams.


2. Task Definition
The second shared task on Indian Language Summarization continued the effort of creating
benchmark datasets for text summarization in Indian languages. The current edition saw the
inclusion of Bengali alongside Hindi, Gujarati and Indian English. Bengali is one of the most
widely spoken languages in the world with over 250 million speakers, the majority of them from
India and Bangladesh. Datasets for all languages in ILSUM 2022[cite ilsum] were extended to
include more articles and summaries. Apart from this we also introduced a new subtask on
misinformation detection in machine-generated summaries. In the following subsections, we
discuss in detail both tasks and the corresponding datasets.

2.1. Task 1 Text Summarization For Indian Languages
The objective of this task is the same as the first edition of ILSUM, which follows the standard
definition of text summarization task (Given an article, participants are asked to generate a
fixed-length summary in either an abstractive or extractive way). This year, we extended by


   1
       https://ai4bharat.iitm.ac.in
adding approximately 15000 more articles on top of the previous edition’s dataset and intro-
duced one more language. As introduced in the previous edition, the current dataset poses
a unique challenge of code-mixing and script mixing. It is very common for news articles to
borrow phrases from English, even if the article itself is written in an Indian Language.
  Examples like these are a common occurrence both in the headlines as well as in the articles.
    • Gujarati: ”IND vs SA, 5મી T20 તસવીરોમાં: વરસાદે િવલન બની મજા બગાડી” (India vs SA,
      5th T20 in pictures: rain spoils the match)

    • Hindi: ”LIC के IPO में पैसा लगाने वालों का टू टा िदल, आई एक और नुकसानदेह खबर” (Investors
      of LIC IPO left broken hearted, yet another bad news)


                           Language     Training Set     Test Set   Total
                             Hindi         21225           3000     24225
                            Gujarati       33630           2999     36629
                            Bengali        12356           2951     15307
                            English        28342           2895     31237
Table 1
Training and Test Data Distribution for Different Languages in Task 1



2.2. Task 2 Detecting Factual Incorrectness in Machine-Generated
     Summaries
This task aims to identify incorrectness in machine-generated summaries, which is an impor-
tant step in ensuring the reliability and accuracy of information. While evaluating these sum-
maries against the original article, the key focus is to detect and classify different types of
incorrectness. In this task, we provided the dataset with four different types of inaccuracies
along with a fifth class containing correct summaries. We use the GPT-4 model to generate
incorrect summaries of each class, and the GPT-3.5 model to produce the correct summaries
using carefully crafted prompts to generate automatic summaries for each type of incorrect-
ness without any manual intervention. Following are the types of incorrectness present in the
dataset. Detailed description of how the dataset was created is available in [39].

    • Misrepresentation: This involves presenting information in a way that is misleading
      or that gives a false impression. This could be done by exaggerating certain aspects,
      understating others, or twisting facts to fit a particular narrative.
    • Inaccurate Quantities or Measurements: Factual incorrectness can occur when pre-
      cise quantities, measurements, or statistics are misrepresented, whether through error
      or intent.
    • False Attribution: Incorrectly attributing a statement, idea, or action to a person or
      group is another form of factual incorrectness.
    • Fabrication: Making up data, sources, or events is a severe form of factual incorrectness.
      This involves creating ”facts” that have no basis in reality.
   For this task, text articles and generated summaries are provided with one associated la-
bel of the type of incorrectness in training data. Still, participants are asked to predict all
possible labels associated with text summaries in test data, as one summary can have mul-
tiple types of incorrectness. Example Article with all types of incorrectness is available at
https://ilsum.github.io/ilsum/2023/index.html. Table 2 contains dataset statistics for Task 2
dataset. The class predictions on test data are evaluated using Macro F1 score.
                              Class           Training Set    Test Set    Total
                      Misrepresentation           294            25        319
                      Inaccurate Quantities       195            10        205
                      False Attribution           250            13        263
                      Fabrication                 250            32        282
                      Correct                     5000          143       5143
Table 2
Task 2 Dataset Statistics



3. Results and Disussion
In this section, we discuss results of the participating teams. Compared to the last edition,
where we only used the ROUGE score for evaluation, we added another ranking based on the
BERT Score for a fair evaluation of abstractive summaries. However, we observe very high
co-relation between the BERT score and ROUGE. Especially the rankings of the system are
exactly same irrespective of the choice of metric. Below we report the results and approaches
used for each of the task and language.

3.1. Task 1 Hindi
For text summarization in Hindi two teams submitted a total of 6 runs. Team Irlab-IITBHU
utilized name entity-aware text summarization, NER emerges as important factor to extract
in-depth information and prioritising key entities for the summary by utilizing a pre-trained
Muril-based HindiNER model and fine-tuning MBART-50(rank 1), mT5 with name entities(rank
2), IndicBART(rank 3), IndicBARTSS(rank 4) and indicBART with name entities(rank 6). Table 3
contains results of all submissions for text summarization in Hindi.
                                     BERT SCORE                          ROUGE (F1 Scores)
 rank    Team Name
                            Precision Recall F1 Score        Rouge-1     Rouge-2 Rouge-4     Rouge-L
   1     Irlab-IITBHU         0.8226    0.8048  0.813         0.5625      0.4715   0.4032     0.5373
   2     Irlab-IITBHU         0.797     0.8073  0.8017        0.5409      0.4592   0.4007     0.5153
   3     Irlab-IITBHU         0.8085    0.7948  0.8008        0.5359      0.4551   0.3973     0.5128
   4     Irlab-IITBHU         0.8005    0.8003  0.7998        0.5328      0.4496   0.3912     0.5084
   5      BITS Pilani         0.7609     0.682  0.7186        0.2988      0.1707   0.1196     0.2476
   6     Irlab-IITBHU         0.7153    0.7037  0.7089        0.2801      0.1568   0.0836     0.2423
Table 3
Performance of teams on Language summarization in Hindi
3.2. Task 1 Gujarati and Bengali
For Gujarati and Bengali text summarisation, only one team submitted only one submission.
Team BITS Pilani fine-tuned mT5(mT5-multilingual-XLSum) model on the ILSUM dataset for
all four languages. Results for text summarization in Gujarati and Bengali are available in
Table 4 and Table 5
                                 BERT SCORE                          ROUGE (F1 Scores)
 rank    Team Name
                        Precision Recall F1 Score         Rouge-1    Rouge-2 Rouge-4        Rouge-L
   1      BITS Pilani     0.7423    0.688   0.7135         0.174      0.0747   0.0333        0.1655
Table 4
Performance of teams on Language summarization in Gujarati


                                 BERT SCORE                          ROUGE (F1 Scores)
 rank    Team Name
                        Precision Recall F1 Score         Rouge-1    Rouge-2 Rouge-4        Rouge-L
   1      BITS Pilani     0.7058    0.6554  0.679           0.12      0.0567   0.0254        0.1087
Table 5
Performance of teams on Language summarization in Bengali



3.3. Task 1 English
For English, four teams submitted one run each. Team NITK - AI outperformed other teams
where they fine-tuned T5-base on ILSUM English dataset. Team Eclipse also fine-tuned T5-
base model standing second on the leaderboard. Results of all four submissions by all teams
are available in Table 6.
                                 BERT SCORE                          ROUGE (F1 Scores)
 rank    Team Name
                        Precision Recall F1 Score         Rouge-1    Rouge-2 Rouge-4        Rouge-L
   1       NITK - AI      0.8752    0.8684  0.8716         0.3321     0.1731    0.121         0.282
   2        Eclipse       0.8505    0.8733  0.8616         0.3022     0.1111    0.042        0.2504
   3      BITS Pilani     0.8724    0.8462  0.8589         0.2354     0.0604   0.0147         0.182
   4         ASH          0.8277    0.8036  0.8153          0.137      0.017   0.0004        0.1181
Table 6
Performance of teams on Language summarization in English



3.4. Task 2 Detecting Factual Incorrectness in Machine-Generated
     Summaries
In this subtask, only one team submitted five runs, exploring zero-shot prompting using GPT-
3.5 Turbo. Where they explored zero-shot prompting to identify if an article and summary pair
belong to a particular class or not with different order of classes. The best result they obtained
was by using an ensemble of predictions from all four different class orders they explored. The
results obtained on this task are available in Table 7
                                        Class           F1 Score
                                      Fabrication        0.152
                                   False Attribution     0.093
                                 Incorrect Quantities    0.291
                                  Misrepresentation      0.335
                                     MACRO F1            0.527
Table 7
Performance of the participation team on Misinformation detection task


4. Conclusion and Future Work
The Indian Language Summarization (ILSUM) track at FIRE 2023 continued the efforts to cre-
ate benchmark corpora for text summarization in Indian languages. Two major updates from
last year were inclusion of Bengali in the summarization task, and inclusion of a new subtask
on misinformation detection in machine generated summaries. Like previous edition major-
ity of the summarization systems for task 1 were based on pre-trained large language models
like MT5, MBart, and IndicBART. A notable exception was the approach proposed by IIT-BHU
who used a combination of NER and pretrained language models. It was also the best perform-
ing approach for Hindi, highlighting scope for improvements over pre-trained LLMs. In the
next edition of the ILSUM we plan to extend the summarization subtask to new languages, es-
pecially Dravidian languages. For the misinformation detection subtask we aim at providing
fine-grain annotations about the part of summaries which are factually incorrect instead of
simply labelling the entire summary as incorrect.


References
 [1] P. Mehta, T. Mandl, P. Majumder, S. Gangopadhyay, Report on the FIRE 2020 evaluation
     initiative, SIGIR Forum 55 (2021) 3:1–3:11. URL: https://doi.org/10.1145/3476415.3476418.
     doi:10.1145/3476415.3476418 .
 [2] T. Mandl, S. Modha, G. K. Shahi, H. Madhu, S. Satapara, P. Majumder, J. Schäfer, T. Ranas-
     inghe, M. Zampieri, D. Nandini, A. K. Jaiswal, Overview of the HASOC subtrack at
     FIRE 2021: Hatespeech and offensive content identification in english and indo-aryan
     languages, in: P. Mehta, T. Mandl, P. Majumder, M. Mitra (Eds.), Working Notes of FIRE
     2021 - Forum for Information Retrieval Evaluation, Gandhinagar, India, December 13-17,
     2021, volume 3159 of CEUR Workshop Proceedings, CEUR-WS.org, 2021, pp. 1–19. URL:
     http://ceur-ws.org/Vol-3159/T1-1.pdf.
 [3] T. Mandl, S. Modha, G. K. Shahi, A. K. Jaiswal, D. Nandini, D. Patel, P. Majumder, J. Schäfer,
     Overview of the HASOC track at FIRE 2020: Hate speech and offensive content identifi-
     cation in indo-european languages, in: P. Mehta, T. Mandl, P. Majumder, M. Mitra (Eds.),
     Working Notes of FIRE 2020 - Forum for Information Retrieval Evaluation, Hyderabad,
     India, December 16-20, 2020, volume 2826 of CEUR Workshop Proceedings, CEUR-WS.org,
     2020, pp. 87–111. URL: http://ceur-ws.org/Vol-2826/T2-1.pdf.
 [4] S. Modha, T. Mandl, P. Majumder, D. Patel, Overview of the HASOC track at FIRE
     2019: Hate speech and offensive content identification in indo-european languages, in:
     P. Mehta, P. Rosso, P. Majumder, M. Mitra (Eds.), Working Notes of FIRE 2019 - Fo-
     rum for Information Retrieval Evaluation, Kolkata, India, December 12-15, 2019, vol-
     ume 2517 of CEUR Workshop Proceedings, CEUR-WS.org, 2019, pp. 167–190. URL: http:
     //ceur-ws.org/Vol-2517/T3-1.pdf.
 [5] H. Madhu, S. Satapara, S. Modha, T. Mandl, P. Majumder, Detecting offensive speech in
     conversational code-mixed dialogue on social media: A contextual dataset and benchmark
     experiments, Expert Systems with Applications (2022) 119342.
 [6] S. Modha, P. Majumder, T. Mandl, C. Mandalia, Detecting and visualizing hate speech in
     social media: A cyber watchdog for surveillance, Expert Syst. Appl. 161 (2020) 113725.
     URL: https://doi.org/10.1016/j.eswa.2020.113725. doi:10.1016/j.eswa.2020.113725 .
 [7] M. Subramanian, R. Ponnusamy, S. Benhur, K. Shanmugavadivel, A. Ganesan, D. Ravi,
     G. K. Shanmugasundaram, R. Priyadharshini, B. R. Chakravarthi, Offensive language
     detection in tamil youtube comments by adapters and cross-domain knowledge trans-
     fer, Comput. Speech Lang. 76 (2022) 101404. URL: https://doi.org/10.1016/j.csl.2022.101404.
     doi:10.1016/j.csl.2022.101404 .
 [8] B. R. Chakravarthi, P. K. Kumaresan, R. Sakuntharaj, A. K. Madasamy, S. Thavareesan,
     B. Premjith, S. K, S. C. Navaneethakrishnan, J. P. McCrae, T. Mandl, Overview of the
     hasoc-dravidiancodemix shared task on offensive language detection in tamil and malay-
     alam, in: P. Mehta, T. Mandl, P. Majumder, M. Mitra (Eds.), Working Notes of FIRE
     2021 - Forum for Information Retrieval Evaluation, Gandhinagar, India, December 13-17,
     2021, volume 3159 of CEUR Workshop Proceedings, CEUR-WS.org, 2021, pp. 589–602. URL:
     http://ceur-ws.org/Vol-3159/T3-1.pdf.
 [9] S. Banerjee, K. Chakma, S. K. Naskar, A. Das, P. Rosso, S. Bandyopadhyay, M. Choudhury,
     Overview of the mixed script information retrieval (MSIR) at FIRE-2016, in: P. Majumder,
     M. Mitra, P. Mehta, J. Sankhavara, K. Ghosh (Eds.), Working notes of FIRE 2016 - Forum
     for Information Retrieval Evaluation, Kolkata, India, December 7-10, 2016, volume 1737
     of CEUR Workshop Proceedings, CEUR-WS.org, 2016, pp. 94–99. URL: http://ceur-ws.org/
     Vol-1737/T3-1.pdf.
[10] P. Gupta, K. Bali, R. E. Banchs, M. Choudhury, P. Rosso, Query expansion for mixed-
     script information retrieval, in: S. Geva, A. Trotman, P. Bruza, C. L. A. Clarke, K. Järvelin
     (Eds.), The 37th International ACM SIGIR Conference on Research and Development
     in Information Retrieval, SIGIR ’14, Gold Coast , QLD, Australia - July 06 - 11, 2014,
     ACM, 2014, pp. 677–686. URL: https://doi.org/10.1145/2600428.2609622. doi:10.1145/
     2600428.2609622 .
[11] M. Amjad, G. Sidorov, A. Zhila, Data augmentation using machine translation for fake
     news detection in the urdu language, in: N. Calzolari, F. Béchet, P. Blache, K. Choukri,
     C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno,
     J. Odijk, S. Piperidis (Eds.), Proceedings of The 12th Language Resources and Evalua-
     tion Conference, LREC 2020, Marseille, France, May 11-16, 2020, European Language Re-
     sources Association, 2020, pp. 2537–2542. URL: https://aclanthology.org/2020.lrec-1.309/.
[12] M. Amjad, N. Ashraf, A. Zhila, G. Sidorov, A. Zubiaga, A. F. Gelbukh, Threat-
     ening language detection and target identification in urdu tweets,             IEEE Access
     9 (2021) 128302–128313. URL: https://doi.org/10.1109/ACCESS.2021.3112500. doi:10.
     1109/ACCESS.2021.3112500 .
[13] P. Mehta, P. Majumder, Optimum parameter selection for K.L.D. based authorship attribu-
     tion in gujarati, in: Sixth International Joint Conference on Natural Language Processing,
     IJCNLP 2013, Nagoya, Japan, October 14-18, 2013, Asian Federation of Natural Language
     Processing / ACL, 2013, pp. 1102–1106. URL: https://aclanthology.org/I13-1155/.
[14] P. Mehta, P. Majumder, Large scale quantitative analysis of three indo-aryan languages, J.
     Quant. Linguistics 23 (2016) 109–132. URL: https://doi.org/10.1080/09296174.2015.1071151.
     doi:10.1080/09296174.2015.1071151 .
[15] P. Bhattacharya, K. Ghosh, S. Ghosh, A. Pal, P. Mehta, A. Bhattacharya, P. Majumder,
     Overview of the FIRE 2019 AILA track: Artificial intelligence for legal assistance, in:
     P. Mehta, P. Rosso, P. Majumder, M. Mitra (Eds.), Working Notes of FIRE 2019 - Forum
     for Information Retrieval Evaluation, Kolkata, India, December 12-15, 2019, volume 2517
     of CEUR Workshop Proceedings, CEUR-WS.org, 2019, pp. 1–12. URL: http://ceur-ws.org/
     Vol-2517/T1-1.pdf.
[16] P. Bhattacharya, P. Mehta, K. Ghosh, S. Ghosh, A. Pal, A. Bhattacharya, P. Majumder, FIRE
     2020 AILA track: Artificial intelligence for legal assistance, in: P. Majumder, M. Mitra,
     S. Gangopadhyay, P. Mehta (Eds.), FIRE 2020: Forum for Information Retrieval Evaluation,
     Hyderabad, India, December 16-20, 2020, ACM, 2020, pp. 1–3. URL: https://doi.org/10.
     1145/3441501.3441510. doi:10.1145/3441501.3441510 .
[17] V. Parikh, U. Bhattacharya, P. Mehta, A. Bandyopadhyay, P. Bhattacharya, K. Ghosh,
     S. Ghosh, A. Pal, A. Bhattacharya, P. Majumder, AILA 2021: Shared task on artificial in-
     telligence for legal assistance, in: D. Ganguly, S. Gangopadhyay, M. Mitra, P. Majumder
     (Eds.), FIRE 2021: Forum for Information Retrieval Evaluation, Virtual Event, India, De-
     cember 13 - 17, 2021, ACM, 2021, pp. 12–15. URL: https://doi.org/10.1145/3503162.3506571.
     doi:10.1145/3503162.3506571 .
[18] V. Parikh, V. Mathur, P. Mehta, N. Mittal, P. Majumder, Lawsum: A weakly supervised
     approach for indian legal document summarization, CoRR abs/2110.01188 (2021). URL:
     https://arxiv.org/abs/2110.01188. arXiv:2110.01188.
[19] S. Ghosh, A. Wyner, Identification of rhetorical roles of sentences in indian legal judg-
     ments, in: Legal Knowledge and Information Systems: JURIX 2019: The Thirty-second
     Annual Conference, volume 322, IOS Press, 2019, p. 3.
[20] S. Parashar, N. Mittal, P. Mehta, Casrank: A ranking algorithm for legal statute retrieval,
     Multimedia Tools and Applications (2023) 1–18.
[21] M. Basu, S. Ghosh, K. Ghosh, Overview of the fire 2018 track: Information retrieval from
     microblogs during disasters (irmidis), in: Proceedings of the 10th annual meeting of the
     Forum for Information Retrieval Evaluation, 2018, pp. 1–5.
[22] S. Majumdar, A. Bandyopadhyay, S. Chattopadhyay, P. P. Das, P. D. Clough, P. Majumder,
     Overview of the irse track at fire 2022: Information retrieval in software engineering, in:
     Forum for Information Retrieval Evaluation, ACM, 2022.
[23] S. Satapara, B. Modha, S. Modha, P. Mehta, Fire 2022 ilsum track: Indian language summa-
     rization, in: Proceedings of the 14th Forum for Information Retrieval Evaluation, ACM,
     2022.
[24] S. Satapara, P. Mehta, S. Modha, D. Ganguly, Indian language summarization at fire 2023,
      in: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Eval-
      uation, FIRE 2023, Goa, India. December 15-18, 2023, ACM, 2023.
[25] S. Satapara, B. Modha, S. Modha, P. Mehta, Findings of the first shared task on in-
      dian language summarization (ILSUM): approaches challenges and the path ahead, in:
      K. Ghosh, T. Mandl, P. Majumder, M. Mitra (Eds.), Working Notes of FIRE 2022 - Fo-
      rum for Information Retrieval Evaluation, Kolkata, India, December 9-13, 2022, vol-
      ume 3395 of CEUR Workshop Proceedings, CEUR-WS.org, 2022, pp. 369–382. URL: https:
     //ceur-ws.org/Vol-3395/T6-1.pdf.
[26] P. Mehta, From extractive to abstractive summarization: A journey, in: H. He, T. Lei,
     W. Roberts (Eds.), Proceedings of the ACL 2016 Student Research Workshop, Berlin, Ger-
      many, August 7-12, 2016, Association for Computational Linguistics, 2016, pp. 100–106.
      URL: https://doi.org/10.18653/v1/P16-3015. doi:10.18653/v1/P16- 3015 .
[27] P. Mehta, P. Majumder, Effective aggregation of various summarization techniques, Inf.
      Process. Manag. 54 (2018) 145–158. URL: https://doi.org/10.1016/j.ipm.2017.11.002. doi:10.
     1016/j.ipm.2017.11.002 .
[28] S. Modha, P. Majumder, T. Mandl, R. Singla, Design and analysis of microblog-based
      summarization system, Social Network Analysis and Mining 11 (2021) 1–16. URL: https:
     //doi.org/10.1007/s13278-021-00830-3.
[29] S. Sinha, G. N. Jha, An overview of indian language datasets used for text summa-
      rization, CoRR abs/2203.16127 (2022). URL: https://doi.org/10.48550/arXiv.2203.16127.
      doi:10.48550/arXiv.2203.16127 . arXiv:2203.16127.
[30] S. Barve, S. Desai, R. Sardinha, Query-based extractive text summarization for san-
      skrit, in: S. Das, T. Pal, S. Kar, S. C. Satapathy, J. K. Mandal (Eds.), Proceedings of
      the 4th International Conference on Frontiers in Intelligent Computing: Theory and
     Applications, FICTA 2015, Durgapur, India, 16-18 November 2015, volume 404 of Ad-
     vances in Intelligent Systems and Computing, Springer, 2015, pp. 559–568. URL: https:
     //doi.org/10.1007/978-81-322-2695-6_47. doi:10.1007/978- 81- 322- 2695- 6\_47 .
[31] R. R. Chowdhury, M. T. Nayeem, T. T. Mim, M. S. R. Chowdhury, T. Jannat, Unsuper-
     vised abstractive summarization of bengali text documents, in: P. Merlo, J. Tiedemann,
      R. Tsarfaty (Eds.), Proceedings of the 16th Conference of the European Chapter of the
     Association for Computational Linguistics: Main Volume, EACL 2021, Online, April 19
     - 23, 2021, Association for Computational Linguistics, 2021, pp. 2612–2619. URL: https:
     //doi.org/10.18653/v1/2021.eacl-main.224. doi:10.18653/v1/2021.eacl- main.224 .
[32] J. D’Silva, U. Sharma, Development of a konkani language dataset for automatic text
      summarization and its challenges, International Journal of Engineering Research and
     Technology. International Research Publication House. ISSN (2019) 0974–3154.
[33] V. R. Embar, S. R. Deshpande, A. Vaishnavi, V. Jain, J. S. Kallimani, saramsha-a kannada
      abstractive summarizer, in: 2013 International Conference on Advances in Computing,
      Communications and Informatics (ICACCI), IEEE, 2013, pp. 540–544.
[34] S. Gandotra, B. Arora, Feature selection and extraction for dogri text summarization, in:
      Rising Threats in Expert Applications and Solutions, Springer, 2021, pp. 549–556.
[35] R. Kabeer, S. M. Idicula, Text summarization for malayalam documents - an experience,
      in: International Conference on Data Science & Engineering, ICDSE 2014, Kochi, India,
     August 26-28, 2014, IEEE, 2014, pp. 145–150. URL: https://doi.org/10.1109/ICDSE.2014.
     6974627. doi:10.1109/ICDSE.2014.6974627 .
[36] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, Improving language un-
     derstanding with unsupervised learning, 2018. URL: https://openai.com/research/
     language-unsupervised.
[37] OpenAI, Gpt-4 technical report, 2023. arXiv:2303.08774.
[38] R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li, C. Guestrin, P. Liang, T. B. Hashimoto,
     Stanford Alpaca: An Instruction-following LLaMA model, 2023. URL: https://github.com/
     tatsu-lab/stanford_alpaca, publication Title: GitHub repository.
[39] S. Satapara, P. Mehta, D. Ganguly, S. Modha, Fighting fire with fire: Adversarial prompting
     to generate a misinformation detection dataset, 2024. arXiv:arXiv:2401.04481.