Implementing Deep Learning-Based Approaches for
Article Summarization in Indian Languages
Rahul Tangsali1,3,*,† , Aabha Pingle1,3,† , Aditya Vyawahare1,3,† , Isha Joshi1,3,† and
Raviraj Joshi2,3
1
  SCTR’s Pune Institute of Computer Technology, Pune
2
  Indian Institute of Technology Madras, Chennai
3
  L3Cube, Pune


                                         Abstract
                                         This paper presents a summary of the work by our team Next Gen NLP, submitted at ILSUM 2022- a
                                         shared task in collocation with FIRE 2022, focused on the Indian language summarization of news articles.
                                         The shared task was to carry out either extractive or abstractive summarization of news articles written
                                         in Indian English, Hindi, and Gujarati. We achieved rank 2 in English, rank 4 in Hindi, and rank 3 in
                                         Gujarati in the validation phase of the competition. Whereas for the test phase of the competition, we
                                         achieved rank 3 in English and rank 4 in both Hindi and Gujarati. We experimented with pre-trained
                                         models in our work and fine-tuned those models with the ILSUM 2022 datasets. In our case, the fine-tuned
                                         SoTA PEGASUS model worked the best for English, the fine-tuned IndicBART model with augmented
                                         data for Hindi, and again fine-tuned PEGASUS model along with a translation mapping-based approach
                                         for Gujarati. Our scores on the obtained inferences were evaluated using ROUGE-1, ROUGE-2, and
                                         ROUGE-4 as the evaluation metrics.

                                         Keywords
                                         Extractive text summarization, Indian Languages, NLP, Pretrained models


1. Introduction
Text summarization is a trending research domain that has gained popularity with a plethora
of emerging use cases seeking its application [1, 2]. The last few decades have witnessed
tremendous growth in NLP research, especially text summarization. Text summarization has
applications in a wide range of domains, including medicine, politics, news, etc. With the
massive influx of news data in the form of newspaper articles, digital media, social media
platforms, and so on, a need exists to automate the news summarization process so that useful
insights could be achieved much faster than human workers were employed for the same task.
Effective summarization approaches investigated recently have hastened the process and made
their mark in the NLP research community by achieving state-of-the-art (SoTA) accuracies.


Forum for Information Retrieval Evaluation, December 9-13, 2022, India
*
  Corresponding author.
†
  These authors contributed equally.
$ rahuul2001@gmail.com (R. Tangsali); aabhapingle@gmail.com (A. Pingle); aditya.vyawahare07@gmail.com
(A. Vyawahare); joshiishaa@gmail.com (I. Joshi); ravirajoshi@gmail.com (R. Joshi)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
   Three distinct types of text summarization techniques exist: extractive, abstractive, and
hybrid. In extractive text summarization, key sentences and phrases are picked from the original
document and are integrated to generate a final summary [3]. This summarization technique
is easier to perform, but it may overlook the text’s overall context or omit some essential
information. This type of summary text is helpful for taking notes. Abstractive summarization
analyses the full text and generates a summary based on the fundamental concepts of the
text [4]. This summary is made using an entirely different wording style than the original
text. Unlike the extractive summarization methodology, sentences from the original text aren’t
picked up directly. Abstractive summarization provides an intelligently curated summarization
using unique phrases which are not native to the input text. However, with deep learning
methodologies, preparing abstractive summaries could be difficult and take a long time with
human judgment. The hybrid-based text summarization approach utilizes both extractive and
abstractive text summarization methods to generate the final summary [5, 6].
   With the emergence of NLP research worldwide, research on text summarization has been
conducted in high-resource languages such as English and texts written in Indian subcontinent-
based languages. Hindi and Gujarati are two of the most spoken Indian languages. Hindi is the
most spoken language in India and is considered the official language in 9 states and 3 union
territories and an additional official language in 3 other states across the country. Hindi is also
one of the 22 scheduled languages of the Republic of India. Hindi is spoken by approximately
615 million people worldwide and was recorded as the third most spoken language in the world
as of 2019. Gujarati is an Indo-Aryan language spoken predominantly by the Gujarati people in
the Indian state of Gujarat. It is the sixth most spoken language in India and is spoken by around
55 million people worldwide. Hindi and Gujarati are spoken by a considerable percentage of
the population across the world. Yet, there has been a backfoot witnessed in NLP research in
these languages compared to high-resource languages spoken worldwide.
   Text summarization research stretches back to 1958 when the first paper on the subject was
published [7]. Since then, various methodologies have been presented for both abstractive
and extractive text summarization in English. These include statistical-based, clustering-based,
graph-based, semantic-based, machine learning, and deep learning-based approaches. Deep
learning-based approaches, which focus on training neural nets, include work done by Mohsen
et al. [8], Xu [9], Alami et al. [10], and Anand and Wagh [11]. In addition, encoder-decoder
models have been proposed, with attention mechanisms incorporated in several proposed
methodologies.
   In comparison to English, lesser research has been done on text summarization research
in Hindi and Gujarati. There is a significant shortage in dataset resources, preprocessing
methodologies, and other research for many Indian languages, especially Gujarati, compared to
English. This motivated us to develop system pipelines that could perform efficient extractive
summarization for articles written in Hindi and Gujarati and achieve decent accuracy for the
generated summaries. Many organizations are leveraging their services to Indian language
speakers, and we aim to solve a small part of this challenge by performing summarization
research in two of the widely spoken languages in India.
   For this shared task [12], we implement pre-trained models [13] and tweak the conventional
pipelines along with fine-tuning with new data to obtain better results than previously imple-
mented systems. For English, we implement the PEGASUS [14], BRIO [15], and T5 [16] models
and also leverage the SentenceBERT model for extractive summarization purposes [17, 18]. For
Hindi, we implemented fine-tuning of IndicBART [19] with a right-shift operation (augmenting
the original dataset by shifting the last sentence of the article to the top), XL-Sum [20], and
mBART [21] models. For Gujarati, we implemented extractive summarization by translating
each sentence in the Gujarati article to English, and by creating a corresponding mapping
between the Gujarati and translated English sentences, and applying fine-tuned PEGASUS
model for English to the resultant English article to generate the English summary. The gen-
erated extractive summary in English is then translated back to Gujarati by a back-mapping
mechanism to get the final Gujarati summary. We also fine-tuned XL-Sum and mBART models
for Gujarati article summarization.


2. Related Work
Text summarization research dates back to 1958 when the first article on the topic [7] was
published. Since then, numerous rule-based and deep learning-based techniques have been
presented. Rule-based approaches include work done by Baxendale [22], which selects sentences
for a summary based on word position and heading of the article, and that by Oliviera in 2016
[23], which used scoring criteria such as lexical similarity, sentence centrality, text rank, and so
on for text summarization.
   Research on deep-learning approaches for text summarization picked up the pace when
encoder-decoder [24] and attention-based architectures [25] were proposed. Yu [26] suggested
methods for creating one-sentence summaries of news stories that use recurrent neural network
models like LSTM [27] and GRU [28], as well as with/without attention. In recent years, fine-
tuning pre-trained models using domain-specific datasets has been the dominant paradigm
in text summarization research. Pre-trained models which implement the BART [29], T5 [16],
etc. architectures have been proposed, which are available in the Hugging Face library. Recent
research includes the implementation of an importance-based ordering approach implemented
by Zhao et al., a cascade approach to abstractive summarization with content selection and
fusion proposed by Lebanoff et al [30]., and usage of prompt-based models such as GPT-3
[31], PaLM [32], T0 [33], etc. Many times, articles considered for summarization can be multi-
document in nature. Wang et al. [34] suggested a task-specific architecture for multi-document
summarization by combining numerous texts into a single graph. Zhong et al. [35] implemented
a semantic-based framework for the same.
   In the case of Hindi and Gujarati, there has been relatively little research on text summariza-
tion. K. Vimal Kumar et al. [36] suggested a graph-based method for summarising text in Hindi.
Gulati et al. [37] developed a unique fuzzy inference method for summarising multi-source
Hindi literature. Gupta et al. [38] suggested a rule-based method for Hindi that included dead
phrase and deadwood reduction strategies. Jain et al. [39] presented a real coded genetic
algorithm for Hindi text summarization. For Gujarati, Shah and Patel suggested Gujarati Text
Summarizer, which uses Textblob1 and Gensim2 to construct summaries from Gujarati text.
Patel examines the preprocessing phase for text summarization of Gujarati texts, emphasizing
1
    https://textblob.readthedocs.io/en/dev/
2
    https://radimrehurek.com/gensim/
Table 1
Details of provided dataset for ILSUM Shared Task 2022
                                               English   Hindi   Gujarati
                                  Train        12565     7958    8731
                                  Validation   898       569     606
                                  Test         4487      2842    3020


related issues and appropriate solutions [40].


3. Dataset Description
The datasets used in our research were provided to us by the ILSUM Shared Task organizers.
Datasets were organized in a CSV format, with multiple columns describing each record in the
file. These datasets were built using articles and headline pairs from several leading newspapers
across India. The columns in the CSV files were- "id": denoting the ID for the article for
unique identification, "Link": hyperlink from where the article has been extracted, "Heading":
heading/title of the article, "Article": the actual content of the article, and "Summary": gold
extractive summary of the article. Each article consisted, on average, of about 9 to 10 sentences,
and the extractive summaries, on average, were a single sentence long. For the validation
and test CSV files, there were only two columns: "id" and "Article", where it was expected to
find the summary of the text present in the "Article" column. The dataset content was raw,
with unnecessary punctuations and delimiters hindering the proposed pipeline, hence causing
a need for efficient data cleaning.
   Table 1 proposes the contents of the training, validation and test datasets in terms of number
of records present in each set.


4. Data Preparation
The dataset provided by the organizers was pretty raw, with redundant punctuations and
delimiters in the content. Hence, it was necessary to remove those so that the clean data
obtained could be further tokenized and passed to the model. In addition, we remove stopwords
present in the text [41] to avoid model redundancy towards not-so-useful data and convert the
text to lowercase to generalize the model perception towards the text. Out of the five columns
present in the CSV file, the "id", "Link" and "Heading" columns were seemingly redundant to be
taken into consideration, so we filtered out those columns for the model to get trained only on
the articles and their corresponding extractive summaries.
   We use the SentencePiece3 tokenizer for tokenizing the English, Hindi, and Gujarati article
texts. SentencePiece is an unsupervised text tokenizer and detokenizer intended specifically
for Neural Network-based text generation systems with a preset vocabulary size before neural
model training. It extends direct training from raw sentences to incorporate subword units

3
    https://github.com/google/sentencepiece
(e.g., byte-pair-encoding (BPE) [42]) and unigram language model. This tokenizer can be
defined implicitly using Hugging Face API4 for model fine-tuning so that both tokenization and
detokenization processes can be carried out without explicit code. First, a vocabulary of all the
common words in all articles is created and further utilized to quantify the text to a vectorized
format. Additionally, we apply padding to the maximum sequence length in the batch so that
sequences of uniform length only would be passed ahead to the model [43].
   For the translation+mapping-based approach that we implement as one of the approaches
for Gujarati, we first split the sentences using the full stop as a delimiter. Then, each sentence is
translated to English using the Google Translate API5 , and then the mapping is created between
the original Gujarati sentence and the translated English sentence obtained. Finally, these
English sentences from each paragraph are concatenated to get the final translated summaries
in English.


5. Systems implemented
5.1. For English
5.1.1. Fine-tuning PEGASUS
PEGASUS stands for "Pre-training with Extracted Gap sentences for Abstractive Summarization",
the paper for which was presented at the 2020 International Conference on Machine Learning
by Zhang et al.[44]. By masking entire sentences from the text and then appending the gap
sentences, the PEGASUS model yields a pseudo-summary of the input text. The PEGASUS
model picks sentences that are essential to the model and removes or masks them from the
input document. The model is then assigned with recovering those vital phrases, which it
accomplishes by constructing the output sequence, including the critical documents entirely
from the document’s non-essential parts. The advantage of this technique is its self-supervision;
the model may generate as many instances as there are documents without the need for human
annotation, which is sometimes a bottleneck in fully supervised systems.
   We fine-tuned the "pegasus-large" model6 available on Hugging Face with the training
dataset for English. This model is pre-trained on 350 million web pages and 1.5 billion news
articles, making its accuracy state-of-the-art in text summarization research. The Hugging Face
transformer library was used for fine-tuning purposes, which made the implementation easier.
Since the training data was large enough, we decided to fine-tune the model for 1 epoch on the
training data, along with a weight decay of 0.01, which took about 3.5 hours for the same. The
inferences yielded a significant increase in ROUGE scores as observed to those obtained with
the only pre-trained version of the model.
   To further increase the ROUGE scores, we tried experimenting with the max-tokens parameter
of the model during inference generation, which is the maximum length the generated inference
can have. The organizers had specified the standard value of the same to 75. We experimented
with a range of max-tokens values around that range, and we got max-tokens=65 to be the ideal

4
  https://huggingface.co/
5
  https://cloud.google.com/translate/
6
  https://huggingface.co/google/pegasus-large
value for the highest ROUGE scores.
  We also experimented with augmenting the dataset by adding noise to each record of the
dataset so that the model could predict the result better despite the noisy text present. The
ROUGE score increase for the same needed to be increased, compared to the highest score we
received.

5.1.2. Fine-tuning BRIO
BRIO stands for "Bringing Order to Abstractive Summarization", the paper presented in 2022 by
Liu et al.[15]. Maximum Likelihood Estimation (MLE) [45] is often used to train summarization
models. MLE presupposes that an ideal model would allocate full probability mass to the
reference summary, which may result in poor performance when a model must compare
numerous candidates that vary from the reference. Instead of relying on MLE training, BRIO
has a contrastive learning component, enabling abstractive models to more precisely assess the
likelihood of system-generated summaries.
   We fine-tune the "Yale-LILY/brio-cnndm-uncased" version7 of the BRIO model available on
Hugging Face on the English dataset. Since BRIO is an extension to the BART model, we apply
BART-based tokenization to the input text, which uses SentencePiece internally. We fine-tuned
the English dataset on the model for 1 epoch with a weight decay of 0.01 and even experimented
further with adding noisy text to each training record. The model’s performance, however, was
not as good as the fine-tuned PEGASUS model mentioned earlier.

5.1.3. Leveraging SentenceBERT for extractive summarization
The approach was a tweaked implementation derived from the paper "Fine-tune BERT for
Extractive Summarization" presented by Liu in 2019 [46]. Here, extractive summarization is
approached as a classification problem by predicting a score between 0 and 1 for each phrase in
a text, i.e., by determining whether or not it belongs to the summary. The algorithm then creates
a summary based on these scores by picking the phrases with the highest scores determined by
certain relevant parameters.
   We extract sentences using the SpaCy8 library for each article in the training dataset. For
every sentence in each training example, we assign a label of 1 if it belongs to the final extractive
summary, else 0. The original dataset was unbalanced, as most of the sentences are unlikely
to be in the summary. We augmented our dataset with new examples that balanced positive
and negative examples. This annotated data, along with the labels, constitutes the input to our
BERT model.
   We fine-tuned the "sentence-transformers/all-mpnet-base-v2" model9 , since it proved to be
the fastest among all models available in the sentence-transformers library. We set the batch
size for training to 4, and the maximum sequence length of the generated summary to 512. The
learning rate for training was set to 0.00001. We fine-tuned the SentenceBERT model for 3
epochs, which took approximately 4.5 hours. The original pre-trained BERT model is modified

7
  https://huggingface.co/Yale-LILY/brio-cnndm-uncased
8
  https://spacy.io/
9
  https://huggingface.co/sentence-transformers/all-mpnet-base-v2
by drop out and a dense layer on top of the BERT model to get the final output label. Finally,
we get the inferences from the model by taking two sentences with the highest scores obtained
from the BERT model, which gave an average summary length of around 70. We add only those
sentences in the summary whose length is more than 25 characters.

5.1.4. Fine-tuning T5
The Text-to-Text-Transfer-Transformer (T5) paradigm suggests recasting all NLP tasks as a
single text-to-text format with text strings as input and output. The original text of the input
and output pairs during T5 pre-training is modified by introducing noise.
   We fine-tuned the ‘mrm8488/t5-base-finetuned-summarize-news’ version10 of the T5 model,
which is pre-trained on 4515 English news articles. We fine-tuned this model on our dataset. We
applied T5 tokenization to our dataset and fine-tuned the model for 20 epochs. The maximum
length of the summary during the inference phase was set to 75.

5.2. For Hindi
5.2.1. Fine-tuning IndicBART
We used IndicBART[19], a multilingual, sequence-to-sequence pre-trained model. The model
focuses on Indic languages majorly, and English as well. IndicBART is based on the mBART
architecture and provides support for 11 Indian languages, and can be used to build various
natural language generation applications for tasks like machine translation and summarization.
   We fine-tuned the ‘ai4bharat/IndicBART’ version11 of IndicBART available on Hugging Face
on the training dataset for Hindi. The data used for training the model was augmented by
adding noise to each record of the dataset. The model gave better results better after training
on such a dataset. The model was fine-tuned for 2 epochs. We also experimented with the
maximum length parameter while generating the inferences. Inferences obtained with ‘max
length’ set to 60 gave the best ROUGE scores.

5.2.2. Fine-tuning XL-Sum
The paper titled ‘XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages’
[20] presents a multilingual dataset as well as an mT5 checkpoint fine-tuned on the dataset and
proposes fine-tuned mT5 [47] with XL-Sum and experimented on multilingual and low resource
summarization tasks. The model was fine-tuned on the 45 languages of the XL-Sum dataset.
   We used the ‘csebuetnlp/mT5_multilingual_XLSum’ checkpoint12 available on Hugging Face
for our summarization task. To get the best results, we fine-tuned this checkpoint on the given
Hindi training dataset for 2 epochs. This method gave ROUGE scores comparable to the Indic
BART scores.


10
   https://huggingface.co/mrm8488/t5-base-finetuned-summarize-news
11
   https://huggingface.co/ai4bharat/IndicBART
12
   https://huggingface.co/csebuetnlp/mT5𝑚 𝑢𝑙𝑡𝑖𝑙𝑖𝑛𝑔𝑢𝑎𝑙𝑋 𝐿𝑆𝑢𝑚
5.2.3. Fine-tuning mBART
Pre-trained on multilingual corpora containing 25 languages, mBART (Multilingual De-
noising Pre-training for Neural Machine Translation)[21] can be used for a wide range of
tasks, including machine translation and summarization. We used the "facebook/mbart-
large-cc25"13 , "GiordanoB/mbart-large-50-finetuned-summarization-V2"14 and "ARTeLab/mbart-
summarization-mlsum"15 pre-trained models on the dataset.
   The results obtained differed minutely. However, the "facebook/mbart-large-cc25" model
gave us the best ROUGE scores; hence, we fine-tuned the model on the dataset for 1 epoch.

5.3. For Gujarati
5.3.1. Translation+Mapping+PEGASUS
We implemented the PEGASUS model for Gujarati by fine-tuning the "pegasus-large" model
available on Hugging Face. As this model wasn’t initially trained for the Gujarati language, we
implemented translation and mapping steps to use this model for generating inferences on our
Gujarati dataset.
   First, we translated the Gujarati validation dataset to English and simultaneously stored the
mapping between the English-translated sentence and the Gujarati sentence for each article in a
dictionary. For translation, we used the GoogleTranslator module provided by deep-translator16
library. Then, we generated the inferences on the English-translated validation dataset using
the PEGASUS model fine-tuned for English, the max-tokens parameter for which was set to
75 initially. Finally, the generated inferences were back-mapped to give the original Gujarati
sentences. As the dataset provided was extractive, we performed the mapping and back-mapping
steps mainly to keep the summaries extractive in nature. It should be noted that the translation
process was only used once, and the original Gujarati text was retrieved using the mapping
developed during the Gujarati to English translation process.
   To further increase the ROUGE scores, we experimented with the max-tokens parameter of
the model. We observed that the English-translated sentences were longer than the original
Gujarati sentences. Therefore, we tested by increasing the max-tokens parameter, and we
inferred that max-tokens set to 85 provided the highest ROUGE scores.

5.3.2. Fine-tuning mBART
For this approach, we used the "facebook/mbart-large-cc25"17 model. After applying the mBART
tokenizer on the given Gujarati dataset, we fine-tuned the model for one epoch. This methodol-
ogy gave us competent ROUGE scores. However, we improved our results by augmenting the
dataset by adding noise to each record of the dataset to create a new record so that the model
could predict better.

13
   https://huggingface.co/facebook/mbart-large-cc25
14
   https://huggingface.co/GiordanoB/mbart-large-50-finetuned-summarization-V2
15
   https://huggingface.co/ARTeLab/mbart-summarization-mlsum
16
   https://github.com/nidhaloff/deep-translator
17
   https://huggingface.co/facebook/mbart-large-cc25
Table 2
Results obtained in validation phase (English)
        Approach Implemented                          ROUGE-1        ROUGE-2    ROUGE-4
        Fine-tuned PEGASUS                            0.5618         0.4509     0.4218
        Fine-tuned BRIO                               0.4878         0.3723     0.3383
        SentenceBERT leveraged for summarization      0.4639         0.3421     0.3156
        Fine-tuned T5                                 0.4851         0.3588     0.3226
Table 3
Results obtained in validation phase (Hindi)
                 Approach Implemented            ROUGE-1    ROUGE-2     ROUGE-4
                 Fine-tuned IndicBART            0.5536     0.4572      0.4162
                 Fine-tuned XL-Sum               0.5281     0.4098      0.337
                 Fine-tuned mBART                0.5269     0.4271      0.3806
Table 4
Results obtained in validation phase (Gujarati)
            Approach Implemented                    ROUGE-1    ROUGE-2     ROUGE-4
            Translation+Mapping+PEGASUS             0.2028     0.1155      0.0835
            Fine-tuned mBART                        0.1924     0.1095      0.0723
            Fine-tuned XL-Sum                       0.1718     0.0718      0.0361
Table 5
Test phase results for submitted models
      Language      Model submitted for test phase         ROUGE-1    ROUGE-2     ROUGE-4
      English       Fine-tuned PEGASUS                     0.5568     0.4430      0.4123
      Hindi         Fine-tuned IndicBART                   0.5559     0.4547      0.4136
      Gujarati      Translation+Mapping+PEGASUS            0.2087     0.1192      0.0838


  The ROUGE scores obtained after fine-tuning the mBART model on this dataset were compa-
rable to the Translation+Mapping+PEGASUS model.

5.3.3. Fine-tuning XL-Sum


We used the XLSum model, an mT5 model fine-tuned on the multilingual XLSum dataset.
We used the checkpoint ‘csebuetnlp/mT5_multilingual_XLSum’ available on Hugging Face
to generate inferences on the Gujarati dataset. The model was trained for 5 epochs with the
max-tokens parameter set to 75.


6. Evaluation Metrics
In our study, the ROUGE Score, which stands for Recall-Oriented Understudy for Gisting
Assessment, was chosen as the evaluation metric [48]. For our summary, we recorded ROUGE-
1, ROUGE-2, and ROUGE-4 scores. ROUGE-1 calculated the unigram overlap between the
candidate and reference summaries, whereas ROUGE-2 assessed the bigram similarities between
Table 6
Best scores obtained by teams in the validation phase
           Language     Best performing team       ROUGE-1      ROUGE-2      ROUGE-4
           English      MT-NLP IIIT-H              0.5685       0.4592       0.4335
           Hindi        HakunaMatata               0.6104       0.5152       0.4755
           Gujarati     MT-NLP IIIT-H              0.2620       0.1644       0.1216
Table 7
Best scores obtained by teams in the test phase
           Language     Best performing team       ROUGE-1      ROUGE-2      ROUGE-4
           English      MT-NLP IIIT-H              0.5583       0.4458       0.4180
           Hindi        MT-NLP IIIT-H              0.6072       0.5102       0.4711
           Gujarati     MT-NLP IIIT-H              0.2611       0.1651       0.1241


the summaries. All ROUGE scores are graded out of one, with the ROUGE score closer to one,
indicating more parallel with the gold summaries.


7. Results
Tables 2, 3 and 4 describe the results obtained on our approaches by testing on the validation data.
Table 5 describes the results obtained on the models we submitted during the test phase. We
evaluated the performance of the models using ROUGE scores as the evaluation metrics. The best
performing approaches were: fine-tuned PEGASUS model with max-tokens set to 65 for English,
the IndicBART model with right-shift operation for Hindi, and Translation+Mapping+PEGASUS
based approach for Gujarati. We achieved optimum accuracies with these approaches on both
the validation and test datasets.
  In the validation round of the competition, we were ranked second in English, fourth in Hindi,
and third in Gujarati. In the test phase, we were ranked third in English and fourth in Hindi
and Gujarati.


8. Competition Results
Tables 6 and 7 depict the best scores obtained in the ILSUM shared task for each language
and the team obtaining the same. Our research being motivated by a shared task- there were
other teams in the competition to achieve the most optimum results for English, Hindi, and
Gujarati. This section of the paper is designed to give readers a thorough comparison between
the findings from our study and the top outcomes from the shared task.


9. Conclusion and Future Work
Thus, we have illustrated the findings of our research which we performed in the ILSUM shared
task in collocation with FIRE 2022 [49]. We have experimented with text summarization on
news articles written in English, Hindi, and Gujarati. We implemented pre-trained models in
our research and data manipulation operations performed in some of the operations. Finally,
we evaluate the ROUGE scores on the inferences obtained from each system we trained. We
achieved decent accuracy on our best-performing models, with accuracies very close to SoTA
accuracies. We can conclude from this analysis that there is a lot of scope for improvement in
research performed for low-resource Indian languages, such as Gujarati, compared to English.
The research foundation for text summarization for English is robust, as there are many pre-
trained models and attention-based mechanisms that one can leverage. However, this foundation
has to be scaled up drastically in the coming years for Hindi and Gujarati.
   In the future, we plan to leverage our work on larger datasets, especially for Hindi and
Gujarati, as we believe that clean and well-formatted datasets are one of the significant barriers
that cause the gap between text summarization research in English and low-resource Indian
languages. Furthermore, we plan to implement our approaches on high-end GPUs and use
better preprocessing and tokenization techniques to shorten this research gap.


10. Acknowledgements
This research was accomplished as part of the L3Cube Pune mentoring program. We convey
our gratitude to our L3Cube mentors for their continuous assistance and encouragement.


References
 [1] A. Vhatkar, P. Bhattacharyya, K. Arya, Survey on text summarization, 2020.
 [2] V. Dehru, P. Tiwari, G. Aggarwal, B. Joshi, P. Kartik, Text summarization techniques
     and applications, IOP Conference Series: Materials Science and Engineering 1099 (2021)
     012042. doi:10.1088/1757-899X/1099/1/012042.
 [3] N. Moratanch, C. Gopalan, A survey on extractive text summarization, 2017, pp. 1–6.
     doi:10.1109/ICCCSP.2017.7944061.
 [4] N. Moratanch, C. Gopalan, A survey on abstractive text summarization, 2016, pp. 1–7.
     doi:10.1109/ICCPCT.2016.7530193.
 [5] M. Kirmani, N. Hakak, M. mohd, M. Mohd, Hybrid Text Summarization: A Survey: Pro-
     ceedings of SoCTA 2017, 2019, pp. 63–73. doi:10.1007/978-981-13-0589-4_7.
 [6] D. Sahoo, A. Bhoi, R. C. Balabantaray, Hybrid approach to abstractive summariza-
     tion, Procedia Computer Science 132 (2018) 1228–1237. URL: https://www.sciencedirect.
     com/science/article/pii/S1877050918307701. doi:https://doi.org/10.1016/j.procs.
     2018.05.038, international Conference on Computational Intelligence and Data Science.
 [7] H. P. Luhn, The automatic creation of literature abstracts, IBM Journal of Research and
     Development 2 (1958) 159–165. doi:10.1147/rd.22.0159.
 [8] F. Mohsen, J. Wang, K. Al-Sabahi, A hierarchical self-attentive neural extractive summarizer
     via reinforcement learning (hsasrl), Applied Intelligence 50 (2020) 2633–2646.
 [9] J. Xu, G. Durrett, Neural extractive text summarization with syntactic compression, arXiv
     preprint arXiv:1902.00863 (2019).
[10] N. Alami, M. Meknassi, N. En-nahnahi, Enhancing unsupervised neural networks based
     text summarization with word embedding and ensemble learning, Expert systems with
     applications 123 (2019) 195–211.
[11] D. Anand, R. Wagh, Effective deep learning approaches for summarization of legal texts,
     Journal of King Saud University-Computer and Information Sciences (2019).
[12] S. Satapara, B. Modha, S. Modha, P. Mehta, Fire 2022 ilsum track: Indian language
     summarization, in: Proceedings of the 14th Forum for Information Retrieval Evaluation,
     ACM, 2022.
[13] X. Han, Z. Zhang, N. Ding, Y. Gu, X. Liu, Y. Huo, J. Qiu, Y. Yao, A. Zhang, L. Zhang, W. Han,
     M. Huang, Q. Jin, Y. Lan, Y. Liu, Z. Liu, Z. Lu, X. Qiu, R. Song, J. Tang, J.-R. Wen, J. Yuan,
     W. X. Zhao, J. Zhu, Pre-trained models: Past, present and future, AI Open 2 (2021) 225–250.
     URL: https://www.sciencedirect.com/science/article/pii/S2666651021000231. doi:https:
     //doi.org/10.1016/j.aiopen.2021.08.002.
[14] J. Zhang, Y. Zhao, M. Saleh, P. J. Liu, Pegasus: Pre-training with extracted gap-sentences
     for abstractive summarization, in: Proceedings of the 37th International Conference on
     Machine Learning, ICML’20, JMLR.org, 2020.
[15] Y. Liu, P. Liu, D. Radev, G. Neubig, BRIO: Bringing order to abstractive summarization, in:
     Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics
     (Volume 1: Long Papers), Association for Computational Linguistics, Dublin, Ireland, 2022,
     pp. 2890–2903. URL: https://aclanthology.org/2022.acl-long.207. doi:10.18653/v1/2022.
     acl-long.207.
[16] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu,
     et al., Exploring the limits of transfer learning with a unified text-to-text transformer., J.
     Mach. Learn. Res. 21 (2020) 1–67.
[17] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks,
     arXiv preprint arXiv:1908.10084 (2019).
[18] Y. Liu, Fine-tune bert for extractive summarization, arXiv preprint arXiv:1903.10318
     (2019).
[19] R. Dabre, H. Shrotriya, A. Kunchukuttan, R. Puduppully, M. Khapra, P. Kumar, IndicBART:
     A pre-trained model for indic natural language generation, in: Findings of the Association
     for Computational Linguistics: ACL 2022, Association for Computational Linguistics,
     Dublin, Ireland, 2022, pp. 1849–1863. URL: https://aclanthology.org/2022.findings-acl.145.
     doi:10.18653/v1/2022.findings-acl.145.
[20] T. Hasan, A. Bhattacharjee, M. S. Islam, K. Mubasshir, Y.-F. Li, Y.-B. Kang, M. S. Rah-
     man, R. Shahriyar, XL-sum: Large-scale multilingual abstractive summarization for 44
     languages, in: Findings of the Association for Computational Linguistics: ACL-IJCNLP
     2021, Association for Computational Linguistics, Online, 2021, pp. 4693–4703. URL: https://
     aclanthology.org/2021.findings-acl.413. doi:10.18653/v1/2021.findings-acl.413.
[21] Y. Liu, J. Gu, N. Goyal, X. Li, S. Edunov, M. Ghazvininejad, M. Lewis, L. Zettlemoyer,
     Multilingual denoising pre-training for neural machine translation, Transactions of the
     Association for Computational Linguistics 8 (2020) 726–742.
[22] P. B. Baxendale, Machine-made index for technical literature—an experiment, IBM Journal
     of Research and Development 2 (1958) 354–361. doi:10.1147/rd.24.0354.
[23] H. Oliveira, R. Ferreira, R. Lima, R. D. Lins, F. Freitas, M. Riss, S. J. Simske, Assessing
     shallow sentence scoring techniques and combinations for single and multi-document
     summarization, Expert Syst. Appl. 65 (2016) 68–86. URL: https://doi.org/10.1016/j.eswa.
     2016.08.030. doi:10.1016/j.eswa.2016.08.030.
[24] K. Aitken, V. V. Ramasesh, Y. Cao, N. Maheswaranathan, Understanding how encoder-
     decoder architectures attend, 2021. URL: https://arxiv.org/abs/2110.15253. doi:10.48550/
     ARXIV.2110.15253.
[25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser,
     I. Polosukhin, Attention is all you need, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach,
     R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing
     Systems, volume 30, Curran Associates, Inc., 2017. URL: https://proceedings.neurips.cc/
     paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
[26] H. Yu, Summarization with attention-based deep recurrent neural networks, 2017.
[27] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural computation 9 (1997)
     1735–80. doi:10.1162/neco.1997.9.8.1735.
[28] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neu-
     ral networks on sequence modeling, 2014. URL: https://arxiv.org/abs/1412.3555. doi:10.
     48550/ARXIV.1412.3555.
[29] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, L. Zettle-
     moyer, BART: Denoising sequence-to-sequence pre-training for natural language gen-
     eration, translation, and comprehension, in: Proceedings of the 58th Annual Meeting
     of the Association for Computational Linguistics, Association for Computational Lin-
     guistics, Online, 2020, pp. 7871–7880. URL: https://aclanthology.org/2020.acl-main.703.
     doi:10.18653/v1/2020.acl-main.703.
[30] L. Lebanoff, F. Dernoncourt, D. S. Kim, W. Chang, F. Liu, A cascade approach to neural
     abstractive summarization with content selection and fusion, in: Proceedings of the 1st
     Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
     and the 10th International Joint Conference on Natural Language Processing, Association
     for Computational Linguistics, Suzhou, China, 2020, pp. 529–535. URL: https://aclanthology.
     org/2020.aacl-main.52.
[31] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan,
     P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan,
     R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin,
     S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei,
     Language models are few-shot learners, in: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan,
     H. Lin (Eds.), Advances in Neural Information Processing Systems, volume 33, Curran
     Associates, Inc., 2020, pp. 1877–1901. URL: https://proceedings.neurips.cc/paper/2020/file/
     1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
[32] A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W.
     Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao,
     P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope,
     J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat,
     S. Dev, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito,
     D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick,
     A. M. Dai, T. S. Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee,
     Z. Zhou, X. Wang, B. Saeta, M. Diaz, O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck,
     J. Dean, S. Petrov, N. Fiedel, Palm: Scaling language modeling with pathways, 2022. URL:
     https://arxiv.org/abs/2204.02311. doi:10.48550/ARXIV.2204.02311.
[33] V. Sanh, A. Webson, C. Raffel, S. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler,
     A. Raja, M. Dey, M. S. Bari, C. Xu, U. Thakker, S. S. Sharma, E. Szczechla, T. Kim, G. Chh-
     ablani, N. Nayak, D. Datta, J. Chang, M. T.-J. Jiang, H. Wang, M. Manica, S. Shen, Z. X.
     Yong, H. Pandey, R. Bawden, T. Wang, T. Neeraj, J. Rozen, A. Sharma, A. Santilli, T. Fevry,
     J. A. Fries, R. Teehan, T. L. Scao, S. Biderman, L. Gao, T. Wolf, A. M. Rush, Multitask
     prompted training enables zero-shot task generalization, in: International Conference on
     Learning Representations, 2022. URL: https://openreview.net/forum?id=9Vrb9D0WI4.
[34] D. Wang, P. Liu, Y. Zheng, X. Qiu, X. Huang, Heterogeneous graph neural networks for
     extractive document summarization, in: Proceedings of the 58th Annual Meeting of the
     Association for Computational Linguistics, Association for Computational Linguistics,
     Online, 2020, pp. 6209–6219. URL: https://aclanthology.org/2020.acl-main.553. doi:10.
     18653/v1/2020.acl-main.553.
[35] M. Zhong, P. Liu, Y. Chen, D. Wang, X. Qiu, X. Huang, Extractive summarization as text
     matching, in: Proceedings of the 58th Annual Meeting of the Association for Computational
     Linguistics, Association for Computational Linguistics, Online, 2020, pp. 6197–6208. URL:
     https://aclanthology.org/2020.acl-main.552. doi:10.18653/v1/2020.acl-main.552.
[36] K. V. Kumar, D. Yadav, A. Sharma, Graph based technique for hindi text summarization,
     in: J. K. Mandal, S. C. Satapathy, M. Kumar Sanyal, P. P. Sarkar, A. Mukhopadhyay (Eds.),
     Information Systems Design and Intelligent Applications, Springer India, New Delhi, 2015,
     pp. 301–310.
[37] A. N. Gulati, S. D. Sawarkar, A novel technique for multidocument hindi text summarization,
     2017 International Conference on Nascent Technologies in Engineering (ICNTE) (2017)
     1–6.
[38] M. Gupta, N. K. Garg, Text summarization of hindi documents using rule based approach,
     2016 International Conference on Micro-Electronics and Telecommunication Engineering
     (ICMETE) (2016) 366–370.
[39] A. Jain, A. Arora, J. Morato, D. Yadav, V. Kumar K, Automatic, J. Szymanski, H. Mora,
     D. Logofătu, A. Sobecki, D. Jain, Automatic text summarization for hindi using real coded
     genetic algorithm, Applied Sciences 12 (2022). doi:10.3390/app12136584.
[40] P. Patel, Pre-processing phase of text summarization based on gujarati language, Inter-
     national Journal of Innovative Research in Computer Science Technology ISSN (2014)
     2347–5552.
[41] S. Sarica, J. Luo, Stopwords in technical language processing, PLOS ONE 16 (2021)
     e0254937. URL: https://doi.org/10.1371%2Fjournal.pone.0254937. doi:10.1371/journal.
     pone.0254937.
[42] R. Sennrich, B. Haddow, A. Birch, Neural machine translation of rare words with subword
     units, in: Proceedings of the 54th Annual Meeting of the Association for Computational
     Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Berlin,
     Germany, 2016, pp. 1715–1725. URL: https://aclanthology.org/P16-1162. doi:10.18653/
     v1/P16-1162.
[43] M. Dwarampudi, N. V. S. Reddy, Effects of padding on lstms and cnns, 2019. URL: https:
     //arxiv.org/abs/1903.07288. doi:10.48550/ARXIV.1903.07288.
[44] J. Zhang, Y. Zhao, M. Saleh, P. J. Liu, Pegasus: Pre-training with extracted gap-sentences for
     abstractive summarization, 2019. URL: https://arxiv.org/abs/1912.08777. doi:10.48550/
     ARXIV.1912.08777.
[45] X. Wang, W. Chen, M. Saxon, W. Y. Wang, Counterfactual maximum likelihood estimation
     for training deep networks, in: M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, J. W.
     Vaughan (Eds.), Advances in Neural Information Processing Systems, volume 34, Curran
     Associates, Inc., 2021, pp. 25072–25085. URL: https://proceedings.neurips.cc/paper/2021/
     file/d30d0f522a86b3665d8e3a9a91472e28-Paper.pdf.
[46] Y. Liu, Fine-tune bert for extractive summarization, ArXiv abs/1903.10318 (2019).
[47] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, C. Raffel,
     mT5: A massively multilingual pre-trained text-to-text transformer, in: Proceedings
     of the 2021 Conference of the North American Chapter of the Association for Com-
     putational Linguistics: Human Language Technologies, Association for Computational
     Linguistics, Online, 2021, pp. 483–498. URL: https://aclanthology.org/2021.naacl-main.41.
     doi:10.18653/v1/2021.naacl-main.41.
[48] C.-Y. Lin, ROUGE: A package for automatic evaluation of summaries, in: Text Summariza-
     tion Branches Out, Association for Computational Linguistics, Barcelona, Spain, 2004, pp.
     74–81. URL: https://aclanthology.org/W04-1013.
[49] S. Satapara, B. Modha, S. Modha, P. Mehta, Findings of the first shared task on indian
     language summarization (ilsum): Approaches, challenges and the path ahead, in: Working
     Notes of FIRE 2022 - Forum for Information Retrieval Evaluation, Kolkata, India, December
     9-13, 2022, CEUR Workshop Proceedings, CEUR-WS.org, 2022.