Advancing Human-Like Summarization: Approaches to Text Summarization Saliq Gowhar, Bhavya Sharma, Ashutosh K Gupta and Anand Kumar Madasamy Department of Information Technology, National Institute of Technology Karnatka, Surathkal, India Abstract Text summarization, a well-explored domain within Natural Language Processing, has witnessed sig- nificant progress. The ILSUM shared task, encompassing various languages, such as English, Hindi, Gujarati, and Bengali, concentrates on text summarization. The proposed research focuses on leveraging pretrained sequence-to-sequence models for abstractive summarization specifically in the context of the English language. This paper provides an extensive exposition of our model and approach. Notably, we achieved the top ranking in the English Language subtask. Furthermore, this paper dives into an analysis of various techniques for extractive summarization, presenting their outcomes and drawing comparisons with abstractive summarization. Keywords Text Summarization, Sequence-to-Sequence models, Abstractive and Extractive Summarization. 1. Introduction In this ever-expanding digital age, textual information has grown exponentially, due to which effective information retrieval and comprehension has become a major challenge. Text Summa- rization, a vital and a heavily researched prospect of Natural Language Processing, has emerged as a crucial solution to this challenge. It aims to condense large texts into concise human-like summaries, providing the readers with key information while sparing them the effort of reading extensive documents of large volumes. With the rapid advancements in NLP and Machine Learning, the area of text summarization has seen rapid growth and advancements despite the absence of large and high-quality datasets. Text summarization can be either abstractive or extractive. Abstractive summarization being a more efficient form of summarization [1], is a technique where the system generates a summary by understanding the content of the document and then creating a summary using it’s own understanding of the document, hence making it a more effective technique capable of generating human-like summaries. It can generate a summary which contains words that may or may not be available in the original document. Several pretrained sequence-to-sequence models exist which can be used for abstractive summarization including T5[2], BART[3], ProphetNet[4] and Pegasus[5]. Extractive summarization on the other hand is a technique which maintains the original information content of the document [6], and works by selecting and extracting Forum for Information Retrieval Evaluation, December 15-18 2023, India Envelope-Open saliqgowhar.211ee250@nitk.edu.in (S. Gowhar); shrmabhav.211ai011@nitk.edu.in (B. Sharma); ashutosh.211ai008@nitk.edu.in (A. K. Gupta); m_anandkumar@nitk.edu.in (A. K. Madasamy) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings the sentences or phrases directly from the original text that are considered the most important representatives of the document’s content. The sentences are ranked as per their importance which can be calculated using various algorithms like TextRank[7], TF-IDF[7] and K-Means[8]. In this study, we have implemented abstractive and extractive summarization methods for English language within the framework of the FIRE shared task 2023 - ILSUM [16], making use of the dataset furnished by the event organizers. Key takeaways of the task can be found in [17]. We conducted summarization on both the raw and preprocessed data, utilizing the ILSUM 2022 dataset for evaluation. Abstractive summarization was executed using Google’s T5 transformer model, while extractive summarization was implemented with TF-IDF and Term Frequency algorithms. In the context of the shared task, we exclusively submitted results derived from abstractive techniques, reserving the application of extractive summarization methods solely for comparative analysis. Evaluation has been done using ROUGE-N scores along with their respective precision, recall and F1 measures. 2. Related Work Ranganathan et al.[9], used fine-tuned T5 transformer model for abstractive summarization on the UCI drug review and BBC datasets. Lalitha et al.[10], fine-tuned T5, BART, and Pegasus for abstractive summarization of medical documents using the SUMPUBMED dataset. Jadeja et al.[11], performed a comparative analysis between state-of-the-art text summarizers including T5 and Pegasus, using the WikiHow dataset, and evaluations have been made manually by humans and as well as using metrics like ROUGE and BLEU. Ladhak et al.[12], introduced WikiLingua, a multilingual dataset containing article-summary pairs in 18 distinct languages. In their experiments, they fine-tuned the mBART model using this dataset. Aljević et al.[13], proposed a novel graph based approach for extractive summarization that involves transforming a given text into a network of interconnected sentences and utilizes a computationally efficient selectivity measure to assess the significance of these graph nodes. Jewani et al.[14] performed a brief study and comparisons of major extractive summarization techniques including TF-IDF, Clustering, Fuzzy logic, Neural Network and Graph based ap- proaches. Souza et al.[15] propose a novel multi-view approach for extractive summarization by treating it as a binary classification problem. 3. Corpus Description The dataset released for this task was created by extracting data from several leading Indian newspaper websites. We utilised the English language dataset released under ILSUM 2.0 in 2023 for our experimentation which consisted of train, test and validation datasets. The train set consists of 28,347 news-article and summary pairs, along with the ids and Headings, whereas the test set contains 2895 news article along with the respective ids and headings. In case of the test set, the summaries were not provided and kept hidden officially for evaluation purposes. Hence for our own evaluation purposes, we made use of the official test dataset released under ILSUM 1.0 in 2022 for English Language which consists of 4487 articles along with their ids, headings and human-reference summaries. Given this dataset, the task was to generate fixed length summaries overcoming the challenge of script mixing. 4. Model Description T5 (Text-To-Text Transfer Transformer) [2], is a versatile and powerful transformer-based architecture for natural language processing. It is unique in that it treats all NLP tasks as text-to-text tasks, which allows it to perform exceptionally well on a wide range of language understanding and generation tasks, from translation and summarization to question-answering and text classification. T5 has been pre-trained on the Colossal Clean Crawled Corpus (C4), which is a mixture of unsupervised and supervised data, making it capable of handling various tasks with the same underlying architecture. For our experimentations, we used the T5-Base variant, which is characterized by a model checkpoint containing 220 million parameters. 5. Methodology 5.1. Preprocessing We perform our experiments on both the original dataset as well as the preprocessed dataset. As part of preprocessing we performed multiple steps which include: • Lowercasing and conversion to string format. • Removal of numerical digits and special characters from the text. • Replacing newline characters with spaces. • Replacing consecutive occurrences of special characters wih single spaces. • Removal of emoticons from the text. Furthermore, we organized our data to match the format that T5 expects for summarization tasks. We focused solely on the ”Article” and ”Summary” columns and removed any other columns. We also renamed the ”Article” column as ’ctext’ and the ”Summary” column as ’text.’ After that, we added the prefix ’summarize: ’ to the start of each article. 5.2. Creating a Custom Dataset Object We create a Custom Dataset object for our data that is used particularly for text summarization in transformer based architectures. • We initialise various attributes which include the tokenizer used, text to be summarized, the reference summaries, maximum length of source text, maximum length of target text and the dataframe. • We retrieve the individual data samples from the dataset provided the index of the sample. Tokenization is performed for both the articles and summaries using the T5Tokenizer and the tokenized input and target sequences are obtained with a maximum length and padding to ensure consistent shapes for model input. • The resulting tokenized sequences include the Input IDs and the attention masks. We return a dictionary for each sample containing the following keys: – source_ids: The input IDs for the source text. – source_mask: The attention mask for the source text. – target_ids: The input IDs for the target text. – target_mask: The attention mask for the target text. 5.3. Setting up the parameters We initialize a weights & biases (wandb) project to keep track of our experiments and set up the wandb configuration for our exerimentation. In this experimentation, we have trained the model on the entire dataset using the Hyper-parameter settings given in Table 1. Table 1 Parameter settings Parameters Values Epochs 8 Max source length 512 Max target length 75 Batch Size 2 Learning Rate 5e-5 Beams 4 Length penalty 1 Repetition penalty 2.5 6. Results As per the official results for the ILSUM 2023 task, our team NITK-AI (SCaLAR1 ) was able to achieve notable scores. Specifically, our performance in terms of the ROUGE metrics was as follows: a ROUGE-1 score of 0.3321, a ROUGE-2 score of 0.1731, a ROUGE-4 score of 0.121, and a ROUGE-L score of 0.282. Additionally, when assessing our results using the BERT Score, we obtained a recall score of 0.8752, a precision score of 0.8684, and an F1 measure of 0.8716. The official ROUGE scores are given in Table 2 and official BERT scores are given in Table 3. 7. Comparative analysis We evaluated the performance of T5 model on ILSUM 2022 test data using ROUGE-N metrics on both the original dataset as well as the preprocessed dataset. The results obtained are given in Table 4. 1 https://scalar-nitk.github.io/website/ Table 2 Official ROUGE score results Team Name Rouge-1 F1 Rouge-2 F1 Rouge-4 F1 Rouge-L F1 NITK-AI (SCaLAR) 0.3321 0.1731 0.121 0.282 Eclipse 0.3022 0.1111 0.042 0.2504 BITS Pilani 0.2354 0.0604 0.0147 0.182 ASH 0.137 0.017 0.0004 0.1181 ILSUM_2023_SANGITA 0 0 0 0 Table 3 Official BERT score results Team Name Bert_Score_P Bert_Score_R Bert_Score_F NITK-AI (SCaLAR) 0.8752 0.8684 0.8716 Eclipse 0.8505 0.8733 0.8616 BITS Pilani 0.8724 0.8462 0.8589 ASH 0.8277 0.8036 0.8153 ILSUM_2023_SANGITA 0 0 0 Table 4 ROUGE Metrics for T5 model on 2022 dataset Dataset Sub-Metric ROUGE-1 ROUGE-2 ROUGE-L Recall 0.432 0.335 0.406 Original Precision 0.488 0.376 0.457 Dataset F1-Measure 0.451 0.350 0.424 Recall 0.321 0.185 0.289 Pre-processed Precision 0.313 0.175 0.282 Dataset F1-Measure 0.310 0.176 0.280 Additionally, we conducted a comparative analysis that involved evaluating the performance of the T5 model for abstractive summarization and comparing it with several extractive summa- rization techniques including TF-IDF and Frequency based approach. This analysis was done using the same ILSUM 2022 dataset using the same ROUGE metrics, but this time we only used the original dataset for comparative analsyis as it gave us the best results using T5. In frequency based approach, we tokenize the sentences of an article and rank the sentences based on the frequency of its words from highest to lowest. In TF-IDF based approach, we tokenize the article into individual sentences, followed by creating a TF-IDF matrix, which assigns weights to words in each sentence. We then compute the cosine similarity between each sentence and entire document, measuring how similar each sentence is to the overall content. Finally, we rank the sentences based on these scores and choose the top n sentences as the summary. The results obtained using these two methods are given in Table 5. It can cleared be deduced that abstractive summarization gives better results as compared to the above mentioned extractive summarization approaches. Table 5 ROUGE Metrics using Extractive Summarization on 2022 dataset Approach Sub-Metric ROUGE-1 ROUGE-2 ROUGE-L Recall 0.222 0.107 0.196 Term Frequency Precision 0.223 0.097 0.193 F1-Measure 0.214 0.098 0.187 Recall 0.340 0.180 0.313 TF-IDF Precision 0.188 0.086 0.171 F1-Measure 0.218 0.101 0.199 8. Conclusion and Future works In this paper, we present our work on performing summarization of English text as a part of the Forum for Information Retrieval Evaluation 2023 shared task, ILSUM. We conducted experiments using the T5 transformer-based model for abstractive summarization, achieving significant results. Additionally, we explored extractive summarization techniques and con- ducted a comparative analysis between abstractive and extractive methods, demonstrating the superior efficiency of abstractive approaches. Due to computational constraints, we submitted results only for English language, securing first position in the subtask as well. As part of our future research within this project, we plan to explore other transformer-based models for abstractive summarization, such as PEGASUS and BART. Furthermore, we aim to extend our work to cover other Indian languages, including Bengali, Hindi, and Gujarati, using multilingual transformer models like mT5 and IndicBART. We also intend to conduct comparative analyses involving large language models (LLMs) such as Llama 2 and perform a deepened error analysis. We anticipate that this work will provide valuable insights and directions for future research in this domain. Acknowledgments We would like to express our sincere gratitude to the organizers of the ILSUM Shared Task and Forum for Information Retrieval Evaluation (FIRE) for curating a high-quality dataset of Indian language texts, paving the way for high quality research in the field. References [1] N. Moratanch and S. Chitrakala, ”A survey on abstractive text summarization,” 2016 Inter- national Conference on Circuit, Power and Computing Technologies (ICCPCT), Nagercoil, India, 2016, pp. 1-7, doi: 10.1109/ICCPCT.2016.7530193. [2] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, et al., Exploring the limits of transfer learning with a unified text-to-text transformer., J.Mach. Learn. Res. 21 (2020) 1–67. [3] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, L. Zettlemoyer, Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, arXiv preprint arXiv:1910.13461 (2019) [4] W. Qi, Y. Yan, Y. Gong, D. Liu, N. Duan, J. Chen, R. Zhang, M. Zhou, Prophetnet: Predicting future n-gram for sequence-to-sequence pre-training, arXiv preprint arXiv:2001.04063 (2020). [5] J. Zhang, Y. Zhao, M. Saleh, P. Liu, Pegasus: Pre-training with extracted gap-sentences for abstractive summarization, in: International Conference on Machine Learning, PMLR, 2020, pp. 11328–11339. [6] S. R. Rahimi, A. T. Mozhdehi and M. Abdolahi, ”An overview on extractive text summa- rization,” 2017 IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI), Tehran, Iran, 2017, pp. 0054-0062, doi: 10.1109/KBEI.2017.8324874. [7] S. Zaware, D. Patadiya, A. Gaikwad, S. Gulhane and A. Thakare, ”Text Summarization using TF-IDF and Textrank algorithm,” 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 2021, pp. 1399-1407, doi: 10.1109/ICOEI51242.2021.9453071. [8] K. Shetty and J. S. Kallimani, ”Automatic extractive text summarization using K-means clustering,” 2017 International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT), Mysuru, India, 2017, pp. 1-9, doi: 10.1109/ICEECCOT.2017.8284627. [9] J. Ranganathan and G. Abuka, ”Text Summarization using Transformer Model,” 2022 Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS), Milan, Italy, 2022, pp. 1-5, doi: 10.1109/SNAMS58071.2022.10062698. [10] E. Lalitha, K. Ramani, D. Shahida, E. V. S. Deepak, M. H. Bindu and D. Shaikshavali, ”Text Summarization of Medical Documents using Abstractive Techniques,” 2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC), Salem, India, 2023, pp. 939-943, doi: 10.1109/ICAAIC56838.2023.10140885. [11] D. Jadeja, A. Khetri, A. Mittal and D. K. Vishwakarma, ”Comparative Analysis of Trans- former Models on WikiHow Dataset,” 2022 International Conference on Sustainable Com- puting and Data Communication Systems (ICSCDS), Erode, India, 2022, pp. 655-658, doi: 10.1109/ICSCDS53736.2022.9761043. [12] Faisal Ladhak, Esin Durmus, Claire Cardie, and Kathleen McKeown. 2020. WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4034–4048, Online. Association for Computational Linguistics. [13] D. Aljević, L. Todorovski and S. Martinčić-Ipšić, ”Extractive Text Summarization Based on Selectivity Ranking,” 2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), Kocaeli, Turkey, 2021, pp. 1-6, doi: 10.1109/IN- ISTA52262.2021.9548408. [14] K. Jewani, O. Damankar, N. Janyani, D. Mhatre and S. Gangwani, ”A Brief Study on Approaches for Extractive Summarization,” 2021 International Conference on Artifi- cial Intelligence and Smart Systems (ICAIS), Coimbatore, India, 2021, pp. 601-608, doi: 10.1109/ICAIS50930.2021.9396031. [15] C. M. Souza, M. R. G. Meireles and R. Vimieiro, ”A multi-view extractive text summariza- tion approach for long scientific articles,” 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 2022, pp. 01-08, doi: 10.1109/IJCNN55064.2022.9892526. [16] S. Satapara, P. Mehta, S. Modha, and D. Ganguly, ”Indian language summarization at fire 2023,” in Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2023, Goa, India. December 15-18, 2023, ACM, 2023. [17] S. Satapara, P. Mehta, S. Modha, and D. Ganguly, ”Key takeaways from the second shared task on indian language summarization (ilsum 2023),” in Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation, Goa, India. December 15-18, 2023 (K. Ghosh, T. Mandl, P. Majumder, and M. Mitra, eds.), CEUR Workshop Proceedings, CEUR-WS.org, 2023.