Vayam Solve Kurmaha at Touché: Power Identification in Parliamentary Speeches Using TF-IDF Vectorizer and SVM Classifier Working Notes Paper Touché Lab at CLEF 2024 Lakshmi Priya S1 , Dhannya S M1 , S. Shwetha1 , Surabhi Kamath1 , Shreedevi Seluka Balaji1 , Sai Nikitha N.S.R1,* and Srinidhi Lakshmi Narayanan1 1 Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, Chennai, Tamil Nadu, 603110, India Abstract Political parties’ viewpoints, goals, and policy philosophies are often made clear through parliamentary debates, which have a significant impact on national decision-making processes. Gaining public understanding of these discussions is essential for understanding political efficacy. However, because political statements are inherently ambiguous and strategically indirect, algorithmic analysis of them is challenging. By taking part in the Touché 2024 assignment on Ideology and Power Identification in Parliamentary Debates, this study attempts to address these issues. In this paper we compare traditional classification models namely Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN) and an ensemble of the three on features extracted using TF-IDF. We found the SVM outperformed the other models and achieved an F1 score of 0.68. Keywords power, parliament, speeches, SVM, TF-IDF, binary classification, political, machine learning 1. Introduction Parliamentary debates are rich repositories for uncovering the outlook of political parties, their motives, and their approach towards the welfare and future of the country. Discussions in parliament have the potential to shape the entire trajectory of a nation, as most decisions of paramount importance originate here. Understanding these discussions is vital, as it allows the public to truly comprehend political parties and evaluate their efficiency in making decisions on their behalf. Political speeches, however, are elusive to computational analysis of their meanings. A paper published in 1977 [1] theorized that politicians are often strategically indirect to advance their career and gain an edge over their opponents. In another study [2] on vagueness in political language, the author states that political language is kept vague to address different audiences simultaneously and to avoid facing threats. Vagueness and indirectness both make a text challenging to analyze. The task of power identification given by Touché, Ideology and Power Identification in Parliamentary Debates 2024 [3] aims to identify if a speaker of a given text in a parliamentary debate belongs to the coalition party or the opposition party. 2. Background A paper [4] on political speech analysis, published in 2020, introduced a Graph Political Sentiment analyzer (GPolS), a neural model for speech-level stance analysis of members of parliament (MPs). The model utilizes a fine-tuned BERT for encoding the data and Graph Attention Networks (GAT) for modeling and aggregating contextual relations between transcripts, motions, and speakers. GPolS CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France * Corresponding author. $ lakshmipriyas@ssn.edu.in (L. P. S); dhannyasm@ssn.edu.in (D. S. M); shwetha2210210@ssn.edu.in (S. Shwetha); surabhi2210196@ssn.edu.in (S. Kamath); shreedevi2210389@ssn.edu.in (S. S. Balaji); sainikitha2210401@ssn.edu.in (S. N. N.S.R); srinidhi2210142@ssn.edu.in (S. L. Narayanan) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings outperforms all baselines significantly under the Wilcoxon signed-rank test, by a large margin greater than 6.5%. A study [5] aimed to predict the party group from Lithuanian parliamentary speeches found that at the dataset level, removing out-of-domain and irrelevant instances was the best preprocessing technique. Similarly, at the document level, removing digits and using a bag-of-words approach and token bigrams were most effective. The highest accuracy achieved was 0.545 compared to 0.279 and 0.13 on random and majority baselines. A different study [6], published in 2004, on identifying agreement and disagreement in conversational speech proposed a statistical model that utilizes Bayesian networks to capture pragmatic dependencies and employs maximum entropy ranking to identify adjacency pairs based on lexical, durational, and structural features. The model was shown to achieve high accuracy. In 2023, Kavallos and Christos-Sotirios conducted a research [7] aimed to assess the feasibility of classifying Greek parliament proceedings for their respective political parties using Multinomial Naïve Bayes classification, Stochastic Gradient Descent classification, Random Forest classification, and Recurrent Neural Network classification. They recorded that Random Forest algorithm performed the best followed by Recurrent Neural Network classification. A thesis [8] exploring sentiment analysis of political debates introduces the Debate Graph Extraction (DGE) framework. This framework represents debates as graphs with speakers as nodes and exchanges as links, labeled based on sentiment ("supporting" or "opposing"). It also discusses analyzing these graphs using network mathematics and community detection to understand debate patterns. 3. System Overview In this task, the English translations of the texts for the model that were provided in the dataset were used. The method used involved augmenting the texts using synonym replacement, extracting its features using TF-IDF vectorizer and applying them to classifier models, namely Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN) and an ensemble of the three. 3.1. Data Augmentation Data augmentation is the process of artificially increasing the size of the dataset by creating modified versions of existing data. It can also also used to balance datasets by increasing the size of the minority class, thereby addressing class imbalances. Imbalanced datasets can cause the classification model to be biased towards the majority class. Figure 1: Dataset During the data exploration phase, we found that most datasets were imbalanced and some by large margins. Figure 1 graphically represents the number of entries for labels ’coalition’ and ’opposition’. The graph clearly depicts the data imbalances in many datasets. Therefore we used data augmentation to balance the datasets. The particular technique that we employed was synonym replacement. Synonym replacement randomly selects words in a text and replaces them with their synonyms to generate new data. 3.2. Feature Extraction Feature extraction is the process of transforming raw data into numerical features that can be processed by machine learning models. The aim of the features is to represent the information contained in the original data in a format that can be efficiently utilised by the machine learning model. We used the Term Frequency-Inverse Document Frequency (TF-IDF) vectorizer for feature extraction due to its efficiency when working with large corpora. TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure that indicates how important a word is to a document within a collection or corpus of unstructured text data. It scores a word by multiplying the word’s Term Frequency (TF) by the Inverse Document Frequency (IDF). The higher the TF-IDF score of a term, the more important that term is to the document. This helps identify words that are informative within a document while not being overly common across all documents. A comparative study [9] conducted in 2020 examined three vectorizers: Count Vectorizer, TF-IDF Vectorizer, and Hashing Vectorizer. These features were applied to classifiers SVM and KNN for sentiment analysis of YouTube comments on Nokia products. The study found that the TF-IDF vectorizer had the best performance, with nearly no errors in predicting negative values and a higher number of positive predictive values compared to the other vectorizers. Figure 2, sourced from a blog post by DeepLearning.AI, illustrates this concept effectively. Figure 2: How TF-IDF Vectorizer works [10] 3.3. Classification models We explored the use of Support Vector Machine, Random Forest, K-Nearest Neighbors and an ensemble of the three for the classification task. 3.3.1. Support Vector Machine (SVM) Support Vector Machine (SVM) is a powerful supervised learning algorithm used for classification tasks. It works by finding the hyperplane in an N-dimensional space (where N is the number of features) that best separates the data points of different classes in the feature space. SVM performs well with non-linear data by transforming it into a higher-dimensional space where they may be linearly separable. A study [11] on identifying EHR progress notes pertaining to diabetes employed SVM classifier and achieved a high performance of F1 score 0.93. The kernel function determines how the data points are mapped to the N-dimensional space. The hyper parameter C controls the error margin of the classification. For our task, we achieved best results with a polynomial kernel function and a 0.01 value of C. 3.3.2. Random Forest (RF) Random Forest is an ensemble learning method that constructs multiple decision trees during training, and outputs the class that is the mode of the classes of the individual trees. Each tree is built by the usage of a random subset of features and records factors, leading to diversity inside the ensemble. This approach reduces the risk of over-fitting and improves the model’s generalization ability as it is less sensitive to the variability of a single tree. A paper published in 2019 [12] on the sentiment analysis of data sources from Twitter used Random Forest Classifier for classification on TF-IDF vectors. The model delivered a performance of 75%. Our best results were obtained using a Random Forest consisting of 350 decision trees with a maximum depth of 5. 3.3.3. K-Nearest Neighbors (KNN) K-Nearest Neighbors (KNN) is a simple, instance-based learning algorithm. For a given test sample, KNN identifies the K training samples closest in feature space and assigns the most common class among those neighbors to the test sample. The simplicity of KNN makes it easy to implement and interpret, though it can be computationally expensive, especially with large datasets. A study [13] on the classification of news topics in Indonesian language used the KNN classifier model with word2vec for feature extraction. The study yielded an accuracy of 89.2/ We attained optimal performance with a K value of 150. 3.3.4. Ensemble Ensemble methods combine multiple machine learning models to improve overall performance. By lever- aging the strengths of various models, ensemble methods can achieve higher accuracy and robustness compared to individual models. A paper [14] published on classification of spam product reviews using an ensemble that combines predictions from Multi-layer perceptron (MLP), K-Nearest Neighbour (KNN), and Random Forest (RF) demonstrated that the ensemble outperformed individual classifiers with an accuracy of 88.13%. We used a majority voting ensemble that aggregates the classifications made by SVM, RF, and KNN, and takes the majority vote to make the final prediction. 4. Results Table 1 displays the macro average F1 scores of TF-IDF features using Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), and an ensemble of these models on datasets from Catalonia, Hungary, and Belgium. For the Catalonia dataset, RF and the Ensemble performed best with an F1 score of 0.78, closely followed by SVM at 0.77. KNN notably underperformed, trailing by a margin of 0.19 from the next lowest score. In the Hungary dataset, SVM achieved the highest score at 0.86, followed by the Ensemble at 0.78, RF at 0.74, and KNN at 0.72. SVM demonstrated significantly better performance with an F1 score of 0.8 compared to the second-best model, the Ensemble. For the Belgium dataset, SVM and the Ensemble equally attained the highest scores, each achieving an F1 score of 0.66. RF followed at 0.63, while KNN once again had the lowest score at 0.56, showing a notable margin of 0.7 from the next lowest score. On average, SVM outperformed the other models, followed by the Ensemble, Random Forest, and lastly KNN. In our submission of classified test data for Touché, Ideology, and Power Identification in Parliamen- tary Debates 2024, focusing on power identification, our SVM model with TF-IDF achieved an F1 score of 0.68, resulting in a 6th place ranking. Table 1 Selected example datasets of the macro average F1 scores of classifiers with TFIDF Vectors Datasets SVM RF KNN Ensemble es-ct 0.77 0.78 0.58 0.78 hu 0.86 0.74 0.72 0.78 be 0.66 0.63 0.56 0.66 5. Conclusion In our analysis of the different classification models that we tested, i.e., Random Forest, K-Nearest Neighbors, and an Ensemble model, with vectors generated using TF-IDF vectorization, we have concluded that the SVM model performs best on the given dataset. Such a model would prove useful to political analysts and researchers by giving them insights into political dynamics, distribution of power within legislative bodies, and rhetorical strategies. Media and journalists could use this model to determine key power strategies employed by the government and opposition. Our model would also help the education sector, as students will be able to use it to analyze speeches from both the government’s and opposition’s perspectives, helping them learn about power strategies and understand language dynamics in politics. References [1] S. G. Obeng, Language and politics: Indirectness in political discourse, Discourse & Society 8 (1997) 49–83. URL: https://doi.org/10.1177/0957926597008001004. doi:10.1177/ 0957926597008001004. arXiv:https://doi.org/10.1177/0957926597008001004. [2] H. Gruber, Political language and textual vagueness, Pragmatics. Quarterly Publication of the International Pragmatics Association (IPrA) 3 (1993) 1–28. [3] J. Kiesel, Ç. Çöltekin, M. Heinrich, M. Fröbe, M. Alshomary, B. D. Longueville, T. Erjavec, N. Handke, M. Kopp, N. Ljubešić, K. Meden, N. Mirzakhmedova, V. Morkevičius, T. Reitis-Munstermann, M. Scharfbillig, N. Stefanovitch, H. Wachsmuth, M. Potthast, B. Stein, Overview of Touché 2024: Argumentation Systems, in: L. Goeuriot, P. Mulhem, G. Quénot, D. Schwab, L. Soulier, G. M. D. Nunzio, P. Galuščáková, A. G. S. de Herrera, G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2024. [4] R. Sawhney, A. Wadhwa, S. Agarwal, R. Shah, Gpols: A contextual graph-based language model for analyzing parliamentary debates and political cohesion, in: Proceedings of the 28th International Conference on Computational Linguistics, 2020, pp. 4847–4859. [5] J. Kapočiūṫe-Dzikieṅe, A. Krupavičius, Predicting party group from the lithuanian parliamentary speeches, Information Technology and Control 43 (2014) 321–332. [6] M. Galley, K. McKeown, J. B. Hirschberg, E. Shriberg, Identifying agreement and disagreement in conversational speech: Use of bayesian networks to model pragmatic dependencies (2004). [7] C.-S. Kavallos, Parliament proceeding classification via machine learning algorithms: A case of greek parliament proceedings, 2023. [8] Z. Salah, Machine learning and sentiment analysis approaches for the analysis of Parliamentary debates, Ph.D. thesis, University of Liverpool, 2014. [9] I. Irawaty, R. Andreswari, D. Pramesti, Vectorizer comparison for sentiment analysis on social media youtube: A case study, in: 2020 3rd International Conference on Computer and Informatics Engineering (IC2IE), 2020, pp. 69–74. doi:10.1109/IC2IE50715.2020.9274650. [10] DeepLearning.AI, Tokenizers and tf-idf, https://www.deeplearning.ai/resources/ natural-language-processing/, 2022. Accessed: 2024-05-31. [11] A. Wright, A. B. McCoy, S. Henkin, A. Kale, D. F. Sittig, Use of a support vector machine for categorizing free-text notes: assessment of accuracy across two institutions, Journal of the American Medical Informatics Association 20 (2013) 887–890. [12] N. Bahrawi, Sentiment analysis using random forest algorithm-online social media based, Journal of Information Technology and Its Utilization 2 (2019) 29–33. [13] N. G. Ramadhan, et al., Indonesian online news topics classification using word2vec and k-nearest neighbor, Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi) 5 (2021) 1083–1089. [14] M. Fayaz, A. Khan, J. U. Rahman, A. Alharbi, M. I. Uddin, B. Alouffi, Ensemble machine learning model for classification of spam product reviews, Complexity 2020 (2020) 8857570.