gerber at Touché: Ideology and Power Identification in
                         Parliamentary Debates 2024
                         Notebook for the Touché Lab at CLEF 2024

                         Christian Gerber1
                         1
                             University of Tübingen, 72070 Tübingen, Germany


                                        Abstract
                                        In democratic countries, national parliaments shape laws and policies through deliberative processes that reflect
                                        underlying political ideologies and power structures. This paper presents a system developed for the Ideology
                                        and Power Identification Shared Task to classify parliamentary debates into categories indicative of ideology and
                                        power dynamics. Using a Convolutional Neural Network (CNN) architecture enhanced with hyper-parameter
                                        optimisation, this system processes multilingual data from the ParlaMint corpus. Key preprocessing steps include
                                        cleaning, tokenisation and conversion of text into integer sequences. The CNN model consists of embedding,
                                        convolutional, max-pooling and dense layers with a sigmoid activation function for binary classification. Our
                                        evaluation, based on precision, recall and F1 score, shows that the model successfully classifies ideology and power
                                        dynamics in parliamentary debates, achieving an average F1 score of 0.676 for power identification and 0.632 for
                                        political orientation. These results demonstrate the potential of the model for analysing complex parliamentary
                                        discourse.

                                        Keywords
                                        CNN, Ideology and Power Identification, NLP, Touché


                         1. Introduction
                         Since the dawn of civilisation, politics has been a fundamental part of human society. From early
                         tribal councils to the complex governmental structure of modern society, politics has shaped the way
                         societies are organised, governed and led. Throughout history, political systems have evolved to meet
                         the changing needs of society and to adapt to new challenges and opportunities. Without politics it will
                         be difficult to maintain order in a large society, therefore politics is essential for the stability, security
                         and development of a country, providing a structure through which decisions are made and power
                         is distributed [1]. It influences every aspect of our lives, from the laws that are made, the resources
                         that are distributed, the education, welfare and security of its citizens. Understanding this political
                         communication and the presentation of political speakers is vital for a functioning society. These
                         speeches usually consist of indirect speech and are quite complex. Nevertheless, it is important to
                         analyse parliamentary debates in order to gain critical insight into how political ideologies and power
                         dynamics influence legislative outcomes and, more importantly, the lives of citizens. Parliamentary
                         debates are an essential part of democratic processes. Politicians and their associated parties serve as
                         the voice of their constituents by expressing political ideologies, negotiating and making decisions [2].
                         This leads to the characterisation of a nation’s political landscape. These debates provide a rich corpus
                         for analysis, reflecting the political climate and the ideologies of each individual speakers. However,
                         the inherent complexity and subtlety of political language, combined with the volume of textual data
                         generated during parliamentary proceedings, presents significant challenges for computational analysis.
                         Traditional approaches or simple approaches like analysing short and direct tweets f.e. via "X", often
                         do not capture the nuanced expressions of political ideology and party affiliation in parliamentary
                         discourse. The shared task [3] focuses on identifying two variables associated with speakers in a
                         parliamentary debate: their political ideology and whether they belong to a governing party or a party


                          CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France
                          $ christian.gerber@student.uni-tuebingen.de (C. Gerber)
                                     © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
in opposition by using machine learning and natural language process. Offering a large corpus to train
on. These two tasks were achieved by using a CNN model described in the third chapter.


2. Background
The analysis of political discourse has a rich history, with early studies focusing on the rhetorical
strategies used by political representatives. While early work [1][2] emphasised the importance of
persuasive elements used by politicians to shape public opinion and policy, recent advances include the
integration of machine learning techniques into political discourse analysis. A study by Abercrombie
and Batista-Navarro [4] provides a large and systematic review of sentiment and ideology detecting
analysis in parliamentary debates. They discuss different approaches, ranging from sentiment analysis,
classification and position scaling to the analysis of political speeches, in order to highlight the strengths
and limitations of these methods. They found out, that within the overall area of sentiment analysis
in political detection, there are eight tasks. For example, emotion analysis, agreement and alignment
detection and most interesting for this paper: ideology and party affiliation detection. These tasks have
been tackled using a wide range of approaches, from supervised to unsupervised machine learning
methods, including neural networks. The use of convolutional neural networks (CNNs) has become
increasingly popular due to their ability to capture complex patterns within textual data. Several
comprehensive reviews provide insights into how deep learning models, such as CNNs, are applied
to various text classification tasks, including sentiment analysis and political text analysis [5][6]. In
addition, the analysis of X’s tweets has become increasingly popular in recent years. COVID-19
tweets were analysed by Aslan et al. [7]. They used FastText Skipgram for information extraction,
a convolutional neural network (CNN) model for feature extraction, and an adaptive optimisation
algorithm (AOA) for feature selection. In Dehghani and Yazdanparast’s paper [8], they present several
machine learning and deep learning models to analyse the sentiment of Persian political tweets. They
applied Gaussian Naive Bayes, Gradient Boosting, Logistic Regression, Decision Trees, Random Forests,
as well as a combination of CNN and LSTM to classify the polarities of tweets. The results showed, that
the CNN-LSTM model had the highest classification accuracy, showcasing the effect of CNN models in
political text analysis.


3. System Overview
The following section provides a detailed description of the system developed to identify ideology
and power in a political speech. The aim of these two tasks is to classify parliamentary debates into
categories that reflect ideology or power dynamics. First, the input data is pre-processed and then
CNN’s local feature extraction is used to convert textual information into numerical vectors. This
description outlines the components, resources and methods used to build and fine-tune the model. The
software used for this implementation is Python with libraries including TensorFlow, Keras, Scikit-learn,
Keras-tuner and others used for machine learning and data processing. The system uses a Convolutional
Neural Network (CNN) architecture and employs various hyperparameter optimisation techniques to
improve performance. Finally, the model is evaluated based on precision, recall and F1 score.

3.1. Dataset
The data for this task comes from ParlaMint [9], a multilingual comparable corpora of parliamentary
debates. A selection of speeches was collected and made available as training set [3]. The data in the
training set was sampled to reduce potential confounding variables (e.g. speaker identity) and provided
in tab-separated text files. The fields in the data include:

    • id: Unique ID for each text.
    • speaker: Unique ID for each speaker, allowing multiple speeches from the same speaker.
    • sex: Binary/biological sex of the speaker (Female, Male, Unspecified/Unknown).
    • text: Transcribed text of the parliamentary speech, potentially including line breaks and other
      special sequences.
    • text_en: Automatic translation of the text to English, which may be empty for speeches originally
      in English or missing for some non-English speeches.
    • label: Binary/numeric label indicating political orientation (0 for left, 1 for right) or power
      identification (1 for opposition, 0 for coalition/governing party).

For the system described in this paper, only the fields ’id’, ’speaker’, ’text’ and ’label’ were used.
Furthermore, the training data provided 29 different parliaments, including Austria, Belgium, Denmark
and many more.


3.2. Data Preprocessing
Data preprocessing is the process of preparing raw data before it is used to build machine learning
models, and involves several steps. As parliamentary speeches are sometimes published in different
ways, there is a lot of inconsistency and redundancy, which makes data cleaning necessary. The
following preprocessing steps were applied:

    • Remove line breaks and non-alphabetic characters
    • Convert text to lower case

At its core, CNN uses a "tokeniser" from Tensorflow [10], which is used to convert sequences of integers
from the input data, with a vocabulary size limited to 10,000 words.

3.3. Model Architecture
Convolutional Neural Networks (CNNs) are a type of artificial neural network that learns directly
from data. A CNN is a feed-forward network that can extract features from data with convolutional
structures [5]. As a result of the convolutional layer, the input data is filtered and a feature map is
created that illustrates the particular attributes associated with the data points [6]. CNN is able to
detect local and deep features from text by using layers to automatically learn their hierarchies. The
CNN model in this paper classifies the input text data and classifies it into two categories: for political
orientation, 0 is left and 1 is right, and for power identification, 0 indicates coalition (or ruling party)
and 1 indicates opposition. Figure 1 shows the proposed deep learning architecture using CNN. First,
the embedding layer converts the pre-tokenised integer sequences into dense vectors of fixed size in an
embedding matrix. One-dimensional convolution (Conv1D) is used for feature extraction. The next
layer is max-pooling, which reduces the network parameters, resulting in a faster training process and
easier handling of overfitting problems [11]. As a final layer, the dense layer with a sigmoid activation
function is used to generate predictions. The model is built using the Adam optimiser and the binary
cross-entropy loss function. Precision, recall and F1-score are used as evaluation metrics.

3.4. Training Process
The model is trained on the processed text data, split into training and validation sets to evaluate
performance during training. The ratio of the splits were 80-20. During the process, the training data is
carefully prepared to ensure no overlaps are made between speakers in the training and validation sets.
The training was done on a GPU-enabled environment.
Figure 1: Architecture of the deep learning model


3.5. Hyperparameter Tuning
There are eight different parameters used in the neural network, each of which can have a different
value. For the epoch 4, 5, 6, 7, 8, 9, 10, 11, 12 were considered and for the batch size 2, 4, 8, 16, 32 and 64.
To find the best parameters and optimise the performance of the CNN, the Bayesian optimisation tuner
from KerasTuner [12] was used. Bayesian optimisation is a sequential design strategy for optimising
complex models where the decision-making process is not easily interpretable [13]. Basically, the
goal was to optimise for validation accuracy. This was achieved by trying different combinations of
words in the vocabulary (max_nb_words), embedding dimensions (embedding_dim), sequence length
(max_sequence_length), number of convolutions (num_conv_layers), number of filters (num_filters) and
kernel size (3, 5, 7). The Bayesian optimisation would be run twice per combination with a maximum of
10 trials. The best parameters for the number of epochs and the batch size were tested manually. These
values are summarised in Table 1.

                       Table 1: Hyperparameters used in Bayesian Optimization
               Parameter                  Values
               epochs                     4, 5, 6, 7, 8, 9, 10, 11, 12
               batch size                 2, 4, 8, 16, 32, 64
               max_nb_words               5000, 10000, 15000, 20000
               embedding_dim              50, 100, 150, 200, 250, 300
               max_sequence_length        100, 200, 300, 400, 500, 600, 700, 800, 900, 1000
               num_conv_layers            1, 2, 3
               num_filters                32, 64, 96, 128, 160, 192, 224, 256
               kernel_size                3, 5, 7

  In the end the model worked best under these condition:

    • Epochs: 8
    • Batch size: 32
    • max_nb_words: 10,000
    • embedding_dim: 250
    • max_sequence_length: 600
    • num_conv_layers: 1
    • num_filters: 192
    • kernel_size: 3


4. Results
4.1. Evaluation Metrics
To evaluate the performance of the model, measures of precision, recall and F1 scores were used for
the two different tasks. The first task was to identify the ideology of the speaker’s party and the other
task was to identify the power. Depending on the identification task, they are calculated based on the
confusion matrix, which has four values:

    • True Positives (TP): The number of correctly identified speeches from "right" ideology/"governing"
      party (0)
    • True Negative (TN): The number of correctly identified speeches from "left" ideology/"opposition"
      party (1)
    • False Positive (FP): The number of incorrectly identified speeches from the "right" ideol-
      ogy/"governing" party (0) when it is actually from the "left" ideology/"opposition" party (1)
    • False Negative (FN): The number of incorrectly identified speeches from the "left" ideol-
      ogy/"opposition" party (1) when it is actually from the "right" ideology/"governing" party

Precision indicates what proportion of predicted positives are actually Positive.

                                                        𝑇𝑃
                                        Precision =                                                    (1)
                                                      𝑇𝑃 + 𝐹𝑃
Recall measures the proportion of Positives that are correctly classified

                                                      𝑇𝑃
                                         Recall =                                                      (2)
                                                    𝑇𝑃 + 𝐹𝑁
F1-score is a number between zero and one that represents the harmonic mean of precision and recall

                                               2 * 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 * 𝑅𝑒𝑐𝑎𝑙𝑙
                                 F1-score =                                                            (3)
                                                𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙

4.2. Evaluation
This section presents the results of each dataset in each task. The parameters used to train the deep
learning model, as discussed in section 2.5 on hyperparameter tuning, were determined by Bayesian
optimisation. The results were evaluated using the submission system provided by TIRA [14] and were
published by Touché [3]. Table 2 shows the results of the Power Identification task and Table 3 shows
the results of the Political Orientation task. The highest scores are highlighted in green and the lowest
in red. The overall F1 scores are 0.6758 for political power and 0.6322 for political orientation. These
scores indicate a moderate level of accuracy in the models, reflecting their ability to classify political
ideology and power. For the identification of speaker power, the model had the highest precision
for Hungary (0.8776), meaning that it had the highest proportion of true positives out of all positive
predictions. Conversely, Ukraine had the lowest precision (0.5135). Recall was also highest for Hungary
(0.8694), meaning that the model was able to successfully identify most of the true instances of political
power in speeches. Again, Ukraine had the lowest score (0.5138). The highest F1 score is also found in
Hungary (0.8727). This indicates a good balance between precision and recall, while the lowest F1 score
in Ukraine (0.4756) indicates significant room for improvement in both precision and recall. This shows
that the model was able to identify the political power of a speaker quite well in countries like Hungary,
Turkey and Galicia, while it had difficulties with speakers in Ukraine, Italy and Bosnia and Herzegovina.
                                    Table 2: Political Power scores
                 Overall F1_Orientation                           0.6322
                 Parliament                           Precision Recall       F1_score
                 Austria (at)                           0.6859     0.6842     0.6827
                 Bosnia and Herzegovina (ba)            0.8150     0.5267     0.5081
                 Belgium (be)                           0.6001     0.5989     0.5951
                 Bulgaria (bg)                          0.6615     0.6607     0.6557
                 Czechia (cz)                           0.6356     0.6368     0.6360
                 Denmark (dk)                           0.6523     0.6389     0.6350
                 Spain (es)                             0.7329     0.7117     0.7164
                 Catalonia (es-ct)                      0.7994     0.8112     0.8042
                 Galicia (es-ga)                        0.8696     0.8593     0.8630
                 Basque Country (es-pv)                 0.7446     0.7413     0.7422
                 Finland (fi)                           0.6161     0.6065     0.5975
                 France (fr)                            0.7149     0.7056     0.7095
                 Great Britain (gb)                     0.7268     0.7211     0.7237
                 Greece (gr)                            0.7783     0.6872     0.6836
                 Croatia (hr)                           0.6760     0.6334     0.6305
                 Hungary (hu)                           0.8776     0.8694     0.8727
                 Italy (it)                             0.6103     0.5556     0.5161
                 Latvia (lv)                            0.6750     0.6198     0.6286
                 The Netherlands (nl)                   0.6525     0.6391     0.6411
                 Poland (pl)                            0.7699     0.7690     0.7694
                 Portugal (pt)                          0.6676     0.6688     0.6582
                 Serbia (rs)                            0.7982     0.7137     0.7303
                 Slovenia (si)                          0.6168     0.5985     0.5814
                 Turkey (tr)                            0.8372     0.8411     0.8375
                 Ukraine (ua)                           0.5135     0.5138     0.4756

The results for the orientation tasks, as shown in Table 3, show that the model achieved the highest
prediction for the Turkish dataset in terms of precision (0.8404), recall (0.8423) and F1 score (0.8410).
Conversely, the model gave the lowest results for Latvia in terms of precision (0.5301) and recall
(0.5083). The lowest F1 score (0.4456) is for the recognition of orientation for speakers from Bosnia and
Herzegovina. According to the results, the CNN model performed better for countries such as Turkey,
Poland and Spain and less accurately for countries such as Bosnia and Herzegovina, Latvia and Croatia.
This suggests that there is a need to improve the model in these countries.


                                 Table 3: Political Orientation scores
                 Overall F1_Orientation                            0.6322
                 Parliament                          Precision Recall        F1_score
                 Austria (at)                           0.6785     0.6204     0.6034
                 Bosnia and Herzegovina (ba)            0.5795     0.5084     0.4456
                 Belgium (be)                           0.6163     0.5642     0.5442
                 Bulgaria (bg)                          0.6227     0.6168     0.6188
                 Czechia (cz)                           0.5333     0.5496     0.5223
                 Denmark (dk)                           0.5653     0.5632     0.5630
                 Spain (es)                             0.7811     0.7795     0.7738
                 Catalonia (es-ct)                      0.6619     0.6594     0.6551
                 Galicia (es-ga)                        0.7726     0.7605     0.7643
                 Parliament                         Precision    Recall    F1_score
                 Finland (fi)                         0.5587     0.5496     0.5392
                 France (fr)                          0.6499     0.5799     0.5806
                 Great Britain (gb)                   0.7591     0.7620     0.7591
                 Greece (gr)                          0.7228     0.7267     0.7196
                 Croatia (hr)                         0.5749     0.5302     0.5123
                 Hungary (hu)                         0.7286     0.6717     0.6898
                 Italy (it)                           0.6447     0.6136     0.6020
                 Latvia (lv)                          0.5301     0.5083     0.4858
                 The Netherlands (nl)                 0.5940     0.5920     0.5928
                 Poland (pl)                          0.8376     0.6740     0.7159
                 Portugal (pt)                        0.6909     0.6873     0.6885
                 Serbia (rs)                          0.7470     0.6113     0.6408
                 Slovenia (si)                        0.7280     0.5997     0.5839
                 Turkey (tr)                          0.8404     0.8423     0.8410
                 Ukraine (ua)                         0.7994     0.6964     0.7315


5. Conclusion
This paper presents a CNN model, developed for the Touché Lab at CLEF 2024, that can be used to iden-
tify the ideology and power of a speaker in parliamentary debates. Through careful data pre-processing
and the application of hyperparameter optimisation techniques, the model achieves a satisfactory level
of accuracy. The model was evaluated on both tasks by TIRA, using datasets from different countries.
The results indicate that the model performs moderately well, with an overall F1 score of 0.6758 for
power identification and 0.6322 for political orientation. For power identification, the CNN mode
performed best for the Hungarian dataset with an F1 score of 0.8727 and for ideology identification it
was the Turkish dataset with an F1 score of 0.8410. However, the performance of the model in regions
such as Ukraine and Bosnia and Herzegovina shows that there is room for improvement. This suggests
a need for further refinement, possibly through more specific data pre-processing or the integration of
additional contextual data. The use of monolingual or multilingual pre-trained models could help to
achieve these improvements. This could potentially be achieved by using monolingual or multilingual
pre-trained language models. Google’s BERT or mBERT architecture has typically been trained on
a large corpus, and there are a variety of writing styles in the corpus, as well as many topics (e.g.
science, novels, news). Multilingual or monolingual language models, such as BERT or mBERT, can
capture the semantics and meaning of sentences in a language. Therefore, these models are used in
data pre-processing for word embedding, resulting in text vectorisation that can be used as input to a
neural network. In addition, other layers can be added to the CNN model, such as LSTM layers. [8].
Future work could also focus on expanding the dataset, improving the model architecture and exploring
additional features to further improve classification performance.
In conclusion, this research represents a fundamental step towards more automated analysis of par-
liamentary debates, paving the way for deeper insights into the ideological and power dynamics that
shape legislative outcomes. The methods and results presented here contribute to the broader discourse
on political communication and its computational analysis, and highlight the potential for further
innovation in this important area.
Acknowledgments

References
 [1] P. Chilton, Analysing Political Discourse: Theory and Practice, Analysing Political Discourse:
     Theory and Practice, Routledge, 2004. URL: https://books.google.de/books?id=un1buuNipQIC.
 [2] J. Charteris-Black, Analysing Political Speeches: Rhetoric, Discourse and Metaphor, Bloomsbury
     Publishing, 2018. URL: https://books.google.de/books?id=1fhGEAAAQBAJ.
 [3] J. Kiesel, Ç. Çöltekin, M. Heinrich, M. Fröbe, M. Alshomary, B. D. Longueville, T. Erjavec, N. Handke,
     M. Kopp, N. Ljubešić, K. Meden, N. Mirzakhmedova, V. Morkevičius, T. Reitis-Munstermann,
     M. Scharfbillig, N. Stefanovitch, H. Wachsmuth, M. Potthast, B. Stein, Overview of Touché
     2024: Argumentation Systems, in: L. Goeuriot, P. Mulhem, G. Quénot, D. Schwab, L. Soulier,
     G. M. D. Nunzio, P. Galuščáková, A. G. S. de Herrera, G. Faggioli, N. Ferro (Eds.), Experimental IR
     Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International
     Conference of the CLEF Association (CLEF 2024), Lecture Notes in Computer Science, Springer,
     Berlin Heidelberg New York, 2024.
 [4] G. Abercrombie, R. Batista-Navarro, Sentiment and position-taking analysis of parliamentary
     debates: a systematic literature review, Journal of Computational Social Science 3 (2020) 245–270.
     URL: https://doi.org/10.1007/s42001-019-00060-w. doi:10.1007/s42001-019-00060-w.
 [5] Z. Li, F. Liu, W. Yang, S. Peng, J. Zhou, A survey of convolutional neural networks: Analysis,
     applications, and prospects, IEEE Transactions on Neural Networks and Learning Systems PP
     (2021) 1–21. doi:10.1109/TNNLS.2021.3084827.
 [6] S. Pouyanfar, S. Sadiq, Y. Yan, H. Tian, Y. Tao, M. P. Reyes, M.-L. Shyu, S.-C. Chen, S. S. Iyengar, A
     survey on deep learning: Algorithms, techniques, and applications, ACM Comput. Surv. 51 (2018).
     URL: https://doi.org/10.1145/3234150. doi:10.1145/3234150.
 [7] S. Aslan, S. Kızıloluk, E. Sert, Tsa-cnn-aoa: Twitter sentiment analysis using cnn optimized via
     arithmetic optimization algorithm, Neural Computing and Applications 35 (2023) 10311–10328.
     URL: https://doi.org/10.1007/s00521-023-08236-2. doi:10.1007/s00521-023-08236-2.
 [8] M. Dehghani, Z. Yazdanparast, Political sentiment analysis of persian tweets using cnn-lstm model,
     2023. URL: https://arxiv.org/abs/2307.07740. arXiv:2307.07740.
 [9] T. Erjavec, M. Ogrodniczuk, P. Osenova, N. Ljubešić, K. Simov, A. Pančur, M. Rudolf, M. Kopp,
     S. Barkarson, S. Steingrímsson, Çöltekin, J. de Does, K. Depuydt, T. Agnoloni, G. Venturi,
     M. Calzada Pérez, L. D. de Macedo, C. Navarretta, G. Luxardo, M. Coole, P. Rayson, V. Morke-
     vičius, T. Krilavičius, R. Darǵis, O. Ring, R. van Heusden, M. Marx, D. Fišer, The parlamint
     corpora of parliamentary proceedings, Language resources and evaluation 57 (2022) 415–448.
     doi:10.1007/s10579-021-09574-0.
[10] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard,
     M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan,
     P. Warden, M. Wicke, Y. Yu, X. Zheng, Tensorflow: A system for large-scale machine learning,
     2016. arXiv:1605.08695.
[11] L. Alzubaidi, J. Zhang, A. J. Humaidi, A. Al-Dujaili, Y. Duan, O. Al-Shamma, J. Santamaría, M. A.
     Fadhel, M. Al-Amidie, L. Farhan, Review of deep learning: concepts, CNN architectures, challenges,
     applications, future directions, J Big Data 8 (2021) 53.
[12] D. Yogatama, N. A. Smith, Bayesian optimization of text representations, 2015.
     arXiv:1503.00693.
[13] J. Snoek, H. Larochelle, R. P. Adams, Practical bayesian optimization of machine learning algorithms,
     2012. arXiv:1206.2944.
[14] M. Fröbe, M. Wiegmann, N. Kolyada, B. Grahm, T. Elstner, F. Loebe, M. Hagen, B. Stein, M. Potthast,
     Continuous Integration for Reproducible Shared Tasks with TIRA.io, in: J. Kamps, L. Goeuriot,
     F. Crestani, M. Maistro, H. Joho, B. Davis, C. Gurrin, U. Kruschwitz, A. Caputo (Eds.), Advances
     in Information Retrieval. 45th European Conference on IR Research (ECIR 2023), Lecture Notes
     in Computer Science, Springer, Berlin Heidelberg New York, 2023, pp. 236–241. doi:10.1007/
978-3-031-28241-6_20.