NICA at Quantum Computing CLEF Tasks 2024
                         Notebook for the NICA team at Quantum Computing Lab CLEF 2024

                         Aylin Naebzadeh1,* , Sauleh Eetemadi2
                         1
                          Student at School of Computer Engineering, Iran University of Science and Technology, Tehran, Islamic Republic Of Iran.
                         2
                          Assistant Professor of Computer Science, School of Computer Engineering, Iran University of Science and Technology, Tehran,
                         Islamic Republic Of Iran.


                                         Abstract
                                         We present the models implemented by the NICA group for the Quantum Computing (QuantumCLEF) Shared
                                         Task at CLEF 2024. Our participation focused on Task 1A: Feature Selection (Information Retrieval Task). We
                                         propose a feature selection algorithm based on a quadratic unconstrained binary optimization (QUBO) problem,
                                         which selects a specified number of features considering their importance and redundancy. This task was solved
                                         using a real quantum computer provided by D-Wave on the MQ2007 and ISTELLA datasets. Our approach
                                         utilized “Shannon Entropy” to target the mutual information between each feature and the target value. In
                                         QuantumCLEF Task 1A, the organizers suggested training a LambdaMART model on the selected features and
                                         evaluating performance using the nDCG@10 metric. Our team achieved nDCG@10 scores of 0.4506 and 0.6211
                                         for the MQ2007 and ISTELLA datasets, respectively.

                                         Keywords
                                         Quantum Computing, Feature Selection, Information Retrieval, QUBO, Quantum Annealer, D-Wave, Shannon
                                         Entropy, nDCG@10


                         1. Introduction
                         Machine learning (ML) models are highly effective in various data analytics tasks, including classification,
                         regression, and data generation. However, the performance and resource requirements of these models
                         often scale with the number of input features. Models with a larger number of features demand more
                         memory and computational power during training. In applications with strict resource limitations,
                         such as embedded systems, it is crucial to develop small and efficient models. Consequently, a common
                         objective in ML pipelines is to reduce the number of features while minimizing information loss through
                         a process known as dimensionality reduction [1, 2].
                            An important example for such a strategy is feature selection (FS), where the input dimension is
                         reduced by selecting only a subset of all available features without performing additional transformations
                         [3]. Despite its importance, feature selection poses significant challenges for classical computers. One
                         major limitation is that feature selection is an NP-hard problem, meaning that the computational effort
                         required to find the optimal subset of features grows exponentially with the number of features. As a
                         result, solving this problem can be extremely time-consuming, especially for large datasets with high-
                         dimensional feature spaces. Additionally, traditional algorithms often require substantial computational
                         resources and memory, making them impractical for real-time applications or systems with limited
                         resources. These challenges underscore the need for more efficient and scalable approaches to feature
                         selection.
                            To address this issue, a new challenge called QuantumCLEF 1 [4, 5, 6] has been established with the
                         objective of efficiently and effectively employ quantum annealers for problems like features selection in
                         Information Retrieval (IR) tasks and Recommender Systems (RS).


                          CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France
                         *
                           Corresponding author.
                          $ a_naeb@comp.iust.ac.ir (A. Naebzadeh); sauleh@iust.ac.ir (S. Eetemadi)
                           https://github.com/AylinNaebzadeh (A. Naebzadeh); http://www.sauleh.ir/ (S. Eetemadi)
                                      © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                         1
                             https://qclef.dei.unipd.it

CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
   Quantum Computing (QC) has garnered significant attention from researchers across various fields,
as technological advancements have made QC resources more accessible and applicable to practical
problems. In the current landscape, IR and RS require computationally intensive operations on massive
and heterogeneous datasets. Consequently, Quantum Computing, particularly Quantum Annealing (QA)
technologies, holds the potential to enhance these systems’ performance in terms of both efficiency and
effectiveness [4].
   This paper describes the NICA team’s contribution to the first shared task of QuantumCLEF 2024.
The objective of this task is formulating the well-known NP-Hard feature selection problem to solve
it with a quantum annealer and compare the results with a simulate annealing. Two datasets have
been defined for the first part of this task, MQ2007 [7] and ISTELLA Letor [8]. These datasets contain
pre-computed features and the objective is to select a subset of these features to train a learning model,
such as LambdaMART [9] or a content-based RS, and to achieve best performance according to metrics
such as nDCG@10. Under the given shared task, our system selects the most relevant features based on
their mutual information with the target value, and then submits those features to a D-Wave quantum
computer2 . Although D-Wave is not freely available, the organizers of this challenge made it accessible
to us for the purpose of this competition.
   This work is structured as follows: Section 2 briefly provides a description of several earlier studies.
Section 3 will then present an explanation of task description. Following that, Section 4 and 5 will
outline the experimental methodology and evaluation results respectively. Finally, in Section 6, we will
present the key findings and conclusions of our studies, as well as some potential directions for future
research.


2. Related Works
Quantum computing has been widely applied across various domains such as energy applications
[10] and finance [11] . In the field of computer science, quantum computing has been explored for
applications like Support Vector Machine (SVM) as machine learning algorithms [12, 13]. However,
there is a limited amount of research on the applications of QC and QA in the domains of IR and RS
[14, 15, 16, 17].
   In a study by Nembrini et al. [14], QA was applied to IR and RS tasks such as feature selection,
demonstrating the feasibility and promising improvements in efficiency and effectiveness. Additionally,
Mücke et al. [2] conducted a series of numerical experiments using classical computers, quantum gate
computers, and quantum annealers. Their results showed competitive performance when comparing
several standard methods on various benchmark datasets, highlighting the potential of quantum
technologies in enhancing IR and RS tasks.


3. Tasks Description
This is the first edition of QuantumCLEF. In this lab, there are two tasks involving computationally
intensive problems closely related to the Information Access field: Feature Selection and Clustering.
Each task presents a problem that can be solved using the QA paradigm. Participants are required to
submit their solutions using both QA and Simulated Annealing (SA) to compare the two methods in terms
of efficiency and effectiveness. Figure 1 provides an overview of the tasks and datasets in QuantumCLEF
2024. More detailed descriptions of the tasks are available on the organizers’ QuantumCLEF website3
and in this paper by Pasin et al. [4].


2
    https://www.dwavesys.com/solutions-and-products/cloud-platform
3
    https://qclef.dei.unipd.it/index.html/
Figure 1: An Overview of Tasks in QuantumCLEF 2024


Table 1
MQ2007 Dataset
                       class      1         2         3   4     ...         43             44         45       46
               0        0      0.000000     0.0    0.0    0.0   ...       0.055556    0.000000        0.00     0.0
               1        1      0.205882     0.0    0.0    0.0   ...       0.000000    0.333333        0.42     0.0
               2        0      0.000000     0.0    0.0    0.0   ...       0.092593    0.500000        0.68     0.0
               3        1      0.088235     0.0    0.0    0.0   ...       0.000000    0.666667        0.86     0.0
               4        0      0.558824     0.0    0.0    0.0   ...       0.000000    0.500000        0.46     0.0


Table 2
ISTELLA Dataset
               target             1               2        3          4       ...    217        218          219     220
           0       0        1.797693e+308       1008.0    2.0    288.0        ...    1.0    0.000000         0.0     0.0
           1       0        1.797693e+308        105.0    2.0     0.0         ...    1.0    0.333333         0.0     0.0
           2       0        1.797693e+308       1058.0    1.0   5248.0        ...    1.0    0.500000         0.0     2.0
           3       0        1.797693e+308        121.0    0.0     0.0         ...    1.0    0.666667         0.0     0.0
           4       0        1.797693e+308        127.0    0.0     0.0         ...    1.0    0.500000         0.0     0.0


4. Experimental Setup
4.1. Task 1.A. Dataset
4.1.1. MQ2007
The MQ2007 dataset comprises 46 independent features and a binary dependent feature labeled “class”.
Table 1 provides example values within this dataset.
Table 3
QUBO Representation for Feature Selection
                     Optimization    Linear Terms     Quadratic Terms                        Formula
                                                                        ∑︀     {︁             ∑︀                         }︁
 Feature Selection     Maximize        𝐼(𝑋𝑖 ; 𝑌 )      𝐼(𝑋𝑗 ; 𝑌 ‖𝑋𝑖 )      𝑖∈𝑆   𝐼(𝑋𝑖 ; 𝑌 ) +    𝑗∈𝑆,𝑗̸=𝑖 𝐼(𝑋 𝑗 ; 𝑌 ‖𝑋𝑖 )
                                                                                  ∑︀𝑁           ∑︀𝑁
 QUBO                  Minimize          𝑞𝑖 𝑥𝑖           𝑞𝑖,𝑗 𝑥𝑖 𝑥𝑗                 𝑖   𝑞 𝑥
                                                                                         𝑖 𝑖 +         𝑞   𝑥
                                                                                                  𝑖<𝑗 𝑖,𝑗 𝑖 𝑗𝑥


4.1.2. ISTELLA
The ISTELLA dataset consists of 220 independent features and a binary dependent feature labeled
“target”. Table 2 presents example values within this dataset. This dataset poses an additional challenge
due to the large number of features, which cannot be directly accommodated in the Quantum Processing
Unit (QPU). To mitigate this issue, we applied a filtering process to select only the most relevant features.
Specifically, We extracted 50 features based on their mutual information with the "target" feature,
ensuring a threshold of 0.031, and we used only the selected features for subsequent analysis. The
threshold of 0.031 was carefully chosen after testing various thresholds to find a balance between
retaining a sufficient number of informative features and managing the limitations of our QPU. By
setting the threshold to 0.031, we ensured that we retained enough features to maintain the integrity and
predictive power of our model. At the same time, we avoided the pitfall of including too many features,
which would exceed the capacity of the QPU and hinder our ability to perform efficient quantum
computations.

4.2. Workspace
In this challenge every team participating was provided with an identical workspace, ensuring equal
access to resources. Recognizing that hardware resource limitations are a common constraint in research,
organizers established a custom infrastructure to mitigate this issue. This infrastructure was necessary
because participants could not have direct access to quantum annealers, and organizers aimed to ensure
that measurements were fair and reproducible.
   Each team’s workspace, accessible via a browser with the correct credentials, included a preconfigured
Git repository essential for maintaining reproducibility. A centralized dispatcher managed and tracked
all team submissions, using a secret API key for submitting problems to the quantum annealer, ensuring
participants remained unaware of this key. Additionally, a web application served as the primary
information hub, allowing teams to view their quotas and various statistics through a dashboard.
Organizers had their own dashboard to manage teams and tasks effectively [18]. Figure 2 provides a
high-level representation of the infrastructure.


5. Methodology
5.1. Quantum Annealing
QA is a quantum computing paradigm that relies on specialized devices known as quantum annealers,
designed to address optimization problems. In QA, a problem is encoded as the energy landscape of a
physical system, and quantum-mechanical principles are employed to guide the system towards a state of
minimal energy, which corresponds to the solution of the original problem. To utilize quantum annealers
effectively, problems must be formulated as minimization tasks using the Quadratic Unconstrained
Binary Optimization (QUBO) formulation, which is defined as follows:

                                                 min 𝑦 = 𝑥𝑇 𝑄𝑥                                                (1)

where 𝑥 is a vector of binary decision variables and 𝑄 is a matrix of constant values representing the
problem to solve [4].
Figure 2: High-level representation of the infrastructure [18].


5.2. QUBO Representation for Feature Selection
D-Wave systems solve Binary Quadratic Models (BQM). Given 𝑁 variables 𝑥1 , ..., 𝑥𝑁 , where each
variable 𝑥𝑖 can have binary values 0 or 1, the system finds assignments of values that minimize,
                                                 𝑁
                                                ∑︁               𝑁
                                                                ∑︁
                                                     𝑞𝑖 𝑥 𝑖 +         𝑞𝑖,𝑗 𝑥𝑖 𝑥𝑗                     (2)
                                                 𝑖              𝑖<𝑗

where 𝑞𝑖 and 𝑞𝑖,𝑗 are configurable (linear and quadratic) coefficients. To formulate a problem for the
D-Wave system is to program 𝑞𝑖 and 𝑞𝑖,𝑗 so that assignments of 𝑥1 , ..., 𝑥𝑁 also represent solutions to
the problem. For feature selection, the Mutual Information QUBO (MIQUBO) method formulates a
QUBO based on the approximation above for 𝐼(𝑋𝑘 ; 𝑌 ), which can be submitted to the D-Wave quantum
computer for solution. The reduction of scope to permutations of three variables in this approximate
formulation for MI-based optimal feature selection makes it a natural fit for reformulation as a QUBO.
Table 3 shows a QUBO representation for feature selection task.

5.3. Computing Mutual Information
We establish and employ several functions to calculate the values of Mutual Information and Conditional
Mutual Information, which are integral to defining our problem, and have been given by the organizers
in the workspace. Specifically, we outline some helpful definitions below.

5.3.1. Quantifying Information: Shannon Entropy
Shannon entropy4 [19], denoted as 𝐻(𝑋), is a mathematical measure that quantifies the information
content of a signal:
                                                           ∑︁
                                            𝐻(𝑋) = −             𝑝(𝑥) log 𝑝(𝑥)                       (3)
                                                          𝑥∈𝑋

  Here, 𝑝(𝑥) represents the probability of an event’s occurrence. The Shannon Entropy formula can
be interpreted as assigning a value of log 𝑝(𝑥)
                                            1
                                                to each event based on its probability, weighted by that
4
    https://en.wiktionary.org/wiki/Shannon_entropy
probability. The reciprocal in the logarithm ensures that less likely events are attributed with more
information.

5.3.2. Conditional Shannon Entropy
Conditional Shannon Entropy (CSE) quantifies the information content of one signal, 𝑋, given the
value of another signal, 𝑌 :
                                                   ∑︁
               𝐻(𝑋|𝑌 ) = 𝐻(𝑋, 𝑌 ) − 𝐻(𝑌 ) = −          𝑝(𝑥, 𝑦) log 𝑝(𝑥, 𝑦) − 𝐻(𝑌 )            (4)
                                                                  𝑥∈𝑋

   Here, the joint Shannon Entropy, 𝐻(𝑋, 𝑌 ), represents the combined information content of both
signals, where 𝑝(𝑥, 𝑦) denotes their joint probability. For instance, knowing that it is winter reduces
the informational value of news regarding rainfall.

5.3.3. Mutual Information
Mutual information 5 [20] between variables 𝑋 and 𝑌 is defined as:
                                                       ∑︁ ∑︁                  𝑝(𝑥, 𝑦)
                                      𝐼(𝑋; 𝑌 ) =               𝑝(𝑥, 𝑦) log                          (5)
                                                                             𝑝(𝑥)𝑝(𝑦)
                                                   𝑦∈𝑌 𝑥∈𝑋

  Here, 𝑝(𝑥) and 𝑝(𝑦) represent the marginal probabilities of 𝑋 and 𝑌 , respectively, while 𝑝(𝑥, 𝑦)
denotes the joint probability.
  Alternatively, mutual information can be expressed as:

                                            𝐼(𝑋; 𝑌 ) = 𝐻(𝑌 ) − 𝐻(𝑌 |𝑋)                              (6)
   In this equation, 𝐻(𝑌 ) represents the Shannon Entropy of 𝑌 , and 𝐻(𝑌 |𝑋) denotes the Conditional
Shannon Entropy of 𝑌 given 𝑋.
   Mutual information quantifies the amount of information about one random variable that can be
inferred from observations of another. Intuitively, when the mutual information between two variables
is high, a model based on either variable alone can effectively capture their combined contribution.
High mutual information implies a strong dependence or correlation between the variables, indicating
that knowledge of one variable provides significant information about the other. Conversely, a low or
zero mutual information value suggests little to no association between the variables.

5.3.4. Conditional Mutual Information
Conditional Mutual Information (CMI) between a variable of interest, 𝑋, and a feature, 𝑌 , given the
selection of another feature, 𝑍, is calculated as:

                                        𝐼(𝑋; 𝑌 |𝑍) = 𝐻(𝑋|𝑍) − 𝐻(𝑋|𝑌, 𝑍)                             (7)
   Here, 𝐻(𝑋|𝑍) represents the Conditional Shannon Entropy (CSE) of 𝑋 given 𝑍, while 𝐻(𝑋|𝑌, 𝑍)
denotes the CSE of 𝑋 conditional on both 𝑌 and 𝑍.
   A high conditional mutual information value indicates strong dependence or correlation between
variables 𝑋 and 𝑌 given the value of variable 𝑍. It suggests that knowledge of variable 𝑍 provides
significant information about the relationship between variables 𝑋 and 𝑌 . Conversely, a low or zero
conditional mutual information value suggests little to no association between variables 𝑋 and 𝑌 given
the value of variable 𝑍.


5
    https://en.wikipedia.org/wiki/Mutual_information
Table 4
Results for Task 1. A. Information Retrieval
                                                                     Annealing
      Dataset                 Submission ID             nDCG@10                  Type   # Features
                                                                     Time (us)
     MQ2007      6f7d7d44-c559-4e36-9b10-b7e51e521036     0.4506      274119      QA        17
     MQ2007                       169                     0.4498      3509658     SA        15
     ISTELLA     c5888bf1-4549-418c-92b8-b7175c9185e4      0.596      427442      QA        15
     ISTELLA                      380                     0.6211      3998441     SA        15


6. Results
The overview of the final results of this study’s submissions can be found in the Table 4. The Annealing
time measures the execution time of the approach. In the case of QA this consists in the programming
time, sampling time and post-processing time. The results of our study indicate that QA outperforms SA
in terms of both effectiveness and efficiency. QA demonstrated superior capability in finding optimal
solutions more consistently and quickly, showcasing its potential for solving complex optimization
problems more effectively. The efficiency gains were evident through faster convergence times and
reduced computational resources required compared to SA. This highlights QA as a more powerful and
efficient approach for tackling intricate problems within the context of our experiments.


7. Conclusion and Future Work
Feature selection plays a pivotal role in machine learning, significantly impacting the performance of
models by reducing overfitting, reducing the effect of curse of dimensionality, and reducing computa-
tional complexity. The task, however, is NP-Hard, necessitating efficient and innovative solutions. The
emerging field of quantum computing offers promising avenues for tackling this challenge. In our study,
we leveraged Shannon entropy to construct our Binary Quadratic Model (BQM) object, a novel approach
in the realm of feature selection. We then employed two distinct methods to solve our problem: SA and
QA, provided by D-Wave. Our results underscore the potential of QA in providing superior solutions
for complex problems like feature selection. As we look to the future, we anticipate further exploration
and refinement of quantum methods for feature selection, contributing to the advancement of machine
learning and, by extension, numerous fields that depend on it.


Acknowledgments
We would like to express our deep appreciation to the organizers of QuantumCLEF 2024 for providing
us with the opportunity to participate in this esteemed event, and sharing informative tutorials, docu-
mentations, and codes with us. It was all a great and valuable experience to work with the provided
cloud based infrastructure and quantum computers, and gain new knowledge. We are truly grateful for
the chance to present our work and contribute to the progress of our field. We would like to once again
acknowledge and appreciate the support and encouragement received from all individuals involved, as
their contributions have been instrumental in our accomplishments.


References
 [1] L. Van Der Maaten, E. Postma, J. Van den Herik, et al., Dimensionality reduction: a comparative, J
     Mach Learn Res 10 (2009).
 [2] S. Mücke, R. Heese, S. Müller, M. Wolter, N. Piatkowski, Feature selection on quantum computers,
     Quantum Machine Intelligence 5 (2023) 11.
 [3] G. Chandrashekar, F. Sahin, A survey on feature selection methods, Computers & electrical
     engineering 40 (2014) 16–28.
 [4] A. Pasin, M. Ferrari Dacrema, P. Cremonesi, N. Ferro, qclef: A proposal to evaluate quantum
     annealing for information retrieval and recommender systems, in: International Conference of
     the Cross-Language Evaluation Forum for European Languages, Springer, 2023, pp. 97–108.
 [5] A. Pasin, M. Ferrari Dacrema, P. Cremonesi, N. Ferro, QuantumCLEF 2024: Overview of the
     Quantum Computing Challenge for Information Retrieval and Recommender Systems at CLEF,
     in: Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), Grenoble,
     France, September 9th to 12th, 2024, 2024.
 [6] A. Pasin, M. Ferrari Dacrema, P. Cremonesi, N. Ferro, Overview of QuantumCLEF 2024: The
     Quantum Computing Challenge for Information Retrieval and Recommender Systems at CLEF, in:
     Experimental IR Meets Multilinguality, Multimodality, and Interaction - 15th International Confer-
     ence of the CLEF Association, CLEF 2024, Grenoble, France, September 9-12, 2024, Proceedings,
     2024.
 [7] T. Qin, T.-Y. Liu, Introducing letor 4.0 datasets, arXiv preprint arXiv:1306.2597 (2013).
 [8] D. Dato, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, R. Venturini, Fast ranking
     with additive ensembles of oblivious and non-oblivious regression trees, ACM Transactions on
     Information Systems (TOIS) 35 (2016) 1–31.
 [9] C. J. Burges, From ranknet to lambdarank to lambdamart: An overview, Learning 11 (2010) 81.
[10] H. P. Paudel, M. Syamlal, S. E. Crawford, Y.-L. Lee, R. A. Shugayev, P. Lu, P. R. Ohodnicki, D. Mollot,
     Y. Duan, Quantum computing and simulations for energy applications: Review and perspective,
     ACS Engineering Au 2 (2022) 151–196.
[11] R. Orús, S. Mugel, E. Lizaso, Quantum computing for finance: Overview and prospects, Reviews
     in Physics 4 (2019) 100028.
[12] D. Willsch, M. Willsch, H. De Raedt, K. Michielsen, Support vector machines on the d-wave
     quantum annealer, Computer physics communications 248 (2020) 107006.
[13] A. Delilbasic, B. Le Saux, M. Riedel, K. Michielsen, G. Cavallaro, A single-step multiclass svm
     based on quantum annealing for remote sensing data classification, IEEE journal of selected topics
     in applied earth observations and remote sensing (2023).
[14] R. Nembrini, M. Ferrari Dacrema, P. Cremonesi, Feature selection for recommender systems with
     quantum computing, Entropy 23 (2021) 970.
[15] M. F. Dacrema, A. Pasin, P. Cremonesi, N. Ferro, Using and evaluating quantum computing for
     information retrieval and recommender systems (2024).
[16] R. Pellini, M. F. Dacrema, P. Cremonesi, Towards improved qubo formulations of ir tasks for
     quantum annealers (2023).
[17] M. Melucci, et al., Introduction to information retrieval and quantum mechanics, Springer, 2015.
[18] A. Pasin, M. F. Dacrema, P. Cremonesi, N. Ferro, Quantumclef-quantum computing at clef, in:
     European Conference on Information Retrieval, Springer, 2024, pp. 482–489.
[19] J. Lin, Divergence measures based on the shannon entropy, IEEE Transactions on Information
     theory 37 (1991) 145–151.
[20] T. E. Duncan, On the calculation of mutual information, SIAM Journal on Applied Mathematics
     19 (1970) 215–220.
          (a) Matrix for MQ2007 with All Features.      (b) Matrix for MQ2007 to make our model choose 15 Fea-
                                                        tures.


     (c) Matrix for ISTELLA with All Selected 50 Features. (d) Matrix for ISTELLA to make our model choose 15
                                                           Features.

Figure 3: QUBO Matrices


    (a) Energy Plot for SA and QPU Samples Over MQ2007 (b) Energy Plot for SA and QPU Samples Over ISTELLA

Figure 4: Overview of Comparison between Computed Simulated Annealing (SA) and Quantum Annealing (QA)
Samples Using Histogram Energy Reporting Levels