Team QTB on Feature Selection Via Quantum Annealing and Hybrid Models Notebook for the QuantumCLEF Lab at CLEF 2024 Esteban Payares1 , Edwin Puertas1 and Juan Carlos Martínez-Santos1 1 Universidad Tecnológica de Bolívar, School of Engineering Abstract Quantum technologies are a reality today, and the future is bright regarding its capabilities for real-world ap- plications. Feature selection is a crucial preprocessing step in Information Retrieval. By identifying the most informative subset of features, feature selection can improve the efficiency of learning to rank models. In this paper, we propose a novel approach to feature selection for Information Retrieval using quantum annealing. This promising optimization technique leverages the principles of quantum mechanics. We focus on the MQ2007 dataset, a widely used benchmark for learning to rank tasks. We also explore different formulations of the feature selection problem as quadratic unconstrained binary optimization (QUBO) problems, including mutual informa- tion, conditional mutual information, and correlation coefficients. Our quantum annealing-based approaches demonstrate their effectiveness in selecting informative features, outperforming simulated annealing, which achieves an nDCG@10 score of 0.4024. The best quantum annealing-based approach achieves a score of 0.443 using a hybrid solver with only ten features. We discuss the importance of the number of selected features in the performance of learning to rank models and the role of hybrid quantum-classical solvers in incorporating additional constraints and preferences into the feature selection process. Our work demonstrates the potential of using quantum annealing to tackle complex optimization problems. It paves the way for further exploration in this domain. Keywords Feature Selection, Information Retrieval, Learning to Rank, Quantum Annealing, MQ2007 1. Introduction Information Retrieval (IR) systems are crucial in efficiently and effectively retrieving relevant information from large-scale data. Among the various components of IR systems, learning-to-rank (LTR) models, such as LambdaMART[1], have shown remarkable performance in improving the quality of search results [2]. However, the performance of these models heavily relies on selecting informative features from the high-dimensional feature space [3]. Feature selection is an essential preprocessing step in IR to identify the most relevant subset of features for training LTR models [4]. By reducing the dimensionality of the input data, feature selection not only improves the efficiency of the training process but also enhances the generalization ability of the models [5]. Traditional feature selection methods, such as recursive feature elimination (RFE) with logistic regression [6], have been widely used in IR. However, these methods often need help with the complex and non-linear relationships among features and the target variable [7]. Recent advancements in quantum computing have opened up new possibilities for solving optimiza- tion problems, including feature selection. Quantum annealing, in particular, has shown promise in efficiently exploring vast search spaces and finding optimal solutions [8]. By leveraging the principles of quantum mechanics, such as superposition and entanglement, quantum annealing can outperform classical optimization algorithms [9]. Wang et al. [10] have demonstrated the effectiveness of integrating machine learning algorithms with quantum annealing solvers for online fraud detection, showcasing the potential of quantum annealing in real-world applications. In this work, we propose a novel approach to feature selection for IR using quantum annealing. Specifically, we focus on the MQ2007 dataset [11], a widely used benchmark for LTR tasks, and aim to CLEF 2024: Conference and Labs of the Evaluation Forum, September 9–12, 2024, Grenoble, France $ epayares@utb.edu.co (E. Payares); epuertas@utb.edu.co (E. Puertas); jcmartinezs@utb.edu.co (J. C. Martínez-Santos) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings select the most informative subset of features to improve the performance of a LambdaMART model. We explore different formulations of the feature selection problem as quantum annealing problems, including mutual information (MI), conditional mutual information (CMI), and correlation coefficients. The main contributions of our work are as follows: We present a quantum annealing-based approach to feature selection for IR, demonstrating its effectiveness on the MQ2007 dataset. We compare different problem formulations (MI, CMI, and correlation coefficients) and analyze their impact on the feature selection process. We evaluate the performance of the LambdaMART model trained with the selected features using the normalized discounted cumulative gain (nDCG) metric[12]. We provide insights into the quantum annealing process through energy histograms and discuss the feasibility of the solutions found by the hybrid solver. 2. Related Work Feature selection is a fundamental process in machine learning that involves the identification of a subset of pertinent features from a larger set to improve model performance. Several authors have developed various approaches to address this problem, each with unique contributions and benefits. Evolutionary Quantum Feature Selection (EQFS) employs principles of quantum computing to enhance the efficiency of feature selection. EQFS employs the Quantum Circuit Evolution (QCE) algorithm, which uses shallow-depth circuits to generate sparse probability distributions. It enables the identification of optimal feature combinations with quadratic scaling in the number of features, thus overcoming the curse of dimensionality. This technique is particularly effective for high-dimensional datasets [13, 14]. We can classify the various traditional feature selection methods into three main categories: filter, wrapper, and embedded methods. Filter methods, such as correlation and mutual information, use general dependency measures between features and the output. Wrapper methods evaluate subsets of variables based on the quality of the final classification, offering superior performance but at a higher computational cost. Embedded methods integrate feature selection into the training process of the classifier, with examples including decision trees and methods with L1 regularization. These traditional approaches are fundamental in various Machine Learning applications, balancing computational effi- ciency and accuracy [13]. Moreover, methods based on mutual information have been highly influential in multiple applications [15]. Another approach to quantum computing involves integrating machine learning algorithms with quantum annealing solvers. This approach is particularly productive in online fraud detection, where data processing in real-time or near-real-time is of paramount importance. The quantum-enhanced SVM has demonstrated notable enhancements in speed and accuracy for time-series datasets with highly imbalanced data. However, the accuracy improvements for non-time series data were relatively marginal. This approach demonstrates the potential of quantum machine learning (QML) applications in time-series data. It illustrates that the optimal strategy depends on the dataset type and the balance between speed, accuracy, and cost [16]. Moreover, authors have demonstrated that conditional mutual information is a practical approach to feature selection in incomplete data [17]. Genetic algorithms represent a type of randomized feature selection method that attempts to emulate the processes of natural evolution. These algorithms generate feature subsets randomly and iteratively improve them through operations such as crossover and mutation. Genetic algorithms can effectively locate optimal feature subsets, thereby avoiding the pitfall of local optima. This robustness renders them well-suited for complex feature selection tasks. The capacity to evade local optima confers a substantial advantage over other sequential search methodologies. Furthermore, incorporating randomness into these processes can enhance the efficiency of identifying optimal feature sets [18]. Moreover, integrating mutual information into genetic algorithms has been employed to improve feature selection [19]. The integration of classical machine learning techniques with quantum computing offers the poten- tial for significant advantages. For example, a hybrid approach that employs quantum evolutionary algorithms to optimize feature combinations. In contrast, classical algorithms that evaluate these combinations can effectively leverage the strengths of both paradigms. This approach aims to enhance the precision and efficacy of feature selection in a range of machine-learning applications. The capacity of quantum algorithms to address the computational complexity of large datasets is a valuable com- plement to the resilience of classical methods in model evaluation. This type of integration promises significant advancements in the practical application of quantum feature selection [13]. Combining mutual information with other criteria has been demonstrated to enhance feature selection in various domains [20]. In supervised feature selection, a recent approach combines sparse representation and mutual information to enhance the feature selection process. The method begins with sparse representation, which evaluates the importance of each feature in the overall data structure. Subsequently, mutual information is employed to reduce redundancy among the selected features. The experimental results demonstrate that this approach significantly enhances classification outcomes and can achieve superior feature selection results at lower dimensions. This method illustrates the significance of considering the global importance of features and redundancy reduction to achieve optimal results [21]. Moreover, using conditional mutual information and other innovative methodologies has yielded encouraging outcomes in feature selection [22]. 3. Methodology In this section, we describe our methodology for feature selection using quantum annealing. We first formulate the feature selection problem as a quantum annealing problem and then present the approaches we explored, including mutual information, conditional mutual information, and correlation coefficients. We also discuss the use of hybrid quantum-classical solvers and simulated annealing for comparison. 3.1. Dataset Our experiments use the MQ2007 dataset, a widely used benchmark dataset for learning to rank tasks in Information Retrieval. The MQ2007 dataset consists of 1,692 queries and 69,623 documents, with each query-document pair represented by a 46-dimensional feature vector. The QuantumCLEF lab organizers provide a training set, which we use for feature selection. The organizers are the ones responsible for training and evaluating the learning to rank model using the selected features [23, 24]. 3.2. Problem Formulation Let 𝒟 = (x𝑖, 𝑦𝑖 )𝑖 = 1𝑁 be a dataset consisting of 𝑁 instances, where x𝑖 ∈ R𝑑 is a 𝑑-dimensional feature vector and 𝑦𝑖 ∈ R is the corresponding target value. The goal of feature selection is to identify a subset of features 𝒮 ⊆ 1, 2, . . . , 𝑑 that maximizes the relevance to the target variable while minimizing the redundancy among the selected features. We formulate the feature selection problem as a quadratic unconstrained binary optimization (QUBO) problem, which can be solved using quantum annealing. The QUBO formulation is given by: min q⊤ Qq, (1) q∈0,1𝑑 where q is a binary vector representing the selection of features, and Q is a 𝑑 × 𝑑 matrix capturing the interactions between features. The diagonal elements of Q represent the relevance of each feature to the target variable. In contrast, the off-diagonal elements represent the redundancy between pairs of features. 3.3. Mutual Information Mutual information (MI) measures the dependence between two random variables. In the context of feature selection, we use MI to quantify the relevance of each feature to the target variable. We define the MI between a feature 𝑋𝑖 and the target variable 𝑌 as: ∑︁ ∑︁ 𝑝(𝑥, 𝑦) MI(𝑋𝑖 ; 𝑌 ) = 𝑝(𝑥, 𝑦) log , (2) 𝑝(𝑥)𝑝(𝑦) 𝑥∈𝑋𝑖 𝑦∈𝑌 where 𝑝(𝑥, 𝑦) is the joint probability distribution of 𝑋𝑖 and 𝑌 , and 𝑝(𝑥) and 𝑝(𝑦) are the marginal probability distributions of 𝑋𝑖 and 𝑌 , respectively. To formulate the feature selection problem using MI, we construct the Q matrix as follows: {︃ −MI(𝑋𝑖 ; 𝑌 ) if 𝑖 = 𝑗, Q𝑖𝑗 = (3) 0 otherwise. The negative sign in the diagonal elements ensures that the optimization problem seeks to maximize the MI between the selected features and the target variable. 3.4. Conditional Mutual Information Conditional mutual information (CMI) extends the concept of MI by considering the redundancy between pairs of features given the target variable. The CMI between two features 𝑋𝑖 and 𝑋𝑗 given the target variable 𝑌 is defined as: ∑︁ ∑︁ ∑︁ 𝑝(𝑥𝑖 , 𝑥𝑗 |𝑦) CMI(𝑋𝑖 ; 𝑋𝑗 |𝑌 ) = 𝑝(𝑥𝑖 , 𝑥𝑗 , 𝑦) log , (4) 𝑝(𝑥𝑖 |𝑦)𝑝(𝑥𝑗 |𝑦) 𝑥𝑖 ∈𝑋𝑖 𝑥𝑗 ∈𝑋𝑗 𝑦∈𝑌 where 𝑝(𝑥𝑖 , 𝑥𝑗 , 𝑦) is the joint probability distribution of 𝑋𝑖 , 𝑋𝑗 , and 𝑌 , and 𝑝(𝑥𝑖 |𝑦) and 𝑝(𝑥𝑗 |𝑦) are the conditional probability distributions of 𝑋𝑖 and 𝑋𝑗 given 𝑌 , respectively. To incorporate CMI into the feature selection problem, we update the Q matrix as follows: {︃ −MI(𝑋𝑖 ; 𝑌 ) if 𝑖 = 𝑗, Q𝑖𝑗 = (5) −CMI(𝑋𝑖 ; 𝑋𝑗 |𝑌 ) if 𝑖 ̸= 𝑗. The negative sign in the off-diagonal elements ensures that the optimization problem seeks to minimize the redundancy between the selected features given the target variable. 3.5. Correlation Coefficients Correlation coefficients measure the linear relationship between two variables. In the context of feature selection, we use the Pearson correlation coefficient to quantify the relevance of each feature to the target variable and the redundancy between pairs of features. The Pearson correlation coefficient between two variables 𝑋 and 𝑌 is defined as: cov(𝑋, 𝑌 ) 𝜌(𝑋, 𝑌 ) = , (6) 𝜎𝑋 𝜎𝑌 where cov(𝑋, 𝑌 ) is the covariance between 𝑋 and 𝑌 , and 𝜎𝑋 and 𝜎𝑌 are the standard deviations of 𝑋 and 𝑌 , respectively. To formulate the feature selection problem using correlation coefficients, we construct the Q matrix as follows: {︃ −|𝜌(𝑋𝑖 , 𝑌 )| if 𝑖 = 𝑗, Q𝑖𝑗 = (7) −|𝜌(𝑋𝑖 , 𝑋𝑗 )| if 𝑖 ̸= 𝑗. The absolute values of the correlation coefficients capture both positive and negative relationships. The negative sign ensures that the optimization problem maximizes relevance and reduces redundancy. 3.6. Quantum Annealing Quantum annealing is an optimization technique that leverages the principles of quantum mechanics to solve complex optimization problems. It is particularly well-suited for solving QUBO problems, as it efficiently explores the vast solution space and finds the global minimum. To solve the feature selection problem using quantum annealing, we follow these steps: Algorithm 1 Quantum Annealing for Feature Selection 1: Construct the Q matrix based on the chosen approach (MI, CMI, or correlation coefficients). 2: Convert the Q matrix to a QUBO problem. 3: Submit the QUBO problem to the quantum annealer. 4: Retrieve the solution from the quantum annealer. 5: Interpret the solution as the selected subset of features. Once the quantum annealer has completed its task, it returns a binary vector q. This vector is more than just a series of 1s, and 0s-it represents the selected subset of features. The indices of the non-zero elements in q are the features that have been chosen, providing a clear output of the quantum annealing process. 3.7. Hybrid Quantum-Classical Solvers In addition to using pure quantum annealing, we explore using hybrid quantum-classical solvers for feature selection. Hybrid solvers combine the strengths of quantum and classical computation by leveraging quantum annealing to explore the solution space and classical optimization techniques for local refinement. We specifically employ the LeapHybridCQMSampler, a hybrid solver provided by D-Wave Systems. This solver combines quantum annealing with a classical-constrained quadratic model (CQM) solver. The hybrid solver allows us to incorporate additional constraints into the feature selection problem, such as the desired number of selected features. Our approach involves formulating the feature selection problem as a CQM and then submitting it to the LeapHybridCQMSampler. This hybrid solver, upon receiving our problem, diligently explores the solution space and returns a set of feasible solutions. From this set, we select the best solution based on the objective function value, ensuring the highest quality of our results. 3.8. Simulated Annealing For comparison purposes, we also implement feature selection using simulated annealing, a classical optimization technique inspired by the annealing process in metallurgy. Simulated annealing aims to find the global minimum of an objective function by gradually decreasing the system’s temperature and allowing occasional uphill moves to escape local minima. We use the SimulatedAnnealingSampler provided by the D-Wave Ocean SDK to solve the feature selection problem using simulated annealing. The simulated annealing algorithm follows a similar procedure to quantum annealing. Still, instead of leveraging quantum effects, it relies on a probabilistic acceptance criterion to explore the solution space. By comparing the results obtained using quantum annealing, hybrid quantum-classical solvers, and simulated annealing, we can assess the relative effectiveness of these approaches for feature selection in Information Retrieval. 3.9. Evaluation The organizers of the QuantumCLEF lab, in a collaborative effort, have evaluated the effectiveness of our quantum annealing-based feature selection approaches. We have submitted the selected features from the training set to the organizers, who have then used them to train a LambdaMART model on their internal training set. LambdaMART, a state-of-the-art learning-to-rank algorithm, combines the MART (Multiple Additive Regression Trees) algorithm with a lambda gradient optimization framework. The organizers evaluate the performance of the LambdaMART model using their internal validation and test sets, which are inaccessible to us. They measure the performance using the normalized discounted cumulative gain (nDCG) metric, commonly used in Information Retrieval, to evaluate the quality of ranked results. We defined the nDCG@k metric as: DCG@k , nDCG@k = (8) IDCG@k where DCG@k is the discounted cumulative gain at position 𝑘, and IDCG@k is the ideal discounted cumulative gain at position 𝑘, which represents the maximum possible DCG@k value. The evaluation results provided by the organizers, as explored in section 4, will offer valuable insights into the effectiveness of our quantum annealing-based feature selection methods. These insights have the potential to significantly improve the performance of learning to rank models in Information Retrieval tasks, underscoring the relevance and potential of our research. 4. Results This section presents and discusses the experimental results of the different feature selection meth- ods for the MQ2007 dataset. Figure 1 shows a pairplot comparing the nDCG@10 scores of the four implemented methods: Quantum Annealing Mutual Information (Quantum-MI), Hybrid Annealing Mutual Information (Hybrid-MI), Hybrid Annealing Correlation (Hybrid-Correlation), and Simulated Annealing Mutual Information (SA-MI). The pairplot visually represents the score distributions and relationships between the methods. 1.0 Quantum-MI 0.5 0.0 1.0 Hybrid-Correlation Hybrid-MI 0.5 0.0 1.0 0.5 0.0 1.0 SA-MI 0.5 0.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 Quantum-MI Hybrid-MI Hybrid-Correlation SA-MI Figure 1: Pairplot of nDCG@10 scores for the different feature selection methods. The pairplot highlights the distributions and interrelationships between Quantum-MI, Hybrid-MI, Hybrid-Correlation, and SA-MI. Figure 2 presents a boxplot comparing the nDCG@10 scores of the four methods. The boxplot shows each method’s median, quartiles, and outliers of the score distributions. The boxplot indicates that Hybrid-Correlation and SA-MI have slightly higher median scores than Quantum-MI and Hybrid-MI. However, there is considerable overlap in the score distributions. To summarize the performance, we calculated the average nDCG@10 scores for each method based on the provided data files. The average scores are as follows: 1.0 0.8 NDCG@10 Score 0.6 0.4 0.2 0.0 I n I I -M d-M -M tio SA tum ela bri orr Hy an Qu d-C bri Hy Implementation Figure 2: Boxplot of nDCG@10 scores for the different feature selection methods, showing each method’s median, quartiles, and outliers. • Quantum-MI: 0.4299 • Hybrid-MI: 0.4195 • Hybrid-Correlation: 0.4430 • SA-MI: 0.4417 The Hybrid Correlation method achieved the highest average nDCG@10 score (0.4430), followed closely by the SA-MI method (0.4417). The quantum-MI (0.4299) and hybrid-MI (0.4195) methods had slightly lower average scores. These results suggest that the hybrid correlation method, which combines quantum techniques with classical correlation-based feature selection, was the most effective in selecting informative features for the MQ2007 dataset. The official results shared by the organizers provide additional insights into the performance of the methods: QTB1A_MQ2007_QA_qtb_NT1 0.4299 356041 Q 13 QTB1A_MQ2007_QA_qtb_NT2 0.4195 500000 H 10 QTB1A_MQ2007_QA_qtb_NT3 0.4434 309227 H 10 QTB1A_MQ2007_SA_qtb_NT1 0.4024 317412 S 10 The official results confirm the average scores obtained from our calculations and provide addi- tional details. The Hybrid-Correlation method (QTB1A_MQ2007_QA_qtb_NT3) achieved the highest score of 0.4434, followed by the Quantum-MI method (QTB1A_MQ2007_QA_qtb_NT1) with a score of 0.4299. The SA-MI method (QTB1A_MQ2007_SA_qtb_NT1) scored 0.4024, slightly lower than our calculated average (0.4417). The Hybrid-MI method (QTB1A_MQ2007_QA_qtb_NT2) had the lowest score of 0.4195 among the four methods. It is worth noting that the Quantum-MI method (QTB1A_MQ2007_QA_qtb_NT1) used 13 features, while the other techniques used ten features. Despite using more features, the Quantum-MI method did not outperform the Hybrid-Correlation method, which used only ten features. We conducted a one-way ANOVA test on the nDCG@10 scores to validate the statistical significance. The test yielded a p-value of 0.7659, indicating that there are no significant differences between the methods at a significance level of 0.05. This suggests that despite the differences in average scores, the methods’ overall performance is statistically comparable. Despite the lack of significant differences overall, we performed post-hoc pairwise comparisons using Tukey’s HSD test to explore the differences further. We summarized Tukey’s HSD test results in Table 1. The pairwise comparisons did not reveal any statistically significant differences between the methods, as indicated by the high p-values (> 0.05) and the confidence intervals that include zero. Table 1 Tukey’s HSD test results for pairwise comparisons of nDCG@10 scores. Comparison Mean Difference p-value Lower CI Upper CI Hybrid-Correlation vs Quantum-MI 0.0131 0.9000 -0.1420 0.1682 Hybrid-Correlation vs Hybrid-MI 0.0235 0.8326 -0.1316 0.1786 Hybrid-Correlation vs SA-MI 0.0013 1.0000 -0.1538 0.1564 Quantum-MI vs Hybrid-MI 0.0104 0.9154 -0.1447 0.1655 Quantum-MI vs SA-MI -0.0118 0.9000 -0.1669 0.1433 Hybrid-MI vs SA-MI -0.0222 0.8326 -0.1773 0.1329 It is important to note that our feature selection methods used only ten features, except for the Quantum-MI method, which used 13 features. Despite using fewer features, our hybrid correlation and SA-MI methods achieved competitive performance compared to other teams with 15 or more features. Our methods effectively selected a compact set of informative features while maintaining good performance. The ability to achieve comparable results with fewer features is advantageous in terms of computational efficiency and model simplicity. 4.1. Analysis of Quantum-Related Methods Performance The official results and our analysis indicate that the quantum-related methods, particularly Hybrid- Correlation (QTB1A_MQ2007_QA_qtb_NT3) and Quantum-MI (QTB1A_MQ2007_QA_qtb_NT1), achieved competitive performance compared to the fully simulated SA-MI method (QTB1A_MQ2007_SA_qtb_NT1). The Hybrid-Correlation method achieved the highest score of 0.4434, outperforming the SA-MI method, which scored 0.4024. The Quantum-MI method also performed well, with a score of 0.4299, despite using more features (13) than the other methods. The quantum-based approach in Quantum-MI effectively captured relevant information from the additional features. The superior performance of the hybrid correlation method underscores the potential of combining quantum techniques with classical correlation-based feature selection. It ignites excitement about the future of feature selection tasks. By harnessing the strengths of both approaches, the hybrid correlation was able to identify informative features more effectively than the purely classical SA-MI method. The competitive performance of quantum-related methods, even without statistical significance in overall comparisons, hints at their immense potential in feature selection tasks. The theoretical foundation of quantum-based approaches offers a more nuanced understanding of data correlations and dependencies, a prospect that could significantly enhance their effectiveness in identifying informative features. However, it is essential to consider the variability in the nDCG@10 scores across different queries, as evident from the pairplot and boxplot. Some queries had scores of 0 for all methods, indicating that none of the selected features were relevant for those specific queries. This variability suggests that the performance of the feature selection methods may depend on the characteristics of individual queries. Future work could explore incorporating query-specific adaptations or using a more diverse set of features to capture the varying information needs of different queries. 4.2. Comparison with Other Teams and Baseline Our feature selection methods used only ten features, except for the Quantum-MI method, which used 13 features. Despite using a smaller feature set, our hybrid correlation and SA-MI methods achieved competitive performance compared to other teams with 15 or more features. It demonstrates the effectiveness of our processes in selecting informative features while maintaining a compact representation. The decision to use fewer features was not arbitrary but a strategic move that offered several key advantages. It significantly reduces the computational complexity of the learning algorithms, as there are fewer dimensions to process. This, in turn, helps to mitigate the risk of overfitting, as the models are less likely to capture noise or irrelevant patterns when trained on a smaller set of relevant features. Moreover, a compact feature set enhances the interpretability of the models, as it hones in on the most important aspects of the data, enlightening us to the true essence of the information. The fact that our methods achieved comparable performance to teams using more features suggests that our feature selection approaches were able to identify the most discriminative and informative features for the MQ2007 dataset. It highlights the potential of our methods to extract meaningful information from the data while maintaining efficiency and simplicity. 5. Conclusions and Future Work In this paper, we presented a novel approach to feature selection for Information Retrieval using quantum annealing. Our focus was on the MQ2007 dataset, where we explored various formulations of the feature selection problem as quantum annealing problems. Our approach demonstrated potential in selecting informative features, which contributed to the performance of the LambdaMART learning-to- rank model. When compared with classical feature selection techniques, our quantum annealing-based approaches showed competitive results. The evaluation conducted by the QuantumCLEF lab organizers indicated that the features selected by our approach contributed to improvements in ranking quality, although the differences were not statistically significant. Our work provides insights into the application of quantum annealing for addressing optimization problems in Information Retrieval. However, we acknowledge that there are limitations and challenges that need to be addressed in future work. Future research directions include: 1. Scaling quantum annealing-based feature selection to larger datasets and higher-dimensional feature spaces. 2. Integrating quantum annealing-based feature selection with other components of the Information Retrieval pipeline. 3. Adapting the approach to other Information Retrieval tasks, such as document clustering and recommendation systems. 4. Exploring the potential of newer generations of quantum hardware and algorithms for Information Retrieval applications. In conclusion, our work demonstrates the feasibility of using quantum annealing for feature selection in Information Retrieval. While the current results do not show significant improvements over classical methods, they suggest that the intersection of quantum computing and Information Retrieval may hold promise for future advancements in the field. Further research is needed to fully understand and leverage the potential of quantum annealing in improving the efficiency and effectiveness of information access. References [1] Q. Wu, C. J. Burges, K. M. Svore, J. Gao, Adapting boosting for information retrieval measures, Information Retrieval 13 (2010) 254–270. [2] T.-Y. Liu, Learning to Rank for Information Retrieval, Springer Berlin Heidelberg, 2011. doi:10. 1007/978-3-642-14267-3. [3] G. Roffo, S. Melzi, M. Cristani, Infinite feature selection, in: 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 4202–4210. doi:10.1109/ICCV.2015.478. [4] I. Guyon, A. Elisseeff, An introduction to variable and feature selection, J. Mach. Learn. Res. 3 (2003) 1157–1182. [5] J. Tang, S. Alelyani, H. Liu, Feature selection for classification: A review, in: Data Classification: Algorithms and Applications, CRC Press, 2014, p. 37. [6] I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning 46 (2002) 389–422. [7] Y. Saeys, I. Inza, P. Larra naga, A review of feature selection techniques in bioinformatics, Bioinformatics 23 (2007) 2507–2517. [8] T. Kadowaki, H. Nishimori, Quantum annealing in the transverse ising model, Physical Review E 58 (1998) 5355. [9] T. Albash, D. A. Lidar, Adiabatic quantum computation, Reviews of Modern Physics 90 (2018) 015002. [10] H. Wang, W. Wang, Y. Liu, B. Alidaee, Integrating machine learning algorithms with quantum annealing solvers for online fraud detection, IEEE Access 10 (2022) 75908–75917. doi:10.1109/ ACCESS.2022.3190897. [11] T.-Y. Liu, J. Xu, T. Qin, W. Xiong, H. Li, Letor: Benchmark dataset for research on learning to rank for information retrieval, in: Proceedings of the SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, 2007, pp. 3–10. [12] K. J"arvelin, J. Kek"al"ainen, Cumulated gain-based evaluation of ir techniques, ACM Transactions on Information Systems (TOIS) 20 (2002) 422–446. [13] D. Jian-peng, L. Hang, P. Xiao-Ling, Z. Chao-Ni, Y. Tian-Huai, J. Xian-Min, Research progress of quantum memory, Acta Physica Sinica (2019). doi:10.7498/APS.68.20190039. [14] D. ming Huang, T. Chow, Effective feature selection scheme using mutual information, Neuro- computing 63 (2005) 325–343. doi:10.1016/j.neucom.2004.01.194. [15] J. Sotoca, F. Pla, Supervised feature selection by clustering using conditional mutual information- based distances, Pattern Recognit. 43 (2010) 2068–2081. doi:10.1016/j.patcog.2009.12.013. [16] H. Wang, W. Wang, Y. Liu, B. Alidaee, Integrating machine learning algorithms with quantum annealing solvers for online fraud detection, IEEE Access 10 (2022) 75908–75917. [17] W. Qian, W. Shu, Mutual information criterion for feature selection from incomplete data, Neuro- computing 168 (2015) 210–220. [18] K. Heshami, D. G. England, P. C. Humphreys, P. J. Bustard, V. M. Acosta, J. Nunn, B. J. Sussman, Quantum memories: emerging applications and recent advances, Journal of modern optics 63 (2016) 2005–2028. [19] H. Ge, T. Hu, Genetic algorithm for feature selection with mutual information, in: 2014 Seventh International Symposium on Computational Intelligence and Design, volume 1, IEEE, 2014, pp. 116–119. [20] H. Liu, J. Sun, L. Liu, H. Zhang, Feature selection with dynamic mutual information, Pattern Recognition 42 (2009) 1330–1339. [21] B. Yao, C. Li, Y. Chen, Supervised feature selection based on sparse representation and mutual information, in: 2023 IEEE 5th International Conference on Civil Aviation Safety and Information Technology (ICCASIT), IEEE, 2023, pp. 1354–1358. [22] M. Beraha, A. M. Metelli, M. Papini, A. Tirinzoni, M. Restelli, Feature selection via mutual information: New theoretical insights, in: 2019 international joint conference on neural networks (IJCNN), IEEE, 2019, pp. 1–9. [23] A. Pasin, M. Ferrari Dacrema, P. Cremonesi, N. Ferro, QuantumCLEF 2024: Overview of the Quantum Computing Challenge for Information Retrieval and Recommender Systems at CLEF, in: Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), Grenoble, France, September 9th to 12th, 2024, 2024. [24] A. Pasin, M. Ferrari Dacrema, P. Cremonesi, N. Ferro, Overview of QuantumCLEF 2024: The Quantum Computing Challenge for Information Retrieval and Recommender Systems at CLEF, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction - 15th International Confer- ence of the CLEF Association, CLEF 2024, Grenoble, France, September 9-12, 2024, Proceedings, 2024.