=Paper=
{{Paper
|id=Vol-3389/Challenge01
|storemode=property
|title=Leveraging SHAP and CBR for Dimensionaltiy Reduction on the Psychology Prediction Dataset
|pdfUrl=https://ceur-ws.org/Vol-3389/ICCBR_2022_XCBR_Challenge_Indiana.pdf
|volume=Vol-3389
|authors=Zachary Wilkerson,David Leake,David Crandall
|dblpUrl=https://dblp.org/rec/conf/iccbr/WilkersonL022
}}
==Leveraging SHAP and CBR for Dimensionaltiy Reduction on the Psychology Prediction Dataset==
Leveraging SHAP and CBR for Dimensionality Reduction on the Psychology Prediction Dataset Zachary Wilkerson, David Leake and David Crandall Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington IN 47408, USA Abstract Effective dimensionality reduction for feature spaces can benefit the accuracy, efficiency, and/or explain- ability of models using the features. One task of the Explainable AI Challenge of the 2022 International Conference on Case-Based Reasoning is to apply explanation methods for informed feature pruning to refine an artificial neural network model for depression screening. This paper explores how explanations provided by SHAP values can guide feature pruning. It presents and evaluates four approaches developed for this task: 1) iterative feature pruning based on SHAP values, 2) using case-based reasoning to guide search through potential feature prunings, 3) pruning based on using Hamming distance between a reference case and patient cases to partition the case base, and 4) evaluating feature prunings in light of semi-factual and counterfactual cases. Results show that both using case-based reasoning and absolute SHAP values can guide feature pruning to improve model accuracy, with best performance occurring when SHAP values inform feature pruning selection for case-based reasoning-based methods. 1. Introduction This paper responds to the Explainable AI Challenge at the 2022 International Conference on Case-Based Reasoning, addressing the Psychology Prediction task to apply explanation methods to improve an artificial neural network’s classification accuracy by pruning the feature space of its training dataset [1]. In addition to the accuracy benefits of removing “noisy" features, reducing the number of features could facilitate human assessment of case similarity, making the process potentially beneficial for interpretability of a case-based classifier for this domain. Consequently, our evaluation considers both classification accuracy and the ability to minimize the feature set while maintaining accuracy. We propose and evaluate four approaches for improving the neural network’s classification accuracy by removing features, testing them both with and without feature importance infor- mation from an explanation component based on SHAP [2] and provided by the Challenge. The first approach (Importance-FP) directly applies SHAP values, iteratively pruning the features whose SHAP values suggest that they contribute least to classification. The second approach, CBR-based feature pruning (CBR-FP), uses CBR to focus search of the space of feature sets, favoring exploration of sets that are novel or similar to sets that have performed well in past iterations. This approach is based on Hoffmann and Bergmann’s HypoCBR [3], which uses ICCBR XCBR’22: 4th Workshop on XCBR: Case-based Reasoning for the Explanation of Intelligent Systems at ICCBR-2022, September, 2022, Nancy, France $ zachwilk@indiana.edu (Z. Wilkerson); leake@indiana.edu (D. Leake); djcran@indiana.edu (D. Crandall) 0000-0002-8666-3416 (D. Leake) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 1 Zachary Wilkerson et al. ICCBR’22 Workshop Proceedings CBR to evaluate the viability of candidate parameterizations for a neural network. The third approach, Hamming Feature Pruning (Hamming-FP), maps each data point to its distance from a provided set of expected feature values determined by a domain expert using Hamming distance. Based on these distance values, the approach favors feature prunings for which similar distances correlate with similar classifications. The fourth approach, Semi/Counterfactual Feature Pruning (SC-FP), is inspired by CBR research on semi-factual and counterfactual explanations [4]. It selects feature sets by directly testing whether changing the values of particular features gener- ates a case of the same classification (semi-factual) or different classification (counterfactual). This is based on the hypothesis that a good pruning strategy retains features that preserve and are significant to correct classifications (i.e., classifications are more likely to change after inverting feature values). We first test baseline versions of the algorithms and then test their performance when biased using SHAP values. Results show that using SHAP values provides modest accuracy improvements, with CBR-FP yielding particularly promising results. 2. Experimental Setup and Methods All experiments prune features to improve performance of a provided multilayer perceptron classifier model with ten hidden layers and logistic activation functions, trained on the Psychol- ogy Prediction dataset [1]. Baselines are 1) no features removed (0-FP), and 2) feature prunings selected randomly, with the most accurate pruning returned after a set number of iterations (Random-FP). We compare the four methods described below against these baselines. We also explore using SHAP values as relative probabilities weighting the selection of features to prune at each iteration in each approach. 2.1. Importance-FP: Iteratively removing the least-contributing feature based on SHAP values This method calculates the SHAP values for the model, prunes the least-contributing feature, calculates the model accuracy, and then repeats the process on the resulting feature subset until no more features can be removed. The pruning resulting in the highest accuracy is returned. 2.2. CBR-FP: Using CBR to evaluate potential prunings for uniqueness and/or projected accuracy This approach begins by selecting a candidate pruning as in Random-FP, but it only applies the pruning if its features are suitably unique from the most similar already-explored pruning or if the retrieved pruning resulted in a suitable classification accuracy. The CBR system is retrieval-only, calculating similarity based on match distance (i.e., if two cases prune the same feature, the distance component is 0, else 1). The pruning resulting in the highest accuracy after a given number of iterations is returned. 2 Zachary Wilkerson et al. ICCBR’22 Workshop Proceedings 2.3. Hamming-FP: Estimating decision boundaries using distances to an extreme case This method treats patients in the provided dataset as cases and generates a prototypical extreme case by aggregating the values from the Challenge-provided “expected features" document (representing a domain expert’s “textbook example" for a patient at highest depression risk). It generates a random feature pruning for each experimental iteration. Using the features preserved in the current pruning, it calculates distances from each patient case to the prototype using Hamming distance, implicitly defining a spectrum between the extreme case and the patient case farthest from it on which all other cases sit. Finally, it partitions this spectrum into distance-based segments corresponding with output classes such that the number of correctly-classified cases is maximized. As prunings are randomly generated across iterations, the dimensions of the spectrum, cases’ locations on it, and the locations of optimal partitions change accordingly; the algorithm finds the pruning that results in the highest number of correctly-classified cases. 2.4. SC-FP: Using semi/counterfactual explanation behaviors as proxy for model behavior with feature pruning This approach is based on the hypothesis that observing the effect on classification of “flipping" the value of a binary feature can help predict the effect of removing that feature. This algorithm assesses the value of randomly-generated candidate feature sets by flipping feature values of randomly-generated feature subsets, estimating the result using CBR retrieval. It favors prunings such that flipping feature values retains correctly-classified cases (semi-factual) while changing incorrectly-classified cases to the correct class (counterfactual). The pruning yielding the greatest increase in correctly-classified cases is selected for evaluation. 3. Results and Discussion Table 1 illustrates the average and maximum accuracy values and corresponding number of fea- tures pruned for each approach, with and without sampling bias based on SHAP values. Average and maximum classification accuracy values are calculated using ten-fold cross-validation over 25-30 experimental trials. Broadly, accuracy-based methods (e.g., CBR-FP and Importance-FP) lead to higher accuracy values. Additional findings are discussed in detail below. 3.1. The number of features removed for best accuracy depends on the method used One surprising result is the wide range of final feature set sizes. Hamming-FP removes few features, suggesting that its simple distance calculation is most stable/performant only for smaller prunings. By contrast, SC-FP removes a significant number of features. Because feature values are flipped rather than removed, this may introduce some information used by SC-FP’s CBR component and not used by the neural network model. CBR-FP usually has only slightly better accuracy than Random-FP, but has the potential benefit of tending toward larger pruning 3 Zachary Wilkerson et al. ICCBR’22 Workshop Proceedings Table 1 Results for random sampling and sampling biased using SHAP values. Accuracy values are percentages. The number of features removed is listed in the final column. Errors are one standard deviation. The top three average and maximum accuracy values and corresponding pruning sizes are boldfaced (one tie has the top four boldfaced). Approach Iters. Avg. Accuracy Feats. Pruned Max Accuracy Feats. Pruned 0-FP 1 50.8 0 50.8 0 Random-FP 100 53.5 ± 1.7 10.5 ± 7.9 57.6 15 1000 56.4 ± 1.1 15.6 ± 5.8 58.5 12 Importance-FP 1 53.0 ± 1.1 11.5 ± 7.4 55.9 27 CBR-FP 100 52.4 ± 2.3 11.4 ± 10.2 55.2 9 1000 55.9 ± 1.1 15.3 ± 5.4 58.5 27 Hamming-FP 100 49.2 ± 4.2 2.2 ± 3.2 53.4 1 1000 47.0 ± 5.3 3.7 ± 2.8 54.2 7 SC-FP 100 32.4 ± 7.8 37.6 ± 15.4 51.4 17 1000 32.2 ± 6.8 35.3 ± 12.2 47.3 22 (a) Results for unbiased random sampling. Approach Iters. Avg. Accuracy Feats. Pruned Max Accuracy Feats. Pruned 0-FP 1 50.8 0 50.8 0 Random-FP 100 53.7 ± 1.7 12.7 ± 9.8 57.6 22 1000 56.4 ± 1.0 14.7 ± 5.9 58.5 18 Importance-FP 1 52.9 ± 1.1 12.6 ± 7.3 56.7 23 CBR-FP 100 54.1 ± 1.5 12.0 ± 8.1 57.6 15 1000 56.9 ± 1.6 14.2 ± 7.2 61.8 9 Hamming-FP 100 47.8 ± 5.5 2.2 ± 2.3 53.5 1 1000 45.2 ± 6.2 3.6 ± 2.2 54.2 2 SC-FP 100 32.9 ± 8.6 47.3 ± 17.4 51.5 18 1000 33.7 ± 9.2 40.9 ± 16.4 57.5 27 (b) Results using SHAP values to bias sampling. sizes (as does Importance-FP). This suggests that if small feature sets facilitate explanation, both feature set size and accuracy should be considered in choosing a pruning method. 3.2. SHAP biases enable subtle accuracy improvements Using SHAP values with CBR is challenging, because SHAP values imply ordered pruning, while CBR favors comparing whole prunings. However, using SHAP values to assign relative probabilities to features for pruning selection appears to be effective and to act as a stabilizing force–increasing overall and/or maximum accuracy values in several methods (esp. CBR-FP). 3.3. Hamming-FP abd SC-FP can offer interpretability for the pruning process for developers, though they underperform in terms of accuracy Compared with Random-FP, CBR-FP and Importance-FP lead to similar/slightly better accuracy improvements. By contrast, Hamming-FP and SC-FP appear much less effective. Both methods 4 Zachary Wilkerson et al. ICCBR’22 Workshop Proceedings estimate model performance using proxy classification accuracy values, making them poten- tially valuable to developers, but for these experiments, they are less successful on average than accuracy-based approaches. Further research is necessary to help clarify ways in which Hamming-FP and SC-FP might be more useful/applicable. 4. Conclusions and Future Directions We have presented and evaluated multiple potential methods for integrating explanatory infor- mation and CBR for feature pruning to improve performance of an artificial neural network, testing those methods on the provided Psychology Prediction dataset. Of these, using CBR to mediate random feature pruning selection weighted using absolute SHAP values appears to yield the highest model accuracy, with Random-FP providing strong performance as well. This provides some support for the generality of the HypoCBR [3] approach on which our approach was based. Both methods provide the potential benefit of removing the greatest number of features, which could facilitate explaining similarity between cases. Future work could investigate using weighted feature values for CBR and/or for weighted Hamming distance calculations, along with more detailed analysis of the experimental algorithms (esp. SC-FP). Acknowledgments This work was funded by the US Department of Defense (Contract W52P1J2093009), and by the Department of the Navy, Office of Naval Research (Award N00014-19-1-2655). We thank Karan Acharya and Lawrence Gates for very helpful discussions. References [1] M. G. Orozco-del Castillo, E. C. Orozco-del Castillo, E. Brito-Borges, C. Bermejo-Sabbagh, N. Cuevas-Cuevas, An artificial neural network for depression screening and question- naire refinement in undergraduate students, in: M. F. Mata-Rivera, R. Zagal-Flores (Eds.), Telematics and Computing, Springer, Cham, 2021, pp. 1–13. [2] S. M. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems 30, Curran, 2017, pp. 4765–4774. [3] M. Hoffmann, R. Bergmann, Improving automated hyperparameter optimization with case-based reasoning, in: Case-Based Reasoning Research and Development, ICCBR 2022, Springer, 2022, pp. 273–288. In press. [4] E. M. Kenny, M. T. Keane, On generating plausible counterfactual and semi-factual explana- tions for deep learning, in: Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), AAAI, 2021, pp. 11575–11585. 5