Explainable and Interpretable Dry Beans Classification using Soft Voting Classifier Belayneh Dejene*1, Gizachew Setegn2, Selamawit Belay1 1 University of Gondar, Department of Information Science, Gondar, 196, Ethiopia 2 Debark University, Department of Computer Science, Debark, 90. Ethiopia Abstract Dry beans, integral to the Fabaceae family, boast global significance with their diverse genetic heritage tracing back to their dissemination from America centuries ago. This study endeavors to develop an explainable dry bean classification model using a soft voting classifier, juxtaposing its performance against classic and ensemble machine learning algorithms. Data preprocessing ensured suitability for classification algorithms, with feature selection employing information gain and variance inflation factors. The class imbalance was addressed via SMOTE + Tomek methods. Evaluation metrics encompassed accuracy, precision, recall, and F1-score. XGBoost led with 92.5065% accuracy, while soft voting classifiers (LGBM, XGB, CatBoost, RF, and DT) closely followed at 92.691%. The soft voting classifier proved optimal for dry bean classification, aiding in model interpretation and decision-making processes. Keywords Classification, Dry bean, Explainable, Machine learning, voting classifier. 1. Introduction Dry beans belong to the diverse Fabaceae family, sometimes referred to as Leguminosae, and they are the most important and the most produced pulse in the world [1]. It originated in America, while there is a wide genetic diversity in the world since, in the 15th and 16th centuries, they were transported to Europe and Africa and quickly spread to the rest of the globe [1]. The selection of dry beans plays an important role in the economy of agriculture-based countries like Bangladesh, India, Pakistan, etc. throughout the winter season. Currently, Dry bean is a staple food for many regions of the world and processing enables the consumption and incorporation of this nutrient-dense food in daily diets. Dry beans are the most known source of protein. In addition, they are low in fat and a rich source of fiber and other important nutrients [2][3]. Dry beans are important for environmental and human health benefits, such as improved soil fertility, reduced risk of chronic disease, and improved or promoted glycemic control [1]. There are several genetic diversities of dry beans, and it is the most produced one among the edible legume crops in the world. According to the Turkish Standards Institution, dry beans are classified as Barbunya, Battal, Bombay, Calı, Dermason, Horoz, Tombul, Selanik, and Seker” based on their botanical characteristics [4][5][6]. Plants are sensitive to the effect of climatic changes and they have a variety of resistance. Finding high-quality seed is the primary challenge facing dry bean producers and distributors or marketers. Using a lower quality seed in production will induce to lower quantity even if all the cultivation conditions are provided. A wide range of computational tools are available to regulate food and agricultural product quality. But most of them are done with the use of conventional techniques of the professionals. For example, different seed categorization is conducted based on human understanding, and determining the type of dry beans requires a skillful person to take a huge time manually, and passes a challenging process [6]. In particular, the color of various dry bean species varies, and geometrical data does not ___________________________________________ Proceedings of the of DAAfrica’2024 Workshop, November 23, 2024, Bejaia, Algeria 1∗ B. Dejene. belzman2011@gmail.com (B. Dejene); gizachewmulucs@gmail.com (G. Setegn); selamsun7@gmail.com (S. Belay) 0000-0002-5978-7691 (B. Dejene) © 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop ceur-ws.org ISSN 1613-0073 24 Proceedings reveal this color variation. Due to this reason, it is vital in economically technical aspects to build an automated technique to detect as well as categorize seed features rapidly and repeatedly. Even, it is difficult for a human operator to understand or handle the seeds except for specific tools or automatic software procedures. The main problem dry bean producers and marketers face is in ascertaining good seed quality. Lower quality of seeds leads to lower quality of produce. Seed quality is the key to bean cultivation in terms of yield and disease. In today’s world, the inspection of the quality of seeds, fruits, and vegetables along with the examination and categorization of seeds and grains have been performed worldwide to meet these demands with the help of machine learning and computer vision [4]. This is why we try to use a soft voting classifier and compare it with individual algorithms to classify dry beans. In recent years, machine learning algorithms have been used in the inspection, classification, prediction, and segmentation of food product quality. Classification techniques are becoming more popular in the fields of medicine, biostatistics, bioinformatics, agriculture, business, etc. as machine learning applications [7]. Machine learning is a subfield of artificial intelligence that enables computers to understand existing data and estimate the existence of unidentified targets. Seed quality is influential in crop production. Seed classification is important for both producers and marketers to provide the values of sustainable agricultural systems. By applying predictive analysis to agricultural data, significant decisions can be taken and classifications can be made. Besides the classification model conducting explainability and interpretability of the classification model provide the professionals with insights into how the classifications are made, fostering trust in the model's decisions [8]. The explainable machine learning model impacts professionals more likely to trust and adopt understand and interpret the reasoning behind the model's recommendations by solving the black box nature of the algorithms. To handle this problem, several studies have been conducted to detect the quality of dry beans using various machine-learning techniques. For example, [4][5][6][7] conducted on dry bean classifications. The previous research on dry bean classification has largely neglected the crucial aspect of explainability and interpretability in their models. Instead, researchers predominantly focused on employing various algorithms without addressing the black box nature inherent in these methods. Classic machine learning approaches were commonly utilized, often with default parameter settings, despite evidence suggesting that optimizing these parameters could enhance classification performance [9]. Additionally, while some studies attempted to tackle class imbalance issues, they typically employed simplistic oversampling methods, which could lead to the generation of redundant data. Advanced techniques for addressing class imbalance were rarely explored. Furthermore, previous research overlooked feature selection methods, which could potentially improve model efficiency and interpretability. The absence of studies utilizing explainable techniques to handle black-box models, as well as the scarcity of research employing soft voting classifiers and tuned parameters, underscored the need for this study. Motivated by these gaps, this study endeavors to develop an explainable and interpretable classification model for dry beans. It seeks to utilize soft voting classifiers, a technique not extensively explored in previous research, and compare its performance with individual machine learning algorithms. By incorporating explainable and interpretable methods, this study aims to classify dry beans accurately while providing insights into the decision- making process, thus facilitating evidence-based policies and interventions in the selection of appropriate dry bean classes. 2. Related works Several studies such as [4] [5][6] and [7], investigated the dry bean classifications using machine learning algorithms. However, most of the previous researchers didn’t consider the explainability and the interpretability of the dry beans’ classification model, most of these previous studies developed a classification model by handling the class imbalance problem on the whole data and 25 developing the classification model without tuning relevant parameters. These studies did not conduct any feature selection methods, they developed the classification model by using all the features in the dataset. M. Koklu and I. A. Ozkan [4] develop multi-class dry bean classifiers using MLP, SVM, kNN, and DT, classification models. The overall correct classification rates have been determined as 91.73%, 93.13%, 87.92%, and 92.52% for MLP, SVM, kNN, and DT, respectively. The SVM classification model has the highest performance with the accuracy of the Barbunya, Bombay, Cali, Dermason, Horoz, Seker, and Sira bean varieties 92.36%, 100.00%, 95.03%, 94.36%, 94.92%, 94.67%, and 86.84%, respectively. However, this researcher didn’t consider the explainability and the interpretability of the dry beans’ classification model. G. Słowiński [5] tried to classify dry beans using machine learning techniques: Multinomial Bayes, Support Vector Machines, Decision Trees, Random Forests, and Voting Classifier. The overall accuracies obtained were in the range: of 88.35 - 93.61%. However, this researcher didn’t consider the explainability and the interpretability of the dry beans’ classification model. M. Salauddin Khan et al [7] aimed to construct a multiclass dry bean classification model using the eight most popular classifiers and compare their performances. The algorithms they used, were LR, NB, KNN, DT, RF, XGB, SVM, and MLP with balanced and imbalanced classes. The XGB classifier performed better than other classifiers with the balanced and imbalanced dataset of dry beans within each class. It performed an accuracy of 93.0% and 95.4% in imbalanced and balanced classes respectively. The overall performance is better than the previous studies, however, the researchers didn’t consider the explainability and the interpretability of the dry beans’ classification model. The researcher develops the model without tuning the parameters and developing the model without those parameters faces overfitting. Not only this but also, the researcher handles the class imbalance problems on the whole dataset before splitting it, and evaluating the model using those fabricated datasets. 3. Materials and Methods 3.1. Data collection methods To conduct this study, we have used the publicly available dataset in the Kaggle repository. The extracted datasets consist of a total of 13,611 grains of 7 different registered dry beans with a total of 17 features including the class level (see table 1 here below for the dataset descriptions) Table 1. Dataset descriptions No Feature Type Description 1 Area Integer The area of a bean zone and the number of pixels within its boundaries 2 Perimeter float Bean circumference is defined as the length of its border 3 Major axis float The distance between the ends of a dry bean can be drawn from a bean the longest length line that 4 Minor axis float The longest line that can be drawn from the bean while standing perpendicular to length the main axis 5 Aspect ratio float Defines the relationship between L and l 6 Eccentricity Real The eccentricity of the ellipse having the same moments as the region 7 Convex area Integer Number of pixels in the smallest convex polygon that can contain the area of a bean seed 8 Equivalent float The seed area diameter of a circle is the same area as a bean diameter 9 Extent float The ratio of the pixels in the bounding box to the bean area 10 Solidity float The ratio of the pixels in the convex shell to those found in beans 11 Roundness float Calculated with the following formula 12 Compactness float Measures the roundness of an object 13 ShapeFactor1 float Shape factor 1 14 ShapeFactor2 float Shape factor 2 15 ShapeFactor3 float Shape factor 3 16 ShapeFactor4 float Shape factor 4 17 Class Nominal Target class of the dry bean 26 3.2. Data preprocessing methods Data preparation involves data selection, data cleaning, data integration, feature selection, handling imbalances, and data transformation to make it available to extract value from those data [10][11]. In this subsection, we have detected the missing values, removed redundancies, detected outliers, and handled class imbalance problems from the dataset using statistical methods 3.2.1. Data cleaning This is a way of removing noise, inconsistencies, redundancy, and missing values to carefully develop the model. Without cleaning the collected data, we can’t get an accurate result [12][13]. In the dataset, there are no missing values, though we have not applied any methods to handle the missing values. From the data, we have removed 68 redundant records using drop redundant methods. Most of the variables have a higher proportion of outliers including Area, Perimeter, Minor Axis Length, Eccentricity, Convex Area, EquivDiameter, and ShapeFactor4. To handle this outlier, we have used interquartile range and boxplot methods. 3.2.2. Data transformation Where data are transformed and consolidated into forms appropriate for extracting by performing summary or aggregation operations. The data are transformed into forms appropriate for mining [14][15]. In these datasets, only the class level needs to be transformed for mining purposes, but all the remaining features don’t need to transform and we have used it as it is. To transform the class level, we have used the level encoding methods and encoded them into numeric values. We have encoded as ‘DERMASON’ = 0, ’SIRA’ = 1, ’SEKER’ = 2, ’HOROZ’ = 3, ’CALI’ = 4, ’BARBUNYA’ = 5, and ’BOMBAY’ = 6. 3.2.3. Feature selection In this method, we have checked the importance of all the features by using information gain, (see Fig 1 here below), from the 16 features the last three, features (ShapeFactor4, Solidity, and Extent) were the least important, but it is not mean that they are not valuable for the model. We have checked the multicollinearity of the feature using the variance inflation factor, and the variance inflation factor shows that all of the features were significant to the model. Due to this, we have not dropped them for their usefulness and we used all of the 16 features for developing the classification model. Fig. 1. Feature importance 3.2.4. Handling class imbalance By nature, the class level of the collected data is imbalanced see Figure 2 here below. To overcome the imbalanced class distributions problem, we can add samples to or remove samples from the data set [16]. Sampling can be achieved in two ways, Under-sampling, randomly removing the majority 27 class, oversampling the minority class, or by combining over and under-sampling techniques [16][17]. The extracted dataset class level has 7 values, from these values, some of them have the least values see Figure 2 here below. In the class distribution, the “BOMBAY” class has the least value when we compare it with other classes. To conduct this research, we used the synthetic minority over-sampling technique (SMOTE) + Tomek methods to handle the class imbalance of the class levels of the dataset. The main reason that we use SMOTE + Tomek is, it avoids the loss of valuable information [16][17]. In SMOTE + Tomek, the SMOTE combines the SMOTE ability to generate synthetic data for the minority class and the Tomek ability to remove the data that are identified as Tomek links from the majority class [18][19]. Fig. 2. Class imbalance 3.3. Train test split In model building, the researcher needs to develop datasets for training and testing to learn and evaluate the machine appropriately [20][21]. To conduct this study, we used the stratified splitting technique to split the whole dataset to train and test data and split the dataset into 80:20 train test ratios. 3.4. Parameter tuning In the process of machine learning and deep learning algorithms, the performance of the algorithm highly depends on the selection of hyperparameters, which has always been a crucial step in the process of machine learning [22][23][24]. To improve the performance rate for each algorithm a collection of hyperparameters has been tuned using grid search methods. Gird search is commonly used as an approach to hyper-parameter tuning that will methodologically build and evaluate a model for each combination of algorithm parameters specified in a grid [24]. Here, we used the grid- search with GridSearchCV for selecting tuning parameters for a homogeneous ensemble machine learning algorithm. Table 2. Tuned parameters No Parameters Algorithms 1 Soft voting Default parameters classifier 2 LGBM Default parameters classifier 3 Random criterion='entropy’,max_features='sqrt',min_samples_split=3,n_estimators=500,rando Forest m_state=0,max_depth=20, max_leaf_nodes=400, n_jobs=-1 4 Cat boost random_state=42, learning_rate=0.1, l2_leaf_reg=4, iterations=600, depth= 6 5 xgboost random_state=42, verbosity=0, min_child_weight=2, max_depth=4, learning_rate=0.15, gamma=0.22, colsample_bytree=0.5 6 Decision tree max_depth=20, criterion='gini',max_features='sqrt',splitter='best', max_leaf_nodes=100 ,min_samples_split=3 28 3.5. Classification model In this study, to construct a dry bean classification model we have used a soft voting classifier in both the balanced and the unbalanced dataset. To compare that the soft voting classifier can perform better than other machine learning algorithms, another model was developed using decision tree algorithms and other ensemble learning classifiers namely random forest, catboost XGBoost, and LGBM classifiers. To improve each algorithm's performance rate, a collection of hyperparameters has been tuned using grid search methods. The performance of each classification model was evaluated using accuracy, precision, recall, and F1- score. 3.6. Model explainability To enhance the explainability of the classification model, we have employed various feature relevance explanation techniques like Local Interpretable Model-agnostic Explanations (LIME) and Shapley Additive Explanation (SHAP) to highlight the most influential features and regions in the input data, and to explain the quality of the inner functioning of deep learning models and decisions by calculating the influence of each input variable and producing relevant scores. Global interpretability techniques, such as feature importance analysis or rule extraction, are employed to reveal the underlying patterns and decision rules learned by the model [8]. 4. Result and discussion Experiments have been carried out to develop a dry bean classification model by using a soft voting classifier and comparing it with other classic and ensemble machine learning algorithms. To construct a classification model for dry beans, we conducted two experiments on the imbalance data and the balanced data using a soft voting classifier, RF, cat boost, XGB, LGBM, and DT. Each experiment was conducted using 16 features and by using all the tuned parameters using grid search (see Table 2). This experiment is multiclass classification because the dataset by nature has seven class levels. In these experiments, we evaluated all the classification models using accuracy, precision, recall, and f1_score evaluation metrics. Finally, we have explained the model using LIME and SHAP feature relevancy explanation techniques. Experiment# 1: Imbalanced dataset This experiment was conducted by using the imbalanced dataset or without applying any data imbalance handling methods. We have developed the model by using DT, RF, Catboost, XGB, LGBM, and a soft voting classifier. We have also evaluated those models’ using accuracy, precision, recall, and f1_score (see Table 3 here below) Table 3. Model performance using the imbalanced dataset Algorithms Metrics Accuracy Precision Recall F1_score Decision tree 0.910668 0.920002 0.920301 0.920069 Random forest 0.923588 0.936018 0.933484 0.934621 Cat boost 0.927649 0.939536 0.939268 0.938353 XGBoost 0.928756 0.940888 0.938008 0.939343 LGBM classifier 0.92285 0.936619 0.935014 0.935765 Soft voting classier (LGBM, 0.92691 0.940701 0.93913 0.939856 cat boost, XGB, RF, DT) Soft voting classier (cat boost, 0.927649 0.939835 0.93874 0.93922 XGB) Soft voting classier (LGBM, 0.925065 0.93902 0.936754 0.937815 cat boost) Soft voting classier (RF, DT) 0.90993 0.920326 0.919321 0.919589 29 As we see from Table 3 above, the XGBoost algorithm outperforms the best result with accuracy precision, and f1_score of 0.928756%, 0.940888%, and 0.938008% respectively. But in the case of recall cat boost algorithm performs the best with 0.939268%. When we see the soft voting classifiers, the soft voting of the algorithms LGBM, cat boost, XGB, RF, and DT performs better than the soft voting with other algorithms. In the soft voting algorithms, the voting that contains cat boost and XGBoost algorithm performs a better result. Experiment# 2: balanced dataset This experiment is conducted by balancing the dataset using SOMTE + Tomek methods on the training set only and developing the model using DT, RF, Catboost, XGB, LGBM, and soft voting classifiers. We have also evaluated those models’ using accuracy, precision, recall, and f1_score (see Table 4 below). Table 4. Model performance using the balanced dataset Algorithms Metrics Accuracy Precision Recall F1_score Decision tree 0.921004 0.932629 0.933324 0.932835 Random forest 0.921004 0.932629 0.933324 0.932835 Cat boost 0.925434 0.9367 0.938996 0.937794 XGBoost 0.925065 0.936134 0.93809 0.937017 LGBM classifier 0.924695 0.937783 0.937861 0.937776 Soft voting classier (LGBM, cat boost, XGB, RF, 0.926541 0.936982 0.938738 0.937805 DT) Soft voting classier (cat boost, XGB) 0.925065 0.936067 0. 93803 0.93695 Soft voting classier (LGBM, cat boost) 0.926541 0.938991 0.940395 0.939642 Soft voting classier (RF, DT) 0.906238 0.916778 0.919568 0.917991 Finally, in this experiment developing the model by handling the imbalance problem is not always a good solution to get a better performance. 4.1. Model comparison As a result, the researcher compared the performance of algorithms to classify the dry bean using a soft voting classifier and other classic and ensemble machine learning algorithms using both imbalanced and balanced datasets. The dataset has seven classes. Then, the researcher used overall accuracy, precision, recall, and f1_score as an evaluation for classification model comparison. According to the overall performance, the classification algorithm that registered the highest performance is selected as the best algorithm for the classification model for the dry bean. As indicated in Table 3 and Table 4 above, the experiments are conducted on classification algorithms for classifying the dry bean. The XGB algorithms registered the highest accuracy of 92.8756% in the imbalanced dataset and the soft voting classifiers of the algorithm LGBM, cat boost, XGB, RF, and DT performed an accuracy of 92.6541% using the imbalanced datasets. The soft voting classifiers of LGBM, XGB, Cat boost, RF, and DT perform the best result next to XGBoost algorithms with overall accuracy, precision, recall, and f1_score of 92.691%, 94.0701%, 93.913%, and 93.986% respectively. The decision tree algorithm is registered with the lowest performance in both the imbalanced and the balanced datasets, see Table 3 and Table 4. Therefore, the XGBoost algorithm is selected as the best classifier as compared to other classic and ensemble machine learning algorithms, and the soft voting classifiers of LGBM, XGB, Cat boost, RF, and DT are selected as the best classifier where we compared with other voting classifiers. 4.2. Model explainability To enhance the explainability of the classification model, we have employed various techniques. We have explained and interpreted the classification model developed with each algorithm to make the trust of how it achieves the result. The explainable AI approach with LIME and SHAP frameworks is implemented to understand how the model predicts the final results. To explain the model, we 30 have randomly selected the rows 100, 150, 200, 250, and 300 in the dataset. But this row was selected randomly and we can select any other rows in the dataset. Fig. 3. Model Explanation with LIME for row 100 Fig.5. Model Explanation with LIME for row 150 Fig. 4. Model Explanation with LIME for row 200 Fig.6. Model Explanation with LIME for row 250 Fig. 7. Model Explanation with LIME for row 300 The figures 3, 4, 5, 6, and 7 above depict interpretations of an XGBoost model using the LIME explainable AI method for classifying specific types of dry beans. In each case, the model achieves 100% accuracy in classifying the beans into their respective classes. Here are the key findings from each interpretation: Class 'BOMBAY' (Figure 3): The model identifies dry beans as 'BOMBAY' based on specific features such as perimeter, shape factors, minor axis length, convex area, and area. For instance, the beans are classified as 'BOMBAY' when perimeter > 0.83, ShapeFactor1 <= 0.78, MinorAxisLength > 0.91, Convex Area > 0.94, and Area > 0.94. Class 'SEKER' (Figure 4): The model correctly classifies dry beans as 'SEKER' by considering features like shape factors, minor axis length, and compactness. For instance, beans are categorized as 'SEKER' when ShapeFactor4 > 0.33, ShapeFactor1 < -0.15, MinorAxisLength < -0.24, ShapeFactor3 > 0.45, and Compactness > 0.44. Class 'HOROZ' (Figure 5): Dry beans are accurately classified as 'HOROZ' based on features such as roundness, perimeter, convex area, equivalent diameter, and area. For example, beans are classified as 'HOROZ' when roundness <= -0.82, Perimeter > 0.24 & <= 0.83, ConvexArea > -0.21 & <= 0.19, EquivDiameter > -0.23 & <= 0.19, and Area > -0.21 & <= 0.20. 31 Class 'SIRA' (Figure 6): The model identifies dry beans as 'SIRA' considering attributes like perimeter, roundness, minor axis length, shape factors, and shape factor 3. For instance, beans are classified as 'SIRA' when Perimeter > 0.24 & <= 0.83, roundness > -0.22 & <= -0.12, MinorAxisLength > -0.24 & <= 0.09, ShapeFactor1 > -0.15 & <= -0.08, and ShapeFactor3 > -0.65 & <= -0.64. Class 'BARBUNYA' (Figure 7): Dry beans are correctly classified as 'BARBUNYA' based on features like roundness, perimeter, minor axis length, shape factor 1, and convex area. For example, beans are categorized as 'BARBUNYA' when roundness <= -0.82, Perimeter > 0.83, MinorAxisLength > 0.91, ShapeFactor1 <= -0.78, and ConvexArea > 0.94. These interpretations provide insights into how the model makes its predictions, highlighting the specific features that are influential in classifying different types of dry beans. Figures 8, 9, 10, 11, and 12 below show the decisions generated by the XGBoost model for the randomly selected rows of 100, 150, 200, 250, and 300 respectively. Based on the decisions generated by the XGBoost model, the class value for rows 100, 150, 200, 250, and 300 is 6, 2, 3, 1, and 5 respectively. to check the name of the class, see section 3.2.2. Fig.8. Decisions for row 100 Fig. 9. Decisions for row 150 32 Fig. 10. Decisions for row 200 Fig. 11. Decisions for row 250 33 Fig. 12. Decisions for row 300 The figure 13 below shows the importance of each feature for each class in constructing the classification model. Based on the result above we have decided that XGBoost is the best classification model to classify the dry beans. So, we have explained the XGBoost model using SHAP explainable AI methods that explain the model using the feature relevancy in the model. As we see here below the figure shows the feature importance of each feature for each class. Fig. 13. Explainable AI with SHAP 5. Conclusion and Recommendation Dry beans belong to the diverse Fabaceae family, sometimes referred to as Leguminosae, and they are the most important and the most produced pulse in the world. It is originally from America, while there is a wide genetic diversity in the world since, in the 15th and 16th centuries, they were transported to Europe and Africa and quickly spread to the rest of the globe. There are numerous genetic diversities of dry beans, and it is the most produced one among the edible legume crops in the world. According to the Turkish Standards Institution, dry beans are classified as Barbunya, Battal, Bombay, Calı, Dermason, Horoz, Tombul, Selanik, and Seker” based on their botanical characteristics. This study aimed to develop an explainable and interpretable classification model for 34 dry beans using a soft voting classifier and compare the performance with other classic and ensemble machine learning algorithms. The data source for this research is publicly available datasets on Kaggle. After applying the data preprocessing task, out of 13611 instances with 16 features and one class level, 13543 instances with 16 features were used for developing the classification model, and after handling class imbalance using SMOTE + Tomek, 7655 instances were used for the model. We checked the multicollinearity of each feature using variance inflation factors to check the significance of each feature, and we concluded that all the features were significant. The proposed model was constructed using soft voting classifiers, decision trees, random forests, extreme gradient boosting, cat boost, and LGBM algorithms using the balanced and unbalanced dataset. To conduct this study, we have done a total of twelve experiments. The performances of the models are evaluated using accuracy, precision, recall, and f1_score evaluation metrics. We have also explained the classification model using LIME and SHAP feature relevancy explanation techniques, to enhance the explainability and interpretability of the classification model by solving the black-box nature of the algorithms. In this study, the best classification model is identified using the accuracy of the developed classification model. Then, XGBoost is selected as the best algorithm that classifies the dry bean using the balanced dataset with 92.5065% accuracy. At the end of this conclusion, the researcher recommended that other researchers do: A dry bean classification model by including additional features of the dry bean like 3D features or the suture axis of the bean. The future researcher can also conduct a dry bean classification model using any other advanced algorithms to improve the performances and develop a mobile application. Declaration on Generative AI The author(s) have not employed any Generative AI tools. References [1] A. Desole, “Dry Bean Dataset Analysis,” Math. Mach. Learn., 2022. [2] S. K. Sathe, “Dry bean protein functionality,” Crit. Rev. Biotechnol., vol. 22, no. 2, pp. 175–223, 2002, doi: 10.1080/07388550290789487. [3] Y. Long, A. Bassett, K. Cichy, A. Thompson, and D. Morris, “Bean split ratio for dry bean canning quality and variety analysis,” IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work., vol. 2019-June, pp. 2665–2668, 2019, doi: 10.1109/CVPRW.2019.00323. [4] M. Koklu and I. A. Ozkan, “Multiclass classification of dry beans using computer vision and machine learning techniques,” Comput. Electron. Agric., vol. 174, no. May, p. 105507, 2020, doi: 10.1016/j.compag.2020.105507. [5] G. Słowiński, “Dry beans classification using machine learning,” CEUR Workshop Proc., vol. 2951, pp. 166–173, 2021. [6] M. Moshinsky, “Dry Bean Classification,” Nucl. Phys., vol. 13, no. 1, pp. 104–116, 1959. [7] M. Salauddin Khan et al., “Comparison of multiclass classification techniques using dry bean dataset,” Int. J. Cogn. Comput. Eng., vol. 4, no. March 2022, pp. 6–20, 2023, doi: 10.1016/j.ijcce.2023.01.002. [8] P. E. D. Love, W. Fang, J. Matthews, S. Porter, H. Luo, and L. Ding, “Explainable Artificial Intelligence ( XAI ): Precepts , Methods , and Opportunities for Research in Construction Explainable Artificial Intelligence ( XAI ): Precepts , Methods , and Opportunities for Research in Construction,” pp. 1–58, 2022. [9] B. E. Dejene, T. M. Abuhay, and D. S. Bogale, “Predicting the level of anemia among Ethiopian pregnant women using homogeneous ensemble machine learning algorithm,” BMC Med. Inform. Decis. Mak., vol. 22, no. 1, pp. 1–11, 2022, doi: 10.1186/s12911-022-01992-6. [10] Anynomous, “Data Preprocessing Techniques for Data Mining,” Science (80-. )., p. 6, 2011. [11] A. M. Dymond, R. W. Coger, and E. A. Serafetinides, “Data preprocessing applied to human average visual evoked potential P100-N140 amplitude, latency, and slope,” Psychiatry Res., vol. 3, no. 3, pp. 315–322, 1980, doi: 10.1016/0165-1781(80)90061-X. [12] N. H. Son, “Data cleaning and Data preprocessing,” 2011, [Online]. Available: http://www.mimuw.edu.pl/~son/datamining/DM/4-preprocess.pdf 35 [13] S. B. Kotsiantis and D. Kanellopoulos, “Data preprocessing for supervised leaning,” Int. J. …, vol. 1, no. 2, pp. 1–7, 2006, doi: 10.1080/02331931003692557. [14] S. Manikandan, “Data transformation,” J. Pharmacol. Pharmacother., vol. 1, no. 2, p. 126, 2010, doi: 10.4103/0976-500x.72373. [15] J. W. Osborne, “Notes on the use of data transformations,” Pract. Assessment, Res. Eval., vol. 8, no. 6, 2003. [16] I. Journal and C. Science, “Class Imbalance Problem in Data Mining : Review,” vol. 2, no. 1, 2013. [17] R. P. Ribeiro, “SMOTE for Regression,” no. October 2015, 2013, doi: 10.1007/978-3-642-40669-0. [18] E. F. Swana, W. Doorsamy, and P. Bokoro, “Tomek Link and SMOTE Approaches for Machine Fault Classification with an Imbalanced Dataset,” Sensors, vol. 22, no. 9, 2022, doi: 10.3390/s22093246. [19] “Imbalanced Classification in Python: SMOTE-Tomek Links Method | by Raden Aurelius Andhika Viadinugroho | Towards Data Science.” Accessed: Mar. 30, 2023. [Online]. Available: https://towardsdatascience.com/imbalanced-classification-in-python-smote-tomek-links- method-6e48dfe69bbc [20] “Training, Validation and Testing Data Explained | Applause.” Accessed: Aug. 16, 2021. [Online]. Available: https://www.applause.com/blog/training-data-validation-data-vs-test-data [21] M. K. Uçar, M. Nour, H. Sindi, and K. Polat, “The Effect of Training and Testing Process on Machine Learning in Biomedical Datasets,” Math. Probl. Eng., vol. 2020, 2020, doi: 10.1155/2020/2836236. [22] M. J. Healy, “Statistics from the inside. 15. Multiple regression (1).,” Arch. Dis. Child., vol. 73, no. 2, pp. 177–181, 1995, doi: 10.1136/adc.73.2.177. [23] R. G. Mantovani, A. L. D. Rossi, E. Alcobaça, J. C. Gertrudes, S. B. Junior, and A. C. P. de L. F. de Carvalho, “Rethinking Defaults Values: a Low Cost and Efficient Strategy to Define Hyperparameters,” 2020, [Online]. Available: http://arxiv.org/abs/2008.00025 [24] B. H. Shekar and G. Dagnew, “Grid search-based hyperparameter tuning and classification of microarray cancer data,” 2019 2nd Int. Conf. Adv. Comput. Commun. Paradig. ICACCP 2019, no. February, pp. 1–8, 2019, doi: 10.1109/ICACCP.2019.8882943. 36