Evidential Nearest Neighbours in Active Learning Daniel Zhu1 , Arnaud Martin1 , Yolande Le Gall1 , Jean-Christophe Dubois1 , and Vincent Lemaire2 1 Univ Rennes, CNRS, IRISA, DRUID, rue E. Branly, Lannion firstname.lastname@irisa.fr 2 Orange Labs, France, vincent.lemaire@orange.com Abstract. Active learning is a subfield of machine learning which allows to reduce the amount of data necessary to train a classifier. The train- ing set is built in an iterative way such that only the most significant and informative data are used and labeled by an external person called oracle. It is furthermore possible to use active learning with the theory of belief functions in order to take erroneous labels due to the oracle’s uncertainty and imprecision into account in order to limit their influence on the classifier’s performance. In this article, we compare the classifier of the k nearest neighbours (kNN) to a variant based on belief functions from the theory of belief functions (EkNN), in a situation where some labels have been noised in order to model uncertain labels. We show that although the superiority of EkNN over kNN is not systematic, there are some interesting and modest results supporting the relevance of belief functions in active learning. Keywords: Active Learning · Belief Functions · Theory of Dempster- Shafer · Nearest Neighbours 1 Introduction In supervised machine learning, the size of the training set, i.e. the number of labeled examples, is often correlated to the performance of the learned model. Although having access to large database is no longer difficult nowadays, labeling the data remains an expensive task, especially when the application domain requires some expertise. Active learning offers a solution to this issue by reducing the number of labeled examples and ensuring that data to be labeled is selected by the model or a strategy [2,12]. Some classifiers can by combined with belief functions from the theory of Dempster-Shafer [5] in order to take the uncertainty and the imprecision in the labels into account, when the oracle – the person in charge of the labeling task – is not necessarily proficient in the domain. In this paper, our contribution consists in the use of an evidential variant of the k nearest neighbours classifier which involves belief functions in active learning. More precisely, we compare this evidential classifier to the common k nearest neighbours in a context where the labels provided by the oracle are uncertain. © 2021 for this paper by its authors. Use permitted under CC BY 4.0. 2 36 Zhu et D. Zhu, A.al. Martin, Y. Le Gall, J.-C. Dubois, V. Lemaire This article is organised as follow. First of all, we introduce active learning and belief functions in section 2. Our contribution is then described in section 3 and our experiments and results are presented in section 4. Finaly the last section 5 concludes this paper. 2 State of the Art In this section, we first introduce some notions about active learning (section 2.1) before dealing with the theory of belief functions (section 2.2). 2.1 Active Learning Active learning (AL) is a subfield of machine learning which allows to limit the amount of training data to train a classifier. Its specificity lies in the construction step by step of a reduced training set, with a limited amount of information [10], by choosing only the most relevant and informative samples which provide an increasing in performances [2,12]. For a given classification problem, let us consider X ⊂ Rd , the set of samples described by d ∈ N∗ features and Y, the set of the different classes. The labeled and unlabeled samples are respectively gathered in L and U such that X = L∪U and L ∩ U = ∅. The aim is to label the minimum amount of data required by the model to reach a given performance or the best performance given a budget. The classifier first selects the sample x ∈ U whose contribution to the model is supposed to be the most significant before asking for its label to the oracle; this step is called a query. Once the label y ∈ Y of x is provided, the classifier learns it and updates its knowledge, moving x from U to L. Queries are formulated repeatedly in the same way until a certain stop criterion is satisfied. At the end of the complete learning process, it is highly probable that U still contains a lot of samples but it is a matter of little importance as the classifier does not need to learn the whole dataset to perform efficiently. This strategy is known as the pool-based sampling [8] and will be used there- after. There are several ways to select and to evaluate the relevance or the “in- formativeness” of a sample. The utility measures defined by the active learning strategies in the literature [12] differ in their positioning according to a dilemma between the exploitation of the current classifier and the exploration of the train- ing data. Among these strategies, the uncertainty sampling, more dedicated to the exploitation part, is one of the most popular and can be divided into three main different types: – the least confident prediction: the sample x∗LC which minimises the proba- bility of classification of its most probable class is queried: x∗LC = arg max 1 − P (yx |x) (1) x∈U Evidential Nearest Neighbours in Active Learning 3 37 With yx being the most probable class for x according to the classifier. However, this method depends only on the most probable class and do not take the other classes into consideration; – the margin sampling: the sample x∗M which minimises the difference between the probability of classification of the two most probable classes is considered as the most uncertain and will be sent to the oracle: x∗M = arg min P (yx(1) |x) − P (yx(2) |x) (2) x∈U (1) (2) With yx and yx being respectively the most and the second most probable classes for x ∈ U according to the classifier. The idea is quite natural: if the difference between the probabilities – the margin – is small, then the classification of x would be considered as ambiguous; – the entropy sampling: the sample x∗H which maximises the entropy of Shan- non is considered as the most uncertain and will be sent to the oracle: X x∗H = arg max − P (y|x) log(P (y|x)) (3) x∈U y∈C Unlike the previous methods, this approach considers every class and not only the most probable ones. One can also notice that the uncertainty could be “divided” or viewed in two distincts parts [7]: aleatoric uncertainty which does not depends on the oracle’s knowledge but rather, is inherent to random phenomena – such as tossing a coin; and epistemic uncertainty which depends on the oracle’s knowledge and ignorance – for example, distinguishing two different species of bird. The second type of uncertainty is more appropriate in our context of active learning as the labeling task strongly relies on the oracle’s proficiency in the domain; in partic- ular, it is possible to adapt the uncertainty sampling to epistemic uncertainty [9]. It is however important to notice that the uncertainty described in this sec- tion is related to the classifier and not to the oracle. The latter’s lack of knowledge and epistemic uncertainty might be modeled with the theory of belief functions introduced in the next section. 2.2 Theory of belief functions The theory of belief functions also known as theory of Dempster-Shafer is an ideal tool for modeling uncertainty and imprecision [4,13]. Both of these imperfections might be common in labels when the oracle is not an expert in the domain. It is therefore necessary to model these phenomena. Let us consider a set Θ called the frame of discernment, containing the ele- mentary and exclusive hypotheses of a given problem. In the context of classifi- cation, it would be the set of classes, therefore: Θ = Y. 4 38 Zhu et D. Zhu, A.al. Martin, Y. Le Gall, J.-C. Dubois, V. Lemaire The theory is based on belief functions which are defined from 2Θ to [0, 1], with 2Θ the power set of Θ. The basic belief assignment (BBA) m : 2Θ → [0, 1] is a belief function which satisfies the normalisation condition: X m(X) = 1 (4) X∈2Θ A BBA allows to assign elementary belief on different combinations of hypoth- esis. For exemple, if we consider X = {θ1 , θ2 , θ3 } ∈ 2Θ , then m(X) is the confi- dence assigned to θ1 , θ2 and θ3 altogether and cannot be subdivided among the different sub-hypothesis {θ1 }, {θ2 } or {θ3 }; thus, m(X) supports the veracity of X as a whole. In particular, if m(∅) = 0, then the assumption of closed world is true, mean- ing that the frame of discernment is exhaustive. On the contrary, the assumption of open world prevails if m(∅) ̸= 0, which implies that unknown hypothesis exist and may not belong to Θ. In this article, the world will be supposed as closed. The uncertainty about an hypothesis θ ∈ Θ is modeled by the value of the BBA: the higher the BBA is (close to 1) and the more confidence there is in θ. However, if the value of BBA is low (close to 0), it means that few evidences support θ. The imprecision happens when the oracle hesitates between several hypotheses. This phenomenon is modeled by non empty and non singleton values of 2Θ and can be extended to the situation of total ignorance, in which case, the entire belief is assigned to Θ in the following way: m(Θ) = 1 and m(X) = 0 for all X ∈ 2Θ −{Θ}. In a such context, every hypothesis is possible. For example, let us consider a classification problem of 10 classes so that Θ = Y = {θi | i ∈ J1, 10K} and let us note A = {θ1 , θ5 , θ8 } ∈ 2Θ . If, for a given sample x ∈ X , the oracle believes that the class of x might be either θ1 , θ5 or θ8 without being able to determine which one is the most likely and without having evidences supporting other classes, then a BBA mx : 2Θ → [0, 1] can be defined for x in the following way: mx (A) = s and mx (Θ) = 1 − s, with s ∈]0, 1[. Assigning a certain amount of belief s ̸= 1 in A represents the uncertainty. As A is the union of three different classes, that means that the oracle is furthermore imprecise; he does not particularly favour one answer among the three. Finally, the remaining 1 − s of the belief is assigned to Θ, which models the ignorance of the oracle. The core feature of the Dempster-Shafer’s theory is the conjunctive rule of combination of Dempster. It allows the combination of several BBA defined from the same frame of discernment. As a result, the hypotheses on which the BBA agree are enhanced. Let us consider l ∈ N∗ , the BBAs (mi )i∈J1,lK defined on the same frame of discernment Θ will be combined into a single BBA m⊕ : X l Y ∀X ∈ 2Θ m⊕ (X) = mk (Xk ) (5) X1 ∩...∩Xl =X k=1 Even though each BBA mi respects the assumption of closed world, m⊕ might not. It is possible to normalise m⊕ in order to restore the closed world assump- Evidential Nearest Neighbours in Active Learning 5 39 tion: ( m (X) ⊕ Θ 1−m⊕ (∅) if X ̸= ∅, ∀X ∈ 2 mNorm (X) = (6) 0 otherwise. This rule of combination might be useful in decisions’ rules. It is therefore possible to define classifiers based on belief functions as described in the next section. 3 An evidential classifier in active learning Active learning is a paradigm of machine learning which reduces the amount of training data necessary to train the classifier. As uncertain oracle leads to errors in the labeling task, by coupling AL and the theory of belief functions [15], one can expect a certain robustness towards incorrect labels. In this section, we present the evidential k nearest neighbours introduced for the first time in [5] (section 3.1) before explaining our approach and its interests in active learning (section 3.2). 3.1 Evidential nearest neighbours The evidential k nearest neighbours classifier (EkNN) is a variant of the classical k nearest neighbours (kNN) based on belief functions [5]. Let us consider x ∈ U and x̃ ∈ L, one of the k nearest neighbours of x according to the euclidian distance. It is possible to define a BBA mx,x̃ which supports the sole hypothesis that x and x̃ belong to the same class θ ∈ Θ. A such BBA must also deal with the distance between x and x̃: the closer they are and the stronger the belief is. Therefore, the BBA might be defined as follow:   α exp(−γθ d(x, x̃)) if X = {θ}, ∀X ∈ 2Θ mx,x̃ (X) = 1 − α exp(−γθ d(x, x̃)) if X = Θ, (7)   0 otherwise. with α and γθ parameters and d(x, x̃) the Euclidean distance between x and x̃. In the original paper [5], it is recommanded to set α to 0.95 and γθ to the inverse of the mean distance between every training samples from the same class θ. The combination rule of Dempster is then applied among the BBA which support the same classes. For each class, we get the BBA (mx,(θ) )θ∈Θ . Finally, the rule of combination is applied one more time among every BBA (mx,(θ) )θ∈Θ and we get mx which aggregates every original BBA supporting different classes. The decision rule is then the following: ∀x ∈ U y = arg max mx ({θ}) (8) θ∈Θ In the context of active learning, it is necessary to take into account the time complexity of the training phase, as the classifier is updated after each query. 6 40 Zhu et D. Zhu, A.al. Martin, Y. Le Gall, J.-C. Dubois, V. Lemaire In the implementation of EkNN used in this article, the training phase consists to store the training data, then, to compute the γθ parameters which take the value of the inverse of the mean distance between every pair of samples from the same class θ ∈ Θ. If the euclidian distance is used, then the number of features d will necessarily influence the time complexity. Let us consider nθ ∈ N∗ , the number of samples whose class is θ. There are Nθ = nθ (n2θ −1) unique couples of samples, which is also the number of distances to be computed. So the time complexity to compute the mean distance of the class θ is in O((d + 1)Nθ ). The global time complexity P of the training P phase (including  every class P of Θ) is then in O (d + 1) θ∈Θ Nθ = O d+1 2 2 θ∈Θ (nθ − nθ ) = O(d 2 θ∈Θ nθ ). As the training set grows by adding samples, it is not necessary to recalculate the distances that have already been computed; in order to update the mean distance, adding the distances of the new training data and weighting them accordingly is sufficient. In this context, the fact that the class of an unkown sample x might be one of its neighbour’s would be a form of imprecision if there are several different classes. Moreover, the uncertainty could be deduced from the distance: the closer the neighbour is the more believable it is that x belongs to the same class as its neighbour. However, uncertainty and imprecision do not directly depend on the oracle in this situation but only on the training set used by EkNN. 3.2 Use of a belief functions-based classifier in active learning The interest in using the theory of belief functions is to model uncertainty and imprecision in the data used in active learning. In particular, it becomes possi- ble to model the ignorance, which would be difficult in the classical theory of probability [13]. In the context of crowdsourcing, [1] and [14] applied the theory of belief functions to model the contributors’ uncertainty. The approach described in this article consists to use EkNN in the context of active learning where some labels provided by the oracle are false. Parameters such as level of confidence or expertise estimation can not be taken into account in a such configuration as far as we know. Therefore, the oracle’s uncertainty is not modeled in this article. The approach tends rather to limit the influence of erroneous labels given by the oracle. Once the learning phase is over, with some samples being mislabeled, the EkNN classifier uses the density of the distribution and its distances to modulate the influence of each neighbor of a sample to be labeled. It is finally the use of the combination rule (eq. (5)) that will contribute to enhance the hypotheses where the BBAs are in agreement, and therefore, to limit the indirect effect of the oracle’s uncertainty. Let us consider a dataset of two classes Θ = {θ1 , θ2 } and the classifier EkNN with k = 5. Let x be a sample to be labeled and its 5 nearest neighbours (xi )i∈J1,5K represented in figure 1. Whether the oracle mislabeled the samples (case A) or not (case B), the goal after the learning phase for the classifier is to find the actual class of x. In case B, two neighbours have been mislabeled, therefore, class 1 is majoritary, but the two remaining class 2 (correctly labeled) are closer to x. The classifier will take the distance into account and attribute a Evidential Nearest Neighbours in Active Learning 7 41 greater belief on θ2 even though the actual class is minoritary in the neighbour- hood (see table 2). The coordinates of the samples and their distances to x are described in table 1. Fig. 1. A sample x to be labeled and its 5 nearest neighbours (within the circle). In case A, every sample has its actual label but in case B, two neighbours have been mislabeled by the oracle. Table 1. Coordinates and distances to x of its 5 nearest neighbours of figure 1. Sample x x1 x2 x3 x4 x5 Coordinate (1.5, 0.23) (1.7, 0.15) (1.3, 0.41) (1.5, −0.39) (1.3, −0.51) (0.82, 0.35) Distance to x 0 0.22 0.27 0.70 0.62 0.77 Table 2. Values of mx of figure 1. mx (∅) mx ({θ1 }) mx ({θ2 }) mx (Θ) Case A 0 5.97 · 10−3 0.919 7.49 · 10−2 Case B 0 5.05 · 10−2 0.808 0.142 4 Experiments In this section, we prove that EkNN is a viable classifier in active learning and that its robustness to uncertain labels is interesting. We first present the method- ology and protocol used in our experiment (section 4.1) and then discuss and interpret the results (section 4.2). 8 42 Zhu et D. Zhu, A.al. Martin, Y. Le Gall, J.-C. Dubois, V. Lemaire 4.1 Methodology The EkNN classifier will be compared to kNN in order to highlight the perfor- mance of the former in active learning. The value of the nearest neighbours k is difficult to choose. A very small value will make the data very sensitive to noise while a larger value will lead to heavy computations. In this article, the value k will be arbitrarily set to 5 for each experiment and each dataset. These classifiers will also be used in association with uncertainty sampling (least confi- dent prediction) and random sampling to select data for labeling. The secondary sampling method involves selecting the data to be labeled randomly so that its potential contribution or relevance to the model is ignored. Random sampling is often used to highlight a better efficiency of active learning’s sampling methods. Several datasets are used to evaluate the classifiers’ behaviour. Synthetic random data have been generated from scikit-learn’s library [11] while some real datasets have been extracted from the UCI repository [6]. The different datasets are presented in tables 3 and 4. Table 3. Synthetic datasets used in the experiments. Name #samples #classes Class distribution #features Synthetic A 1,000 2 90 %-10 % 10 Synthetic B 1,000 5 50 %-20 %-15 %-10 %-5 % 10 Synthetic C 1,000 5 75 %-10 %-5 %-5 %-5 % 10 Table 4. Real datasets used in the experiments. In the legend, LRC and MRC stand for “least represented class” and “most represented class”. Name #samples #classes LRC MRC #features Speaker Accent Recognition 329 5 8.8 % 50 % 12 HCV 615 5 1.1 % 87 % 14 Letter Recognition (V vs. Y) 1,550 2 49 % 5.5 % 15 Wine Quality (Red Wine) 1,599 6 0.63 % 43 % 11 Pen-Based Recognition of Hand- 3,166 3 33 % 33 % 16 written Digits (3 vs. 6 vs. 8) To evaluate the performance of a classifier coupled to a given sampling method, the accuracy criterion is often used, but this metrics is not always relevant, especially when the class distribution is unbalanced [3]. An alterna- tive is to use the balanced accuracy, denoted by ab and defined in the following equation:   1 TP TN ab = + (9) 2 P N With T P , T N , P and N being respectively the amount of true positive, true negative, of samples from the positive class and samples from the negative class. Evidential Nearest Neighbours in Active Learning 9 43 The results for a combination of a classifier and a sampling method will be plot under the form of a learning curve of balanced accuracy according to the number of queries. First, a simple comparison between EkNN and kNN is made according to the following protocol. The initial dataset X is split into a test set T consisting of 25 % of the samples while a “training” set contains the remaining samples. From the latter, 5 samples of each class is drawn, forming the bootstrap B, and will be used to pre-train the classifier. The remaining of the “training” set becomes the pool P so X = T ∪ B ∪ P. Then, the active learning phase begins: the classifier will forge queries from P by selecting the sample according to a given sampling method (uncertainty or random sampling) until the number of 150 queries is reached. After each query, the oracle gives the label of the requested sample and the classifier updates its knowledge. Finally, the classifier computes the balanced accuracy on T and adds it to the learning curve before making another query. The whole process is repeated 20 times such that an averaged learning curve is computed over the 20 learning curves for a given combination of classifier-sampling method. Second, in order to add some uncertainty to the answers provided by the oracle, noise will be added to the labels, meaning that some of them are replaced by false values. Noise might not always be caused by uncertainty (the oracle can still provide false label while being confident and certain) but it will be sufficient in the current configuration as the uncertainty is studied through its consequences. The protocol used is the same as in the previous experiment except that a copy of the labels is generated with t % of them being noised. To answer queries, the oracle uses this noised copy instead of the original labels. Finally, the noised curves will be compared to the curves corresponding to the data that have not been noised. 4.2 Results The comparison between EkNN and kNN in figure 2, whether with uncertainty or random sampling, suggests that the contribution of belief functions is most often interesting in AL. It is important to highlight that the EkNN is always superior or equivalent to kNN. This is the case for each dataset except for Speaker Accent Recognition where EkNN’s random sampling has significant low performance compared to the other curves. However, the difference is not always significant as the confidence intervals are often overlapping. Due to the definition of the balanced accuracy, the dataset with balanced class distribution presents higher score while the highly unbalanced one generate more learning difficulties. As a sidenote, the fact that uncertainty sampling is superior to random sampling is an expected behaviour, otherwise AL would not be an interesting learning paradigm. When noise is added to label, it is worth mentioning that on dataset that are easy to classify, the confidence intervals of noised data is wider than non-noised data (Hard). This might be explained by the fact that noised data leads to bigger variance among results, and thus to a less precise balanced accuracy as it can 10 44 Zhu et D. Zhu, A.al. Martin, Y. Le Gall, J.-C. Dubois, V. Lemaire Fig. 2. Comparison of the balanced accuracy (with confidence intervals) between EkNN and kNN through uncertainty sampling (UNC) and random sampling (RD). Evidential Nearest Neighbours in Active Learning 11 45 be seen in figure 4. Although, this is not always the case as shown in figures 3 and 5. Predictably in such cases, when the noise rate is high, the Hard and Noised curves of a same classifier move away from each other. It is more interesting to compare EkNN and kNN’s Noised curves. Again, the former’s curve is often above or at the same level as the latter’s curve. The gap between the confidence intervals, however, does not appear to be large as they often overlap each other; the gap is wider in figure 5. Therefore, it appears that EkNN is slightly more robust to noise than kNN. Fig. 3. Comparison of the balanced accuracy (with confidence intervals) between EkNN and kNN on HCV dataset through noised and non-noised (Hard) labels. 12 46 Zhu et D. Zhu, A.al. Martin, Y. Le Gall, J.-C. Dubois, V. Lemaire Fig. 4. Comparison of the balanced accuracy (with confidence intervals) between EkNN and kNN on Letter Recognition dataset through noised and non-noised (Hard) labels. Evidential Nearest Neighbours in Active Learning 13 47 Fig. 5. Comparison of the balanced accuracy (with confidence intervals) between EkNN and kNN on synthetic dataset through noised and non-noised (Hard) labels. 14 48 Zhu et D. Zhu, A.al. Martin, Y. Le Gall, J.-C. Dubois, V. Lemaire 5 Conclusion Active Learning is a subfield of Machine Learning that aims to reduce the size of the training set and the amount of labels required. This paradigm might be coupled to the theory of belief functions in order to have a way to model uncertainty and imprecision amongst the data. In this article, we attempt to show the efficiency of a classifier based on belief functions, EkNN, compared to a more common classifier, kNN. Our results suggest that in the majority of the experiments there is a real contribution of the EkNN classifier. The imperfect label discussed in this article mostly covered the uncertainty aspect with the use of noised label. Further experiments on imprecision could be done in future work to complete this paper. This could be achieved by allowing the oracle to propose several classes instead of one after each query. However, a more flexible classifier would be required to treat imprecision as EkNN is not particularly adapted to deal with several classes per sample. Besides, the EkNN classifier requires heavy computation in AL as the distances between samples are re-computed after each query. Thus, it might be interesting to design a more efficient and adapted belief functions-based classifier for AL in order to treat both uncertainty and imprecision. References 1. Abassi, L., Boukhris, I.: A worker clustering-based approach of label aggrega- tion under the belief function theory. Applied Intelligence 49(1), 53–62 (2019). https://doi.org/10.1007/s10489-018-1209-z 2. Bondu, A., Lemaire, V.: État de l’art sur les méthodes statistiques d’apprentissage actif. In: Apprentissage Artificiel et Fouille de Données, AAFD (2006) 3. Brodersen, K.H., Ong, C.S., Stephan, K.E., Buhmann, J.M.: The balanced accu- racy and its posterior distribution. In: 2010 20th International Conference on Pat- tern Recognition. pp. 3121–3124 (2010). https://doi.org/10.1109/ICPR.2010.764 4. Dempster, A.P.: Upper and lower probabilities induced by a mul- tivalued mapping. Ann. Math. Statist. 38(2), 325–339 (04 1967). https://doi.org/10.1214/aoms/1177698950, https://doi.org/10.1214/aoms/ 1177698950 5. Denoeux, T.: A k-nearest neighbor classification rule based on dempster-shafer the- ory. IEEE Transactions on Systems, Man, and Cybernetics 25(5), 804–813 (1995). https://doi.org/10.1109/21.376493 6. Dua, D., Graff, C.: UCI machine learning repository (2017), http://archive.ics. uci.edu/ml 7. Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Machine Learning 110(3), 457–506 (2021). https://doi.org/10.1007/s10994-021-05946-3, https://doi.org/ 10.1007/s10994-021-05946-3 8. Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR ’94. pp. 3–12. Springer London, London (1994) Evidential Nearest Neighbours in Active Learning 15 49 9. Nguyen, V.L., Destercke, S., Hüllermeier, E.: Epistemic uncertainty sampling. In: Kralj Novak, P., Šmuc, T., Džeroski, S. (eds.) Discovery Science. pp. 72–86. Springer International Publishing, Cham (2019) 10. Nodet, P., Lemaire, V., Bondu, A., Cornuéjols, A., Ouorou, A.: From Weakly Supervised Learning to Biquality Learning: an Introduction. In: In Proceedings of the International Joint Conference on Neural Networks (IJCNN) (2021) 11. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011) 12. Settles, B.: Active learning literature survey. Computer Sciences Technical Re- port 1648, University of Wisconsin-Madison (2009) 13. Shafer, G.: A mathematical theory of evidence, vol. 42. Princeton university press (1976) 14. Thierry, C., Casiez, G., Dubois, J.C., Le Gall, Y., Malacria, S., Martin, A., Pietrzak, T., Uro, P.: Interface de Recueil de Données Imparfaites pour le Crowd- Sourcing. EGC 2020 - Humains et IA, travailler en intelligence Atelier de la conférence (Jan 2020), https://hal.inria.fr/hal-02465761 15. Zhu, D., Martin, A., Dubois, J.C., Le Gall, Y., Lemaire, V.: Modèle crédibiliste pour l’échantillonnage en apprentissage actif. In: Rencontres francophones sur la logique floue et ses applications, (LFA) (2021)