=Paper=
{{Paper
|id=Vol-2212/paper10
|storemode=property
|title=The ensemble of algorithms for coronary heart disease detection based on electrocardiogram
|pdfUrl=https://ceur-ws.org/Vol-2212/paper10.pdf
|volume=Vol-2212
|authors=Valeriia Guryanova
}}
==The ensemble of algorithms for coronary heart disease detection based on electrocardiogram ==
The ensemble of algorithms for coronary heart disease detection based on electrocardiogram V N Guryanova1 1 Lomonosov Moscow State University, Leninskie Gory 1, Moscow, Russia, 119991 Abstract. Coronary heart disease (CHD) is the leading cause of death in the world. This disease can be asymptomatic for a long time and over time can progress and result in death. Today electrocardiogram (ECG) can be done at home with the help of special equipment from CardioQvark. In this paper the possibility of CHD detection based on such ECGs was explored. Different approaches to the classification of such electrocardiograms were surveyed. New algorithms and modifications to existing algorithms were proposed. A new method – the ensemble of different algorithms – has shown the best performance. 1. Introduction Coronary heart disease(CHD) [1] is a group of diseases, that is defined by lack of oxygen supply to the heart muscle through the coronary arteries. According to World Health Organization, this disease is the leading cause of death in the world. At the initial stages of the disease, most people do not show any symptoms of this disease. It is very important to identify the CHD in time to slow the course of the disease and prevent the patient’s death. Traditionally, CHD can be detected with the help of specialists and a number of tests. It should be noted that these tests take a significant amount of patient’s time, and also require high qualification of the specialist who will conduct them. Since there are very few specialists and the number of potential patients is growing every year, the task of automatically determining CHD is extremely urgent at the present time. Currently, it is extremely relevant to create a device that will help determine the disease or its probability at home. Such devices will allow the person to be sent to a doctor in case of a high probability of having a CHD. An electrocardiogram (ECG) – is a signal that reflects the electrical activity of the heart. ECG is one of the most affordable ways of diagnosing heart disease now due to its non-invasiveness and low cost. Currently, there are many different studies that show that the ECG can be used to determine CHD [2], [3], [4], [5]. CardioQvark (project site: www.cardioqvark.ru) has created a device in a form of a smartphone case that allows you to make ECG measurements at home. The CardioQVARK device is a portable electrocardiograph in the form of a smartphone case (iPhone 5 / 5s / SE / 6 / 6s), allowing to register data of bio-electrical activity of the heart from the first ECG lead and next leads: aVR, aVL, aVF, Vi (i = 1 ... 6) using the patient’s cable. In this work, the possibility of CHD detection based on such ECGs from the first lead was explored. IV International Conference on "Information Technology and Nanotechnology" (ITNT-2018) Data Science V N Guryanova There are different approaches to ECG classification problem. Some of them were surveyed in this work. New algorithms and modifications to existing algorithms were proposed. In order to improve the performance of the classification, an ensemble of 5 different methods was built. Each of these methods will be described below. 2. Data Description All research was conducted on the basis of the following medical centers: NGHCI ”Semashko Central Clinical Hospital 2”, Federal State Scientific Institution ”Petrovsky Russian Scientific Center of Surgery”, Federal State-Funded Health Care Institution ”City Clinical Hospital 4 Health Care Moscow Department”, Federal State-Funded Health Care Institution ”Moscow Clinical Scientific Center of Moscow Department Health Care”, State Autonomous Health Care Institution of the Moscow Region ”Clinical Center for Restorative Medicine and Rehabilitation”. A voluntary anonymous study included patients over 18 years of age. Annotated impersonal electrocardiograms (ECG) were recorded from the first ECG-lead using a CardioQVARK cardio monitor. The duration of each recording was 5 minutes. The measurement was taken in the sitting position, with support for the back, hands on the knees or on the table. Data was collected in dynamics of 3-10 observations with an interval of at least 12 hours between measurements. The sample that was used for this task consists of 1798 cardiograms. It contained 1055 cardiograms of healthy patients and 743 cardiograms of patients with CHD. The sampling frequency was 1000 Hz. Signals were preprocessed before applying machine learning algorithms. For preprocessing, a low-pass and high-pass Butterworth filters [6] of the second order were used. For low-pass filter cutoff frequency was 0.3 Hz. For high-pass filter cutoff frequency was 15 Hz. The signal trend was extracted using a median filter [6]. Then the trend was subtracted from the preprocessed signal. 3. Algorithms Description Below are descriptions of the algorithms that were used to build the ensemble. 3.1. The algorithm based on the HRV signal The idea which was used for this algorithm was described in the article [2]. In the ECG signal, R-peaks can be distinguished, which correspond to the person’s pulse [6] . ECG signal is used to create heart rate variability signal (HRV signal). It is calculated as follows. • R-peaks are computed. • The intervals between two R-peaks (RR-intervals) are measured. • Each value of the RR-interval is converted to 60/RR. The main idea of this method is a construction of various groups of features from HRV signal. The first group of features includes various entropic features, which indicate a measure of unpredictability in the signal. The following types of entropies are used: approximate entropy, sample entropy, and Shannon entropy. Each type of entropy is described in detail below. Approximate entropy is calculated as follows. Here x = (x0 , x1 , ..., xN −1 ) is the HRV signal of length N . • The integer m and the real r are fixed. • A set of vectors of the form xm i = (xi , xi+1 , ..., xi+m−1 ), where i ∈ [0, N − m] is composed. m • The values Ci (r) are calculated as follows: xm m m k : d xi , xk ≤ r, k ∈ [0, N − m] Cim (r) = , N −m+1 IV International Conference on "Information Technology and Nanotechnology" (ITNT-2018) 74 Data Science V N Guryanova where d xm m xm m i , xk = max i (a) − xk (a) , a∈[0,m−1] where the xm m i (a) − component a of the vector xi . • The values Φm (r) are defined as: N −m+1 Φm (r) = (N − M + 1)−l X log(Cim (r)). i=1 • Approximate entropy (ApEn) is defined as ApEn(r) = Φm (r) − Φm+1 (r). In this paper, the approximate entropy was realized for m = 10 and r = 0.2std(x), std(x) − standard deviation of the signal x. The Sample entropy is calculated as follows. m+1 • Vectors xm i of length m and vectors xi of length m + 1 are formed similar to those that were formed in approximate entropy. • The values A and B are calculated as follows: A(r) = (xm+1 k , xm+1 l ) : d(xm+1 k , xm+1 l ) ≤ r, 0 ≤ k ≤ l ≤ N − m − 1 , B(r) = (xm m m m k , xl ) : d(xk , xl ) ≤ r, 0 ≤ k ≤ l ≤ N − m , where d is determined as in approximate entropy. • Sample entropy (SampEn) is defined as A(r) SampEn (r) = − log . B(r) In this paper, the sample entropy was determined for m = 10, r = 0.2std(x). The Shannon entropy is calculated as k X ShanEn = pf log pf , f =1 where k is the number of different elements in the signal x, pf is the frequency of the element f in the signal x. The following group of features is based on a recurrence plot. This plot shows the frequency and duration of repetitions in the signal. The element (i, j) of the given plot is defined as 1, if ||xi − xj || < ε, or as 0 otherwise, where x is the HRV signal. Based on this plot, the following features are calculated, N − the number of elements in the HRV signal, ld min − the minimum length of the plot diagonal, lv min − minimal length of the vertical line, ld max − maximum length of the diagonal, lv max − maximum length of the vertical line: • Density of points (REC) N 1 X REC = R(i, j). N 2 i,j=0 IV International Conference on "Information Technology and Nanotechnology" (ITNT-2018) 75 Data Science V N Guryanova • The percentage of points that form the diagonal lines (DET) Pld max l=ld min lP (l) DET = PN , i,j R(i, j) P (l) − is the number of diagonals of length l. • The average length of the diagonals (Lmean ) Pld max l=ld min lP (l) Lmean = P . l=ld min P (l) • Entropy of diagonal lines (ENd ) ldX max ENd = − pl log pl , l=ld min where pl is the frequency of diagonal lines of length l. • Entropy of vertical lines (ENv ) lvX max ENv = − pvl log pvl , l=lv min where pvl is the frequency of vertical lines of length l. For the calculation of these features, the PyRQA library [7] was used. Another group of features that is used for this approach is a group of features based on the Poincare plot. The Poincare plot is constructed as follows: for the signal x = (x0 , x1 , ..., xN −1 ), the plot consists of the points (x0 , x1 ), (x1 , x2 ), ..., (xi , xi+1 ) and so on. In this case, RR-intervals were used as a signal x. The following features are constructed: • The standard deviation of distances from the points of the plot to the line y = x. This feature describes the local variability of RR-intervals. • The standard deviation of distances from the points of the plot to the line y = −x − 2RRmean . RRmean − is the average value of RR-intervals. This feature describes the long-term variability of RR-intervals. The feature, which is based on the detrended fluctuation analysis. This method allows determining the self-dependence of the signal. The following cumulative sum is defined as: t X xcumsum (t) = (x(i) − µ), i=1 x is a signal consisting of RR-intervals, µ is the mean of the signal x. The data is segmented with a window of size ∆n. On each segment polynomial is found for the data, which most accurately represents it (usually linear). The union of all such polynomials forms a function x∆n (t), which is an approximation of the original function xcumsum (t). Then there is the following function: IV International Conference on "Information Technology and Nanotechnology" (ITNT-2018) 76 Data Science V N Guryanova v u N u1 X F (∆n) = t [x cumsum (t) − x∆n (t)]2 , N t=1 where N is the length of the signal consisting of RR-intervals. The feature is the slope of the line log F (∆n) to log(∆n). More information about this approach is written in [8]. In this paper, the implementation of this feature was computed using publicly available software Nonlinear measures for dynamical systems or nolds, version 0.3.2, which can be downloaded from (https://pypi.python.org/pypi/nolds). The next feature that was used in this approach is the correlation of dimensions. This feature is a quantitative characteristic of the signal trajectory and is defined as follows. • Vectors xm i of length m are similar to those that were formed in approximate entropy are constructed. • The value g is defined as: g(r) = (xm m m m k , xl ) : d(xl , xk ) ≤ r, 0 ≤ k ≤ l ≤ N − m − 1 , g − is the number of pairs of vectors whose distance is less than or equal to r. • The value C(r) is defined as g(r) C(r) = , N2 where N − is the length of the signal x. • The correlation of dimensions (D2) is defined as log C(r) D2 = lim . r→0 log(r) In this paper, the implementation of this feature was computed using publicly available software Nonlinear measures for dynamical systems or nolds, version 0.3.2, which can be downloaded from (https://pypi.python.org/pypi/nolds). The gradient boosting from the xgboost package [9] was used as the classifier in this approach. 3.2. The algorithm based on 3 different feature spaces This algorithm is a mixture of 3 different feature spaces, which were previously used in the classification of biomedical signals. The first group of features consists of the parameters of the Hjorth’s parameters: activity, mobility, complexity [10]. These parameters were originally used as features for electroencephalograms. Later, these parameters were used in many works, including the classification of ECG signals [11]. The second group of features consists of statistical signal features: mean, standard deviation, signal minimum, signal maximum, skew, kurtosis, selective quantiles of order: 0.1, 0.25, 0.5, 0.75, 0.9, sums and sums of signal values squares that are above / below certain values of quantiles: 0.1, 0.25, 0.5, 0.75, 0.9. The next group of features was suggested by Uspenskiy for the disease detection by patient’s ECG [12]. To calculate these features, it is necessary to calculate the amplitudes of the R-peaks (A(n)), the distances between the R-peaks (T (n)), and the arctangent of their ratio A(n) α(n) = acrtg . T (n) IV International Conference on "Information Technology and Nanotechnology" (ITNT-2018) 77 Data Science V N Guryanova Table 1. Signal encoding for Uspenskiy features A B C D E F A(n + 1) − A(n) + − + − + − T (n + 1) − T (n) + − − + + − α(n + 1) − α(n) + + + − − − It is assumed that the values of A(n), T (n) are not important, but signs of their increments are. The method of signal encoding based on all possible signs of increments of these quantities is presented in Table 1. After the code representation of the signal is received, the three-gram selection is performed. The feature space is the number of occurrences of each of the possible three-grams in a given code sequence derived from the signal. The logistic regression from the scikit-learn package [13] was used as the classifier in this approach. 3.3. The algorithm based on R-peak’s neighborhood The idea used in this algorithm was described in the article [14] for determining the state of the patient’s heart in which he should be sent to the cardiac service. The feature space for this approach is constructed as follows. • The neighborhoods of the signal R-peaks are allocated: 200 points before R-peak and 500 after. • The averaged neighborhood is used as a feature space. Neural network with the architecture described in Table 2 was used as the classification model for this algorithm. Table 2. Neural Network Structure for the algorithm based on R-peaks neighborhoods Input Layer Shape = (700) Dense Layer Units Number = 90 Activation Function = sigmoid Dense Layer Units Number = 1 Activation Function = sigmoid In this work, the neural network was implemented using the libraries Theano [15] and Lasagne [16]. 3.4. The algorithm based on wavelet transformation The idea for this algorithm was described in paper [17] for epilepsy detection based on ECG- signal and for identification of a person based on ECG-signal. Wavelet signal transformation is a convolution of the signal with functions Ψ(t), called wavelets. Such wavelet functions should possess specific properties: IV International Conference on "Information Technology and Nanotechnology" (ITNT-2018) 78 Data Science V N Guryanova Z +∞ Z +∞ Ψ(t)dt = 0 |Ψ(t)|2 dt < ∞. −∞ −∞ Wavelet transformation allows achieving signal compression, with good performance of reproduced original signal [6]. All wavelet functions used in a wavelet transformation can be represented with prototype function ψ(t) using scaling and shift. In case of discreet wavelet transformation all wavelet function can be written as: 1 ψm,n (t) = √ m 2−m t − n . 2 In discrete wavelet transformation of ψm,n ∗ function can be separated into two parts. The two types of functions correspond to coefficients of approximation and detail coefficients. In the present work approximation coefficients are used for new representation of the signal, and Daubechies wavelets [18] are used as a wavelet function. After getting the coefficients of wavelet transformations, local segments are extracted from the signal. The segments are extracted by moving a window of certain length w with step s, with all elements within one window going into a separate segment. After such a procedure every signal is represented as a set of local segments. All local segments in the training set are separated into k clusters using k-means algorithm. After this, every segment is replaced with the number of the cluster it belongs to. This way every signal is represented as a text of codewords, with each word representing the certain cluster. In the implementation of described algorithm these parameter values were used: w = 100, s = 30, k = 200. K-means algorithm implementation from scikit-learn package was used. In the train sample every local segment is replaced with the cluster it is the closest to. That means that for every local segment si in test sample cluster c it belongs to is determined using this formula: v u w uX c = argmin d(bj , si ), d(bj , si ) = t (sk − bk )2 , i j k=1 where bj is a cluster center j, ski (bkj ) is a kth element of local segment i (cluster j). Using this approach and transforming the input signal into text it is possible to use natural language processing algorithms. The paper’s authors that suggested such encoding used the bag of words as a feature space. Feature description is a number of occurrences of each code word in a specific signal. Features based on word2vec technology were used in order to extract dependencies between local segments in the signal. This approach was suggested by Google and it allows to use context-aware text processing, reducing the dimensions of the data [19]. Word2Vec model was trained using the length of embedding vector equals to 80. A mean of all vectors in the signal was used as a feature. The model was trained using package gensim [20]. The logistic regression from scikit-learn package [13] was used as a classification algorithm. 3.5. The algorithm based on bispectrum Bispectrum is a function of two variables f1 and f2 that specify the frequencies, expressed by the following formula [21]: B(f1 , f2 ) = X(f1 )X(f2 )X ∗ (f1 + f2 ), where X(f ) is the Fourier transform of the signal, and X ∗ (f ) is the complex conjugate of it. The signal bispectrum is usually calculated using a fast Fourier transform. A detailed description of the algorithm for the bispectrum computation of the signal can be found in [22]. IV International Conference on "Information Technology and Nanotechnology" (ITNT-2018) 79 Data Science V N Guryanova During calculation the bispectrum of the signal, we obtain a two-dimensional matrix whose elements are complex numbers. Based on the resulting matrix, the elements of which can be denoted as a(i, j), we can associate each signal with a certain image. This image is calculated as follows. A new matrix B = ||bi,j || is calculated, the elements of which are equal to: q b(i, j) = Re2 a(i, j) + Im2 a(i, j), where Re denotes the real part of the complex number, and Im denotes the imaginary part of the complex number. The contour plot of matrix B is used as the image. The authors of the article [23] have shown that coronary heart disease can be detected by analyzing images obtained from a signal bispectrum. The method described in the above article was to allocate the area of the region within the level lines to conclude that the patient had CHD. The results obtained by the authors allow concluding that bispectrum images can be used to detect CHD. It was proposed to use neural networks for classification of such images. The architecture of the neural network is described in Table 3. Table 3. Neural Network Structure for the algorithm based on bispectrum Input Layer Shape = (3, 80, 80) Convolution layer Filter size = (32, 5, 5) Offset = (2,2) Dense Layer Units Number = 30 Activation Function = LeakyRelu Dense Layer Units Number = 1 Activation Function = Sigmoid In this work, the neural network was implemented using the libraries Theano [15] and Lasagne [16]. 4. Methods of constructing ensembles of algorithms In order to increase classification performance, it was suggested to use ensembles of the algorithms. Several methods of ensembling are described in this section. 4.1. Majority voting Given a set of algorithms A = (A1 , A2 , ..., An ) and output a vector of predictions a = (a1 , a2 , ..., an ) then the resulting answer a of the ensemble is equal to a = mode(a1 , a2 , ..., an ), where mode is a statistic, that is equal to the element which is most often encountered in the predictions. If there are several of them a random one of them is chosen. 4.2. EM-algorithm The main idea of the EM-algorithm [24] is data aggregation from different people about the same event in order to get a correct evaluation. Since the goal of ensembling is the aggregation of several different algorithms, the EM-algorithm is applicable to ensemble creation. IV International Conference on "Information Technology and Nanotechnology" (ITNT-2018) 80 Data Science V N Guryanova Algorithm description: N − size of available data, nkil − whether the k-th algorithm has the answer l (l ∈ {1, 2}) to the data i (i = 1...N ), k (j ∈ {1, 2}) − is the probability that the k th algorithm will return j when the true πjl answer is l. Tij = 1, if the true answer for the data i is j, otherwise, it equals 0. pj − the probability of class j in the sample. • Step 1: Initialize the matrices π to the ideal case. T initialize the voting value for the majority. • Step 2: Recalculate the values of matrices π and pj : Tij nkil P P Tij k πjl = PiP pj = i Tij nkil N l i • Step 3. Recalculate Tij : pj Q2 QK k nk k=1 l=1 (πjl ) il Tij = P2 QK Q2 k nk q=1 pq k=1 l=1 (πql ) il Repeat steps 2 and 3 until the matrices π stop changing. At the end of this algorithm, we obtain the probability of the data belonging to each class in the matrix T . As an answer, the class is taken, the probability of belonging to which is the greatest. 5. Evaluation of Algorithms Cross-validation was used to evaluate the performance of algorithm. To avoid overfitting the ECGs of one patient did not fall simultaneously into the training and test set. The following performance criteria was introduced: N Pni 1 X j=1 Itij =pij , N i=1 ni where tij is the true value of the target variable for the cardiogram j of the patient i, pij − the predicted value of the target variable for the cardiogram j of patient i, ni − the number of cardiograms of the patient i, N − the number of patients, Itij =pij is the indicator function which equals to 1 if tij equals to pij and equals to 0 otherwise. This criterion is called patient performance. It allows us to evaluate how well the algorithm determines a person’s disease by any of his cardiograms. In addition, this performance criterion does not depend on the number of cardiograms for each patient. ROC-AUC score [25] and F-score [25] were used for models evaluation. 6. Results The results of evaluations are shown in Table 4, where the first column shows algorithms type or ensemble type. The algorithm based on wavelet transformation is included in two variants, with word2vec and without. IV International Conference on "Information Technology and Nanotechnology" (ITNT-2018) 81 Data Science V N Guryanova 7. Conclusion In the course of this paper, the following results were obtained. When CardioQvark equipment is used, it is possible to determine CHD with an accuracy of more than 0.81 for patient performance, with an accuracy greater than 0.77 for F-score and with an accuracy greater than 0.87 for ROC-AUC score. Word2vec can increase the performance of the classification method based on the wavelet transformation. Bispectrum can be used to classify CHD. The EM algorithm is applicable for ensemble and in this case, shows the best performance of classification for all selected performance criteria. Table 4. CHD detection results An Algorithm Patient Performance ROC-AUC F-score Bispectrum 0.7207 0.7418 0.7244 Wavelet transformation 0.741 0.8 0.6967 Wavelet transformation +word2vec 0.7501 0.8 0.6990 R-peak’s neighborhood 0.7602 0.7988 0.703 The HRV signal 0.763 0.744 0.662 3 different feature spaces 0.7632 0.8042 0.70256 majority 0.806 0.77 EM 0.8108 0.8738 0.7784 8. References [1] Gorbachev V V 2008 I cardiac ischemia (Minsk: Vysh. shk) p 479 (in Russian) [2] Dua Sumeet 2012 Novel classification of coronary artery disease using heart rate variability analysis Journal of Mechanics in Medicine and Biology 12(4) 1240017-1240019 [3] Giri D 2013 Automated diagnosis of coronary artery disease affected patients using LDA, PCA, ICA and discrete wavelet transform Knowledge-Based Systems 37 274-282 [4] Acharya U 2017 Rajendra et al. Application of higher-order spectra for the characterization of coronary artery disease using electrocardiogram signals Biomedical Signal Processing and Control 31 31-43 [5] Kumar M R B U and Rajendra А 2017 Characterization of coronary artery disease using flexible analytic wavelet transform applied on ECG signals Biomedical Signal Processing and Control 31 301-308 [6] Rangayyan R M 2015 Biomedical Signal Analysis (John Wiley & Sons) [7] Rawald T M and Sips N M 2017 PyRQA – Conducting recurrence quantification analysis on very long time series efficiently Computers & Geosciences 104 101-108 [8] Kantelhardt J W 2001 Detecting long-range correlations with detrended fluctuation analysis Physica A: Statistical Mechanics and its Applications 295(3-4) 441-454 [9] Tianqi C and Guestrin C 2016 Xgboost: a scalable tree boosting system Proc. of the 22nd acm sigkdd international conference on knowledge discovery and data mining ACM [10] Hjorth B 1970 EEG analysis based on time domain properties Electroencephalography and Clinical Neurophysiology 29(3) 306-310 [11] De Cooman T, Carrette E, Boon P, Meurs A and Van Huffel S 2014 September Online seizure detection in adults with temporal lobe epilepsy using single-lead ECG Proceedings of the 22nd European Signal Processing Conference 1532-1536 [12] Uspensky V 2008 Theory and practice of diagnosis of diseases of internal organs by the method of information analysis of electrocardio signals (Moscow: Economics and Informatics) p 116 (in Russian) IV International Conference on "Information Technology and Nanotechnology" (ITNT-2018) 82 Data Science V N Guryanova [13] Pedregosa F 2011 Scikit-learn: machine learning in Python Journal of machine learning research 2825-2830 [14] Ripoll V J R 2016 ECG assessment based on neural networks with pretraining Applied Soft Computing 49 399-406 [15] Al-Rfou R, Alain G, Almahairi A, Angermueller C, Bahdanau D and Ballas N 2016 Theano: a Python framework for fast computation of mathematical expressions Preprint arXiv:1605.02688 [16] Dieleman S 2015 Lasagne: First release (Geneva, Switzerland: Zenodo) [17] Wang J, Liu P, She M F, Nahavandi S and Kouzani A 2013 Bag-of-words representation for biomedical time series classification Biomedical Signal Processing and Control 8(6) 634-644 [18] Liu C L 2010 A tutorial of the wavelet transform (Taiwan: NTUEE) [19] Mikolov T 2013 Efficient estimation of word representations in vector space (Scottsdale, Arizona: ICLR Workshop) [20] Rehurek R and Sojka P 2010 Software framework for topic modelling with large corpora Proc. of the LREC Workshop on New Challenges for NLP Frameworks [21] Civera M L Z and Surace S 2016 Using bispectral analysis and neural networks to localise cracks in beam-like structures Proc. of the 8th European Workshop On Structural Health Monitoring 1542-1551 [22] Chrysostomos N, Mysore L and Raghuveer R 1987 Bispectrum estimation: A digital signal processing framework Proc. of the IEEE 75(7) 869-891 [23] Al-Fahoum A, Al-Fraihat A and Al-Araida A 2014 Detection of cardiac ischaemia using bispectral analysis approach Journal of medical engineering & technology 38(6) 311-316 [24] Dawid A and Skene A 1979 Maximum likelihood estimation of observer error-rates using the EM algorithm Applied statistics 20-28 [25] Sokolova M, Japkowicz N and Szpakowicz S 2006 Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation Australian conference on artificial intelligence 4304 1015-1021 IV International Conference on "Information Technology and Nanotechnology" (ITNT-2018) 83