=Paper= {{Paper |id=Vol-3039/short6 |storemode=property |title=Improved robust handling of electromyograms with mining of new diagnostic signs |pdfUrl=https://ceur-ws.org/Vol-3039/short6.pdf |volume=Vol-3039 |authors=Gennady Chuiko,Yevhen Darnapuk,Olga Dvornik,Yaroslav Krainyk |dblpUrl=https://dblp.org/rec/conf/ittap/ChuikoDDK21 }} ==Improved robust handling of electromyograms with mining of new diagnostic signs== https://ceur-ws.org/Vol-3039/short6.pdf
Improved robust handling of electromyograms with mining of
new diagnostic signs
Gennady Chuikoa , Yevhen. Darnapuka, Olga Dvornika, and Yaroslav Krainyka
a
    Petro Mohyla Black Sea National University, 68 Desantnikov St.,10, Mykolayiv, 54003, Ukraine


                Abstract
                The research proposes an improved and more reliable way of computer processing raw
                electromyograms (EMGs). The main steps of this method are pre-filtering with Haar wavelets,
                full-wave rectification, followed by post-filtering with a digital filter with a moving median
                (MM). MM digital filtering is probably the first time applied to EMG.
                The proposed method is more reliable than the commonly used Root Mean Square (RMS)
                method. RMS windowing turns out to be invalid by the effect of massive outliers within the
                EMG. As shown in the article, this typical deviation from the Gaussian distribution occurs for
                many untreated EMGs. The paper also suggests a few new diagnostic hallmarks. These are the
                median and peak values of the smoothed (by MM-filter) EMG and their ratio.
                Keywords 1
                electromyograms, massive outliers, robust computer processing, moving median filter, new
                diagnostic signs

1. Introduction
    Electromyograms (EMGs) reflect weak electrical signals from muscle and nerve cell arrays [1, 2,
3]. Most medical cues, as well as EMGs, are variable. Such variability is the effect of a human body's
dynamical balance with a fickle medium. That is quite natural in the framework of the homeostasis
concept of living organisms [3]. So, the study of the variabilities of biomedical signals gives us insight
into the adaptive abilities of specific body systems within the healthy state or at pathologies.
    Besides, another inborn trait of many biomedical signals [4], EMGs too [3, 5], is the marked non-
Gaussianity of their probability distributions. That can manifest itself in various ways: heavy-tailed
(high-skewed [4]) distributions, weighty kurtosis, multi-modality, arrays of outliers, etcetera. [6].
Clinicians use the raw EMGs relatively rarely. As a rule, EMGs are processed according to some
algorithm, more or less sensitive to non-Gaussinity [7]. Only then are they analyzed, and the clinical
decision is made.
    The steady request for robust gauges of variability that are low-sensable to the non-Gaussinity exists
in clinical practice. An interquartile range and a median are, for instance, the most robust measures for
the variability and central tendency among the statistics indexes in this sense [6]. The new and reliable
diagnostic signs are welcome, too, allowing the binary sorting of the "norm-pathology" type.

2. Related Works and problem statement
   Authors [5] have characterized the non-Gaussianity of stimulated surface EMGs by abnormal high
kurtosis of the probabilities distributions. The level of deviations from the Gauss distribution depended
on the shape of used electrodes and filters. Other visual and convincing ways are offered in [6]. Those
are statistical box plots and so-called "probability-probability" (P-P) plots.


ITTAP’2021: 1nd International Workshop on Information Technologies: Theoretical and Applied Problems, November 16–18, 2021,
Ternopil, Ukraine
EMAIL: genchuiko@gmail.com (A. 1); yevhen.darnapuk@chmnu.edu.ua (A. 2); olga.v.dvornik@gmail.com (A. 3),
yaroslav.krainyk@chmnu.edu.ua (A. 4)
ORCID: 0000-0001-5590-9404 (A. 1); 0000-0002-7099-5344 (A. 2); 0000-0002-4545-1599 (A. 3), 0000-0002-7924-3878 (A. 4)
             © 2021 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
    Author [7] has been considered two popular algorithms of EMGs processing. Both include an
alternative stage: either full-wave rectification with digital filtration or root mean square (RMS) filter.
The author has warned that popular RMS smoothing of EMGs works correctly only for signals with
distributions close to normal Gauss one. Under the condition, if outliers occur more often than predicted
by the normal distribution, full-wave rectification with the digital filtration by moving average is more
robust. The moving median filter is even preferable, but it is practically unused [7].
    The handbook [8] consults the range (50-500) ms for the smoothing windows. Authors [9] admit the
RMS window lengths from 10 ms up to1000 ms. Neither statistical significance nor clinical relevance
of the smoothed EMG mean was affected by such manipulation of window length (variation did not
exceed 3%). The smoothed EMG's peak value turned out a little more sensitive (variation less than 9%).
However, the variability indexes were more sensitive to a window length.
    It is worth noting that clinicians wide use the mean and peak values of rectified and smoothed EMGs
as diagnostic signs [9]. It would be wise to exploit medians of smoothed EMGs instead of habitual mean
keeping the robustness in mind.
    The papers [3,10] considered the denoising or separation of a signal on the low-frequency part (the
signal itself) and the high-frequency one (the noise). It needs the additive "signal+noise" model for
EMGs, of course. Then such separation is easily accessed with Haar's wavelets [3,10]. The exponential
smoothing model of a time series [11], like a raw EMG, for example, permits to get the "Errors (noise)-
Trend-Seasonality" (ETS) pattern for this series. Starting from the Maple 18 version, this model is well-
programmed within the program package "Time Series Analysis." [12].
     Good is namely that way of medical signal processing, accompanied by the recommendation about
a single one or even a short vector of diagnostics features [13]. If one shares a practiced clinician point
of view, of course. Thus, our approach intends to ensure reliable methods for processing medical signals
with direct indications of diagnostic signs.
    This research aims to work out improved robust processing of EMGs with the account of their non-
Gaussianity. The following stages have to be included in that work:
    1. Testing and estimation of non-Gaussinity of probability distributions of raw EMGs with
    estimations of outliers fractions as a possible diagnostic sign.
    2. Definition of the ETS model for raw EMFs.
    3. Separation of signal and noise and estimation of Signal-to-Nose ratio (SNR) as a possible
    diagnostic sign.
    4. Full-wave rectification and first-time robust moving median smoothing of EMGs in comparison
    with the RMS method.
    5. Estimating and comparing medians and peak-values, peak-to-medians ratios, and variabilities
    as clinical diagnostic signs for RMS and moving medians (MM) windowing methods.


3. Methodology side

   3.1 Initial data provenance and description
    The database [14] was the source of three samples of raw EMGs. Data were collected with a
Medelec Synergy N2 EMG Monitoring System (Oxford Instruments Medical, Old Woking, United
Kingdom).
    A healthy patient (age 44-), a patient with verified myopathy (age 57), and a patient with verified
neuropathy (age 62), all-male, were investigated. The EMG records have different lengths: 50860,
110336, and 147858 samples, respectively [3,14]. The pointers "Healthy," "Myopathy," and
"Neuropathy" will be in use further in the text to denote these patients and their indicators
   A 25 mm concentric needle electrode was placed into the anterior tibialis muscle of each subject.
The patient was then asked to dorsiflex the foot gently against resistance. The needle electrode was
repositioned until motor unit potentials with a rapid rise time were identified by the device. Data were
then gathered for some short time, at which point the patient was asked to relax, and the needle removed.
   The data were recorded at 50 kHz and then downsampled to 4 kHz as the final sample rate of raw
EMGs. Two analog filters were used during the recording process: a 20 Hz high-pass filter and a 5KHz
low-pass filter. ADC (analog to digital converter) resolution: was 16 bits. The gain: 10000 ADC
units/mV [14].

   3.2 Computer processing toolbox
   Authors have performed this research in the general framework of Maple 2020, one of the last
versions of a well-known computer Math system [12]. According to the list of project tasks from section
2, the following program packages were in use: "Statistics," "TimeSeriesAnalysis," and
"SignalProcessing."
   Sometimes, authors needed to write short programs (procedures) using a high-level handy Maple
programming language. For instance, it was so at estimations of outliers weights or the separation of
raw EMGs on inner data and outliers. That was necessary for the framework of the accepted approach
(see section 2): robust estimations and search for diagnostic hallmarks.

4. Results and discussions

4.1     Normality testing of raw EMGs and outliers weight's estimation
    Let start the normality testing with kurtosis of probabilities distributions following [ 5]. There are
different definitions of kurtosis. Let use here the quantity excess kurtosis, which is zero equal for the
normal Gauss distribution. The excess kurtosis can be easily obtained by subtracting three from ordinary
kurtosis. The ratio of the fourth cumulant of distribution to its variance defines this last.
    The computing gave such a series: 11.1, 6.4, and 22.0 in the order "Healthy," "Myopathy,"
"Neuropathy." One can see the positive excess kurtosis that is inherent in leptokurtic distribution.
Leptokurtic distributions predict heavy tails on either side. That indicates outliers' great arrays.
    The statistical box plot can visualize arrays of outliers. In the box plot method, a data set is split into
quartiles, and the difference between the first and the third quartiles defines the height of the box [6,15].
Box plots (also called box-and-whisker plots or box-whisker plots) give an excellent graphical image
of the concentration of the data. They also show how far the extreme values (outliers) are from most of
the data.
    A box plot is constructed from five values: the minimum value, the first quartile (Q1), the median,
the third quartile (Q3), and the maximum value. We use these values to compare how close other data
values are to them. The two vertical lines (called whiskers) outside the box extend to the smallest and
largest observations within the range Q1(Q3) plus/or minus 1.5 x IQR (interquartile range).
    If there are no outliers, then the whiskers extend to the min and max values. Observations 1.5 x IQR
greater than Q3 or less than Q1 should be considered outliers (these are so-called inner fences [6]).
Outliers can point us to exciting distribution patterns and should not be removed as "bad" data from the
start. Fig 1 shows the box plots for three raw EMGs.




Figure 1: Box-and-whisker plots for three raw EMGs showing outliers as bold vertical lines from
separate points. The medians are practically unseen as horizontal lines near zero, but one can insight
narrow interquartile ranges (variabilities) and relatively tight ranges between maximal and minimal
values {whiskers).
    The interquartile ranges (IQRs), as the robust measure of variability, forms such a series: 70, 65, and
105 (in μV) in the order "Healthy," "Myopathy," "Neuropathy." Fig, 1 shows many outliers; they are
the values out of setting ranges. The lot of outliers is the first hallmark of a non-Gauss distribution of
probability.
    The minor program (procedure) scripted om Maple language has allowed us to count the percentages
of outliers for each of the EMGs. At a glance, they are not extremely high: 7.45, 12.05, and 21.45 in
the previous order. Another procedure has allowed us to separate EMGs on the inner data and outliers.
Unexpectedly, it turned out that outliers define the form and area of Poincare plots. Figure 2 illustrates
this statement.




Figure 2: Poincare plots for the healthy patient with a minimal fraction of outliers. All axis are
calibrated in mV. The left-hand graph shows the plot with outliers; the middle graph is without
outliers, and the right-hand graph only shows outliers without the central part. The variability
descriptors linked with Poincare plots have to be quite sensitive to the presence of the outliers.

   Let estimate the above-said deviations of normality with the normal plots. Such a graph is a
probability-probability plot for two distributions: the normal (the horizontal axis) against the factual
one (the vertical axis) [6]. Both samples of random values, the normal and the factual, have the same
length, and their cumulative probabilities functions are compared. The thin straight line matches the
normal distribution, the points – factual one.




Figure 3: Probability-probability (P-P, or normal) plots for raw EMGs. The horizontal axes show the
expected from the normality hypothesis values, the vertical axes – the factual ones. The thin straight
line meets the Gauss distribution, the bold line from points meets the factual one.

   Figure 3 shows that the factual distribution is far from the Gaus pattern, especially for larger absolute
values, for outliers. Thus, the results of this section are well agreed among themselves and prove
significant non-Gaussinity of probability distributions of raw EMGs.

4.2.    Exponential smoothing model of raw EMGs and their denoising
  The authors got the same ETS models for all three raw EMGs. Those were the simplest possible
models with additive errors (noises) without any trends or seasonalities. We had used the computer
math system Maple 2020. Precisely, it was the algorithms of its "Time Series Analysis" program
package.
    The simplicity and uniformity of ETS models for raw EMGs allow the pre-filtration onto low-
frequency part (or a signal by itself) and high-frequency one (a noise) with Haar wavelets, as it has been
proposed in [3, 10]. Fig.4shows the periodograms of pre-filtered signals and noises for three EMGs.
    Periodograms of Fig;4 can be a diagnostic tool by themselves, but one should admit them as
excessive complex for clinic practice. Let introduce more digital and straightforward indexes, such as
the known signal-to-noise ratio (SNR). One can define it by the logarithm of the RMS amplitudes ratio
(in decibels, dB). These values of SNR are the following: 12.75 dB, 6.96 dB, and 6.25 dB in the same
order as in Fig.1. Note that all raw EMGs are pointedly noisy. Still, EMGs of pathological states are
about twice noisier than the healthy ones. Thus, the pre-filtering sounds reasonable.
    It should be marked that pre-filtered EMGs comprise twice fewer counts (samples), shorter than raw
EMGs. It is equal to a twofold decrease of sample rate to 2 kHz.




Figure 4: Periodograms of pre-filtered signals (graphs on the left) and noises (on the right). The order
of the periodograms - "Healthy," "Myopathy," "Neuropathy" from the top pair to the bottom. Note
that the EMG of a healthy patient is the least noisy. The power scale is absolute (in arbitrary units).

   Curiously, the pre-filtering changes the percentage of outliers barely (shifts are within a few tenths
of a single percent). Hence, this parameter, least for the healthy patient, refers to the signal itself, no
way to the additive noise. Then this index most likely is diagnosis-caused. If so, then the Poincare plot
shape also is diagnosis-dependent.

4.3     Full-wave rectification and smoothing of pre-filtered EMGs
    An all-over accepted means to capture the envelope of raw EMGs is the root mean square (RMS)
values within a window that "slides across" the signal. The RMS value is plotted at the center of the
window to avoid time shifts in the envelope relative to the signal.
    However, this way well works only if the number of outliers is minor [7]. That assumption looks
quite doubted for raw and pre-filtered EMGs (see sections 4.1, 4.2). The more robust tool can be the
moving median (MM) digital filter applied to the rectified signal. Although this filter (MM) never was
used in EMG smoothing afore [7].
    This section would like to show the full-wave rectified pre-filtered FMGs and the results of two
alternative smoothing methods: RMS and MM. The smoothing windows lengths were roughly
proportional to the lengths of the signals: 300, 600, and 900 ms, respectively. Figure 5 reflects this
computing. Let recall the fact that RMS-windowing need not preliminary rectification of a signal,
opposite to MM one.
   Pay attention that RMS and MM smoothing results are similar but not identical. In particular, the
amplitudes of MM-envelopes are less than for RMS ones and materially shifted down. That is logical
because the median, as an indicator of the central tendency, is least sensitive to the effect of the outliers.
Opposite, massive outliers rise the envelopes' amplitudes markedly within RMS-windowing as
predicted in [7].




Figure 5: Full-wave rectified pre-filtered EMGs (the left column), their RMS-windowing (the middle
column), and smoothing by MM (the right column). All amplitudes are given in mV. The order of
patients (from upper to down) is the same as in Fig.4.

   Mean, and peak values of smoothed EMGs are now in use among clinicians [5]. However, median
and peak values should be a more robust set considering all the above argumentation. That is why Table
1 collects these and a few extra values, suggested as robust diagnostics signs.

   Table 1
   The comparison of results between two alternative smoothing methods: RMS and MM.
        Methods ↓       Patients→            Healthy         Myopathy       Neuropathy
                        Medians                 69               89             213
                        Peak values            169              106             879
           RMS
                        Peak/median            2.4              1.2             4.1
                        Variabilities           29                5             220
                        Medians                 33               32              26
                        Peak values            101               42             337
           MM
                        Peak/median            3.0              1.3            13.0
                        Variabilities           21                4             125

   Table 1 presents the values usable in diagnostics for two different smoothing methods. Almost all
values are given in μV (microvolts), except for peak-to-median ratios. Interquartile ranges defined
variabilities.
   Consider the peak value to median ratio within the MM method. This ratio strongly correlates with
the kurtosises mentioned afore. The correlation coefficient equals 0.9869 and is statistically significant
on the confidence level equal to 0.99 despite small numbers (three) of patients. The correlation between
variabilities and the peak-median ratio is even better (0.9997).
   Thus, the variability of the correctly smoothed EMG might be a new diagnosis sign. It is a pair of
dozen microvolts for healthy patients, while only a few microvolts for myopathy and about a hundred
microvolts for neuropathy. The variability correlates strongly with the kurtosises of probabilities
distribution and with peak-to-median ratios. Both correlations are significant statistically on the
confidence level of 0.99.
   The handling way with full-wave rectification and following filtering (smoothing) by moving
median (MM) is more robust than RMS-windowing under the condition of massive outliers. Such
conditions seem to be the case in many laboratory measurements of EMGs [7]. Based on this
conclusion, the authors recommend only the bottom half of Table 1 as the diagnostic information.

5. Conclusions
    Let make conclusions pushing off from the list of tasks from section 2.
    1. The non-Gaussianity of raw EMGs was proved. Positive kurtosises and massive outliers (with
the relative weight from a few up to a few dozen percentages) cause pointed deviations from the Gaus
distribution.
    2. All raw RMGsd have a simple ETS model: additive noises, no trends, and no seasonalities.
    3. Thus, the noises can be easily removed by pre-filtering with Haar wavelets. Pathological EMGs
are twice as noisy as for healthy patients. The pre-filtering (denoising) has to be a mandatory procedure
at raw EMGs handling.
    4. Full-wave rectification, and first-time robust moving median smoothing of EMGs were
performed compared to commonly used RMS-windowing.
    5. We have suggested a few new robust diagnostic signs based on this research's results (see Table
1).
    So, the list of tasks was carried out; hence the aim of the study was achieved. We have offered the
new robust processing method for raw EMGs with two attainable and straightforward diagnostic
hallmarks.

6. Acknowledgments
    The authors are grateful to Maplesoft (a division of Waterloo Maple Inc.) for granting access to the
licensed copy of Maple 2020.
    The results of this paper are a part of studies on the scientific and research project "Development of
a hardware and software complex for non-invasive monitoring of blood pressure and heart rate for dual
purpose" (State registration number No. 0120U101266, 2020–2021), financed from the state budget.

7. References
[1] K. R. Mills, "The basics of electromyography." Neurol. Pract., 76, 2, (2005): 32–35, doi:
10.1136/jnnp.2005.069211
[2] I. Saad, N. H. Bais, C. Bun Seng, H. Mohd Zuhir, and N. Bolong, Electromyogram (EMG) signal
processing analysis for clinical rehabilitation application, Proc. - AIMS 2015, 3rd Int. Conf. Artif. Intell.
Model. Simul., pp. 105–110, 2016. doi: 10.1109/AIMS.2015.76
[3] G. Chuiko, O. Dvornik, Y. Darnapuk, and Y. Baganov, "Devising a new filtration method and proof
of self-similarity of electromyograms," Eastern-European J. Enterp. Technol., 4, 9(112), (2021):. 15–
22, doi:10.15587/1729-4061. 239165
[4] D. Bratashov, O. Grishin, A. Kozlova, A. Abdurashitov, R. Verkhovskii, E. V. Shashkov, O.
Inozemtseva, V.P. Zharov, Statistics peculiarities of photoacoustic signals in flow cytometry of B16F10
cells, 20th Int. Conf. Photoacoustic. Photothermal Phenom., no. July, p. 45, 2019. Moscow, URL:
https://www.researchgate.net/publication/334721619_Statistics_peculiarities_of_photoacoustic_signal
s_in_flow_cytometry_of_B16F10_cells
[5] Noureddine Messaoudi, Raïs El'hadi Bekka, Philippe Ravier, Rachid Harba, "Assessment of the
non-Gaussianity and non-linearity levels of simulated sEMG signals on stationary segments." Journal
of Electromyography and Kinesiology,32, (2017):70-82, doi:10.1016/j.jelekin.2016.12.006.
[6] A Field, Discovering Statistics Using SPSS (3rd Ed.). Los Angeles-London-New Delhi-Singapore-
Washington DC: SAGE; 2009, URL:
https://www.academia.edu/24632540/Discovering_Statistics_Using_SPSS_Introducing_Statistical_M
ethod_3rd_edition
[7] W. Rose, KAAP686 Mathematics and Signal Processing for Biomechanics. Electromyogram
analysis, 2019, URL: http://www1.udel.edu/biology/rosewc/kaap686/notes/EMG%20analysis.pdf
[8] P. Konrad. ABC of EMG – A Practical Introduction to Kinesiological Electromyography. © 2005
by Noraxon U.S.A., Inc. URL: https://www.noraxon.com/wp-content/uploads/2014/12/ABC-EMG-
ISBN.pdf
[9] A. M. Burden, S. E. Lewis, E. Willcox, "The effect of manipulating root mean square window length
and overlap on reliability, inter-individual variability, statistical significance and clinical relevance of
electromyogram." Manual Therapy, 19, 6,(2014): 595-601, doi: 10.1016/j.math.2014.06.003
[10] G. Chuiko, O. Dvornik, Y. Darnapuk, Y. Krainyk , Principal Component Analysis, Quantifying,
and Filtering of Poincaré Plots for time series typal for E-health, in R.Patgiri, A. Biswas, P, Roy. (eds)
Health Informatics: A Computational Perspective in Healthcare. Studies in Computational Intelligence,
Springer, Singapore. 2021, v.932, pp.61-76, doi: 10.1007/978-981-15-9735-0_4
[11] J. Brownlee, A Gentle Introduction to Exponential Smoothing for Time Series Forecasting in
Python, 2020. URL: https://machinelearningmastery.com/exponential-smoothing-for-time-series-
forecasting-in-python/
[12] Time Series Analysis, © Maplesoft, a division of Waterloo Maple Inc. 2019, URL:
https://www.maplesoft.com/products/maple/new_features/maple18/TimeSeriesAnalysis.aspx
[13] V. Martsenyuk, A. Sverstiuk, A. Kłos-Witkowska, A. Horkunenko and S. Rajba, "Vector of
Diagnostic Features in the Form of Decomposition Coefficients of Statistical Estimates Using a Cyclic
Random Process Model of Cardiosignal," 2019 10th IEEE International Conference on Intelligent Data
Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), 2019, pp.
298-303, doi: 10.1109/IDAACS.2019.8924398.
[14] A. L. Goldberger, , L. A. N. Amaral, L. Glass, J. M. Hausdorff,. P. C. Ivanov, R. G. Mark, et al.
PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex
physiologic signals. Circulation, 101 (23), 2000: e215–e220. doi: 10.1161/01.cir.101.23.e215
[15] C. Thirumalai, M. Vignesh, and R. Balaji, Data analysis using box and whisker plot for lung cancer,
2017 Innovations in Power and Advanced Computing Technologies (i-PACT), 2017, pp. 1-6, doi:
10.1109/IPACT.2017.8245071.