Adaptive Multistage Method of Anomalies Detection in
                      ECG Time Series

Igor Baklan1[0000-0002-5274-5261] and Yurii Oliinyk1[0000-0002-7408-4927] and Iryna Mukha1[0000-
 0002-4423-5106]
                 and Kateryna Lishchuk1[0000-0002-9902-0065] and Olena Gavrilenko1[0000-0003-
 0413-6274]
            and Oleksandr Ocheretianyi1[0000-0001-9455-4781] and Anna Tsytsyliuk1[0000-0001-
                                            5527-4154]


                1Igor Sikorsky Kyiv Polytechnic Institute, Kyiv, 03056, Ukraine

                                        iaa@ukr.net


        Abstract. This work proposes a new approach for identifying heart anomalies on
        electrocardiograms data using adaptive multistage method of anomalies
        detection. The method includes: search of exact match pattern with fragments in
        linguistic chain, analysis by average distance, analysis by fuzzy distance, reduc-
        tion to a common time grid. Used various matrix of linguistic distances.
        Described sourсe data and database process filling.

        Keywords: ECG, Linguistic modeling, Linguistic model, Linguistic chain.


1       Introduction

Cardiovascular disease is one of the leading causes of disability and mortality world-
wide. However, due to proper diagnostics, it is possible to detect possible abnormalities
of the heart activity in a timely manner and take all necessary steps to restore its normal
functioning as soon as possible. One of the main instrumental methods for diagnosing
cardiovascular diseases is electrocardiography - a graphical recording of fluctuations in
the potential difference that occurs during cardiac function in the form of an electrocar-
diogram (ECG). An ECG analysis for abnormalities - areas on the curve that do not fit
well-defined concepts of normal behavior - reveals pathological changes in the func-
tioning of the heart muscle.
    The modern approach to ECG analysis is to automate the process of identifying pos-
sible anomalies, which improves the accuracy and reliability of diagnostic results.
Anomaly detection is one of the most important tasks of data mining technology. Com-
puterized ECG processing and analysis involves the use of mathematical methods, most
of which are based on the interpretation of ECG time series data. Statistical models
(AR, ARIMA, Kalman filters), fuzzy models (FRCM, SSOD, FUCOT algorithms), hid-
den Markov processes, clustering and classification (c-mean, k-mean clustering algo-
rithm, sliding method) are used to detect compression ratio (CR) anomalies. [22]
    Anomalies Detection Approach in Electrocardiogram Analysis Using Linguistic
Modeling [13] proposes a new approach - linguistic modeling of ECG data. This ap-
    Copyright © 2020 for this paper by its authors.
    Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
proach involves constructing a linguistic ECG model by converting ECG data into sym-
bolic (linguistic) sequences, based on which a formal language can be constructed to
solve the applied problems of analysis. This article proposes an algorithm for finding
and detecting anomalies in the ECG based on the analysis of the linguistic ECG model,
built on the selected parameters of linguistics. This algorithm involves the analysis of
the received language circuits for the detection of atypical patterns (values) and fuzzy
local trends.
   The structure of the article is as follows. The first section discusses the problem of
anomalies search and detection in CR, as well as the approaches used. The second sec-
tion describes the process of constructing a linguistic ECG model. The third section
presents an algorithm for finding and detecting anomalies in an ECG based on linguistic
modeling, taking into account anomalies in fuzzy local trends. The fourth section pre-
sents the results of evaluating the effectiveness of the proposed solutions.


1.1    Related Works

ECG analysis is of great importance in medicine and is beginning to attract more atten-
tion and become involved in various fields. For example, there are studies [1] on the
identification of a person in a security system through analysis of his ECG. There, a
method was proposed for cropping PP intervals in which ML training was carried out
using each sample of slice data as an input parameter. Also, work was done [5] to pro-
duce a web-based tool for distributed ECG annotations. This cross-platform system can
be used to jointly view and annotate ECG, which will simplify communication between
specialists. Another real-time approach for public health is the real-time approach based
on multi-layer perceptron classifier [7]. Based on it, the distribution of incoming ECG
cardiac contractions into one of 23 classes occurs using ECG sparse distributed signa-
tures.
   In addition, ECG analysts have found application in the study of sleep and its stages.
Thus, the use of ECG and respiratory signals for determining the stage of sleep using
deep neural networks was described in [11]. Five ultra-precise neural networks were
created, each for different types of respiration. This study can help in general when
studying sleep and processes in the body at its different stages. The main problem sci-
entists faced in this study was a decrease in accuracy due to age and / or more severe
sleep apnea.
   Nowadays, a lot of research is underway to improve the analysis and processing of
ECG data. In general, neural networks are mostly used for analytics because of its abil-
ity to process complex and fuzzy data.
   There are several problems when studying ECG results - extraneous noise, truncat-
ing a data set without losing information for training networks, and analyzing the in-
formation itself.
   Firstly, Complex Deep Learning Models can be useful not only for data analysis but
also for removing various types of noise (i.e. baseline wander, muscle artefact, elec-
trode motion artefact) [10]. Based on the results of noise reduction tests between two
DL models (CNN, LSTM). CNN had the best indicator both in synthetic and real data.
Therefore, the study of the best and most productive CNN model in the same study
becomes relevant.
   Secondly, for the analysis of data in most cases records of the same length are nec-
essary - accordingly there are several methods for bringing records to the same size.
   • Filling all records to the longest length.
   • Shortening all records to the shortest.
   • Grouping records by length.
   • Trimming or padding data to a specific length [8].
   • Heuristic-based crop [6].
   For the analysis itself, a large number of new methods are described. Thus, in [2] a
time series clustering algorithm is described that is suitable for multidimensional inputs
and outputs of variable lengths and time offsets. This clustering is based on the distance
between hidden linear dynamic systems (LDS). An example of the usage of the critical
oscillation method for ECG analysis was carried out in [4], where the choice of ECG
segments was made with the stationarity criterion. The heart was examined on the phys-
ical side and critical dynamics in frog's ECG has been detected in the high frequency.
   In [12], a new tool was invented for obtaining new information, which is based on a
new method of data analysis - Diverse learning. This technique extracts a structure from
data and belongs to uncontrolled machine learning methods. So from the raw data, you
can outline the dynamics of the process inside and the possibility of 3D visualization
of DM appears. (Where each pulse is a point)
   In another research [3], it was proved, using recurrence networks that the RNs from
ECG f is bimodal and thus forms a separate class. In addition, this research shows that
RNs from ECG have signiﬁcantly higher value for the clustering coeﬃcient and lower
value for the average path length.
   In the end, we can say that for ECG analytics it is important to preserve data when
changing the recording time, eliminate unnecessary noise for corrective analysis, and
optimize training algorithms and improve the analysis itself. the research results re-
vealed that modern approaches for ECG analysis are convolutional neural networks,
recurrence networks, time series, the critical oscillation method and new methods for
training neural networks (for example, diverse training). The greatest advantage of neu-
ral networks is the ability to analyze new data based on a generalization of previous
cases, but at the same time it is also a drawback because a lot of data and time are
needed for training.


1.2    Researches Tasks

Main goal: to enhance the precision and confidence of diagnostic results about the level
of the cardiovascular system as a result of the progress of computation approaches of
ECG feature analysis.
   The research purposes are:

─ development of the anomaly detection algorithm;
─ construction of anomaly patterns database.
2      Anomaly Detection Algorithm

We represent search algorithm having into linguistic row fragment linguistic pattern. It
was important definition proximity criteria of linguistic rows and proximity level by
expert judgment methods.
   We consider next algorithm of search elements from pattern based in linguistic chain
being investigated:

1. Search of exact match pattern with fragments in linguistic chain. If some pattern is
   found in linguistic chain, then this found substring marked as anomaly.
2. Analysis by average distance. If some linguistic chain substring has defined aver-
   age distance level with pattern, then this found substring marked as anomaly.
3. Analysis using Analytic Hierarchy Process (AHP) method T.Saaty.
4. Reduction to a common time grid.
5. Pattern matching with input signal Availability, when it is shift operator. Shift op-
   erator used for shift linguistic chain or substring by input signal. For example, chain
   “CDE” transforms in “BCD” in case shift down on 1 step, and in “DEF” in case shift
   up on 1 step. If some pattern is found in linguistic chain with using shift operator,
   then this found substring marked as anomaly.
6. Pattern matching with input signal scaling operator. This operator scale linguistic
   chain by the time (x axis) or signal level (y axis).

    Based on the obtained coincidence, an expert makes conclusion about the revealed
abnormality in the heart.
    The task of fuzzy string matching will be listed next: considering a (extensive) text
T of range n, and a (small) sample P of length m. They both are series of symbols from
an alphabet Σ of power 𝜎, and a supreme amount of variances allowed k, discover all
the subsequences of T which edit size to P is no more than k. [24,25] Those fragments
are named “instances" and it is usual to account only their begin or finish marks. The
change interval between two strings x and y is the least amount of variances that would
modify x into y or in reverse. The allowed variances are deletion, insertion and substi-
tution of symbols. The task is important for 0 < 𝑘 < 𝑚. The variance rate is determined
         𝑘
as 𝛼 = . Finally, if “change interval” among current strings (ed()) implies the edit
         𝑚
interval, we can to account begin points (i.e. {|𝑥|; T=xP’y; ed(P; P’)≤k}) or finish
points (i.e. {|𝑥𝑃′|; T=xP’y; ed(P; P’)≤k}) of instances. [20]
    For the average distance, we used some classic estimates of the closeness of linguis-
tic chains. Which are:

1. Hamming distance is the amount of points where the equivalent numbers of two
   binary words of the identical length are dissimilar. In general, Hamming distance is
   used for lines of the same length in an alphabet consisting of q characters and serves
   as a difference metric (a function that determines the distance in metric space) from
   objects of the same dimension. In other words, Hamming distance measures the min-
   imum number of replacements required to change one line to another or the mini-
   mum number of errors that could convert one tape to another. In a more general
   context, Hamming distance is one of the chain metrics for measuring the machining
   distance between two sequences. [14].
2. Levenstein distance is the minimum number of character deletions, inserts, and re-
   placements required to convert one line to another. [15].
3. Another way to formalize the difference between words is using the Jarro-Winkler
   distance. Each line character is compared to all other corresponding line characters.
   The number of matching but different atomic numbers is divided by 2 and deter-
   mines the number of transpositions. [16].
4. Damerau-Levenstein distance is a chain metric for measuring the processing dis-
   tance between two sequences. In short, the Damerau-Levenstein distance between
   two words is the minimum number of operations required to change one word to
   another. [17][18][19].
5. Mahalonobis distance is a measure of the distance between vectors of random vari-
   ables. This measure is often used to determine the similarities between an unknown
   and a known sample.

Average distance calculated by normalized this distance:

                                 𝐴𝑑𝑖𝑠𝑡 = ∑5𝑖=1 𝐷𝑖𝑠𝑡𝑖 ,

where 𝐷𝑖𝑠𝑡𝑖 are distances of Hamming, Levenshtein, Damerau–Levenshtein, Jaro–
Winkler, Mahalonobis.
    As a result, we find the patterns that exceed the expert level determined using ap-
propriate methods.
    In the process of teaching the system to recognize the types of anomalies, the data
of various datasets were used. It turned out that the same linguistic chain in different
datasets can correspond to different types of anomalies. We propose to include as one
of the step of our algorithm the usage of method of multycriteria choice.
    A large number of mathematical methods are proposed to solve the tasks of mul-
tycriteria choice. Applied decision theory methods differ in the way they present and
process expertise, but they are all designed to help make an effective decision. Among
the existing methods, the analytical hierarchy process (AHP) proposed by T. Saati [22]
is quite universal and theoretically grounded. This method is simple, the method of
solving the problem corresponds to the intuitive presentation of the problem being
solved. The advantages of this method can also be attributed to the ease of its imple-
mentation at the software level. But this method has several major disadvantages: first,
the limitation on the number of alternatives that are compared at the same time, and
secondly, there is the problem of hierarchy coherence, and third, a large amount of
expert information. In addition, when applying this method, there is a problem that the
real matrices of pairwise comparisons are usually not completely consistent. Therefore,
all these facts give rise to the problem of the limitations of the application of this
method, but in [23] methods for solving these problems and removing restrictions on
its application are proposed.
    Therefore, in order to determine which type of anomaly this or that linguistic chain
most closely corresponds, it was decided to use the AHP by T. Saati.
  In our case, various linguistic chains act as alternatives, types of anomalies (for ex-
ample, tachycardia, arrhythmia, etc.) act as criteria, datasets as experts.


3      Data Sources for Filling Anomalies Linguistic Chain
       Database

In the present study, the ECG signals are obtained from MIT-BIH Arrhythmia dataset
[21]. The MIT-BIH Dataset was primary set of traditional experimental samples to es-
timate EKG deviations. Since 1980, this set of data was used for the initial study for
heart motion all over 500 researches globally [20]. This dataset contains 48 half-hour
extracts of dual path, 24-hour ECG records obtained from 47 patients observed at the
BIH Arrhythmia Laboratory. Our study is based on six samples from the arrhythmia
dataset containing files 101, 106, 112, 138, 200 and 222 that includes sufficient ordi-
nary beats, PVC and APC arrhythmia for the research. It is discovered 3 sorts of the
arrhythmias included here containing normal beat (NORMAL), premature ventricular
contraction (PVC) and arterial premature contraction (APC). The given ECG signal is
measured with 360Hz. The categorization of foregoing sorts is examined considering
that they are more potential to be misunderstood by calculation device against other
signal profiles. ECG signals were outlined by binary annotation file (.atr), a binary file
(.dat) and a text header file (.hea),. Most of samples contain a binary file, which has
scanned examples of at least one signal saved in 212 templates, and most samples con-
tain at least one annotation file. Annotation files hold group of marks. Each of them
depict a component of at least one signal at a particular time in the sample. In our re-
search, each model, ECG beats are chosen by chance out of six files so that database is
collected     of   1,000     beats     of   every     sort.  On      Fig.     1    shows
“MNPQSTUWXXVRNLLMNNNN” Arrhythmia anomalies linguistic chain.
Fig. 1. Arrhythmia anomalies linguistic chain example

   On Fig. 1 shows 2 data series: series1 with anomaly linguistic chain “ggijlnopponjh-
hiiii”, series 2 –contains unmarked linguistic chain “ggijlmopqpnjhhiiii”. These 2 lin-
guistic chains have big closeness, therefore series2 has anomaly with big probability.
   The linguistic Alphabet1 [a-z] contains 26 symbols and doesn’t allow create ade-
quate linguistic chain. For example, many data series elements have only one value ‘iii’
(Fig. 1). For avoid this problem need to increase alphabet capacity. Linguistic Alpha-
bet2 contains 100 unicode symbols (from 192 to 292).
   On      Fig. 1 shows 2 data series: series1 with anomaly linguistic chain
“èéîóûĀąċĐĕęęđĂôííïïïð”, series 2 –contains unmarked linguistic chain
“éêîõüāćčĔěĠğĕăôîîðïïð”. These 2 linguistic chains have big closeness too, therefore
series2 has anomaly with big probability.
Fig. 2. Linguistic chains comparison for Alphabet1


Fig. 3. Linguistic chains comparison for Alphabet2
Table 1. Translation ECG signals to linguistic chain
 ECG signal   961   965 986 1007   1036   1059   1077   1100   1121   1142   1157   1157   1124   1065   1009
 level
 Alphabet1     g     g   i    l    n      o      p      p      o      n      j      h      h      i      i
 Alphabet2     è é î          ó û         Ā      ą      ċ      Đ      ĕ      ę      ę      đ      Ă      ô
   Table 1 contains translation of ECG signals with arrhythmia anomalies to linguistic
chains. Linguistic chain based on Alphabet2 more complicated then based on Alpha-
bet1.


4       Conclusion

We proposed adaptive multistage method of anomalies detection in ECG time series
based on using linguistic modeling. For finding of the closeness of linguistic chains we
used combination of classic estimates such as Hamming, Levenshtein, Damerau–Le-
venshtein, Jaro–Winkler, Mahalonobis with AHP. Our approach allows to find various
anomalies related to many heart diseases.


References

1. Yoo, P. D. An Enhanced Machine Learning Based Biometric Authentication System Using
    RR-Interval Framed Electrocardiograms. arXiv preprint arXiv:1907.13517. (2019).
 2. Hsu, C. C. Y., Hardt, M., & Hardt, M. Linear Dynamics: Clustering without identifica-
    tion. arXiv preprint arXiv:1908.01039. (2019).
 3. Kachhara, S., & Ambika, G. Bimodality and Scaling in Recurrence Networks from ECG
    data. arXiv preprint arXiv:1908.01286. (2019).
 4. Contoyiannis, Y., Diakonos, F., & Kampitakis, M. Applying the Method of Critical Fluctu-
    ations on Human Electrocardiograms. arXiv preprint arXiv:1908.06408. (2019).
 5. Ding, Z., Qiu, S., Guo, Y., Lin, J., Sun L., Fu D., & Lv T. LabelECG: A Web-based Tool
    for Distributed Electrocardiogram Annotation. arXiv preprint arXiv:1908.06553. (2019).
 6. Yuan, B., & Xing, W. Diagnosing Cardiac Abnormalities from 12-Lead Electrocardiograms
    Using Enhanced Deep Convolutional Neural Networks. arXiv preprint arXiv:1908.06802.
    (2019).
 7. Das, A. K., Catthoor, F., & Schaafsma, S. Heartbeat Classification in Wearables Using
    Multi-layer Perceptron and Time-Frequency Joint Distribution of ECG. In 2018 IEEE/ACM
    International Conference on Connected Health: Applications, Systems and Engineering
    Technologies (CHASE) (pp. 69-74). IEEE. (2018).
 8. Liu, Y., He, R., Wang, K., Li, Q., Sun, Q., Zhao, N., & Zhang, H. Automatic Detection of
    ECG Abnormalities by using an Ensemble of Deep Residual Networks with Attention. arXiv
    preprint arXiv:1908.10088. (2019).
 9. Ding, X., Yan, B. P., Zhang, Y. T., Liu, J., Su, P., & Zhao, N. Feature Exploration for
    Knowledge-guided and Data-driven Approach Based Cuffless Blood Pressure Measure-
    ment. arXiv preprint arXiv:1908.10245. (2019).
10. Arsene, C. Complex Deep Learning Models for Denoising of Human Heart ECG sig-
    nals. arXiv preprint (2019).
11. Sun H, Ganglberger W, Panneerselvam E, Leone MJ, Quadri SA, Goparaju B, Tesh RA,
    Akeju O, Thomas RJ, Westover MB Sleep Staging from Electrocardiography and Respira-
    tion with Deep Learning. arXiv preprint arXiv:1908.11463. (2019).
12. Wang, S. C., Wu, H. T., Huang, P. H., Chang, C. H., Ting, C. K., & Lin, Y. T. Novel imaging
    revealing inner dynamics for cardiovascular waveform analysis via unsupervised manifold
    learning. arXiv preprint arXiv:1909.04206. (2019).
13. Baklan, I., Mukha, I., Oliinyk, Y., Lishchuk, K., Nedashkivsky, E., & Gavrilenko, O. Anom-
    alies Detection Approach in Electrocardiogram Analysis Using Linguistic Modeling. In In-
    ternational Conference on Computer Science, Engineering and Education Applications (pp.
    513-522). Springer, Cham. (2019).
14. Hamming, R. W. "Error detecting and error correcting codes" (PDF). The Bell System Tech-
    nical Journal. 29 (2): 147–160. doi:10.1002/j.1538-7305.1950.tb00463.x. ISSN 0005-8580.
    (1950).
15. Levenshtein, VI. "Binary Codes Capable of Correcting Deletions, Insertions and Reversals."
    Soviet Physics Doklady 10, p.707 (1966).
16. Jaro, M. A. "Advances in record linkage methodology as applied to the 1985 census of
    Tampa Florida". Journal of the American Statistical Association. 84 (406): 414–20.
    doi:10.1080/01621459.1989.10478785.(1989).
17. Levenshtein, Vladimir I., "Binary codes capable of correcting deletions, insertions, and re-
    versals", Soviet Physics Doklady, 10 (8): 707–710 (1966).
18. Damerau, Fred J. "A technique for computer detection and correction of spelling errors",
    Communications of the ACM, 7 (3): 171–176, doi:10.1145/363958.363994 (1964).
19. De Maesschalck, R.; D. Jouan-Rimbaud, D.L. Massart The Mahalanobis distance. Chemo-
    metrics and Intelligent Laboratory Systems 50:1–18 (2000).
20. Majorek, Karolina A.; Dunin-Horkawicz, Stanisław; et al., "The RNase H-like superfamily:
    new members, comparative structural analysis and evolutionary classification", Nucleic Ac-
    ids Research, 42 (7): 4160–4179, doi:10.1093/nar/gkt1414, PMC 3985635, PMID
    24464998. (2013).
21. P. Sellers, The theory and computation of evolutionary distances: Pattern recognition, J. Al-
    gorithms 1, 359–373, (1980).
22. W. Chang and J. Lampe. Theoretical and empirical comparisons of approximate string
    matching algorithms. In Proc. 3rd Combinatorial Pattern Matching (CPM'92), LNCS 644,
    pages 172-181, (1992).
23. Saaty, T.L. The Analytic Hierarchy Process, New York: McGraw Hill. International, Trans-
    lated to Russian, Portuguese, and Chinese, Revised editions, Paperback (1996, 2000), Pitts-
    burgh: RWS Publications. (1980).
24. O.A.Pavlov, K.I.Lishchuk, V.I.Kut Mathematical optimization models for substantiating
    and finding the weights of objects in the method of pairwise comparisons. System Research
    and Information Technologies, No.2, pp,13-21 (2017).
25. A. Apostolico, Z. Galil, Pattern Matching Algorithms, Oxford University Press, Oxford,
    (1997).
26. M. Crochemore, W. Rytter, Text Algorithms, Oxford University Press, Oxford, (1994).