=Paper= {{Paper |id=Vol-2667/paper41 |storemode=property |title=The using of machine learning and neural networks in the processing of computer simulation results for medical diagnostics |pdfUrl=https://ceur-ws.org/Vol-2667/paper41.pdf |volume=Vol-2667 |authors=Maxim Polyakov,Alexander Khoperskov,Egor Borisovskii,Egor Emelyanov }} ==The using of machine learning and neural networks in the processing of computer simulation results for medical diagnostics == https://ceur-ws.org/Vol-2667/paper41.pdf
The using of machine learning and neural networks
in the processing of computer simulation results for
                medical diagnostics
                           Maxim Polyakov                                                          Alexander Khoperskov
       Department of information systems and computer modeling                    Department of information systems and computer modeling
                      Volgograd State University                                                 Volgograd State University
                          Volgograd, Russia                                                          Volgograd, Russia
                        m.v.polyakov@volsu.ru                                                       khoperskov@volsu.ru


                         Egor Borisovskii                                                          Egor Emelyanov
     Department of information systems and computer modeling                   Department of information systems and computer modeling
                    Volgograd State University                                                Volgograd State University
                        Volgograd, Russia                                                         Volgograd, Russia
                        infomod@volsu.ru                                                          infomod@volsu.ru

   Abstract—We use machine learning technologies and neural                  medicine [1], [7]. The paper [5] states that the minimum
networks to improve the efficiency of medical diagnostics based              size of a cancerous tumor detected by mammography is 1.68
on the microwave radiometry. Originality of our approach is
that we use the results of computer modeling temperature fields              cm in diameter. The task of microwave radiometry is to
in multicomponent biological tissues to build training and test              detect smaller tumors. Microwave radiometry can also detect
data sets. We investigate limits of applicability of the method for          cancerous tumors or early structural changes that are not
diagnosing breast cancer using microwave radiometry data. Task               detected and can be skipped when using mammography.
of determining diagnostic quality for different sizes of tumors
seems promising to us.                                                                           II. P ROBLEM STATEMENT
  Keywords—computer modeling, machine learning, neural
networks, diagnosis of cancer, microwave radiometry
                                                                                Mathematical models and numerical methods that we use
                                                                             to construct a sample of temperature data are described in [9].
                                                                             We have verified models and it has shown the effectiveness
                         I. I NTRODUCTION
                                                                             of building models of the mammary glands healthy patients
   Early diagnosis of breast cancer is an important issue in                 (without cancer pathologies) [10]. The main task of this study
modern medicine. Incidence of breast cancer is increasing                    is to determine threshold value size of the tumor, which can
worldwide and is one of the most common types of cancer.                     be detected by microwave radiometry.
At the moment, there is no effective means to prevent breast
cancer. Probability of successful treatment and full recovery of
the patient depends entirely on early detection and diagnosis.
Breast cancer is a curable disease with a probability of
97% with early detection [4]. Microwave radiometry is a
non-invasive method for measuring internal temperature in
biological tissues. This method is based on measuring the self-
radiation of the biological tissues in the radio to microwave
range. Possibility of using microwave radiometry to diagnose
breast cancer was shown in the article [3] for the first time. We
can assume that the theoretical basis of microwave radiometry
                                                                             Fig. 1. Distribution of thermodynamic temperature at a depth 4 cm obtained
in mammology is based on the research of the French scientist                by computer simulation. On the left is a model with a tumor of radius R =
M. Gautherie [6] to some extent. He convincingly showed that                 0.5 cm.
the heat release of the tumor is directly proportional to the
rate of its growth. His research was based on clinical data                    To do this, we need to build samples data from computer
from more than 85,000 patients. Microwave radiometry has                     modeling of breast temperatures, volume which will allow for
a unique ability to detect fast-growing tumors in the first                  machine learning, as well as binary classification of test data
place. Microwave radiometry is also used in other fields of                  (healthy-sick). When building a large volume of models, it




Copyright © 2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
Data Science


is necessary to take into account the fact that most often
malignant tumors appear in the upper outer quadrant. We
present results of numerical simulation thermal dynamics in
the biological tissues of the mammary gland (Fig. 1.).
       III. M ACHINE LEARNING METHODS AND TOOLS
   The following methods are used for binary classification
of computer modeling data: support vector machines (SVM),
k-nearest neighbors, (KNN) and the naive Bayesian classifier
(NBC). These methods are implemented in the Scikit-learn
Python library. In addition to the classification task, this library
allows you to build regressions, perform clusterization, and so
on.                                                                        Fig. 3. Frequency distribution of internal temperature data for a) points “0”
                                                                           of the model without a tumor, b) points “0” of the model with a tumor R
                                                                           =0.75 cm, c) points “3” models without a tumor, d) points “3” models with
                                                                           a tumor R =0.75 cm.


                                                                           the classic standardization method to bring an array of input
                                                                           data to a single view. After applying it, each attribute has an
                                                                           average of “0” and a variance of “1”. Data normalization was
                                                                           performed using the formula

                                                                                                              xij − Mj
                                                                                                       x̂ =            ,                              (2)
Fig. 2. Structure of source data set.                                                                             Sj
                                                                                  where Mj is the average value of the samples, and Sj is
  The training set is a temperature data set at points of the the standard deviation of the samples.
breast (according to the survey method [2]) in the microwave
range (depth temperature) and in the infrared range (skin tem-
perature). Each model has the “1” – sick or “0” – healthy flag.                                             a
                                                                                     1
Equal data samples containing 80 models were constructed:
without a tumor, with a tumor of radii R=0.5 cm, R=0.75 cm
                                                                            loss




                                                                                   0.5
and R=1 cm (Fig. 2.). A cross section was made with skin and
depth temperatures of points 0,..., 8 and we have a combined
data set                                                                             0
                                                                                       0     100       200      300       400      500
             1
               T0           T11 . . . T18             1
                                                                 
                                                                    y1
                                                                                                           b
                                                                                     1
             T02           T1 2
                                       . . . T18     2 
                                                                   y2 
                                                                            accuracy




                                                                         
       X=  . . . . . . . . . . . . . . . . . . . . . . . , Y = . . . . , (1)  0.8
                240          240                     240
              T0          T1           . . . T18                   y240            0.6

   where T0i , . . . , T8i are internal temperature at the points                      0.4
                                                                                             0   100      200           300        400          500
0, . . . , 8, T9i , . . . , T18
                             i
                                are temperature of the skin at points
                                                                                                                epoch
0, . . . , 8 and yi ∈ {Health, R=0.5, R=0.75, R=1.0} is label i
model.                                                                     Fig. 4. Dependence loss (a) and accuracy (b) on the number of training
   Ratio of cancer and healthy models in the training and                  epochs of the neural network.
test sets is assumed to be equal, in order to preserve data
uniformity. We conducted a statistical analysis of the data at                We solve binary classification problem, so number of neu-
the first stage. It showed general differences between the data            rons in the output layer is two. The softmax function is
sets.                                                                      selected as the activation function for last layer. This function
   Qualitative analysis of large amounts of data is particularly           converts the network output vector to another vector. The
difficult. Statistical analysis confirms reliable differences in           coordinate of this vector is located in range [0; 1]. The sum
temperature data at certain points of the breast (Fig. 3.).                of the coordinates is 1. We define patient class by the largest
                                                                           coordinate in output vector. Parameters of the layers and their
               IV. T HE USING OF NEURAL NETWORKS                           number were selected empirically. We were based on the
   Neural networks are another way to perform binary clas-                 results of testing.
sification. It is necessary that the input data lie in a single               To train a neural network, the original sample consisting
range for successful training of the neural network. We used               of 160 numerical simulation results was randomly mixed and



VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                                190
Data Science


divided into two parts, the first p art c ontaining 1 20 survey
results, the second part containing 40. Gradient optimization                                                  √
                                                                                                       G=       L·S,                           (3)
methods were used to train neural networks.
   In order for the model not to be retrained, the number of                                   TP                 TN
epochs was set to 500. The Fig.4. shows the accuracy of model                 where L =                ,S=                , T P is proportion
                                                                                           TP + FN             TN + FP
and model loss in learning process.                                        of correctly classified glands class “Cancer”, F N is proportion
                 V. D ISCUSSION OF RESULTS                                 of misclassified g lands c lass “ Cancer”, T N i s p roportion of
                                                                           correctly classified g lands c lass “ Healthy”, F N i s proportion
   Experiments have shown a significant d ependence o f the                of misclassified glands class “Healthy”.
effectiveness diagnostics based on microwave radiometry data
on size of the tumor.
                                                                              TABLE II. SENSITIVITY, SPECIFICITY, AND EFFECTIVENESS OF THE SVM
                                                                                             METHOD FOR VARIOUS DATA SAMPLES
       TABLE I. EFFECTIVENESS OF DATA CLASSIFICATION METHODS


                               NBC          KNN     SVM                                                       L       S       G
                 R=0.5 cm      0.475        0.525   0.575                                    R=0.5 cm        0.68    0.49   0.577
                 R=0.75 cm      0.7         0.675   0.725                                    R=0.75 cm       0.78   0.725    0.75
                  R=1 cm       0.74         0.75    0.79                                      R=1 cm         0.82    0.76    0.79



   The support vector method gave the best result in relation                 The calculated sensitivity and specificity indicators (Table 2)
to other machine learning methods (Table 1), which indicates               are quite high. At the same time, we should note characteristic
that this method is better applicable to this type of tasks and to         space limited only by temperatures and rather small training
this structure of training data set. The SVM method’s gain in              data set.
relation to the NBC is 10% for a data set with a tumor R=0.5
cm. Which is significant for the task of medical diagnostics.                   TABLE III. RESULTS OF NEURAL NETWORK TESTING FOR VARIOUS
                                                                                                            PARAMETERS
Ratio of correctly classified models to volume of test sample
was used as a measure of efficiency for comparing methods.                                                    Model 1    Model 2    Model 3   Model 4
E = Fω , where ω is number of correctly recognized models,                       Number of layers                6          5         4          5
and F is volume of test data set.                                            Number of neurons in 1 layer       18         18         18        18
                                                                             Number of neurons in 2 layer       18         18         18         9
                                                                             Number of neurons in 3 layer       18          9         3          3
                                                                             Number of neurons in 4 layer       18          3         2          3
      0.8
                                                                             Number of neurons in 5 layer       18          2         –          2
                                                                             Number of neurons in 6 layer        2          –         –          –
     0.75                                                                                L                     0.72       0.62       0.59      0.81
                                                                                         S                     0.66       0.59       0.55      0.67
      0.7                                                                                E                     0.7        0.67       0.57      0.75

     0.65
                                                                              Neural network testing results show comparable results
 E




      0.6                                                                  with machine learning methods (Table 3). Structure of neu-
                                                                           ral network significantly affects the accuracy of diagnostics.
     0.55                                                                  Effectiveness of diagnostics increases with increase in radius
                                                                           of tumor (Fig. 5.). Even for small tumors of radius R=0.5
      0.5                                                                  cm with a probability of 57.5%, it is possible to correctly
                                                                           determine the class. We can expect a successful application
     0.45
        0.5        0.6        0.7           0.8        0.9      1          of microwave radiometry method for smaller tumors, with an
                                    R, cm                                  expansion of feature space, an increase in sample size, and the
                                                                           use of heuristics.
Fig. 5. Dependence of effectiveness diagnostic method according to mi-
crowave radiometry on size of tumor for various machine learning methods
(SVM is circles, KNN is triangles and NBC is squares).                                            ACKNOWLEDGMENT

   To determine dependence of the effectiveness diagnostics                   MP thanks RFBR (project number 19-37-90142) for the
on size of the tumor, we used binary classification “Healthy”              financial support. AK acknowledges the Ministry of Science
and “Cancer”. This is due to the fact that most important, in              and Higher Education of the Russian Federation (government
our opinion, is correct detection of malignant neoplasms.                  task, project No.0633-2020-0003) for the financial support of
   A measure of the effectiveness of medical diagnostics                   the development of the software. EE is grateful to RFBR and
is considered to be the geometric mean of sensitivity and                  the government of Volgograd region according to the research
specificity                                                                project No. 19-47-343008 for the financial support.



VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                         191
Data Science


                            R EFERENCES
[1] A.G. Gudkov, V.Yu. Leushin, I.A. Sidorov, S.G. Vesnin, I.O. Porokhov,
     M.K. Sedankin, S.V. Agasieva, S.V. Chizhikov, E.N. Gorlacheva, M.I.
     Lazarenko and V.D. Shashurin, "The use of multichannel microwave
     radiometry for the functional diagnosis of the brain," Medical
     Technology, vol. 2, no. 314, pp. 22-25, 2019.
 [2] A.G. Losev, E.A. Mazepa and T.V. Zamechnik, "About several typical
     traits in the diagnosis of mammary glands pathologyaccrding to the
     date of microwave radiometry," Modern problems of science and
     education, vol. 6, pp. 254-261, 2014.
  [3] A.H. Barrett and P.C. Myers, "Subcutaneous Temperature: A
     method of Noninvasive Sensing," Science, vol. 190, pp. 669-671, 1975.
  [4] E.Y.K. Ng, "A review of thermography as promising non-invasive
     detection modality for breast tumor," Int J Therm Sci, vol. 48, no. 5,
     pp. 849-859, 2009.
  [5] J.R. Keyserlingk, P.D. Ahlgren, E. Yu, N. Belliveau and M. Yassa,
     "Functional infrared imaging of the breast, "IEEE Eng Med Biol
     Mag, vol. 19, no. 3, pp. 30-41, 2000.
  [6] M. Gautherie, "Temperature and Blood Flow Patterns in Breast
     Cancer During Natural Evolution and Following Radiotherapy,"
     Biomedical Thermology, pp. 21-64, 1982.
[7] M.K. Sedankin, V.Yu. Leushin., A.G. Gudkov, S.G. Vesnin, D.A.
     Khromov, I.O. Porokhov, I.A. Sidorov, S.V. Agasieva and E.N.
     Gorlacheva, "Modeling the intrinsic thermal radiation of the kidney in
     the microwave range," Medical Technology, vol. 1, no. 313, pp. 44-47,
     2019.
[8] M.K. Sedankin, V.Yu. Leushin, A.G. Gudkov, S.G. Vesnin, I.A. Sidorov,
      S. V. Agasieva, L.M. Ovchinnikov and N.A. Vetrova, "Applicator
     antennas for medical microwave radiothermographs," Medical
     Technology, vol. 4, no. 310, pp. 13-15, 2018.
 [9] M.V. Polyakov, A.V. Khoperskov and T.V. Zamechnic, "Numerical
     Modeling of the Internal Temperature in the Mammary Gland,"
     Lecture Notes in Computer Science, vol. 10594, pp. 128-135, 2017.
[10] V. Levshinskii, M. Polyakov, A. Losev and A. Khoperskov,
     "Verification and Validation of Computer Models for Diagnosing
     Breast Cancer Based on Machine Learning for Medical Data
     Analysis," Communications in Computer and Information Science, vol.
     1084, pp. 447-460, 2019.




VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)   192