=Paper= {{Paper |id=Vol-2485/paper28 |storemode=property |title=Applying Visual Analysis Procedures to Multidimensional Medical Data |pdfUrl=https://ceur-ws.org/Vol-2485/paper28.pdf |volume=Vol-2485 |authors=Alexander Bondarev,Vladimir Galaktionov }} ==Applying Visual Analysis Procedures to Multidimensional Medical Data== https://ceur-ws.org/Vol-2485/paper28.pdf
  Applying Visual Analysis Procedures to Multidimensional Medical Data
                                              A.E. Bondarev 1,V.A. Galaktionov1
                                            bond@keldysh.ru|vlgal@gin.keldysh.ru
                 1
                   Keldysh Institute of Applied Mathematics RAS, 125047 Miusskaya sq. 4, Moscow, Russia
    The paper considers the tasks of visual analysis of multidimensional data sets of medical origin. For visual analysis, the approach
of building elastic maps is used. The elastic maps are used as the methods of original data points mapping to enclosed manifolds having
less dimensionality. Diminishing the elasticity parameters one can design map surface which approximates the multidimensional dataset
in question much better. To improve the results, a number of previously developed procedures are used - preliminary data filtering,
removal of separated clusters (flotation). To solve the scalability problem, when the elastic map is adjusted both to the region of
condensation of data points and to separately located points of the data cloud, the quasi-Zoom approach is applied. The illustrations of
applying elastic maps to various sets of medical data are presented.
    Keywords: multidimensional data, visual analysis, elastic maps, quasi-Zoom.

                                                                       phototechnics. The results of applying these procedures to
1. Introduction                                                        multidimensional volumes of data of various origins are
                                                                       presented in [1-4].
    In the analysis of multidimensional data a special place is
                                                                           This approach is generally universal, since it does not depend
occupied by the task of classification. When solving
                                                                       on the nature of the studied multidimensional data. This makes it
classification problems, the approaches of visual analytics are
                                                                       possible to apply this approach and the developed procedures to
very useful. They are the synthesis of several algorithms for
                                                                       the tasks of studying multidimensional medical data. This paper
reducing the dimension and the visual presentation of
                                                                       represents the results of applying the construction of elastic maps
multidimensional data in manifolds of a lower dimension nested
                                                                       and procedures developed earlier for the visual analysis of
in the original volume. These algorithms include the display of
                                                                       multidimensional data volumes of medical origin.
the original multidimensional volume in elastic maps [8, 9, 18]
                                                                           In most of the previous cases, we considered data sets that
with different properties of elasticity. These methods allow to get
                                                                       were specially prepared in advance. Here, for the first time, we
insight of the cluster structure contained in the initial              took several sets of publicly available medical data sets [16].
multidimensional data volume under question.
                                                                       Some results were previously presented in [3].
    Our team became interested in elastic maps in the process of
implementing a project to develop computational technologies           2. Elastic maps approach
for building, processing, analyzing and visualizing
multidimensional parametric solutions of CFD problems.                     The ideology and algorithms for construction of elastic maps
Computational technology is implemented in the form of a single        are described in detail [8, 9, 18]. Elastic map is a system of
technological pipeline of algorithms for the production,               elastic springs embedded in a multidimensional data space. This
processing, visualization and analysis of multidimensional data.       approach is based on an analogy with the problems of mechanics:
Such pipeline can be considered as a prototype of a generalized        the main manifold passing through the "middle" of the data can
computational experiment for non-stationary problems of                be represented as an elastic membrane or plate. The method of
computational gas dynamics. As a result, such a generalized            elastic maps is formulated as an optimization problem, which
computational experiment makes it possible to obtain a solution        assumes optimization of a given functional from the relative
not for a single individual problem, but for a whole class of          location of the map and data.
problems, defined by ranges of variation of the determining                According to [18], the basis for constructing an elastic map
parameters. It should also be noted the universality of such           is a two-dimensional rectangular grid G embedded in a
approach. It can be applied to a wide range of problems of             multidimensional space that approximates the data and has
mathematical modeling of non-stationary processes. The                 adjustable elastic properties with respect to stretching and
description of the elements of the implemented computing               bending. The location of the grid nodes is sought as a result of
technology is given in [5, 6].                                         solving the optimization problem for finding the minimum of the
    In practice, elastic maps turned out to be a useful and quite      functional:
versatile tool, which made it possible to apply them to                                      𝐷1      𝐷2     𝐷3
multidimensional data volumes of various types. This approach                           𝐷=      +𝜆 +𝜇          → 𝑚𝑖𝑛 ,
                                                                                            |𝑋|      𝑚      𝑚
was applied to the tasks of analyzing textual information, where
the frequencies of using words [1] were used as numerical              where │X│ is the number of points in the multidimensional data
characteristics, as well as to the tasks of analyzing mineral          volume X; m is the number of grid nodes, λ and μ are the elastic
samples [11]. In the process of working on these tasks, a number       coefficients responsible for the stretching and curvature of the
of procedures for processing the studied data were developed and       mesh. Here D1, D2, D3 are the terms responsible for the properties
tested, which made it possible to improve the results of visual        of the grid. The term D1 is a measure of the proximity of the grid
analysis. These procedures include the preliminary filtering of        nodes to the data. The term D2 represents the measure of the
data, which allows weeding out points with indistinctly defined        stretching of the grid. The term D3 represents the measure of the
values, the removal of separated clusters (flotation), quasi-Zoom.     curvature of the grid.
The latter procedure is designed to solve the problem of                    The author of the approach [18] has developed the software
scalability, when the elastic map adapts both to the area of data      package [17], which allows the construction and visual
points concentration and to separately located points of the data      presentation of elastic maps. The main functional features of this
cloud, which complicates visual analysis. The essence of this          software are described in detail in [18]. The figures below in this
technological approach is that for finer adjustment it is necessary    article are created by means of this software package.
to select large clusters in the studied volume of multidimensional
data and build elastic maps for selected clusters separately, thus
organizing an effect similar to the zoom function in modern



Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
3. Procedures for visual analysis
Previously, to study multidimensional data, a number of
procedures for processing the studied data were developed,
which allowed to improve the results of visual analysis. These
procedures include the preliminary filtering of data, which allows
weeding out points with indistinctly defined values, the removal
of separated clusters (flotation), quasi-Zoom. Below we briefly
give examples of the application of these procedures to
multidimensional volumes of data of different origin.
An example of constructing elastic maps for the volume of
multidimensional data representing the characteristics of mineral
resources, namely, three types of coal from Polish deposits [11],
is given in [4]. Multidimensional data are considered,
representing points in the multidimensional feature space
(characteristics of coal samples). The data set displays three
grades of coal. The task of classifying coal by grade was
considered. By combining the construction of elastic maps, the            Fig. 2. Extension of the elastic map for the transposed data set
removal of fuzzy points and separated classes (filtering and                               after applying quasi-Zoom.
flotation of data), it is possible to completely separate the
samples specified in the initial volume into three classes                   Also, the construction of elastic maps was applied to the
corresponding to three types of coal.                                    study of multidimensional arrays of errors of different solvers
Examples of the use of quasi-Zoom for analyzing the thematic             compared to the etalon solution [4]. We considered the numerical
proximity of the words of the Russian language are given in [1,          results of comparing the accuracy of the work of various solvers
2, 4]. The basis of the proposed method is the analysis of the           of the OpenFOAM software package using the example of the
environment of words. The main hypothesis is that similar words          well-known inviscid flow problem around a cone at zero angle
should occur in approximately the same context. In this regard,          of attack. The results obtained using various OpenFOAM solvers
in the space of attributes, they will be located at a relatively close   were compared with the well-known numerical solution of this
distance from each other, while the different words will be              problem with the variation of the free-stream Mach number and
located at a distance more distant from each other. Text boxes           the angle of the cone. Four solvers of OpenFOAM software
from news sources were used as test data (news feeds for a certain       package - rhoCentralFoam, pisoCentralFoam, sonicFoam,
period). For the primary tests, about 100 verbs with 353 nouns           rhoPimpleFoam participated in the comparison. All these solvers
associated with them were selected. The data thus obtained was           have different approximation and computational properties.
further considered as a multidimensional data volume,                    Figure 3 shows the elastic map for pressure, obtained as a result
representing 100 points in 353-dimensional space. The numerical          of parametric calculations, in the space of the first principal
values of the resulting matrix are defined as frequencies of             components. The yellow circles show the results for
sharing. The data volume under study contained a region of high          rhoCentralFoam solver, the red ones for pisoCentralFoam, the
data density and points far enough from this region. In the study        green ones for sonicFoam and the blue ones for rhoPimpleFoam.
of the frequency of the joint use of verbs and nouns, the practical
task was set as follows. It was necessary to separate the "stuck
together" points. The use of filtering and two consecutive quasi-
Zoom procedures allowed to solve this problem completely
(Fig.1).




                                                                              Fig. 3. Elastic map for the array of errors for different
                                                                                               OpenFOAM solvers.

                                                                             The results of the visual analysis showed that the errors for
                                                                         rhoCentralFoam and for pisoCentralFoam can be roughly
                                                                         approximated by a plane reflecting the dependence of the error
                                                                         on the Mach number and the cone angle.

                                                                         4. Processing of medical datasets
Fig. 1. Extension of the elastic map after two consecutive quasi-
                       Zoom applications.                                     The attempt of applying elastic maps to medical data was
                                                                         made in [2]. For this purpose the data from [13] were used. This
The use of a similar approach for the transposed data file allowed       data set contains values for six biomechanical features used to
us to select among the set of nouns a number of semantic clusters        classify orthopaedic patients into 2 classes (normal or abnormal).
(Fig.2). This opens up additional opportunities for the analysis         Each patient is represented in the data set by six biomechanical
and interpretation of semantic groups for specialists in this field.     attributes derived from the shape and orientation of the pelvis and
lumbar spine (in this order): pelvic incidence, pelvic tilt, lumbar
lordosis angle, sacral slope, pelvic radius and grade of
spondylolisthesis. The data set contains 310 points in 6-
dimensional space. Unfortunately, elastic maps didn’t give good
results from the point of view of classification.
    Below are the results for the three other volumes of
multidimensional data that involve the solution of the
classification problem. All data sets were taken from UCI
Machine Learning Repository [16].
    The first data set considers variability of impedivity in
normal and pathological breast tissue [10] and tasks of
classifying various types of diseases [14]. This dataset contains
106 points placed in 9-dimensional attribute space. Also each
point has its class attribute corresponding to the type of disease -
carcinoma, fibro-adenoma, mastopathy, glandular, connective,
adipose. According to [14], the dataset can be used for predicting
the classification of either the original 6 classes or of 4 classes by
merging together the fibro-adenoma, mastopathy and glandular
classes whose discrimination is not important (they cannot be                    Fig. 6. Extension of elastic map for source data.
accurately discriminated anyway).
    Further, we use the following notation and color scheme for              Figures show that (car + fad +) and (con + adi ) pairs of
the classes studied: car (carcinoma) - red, adi (adipose) - yellow,      classes are well separated. However, within the pair, data from
con (connective) - green, fad + (fibro-adenoma + mastopathy +            different classes are mixed. To improve the picture of the
glandular) - blue. We use the combined fad + class because of            separation, use flotation and remove fad +. The results of
the above remark by the authors of the volume of data that these         building an elastic map for this case are shown in Figure 7. In
classes are not separated exactly.                                       this case, the car class was fully distinguished.
    Below one can see the illustrations of the construction of
elastic maps for the studied data volume. Figure 4 shows the
source data in the space of the first three principal components.
Figures 5 and 6 show the elastic map and its development for a
given amount of data.




                                                                            Fig. 7. Extension of elastic map for classes car, con, adi.

                                                                             Now remove the car class and consider separately the
                                                                         remaining pair of classes - con and adi. After constructing the
      Fig. 4. Source data in the space of the first principal
                                                                         elastic map and its development, we obtain the picture presented
                          components.
                                                                         in Figure 8. In this case, a satisfactory separation of classes was
                                                                         achieved.




               Fig. 5. Elastic map for source data.                            Fig. 8. Extension of elastic map for classes con, adi.
    Next, consider together a couple of classes - car and fad +.
Figure 9 presents the extension of the elastic map for these
classes. There is also a satisfactory separation. The use of q-
Zoom in order to improve the separation in the center of the
picture did not lead to success. Also, the attempt to divide the
mixed fad + class into the fad, mas, gla classes was not
successful. The comment in [14] about the inseparability of these
classes turned out to be true.




                                                                        Fig. 11. Extension of elastic map for 10-dimensional attribute
                                                                                                     space.

                                                                           However, in the original article [12] a picture was given from
                                                                       which it was possible to conclude that only for 4 parameters
                                                                       (glucose, Insulin, Resistin, HOMA-homeostasis model
                                                                       assessment) there is a significant difference between patients and
                                                                       healthy people. From the data space, only these 4 dimensions
     Fig. 9. Extension of elastic map for classes car, fad+.
                                                                       were left, and the elastic map was re-constructed. The results are
                                                                       shown in Fig. 12. The separation between the green and red dots
     The following data set is also devoted to the problems of         has improved significantly, however, in the center of the picture
forecasting breast diseases [7, 12]. The data set contains 116         there is an area where the dots are mixed.
points in a 10 -dimensional attribute space. Each point also
contains a binary variable indicating the presence or absence of
the disease. Attribute space contains ten predictors. According to
[12], the predictors are anthropometric data and parameters
which can be gathered in routine blood analysis.
     Prediction models based on these predictors, if accurate, can
potentially be used as a biomarker of breast cancer.
     For this data volume, an elastic map was constructed. Dots
with the absence of the disease are shown in green, and the
presence of the disease is marked in red.
     Figures 10 and 11 represent the constructed elastic map and
its extension. As one can see, the green and red dots are strongly
mixed. This caused some confusion, since by construction this
picture represents points that have to be close to each other in the
multidimensional attribute space.




                                                                         Fig. 12. Extension of elastic map for 4-dimensional attribute
                                                                                                     space.

                                                                            The following dataset is for the early diagnosis of the Autistic
                                                                       Spectrum Disorder (ASD) [15]. The data set consists of 692
                                                                       points originally defined in the 21-dimensional attribute space.
                                                                       The diagnostic approach is based on the analysis of the
                                                                       questionnaire data consisting of 10 questions. About half of the
                                                                       attributes are patient data. Therefore, it was decided to leave 12
                                                                       attributes - 10 answers to the questionnaire, the age of the patient
                                                                       and the total score according to the results of the questionnaire.
    Fig. 10. Elastic map for 10-dimensional attribute space.
                                                                       The results are presented in Figures 13 and 14 in the form of an
                                                                       elastic map and its scan.
                                                                       the total variance, followed by the removal of unnecessary
                                                                       criteria.

                                                                       6. References
                                                                       [1] Bondarev, A.E. et al, 2016. Visual analysis of clusters for a
                                                                            multidimensional textual dataset. Scientific Visualization.
                                                                            8(3), 1-24.
                                                                       [2] Bondarev, A.E., 2017. Visual analysis and processing of
                                                                            clusters structures in multidimensional datasets. ISPRS
                                                                            Archives, XLII-2/W4, 151-154.
                                                                       [3] Bondarev, A. E.: The procedures of visual analysis for
                                                                            multidimensional data volumes, Int. Arch. Photogramm.
                                                                            Remote Sens. Spatial Inf. Sci., XLII-2/W12, 17-21,
                                                                            doi.org/10.5194/isprs-archives-XLII-2-W12-17-2019
 Fig. 13. Elastic map for 12-dimensional attribute space when          [4] Bondarev, A.E., Bondarenko, A.V., Galaktionov, V.A.,
                       diagnosing ASD.                                      2018. Visual analysis procedures for multidimensional data.
                                                                            Scientific     Visualization     10(4),   109      -   122,
                                                                            doi.org/10.26583/sv.10.4.09.
                                                                       [5] Bondarev, A.E., Galaktionov, V.A., 2015a. Analysis of
                                                                            Space-Time Structures Appearance for Non-Stationary
                                                                            CFD Problems. Procedia Computer Science, 51, 1801–
                                                                            1810.
                                                                       [6] Bondarev,        A.E.,     Galaktionov,     V.A.,     2015b.
                                                                            Multidimensional data analysis and visualization for time-
                                                                            dependent CFD problems. Programming and Computer
                                                                            Software,                   41(5),                 247–252,
                                                                            doi.org/10.1134/S0361768815050023.
                                                                       [7] Crisóstomo, J. et al., 2016. Hyperresistinemia and metabolic
                                                                            dysregulation: a risky crosstalk in obese breast cancer.
                                                                            Endocrine, 53(2), 433-442, doi.org/10.1007/s12020-016-
                                                                            0893-x
                                                                       [8] Gorban, A. et al, 2007. Principal Manifolds for Data
                                                                            Visualisation and Dimension Reduction, Springer, Berlin –
                                                                            Heidelberg – New York, 2007.
                                                                       [9] Gorban A., Zinovyev A., 2010. Principal manifolds and
                                                                            graphs in practice: from molecular biology to dynamical
                                                                            systems. International Journal of Neural Systems, 20(3),
                                                                            219–232.
 Fig. 14. Extension of elastic map for 12-dimensional attribute        [10] Jossinet, J., 1996. Variability of impedivity in normal and
                 space when diagnosing ASD.                                 pathological breast tissue. Med. & Biol. Eng. & Comput,
                                                                            34, 346-350.
    These results show that the separation between diagnoses           [11] Niedoba, T., 2014. Multi-parameter data visualization by
about the presence or absence of ASD is quite satisfactory on the           means of principal component analysis (PCA) in qualitative
studied data set.                                                           evaluation of various coal types / Physicochemical
                                                                            Problems of Mineral Processing, 50(2), 575-589.
                                                                       [12] Patrício, M., et al 2018. Using Resistin, glucose, age and
5. Conclusions                                                              BMI to predict the presence of breast cancer. BMC Cancer,
    For the analysis of structures in multidimensional data                 18(1), doi.org/10.1186/s12885-017-3877-1.
volumes, technologies for constructing elastic maps are used,          [13] Rocha Neto, A., Barreto, G., 2009. On the Application of
which are methods for mapping points of the original                        Ensembles of Classifiers to the Diagnosis of Pathologies of
multidimensional space to nested manifolds of lower dimension.              the Vertebral Column: A Comparative Analysis, IEEE Latin
A number of data processing techniques that can improve the                 America Transactions, 7(4), 487-496.
results are considered - pre-filtering of data, removal of separated   [14] Silva, J.E., Marques de Sá, J.P., Jossinet, J., 2000.
clusters (flotation), quasi-Zoom. Examples of the construction of           Classification of Breast Tissue by Electrical Impedance
elastic maps and the use of these procedures for                            Spectroscopy. Med & Bio Eng & Computing, 38, 26-30.
multidimensional data of medical origin are given. The results         [15] Thabtah, F., 2017. Machine learning in autistic spectrum
showed that the construction of elastic maps together with the              disorder behavioral research: A review and ways forward.
procedures of accompanying data processing can serve as a                   Informatics for Health and Social Care, doi.org/ ·
useful tool for visual data analysis and complement other                   10.1080/17538157.2017.1399132
methods for studying multidimensional data volumes.                    [16] UCI       Machine       Learning      Repository,     2019.
    However, the results show that when processing medical data             archive.ics.uci.edu/ml/ (01 March 2019).
from open sources, we are faced with a new problem. The data           [17] ViDaExpert, 2019. bioinfo.curie.fr/projects/vidaexpert (01
considered are clearly overloaded with unnecessary                          March 2019).
measurements and unnecessary information. This makes the data          [18] Zinovyev, A., 2000. Vizualizacija mnogomernyh dannyh
“noisy” and does not allow class division. To overcome this                 [Visualization of multidimensional data]. Krasnoyarsk,
problem, it is planned in the future to implement an additional             publ. NGTU. 2000. 180 p. [In Russian].
procedure for analyzing the contribution of each measurement to