Biological Data Mining and Its Applications in Pulmonology
Oleksandr Komlevoia, Nataliia Komlevab, Vira Liubchenkob and Svitlana Zinovatnab
a
    Odesa National Medical University Valikhovsky Lane 2, Odesa, 65082, Ukraine
b
    Odesa Polytechnic State University, Shevchenko Ave., 1, Odesa, 65044, Ukraine


                Abstract
                The processing of diagnostic data in pulmonology is complicated for a doctor due to the need
                to analyze many indicators, the relationships between which can be complex, and the degree
                of influence on the diagnostic result can be different. Traditionally, general clinical,
                biochemical, and questionnaire methods are used to support making a diagnosis. They allow
                describing the state of the bronchopulmonary system by a variety of indicators. The modern
                practice of laser correlation spectroscopy makes it possible to expand various indicators. Still,
                their values are represented by sets of one-dimensional distribution diagrams and are not
                convenient for analysis. Therefore, we investigated the feasibility of classifying the values of
                32 biophysical indicators obtained by laser correlation spectroscopy in this work. We first
                performed data visualization and found that the classes of diagnosed diseases did not have a
                clear separation but were separated from the normal state. We then examined the results of
                classifying the data using three algorithms – naive Bayes, logistic regression, and random
                forest. We conclude that the most appropriate algorithm is logistic regression. The work value
                lies in expanding the set of diagnostic indicators due to the high-precision results of the
                classification of biophysical indicators, which increases the objectivity of the diagnosis of
                pulmonological diseases.

                Keywords 1
                Bioinformatics,               pulmonological                 diagnostics,              data   analysis,   biophysical   indicators,
                classification

1. Introduction
    “Bioinformatics is the combination of health information, data, and knowledge. It uses
computational techniques and tools to analyze the enormous biological databases”[1]. Health care
analysts aim to provide the best medical care at an affordable cost, using a modern material and
technical base.
    In [2], three goals of bioinformatics are listed: 1) organize data in such a way that “allows researchers
to access existing information and to submit new entries as they are produced”; 2) “to develop tools and
resources that aid in the analysis of data”; 3) “to use these tools to analyze the data and interpret the
results in a biologically meaningful manner.”
    According to [3], “biological research involves the application of some type of mathematical,
statistical, or computational tools to help synthesize recorded data and integrate various types of
information in the process of answering a particular biological question.”
    Bioinformatics is used in various fields of biology. Its application in genomics is described in [4],
in the field of plant breeding – in [5], in the development of drugs – in [6]. In [7], an overview of many
areas of human activity in which bioinformatics can be applied is given. Biologists are stepping up their
efforts in understanding the biological processes that underlie disease pathways in clinical contexts [8].


IDDM-2021: 4th International Conference on Informatics & Data-Driven Medicine, November 19–21, 2021, Valencia, Spain
EMAIL: shurik73.jan@gmail.com (A. 1); nkomlevaya@gmail.com (A. 2); lvv@op.edu.ua (A. 3); zinovatnaya.svetlana@op.edu.ua (A. 4)
ORCID: 0000-0002-8297-089X (A. 1); 0000-0001-9627-8530 (A. 2); 0000-0002-4611-7832 (A. 3); 0000-0002-9190-6486 (A. 4)
             ©️ 2021 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
    Recent advances in bioinformatics and computational biology have accelerated the development of
complex biomedical systems (for example, knowledge-based decision systems and artificial
intelligence in medical informatics). Artificial Intelligence (AI) has a significant impact on biological
and medical research, and AI-based methods have been increasingly implemented in real-life
medical/healthcare applications. [9] presents five innovative scientific papers related to artificial
intelligence methodologies, with potential applications in biological and medical fields.
    [10] also shows bioinformatics approaches for analyzing measurable indicators using mathematical,
network solutions, and machine learning theories.
    The state of the respiratory system is assessed by a set of biophysical indicators - the percentage of
particles of various natures present in the exhaled air. They are obtained using the method of laser
correlation spectroscopy. For a differential assessment of the physiological and pathological states of
the respiratory system, the change in the biophysical parameters of the humidity of the exhaled air is
assessed.
    The work aims to improve the results of the analysis of the biophysical parameters of exhaled air
condensate obtained by laser correlation spectroscopy in pulmonological diagnostics through intelligent
data processing methods.
    In general, a pulmonological diagnostic is an extensive area. Several works consider the principles
of constructing machine learning algorithms to diagnose and predict asthma [11] and bronchitis [12].
In this article, a diagnostic analysis of the biophysical parameters of exhaled air condensate is carried
out in the presence of these diseases.

2. State of the Art in Pulmonology
    Currently, asthma and bronchitis research uses protocols that include the following methods [13,
14]:
         general clinical methods: analysis of data obtained from case histories on complaints,
    anamnesis of life and disease, objective examination, X-ray examinations;
         questionnaires: conducted with the help of universal questionnaires developed using socio-
    demographic methods;
         biochemical with the study of indicators of lipid peroxidation system (the content of primary
    products of lipid peroxidation – diene conjugates, secondary products – malonic dialdehyde), and
    the activity of the enzyme antioxidant protection – catalase.
    For a wide range of respiratory diseases, including reversible and irreversible obstructive diseases,
there are many studies on the pathogenesis, disorders of respiratory mechanics, ventilation-perfusion
relations, respiratory function, and ventilation regulation.
    Diagnosis of diseases and assessment of their severity is based on various clinical data and additional
studies: a functional examination of the lungs, chest radiography, blood culture, computed tomography,
serological and cultural studies, analysis of sputum by Gram, acid-base composition, and saturation
level, the study of arterial blood gas composition, bronchoscopy, invasive procedures to obtain
diagnostic material, routine laboratory tests, and other [15, 16].
    Traditional methods for assessing the activity of inflammation in the airways include analysis of
normal and induced sputum, bronchoalveolar lavage, bacterioscopy, and bronchobiopsy [17]. Some
physicochemical methods allow to determine traces of gaseous substances in human exhaled air and
determine its micro composition: mass spectrometry combined with gas chromatographic separation,
gas chromatography, electrochemical sensors, UV chemiluminescence, and IR spectroscopy [18]. The
latter includes Fourier spectroscopy, optoacoustic spectroscopy, and laser correlation spectroscopy, one
of the most sensitive methods to study the composition of polydisperse liquids.
    Until now, laser correlation spectroscopy has been used to study the molecular composition and
functional characteristics of various biological fluids (blood plasma, oropharyngeal washes, etc.) [19].
However, to assess the state of the human pulmonological system, it would be advantageous to study
the moisture condensate of the air exhaled by a person using laser correlation spectroscopy.
    In earlier works, attempts were made to study the state of the pulmonological system based on the
analysis of exhaled breath condensate subfractions. However, the conclusions were based on a rather
small number of indicators (from 3 to 6), which led to about 15% of errors of the second kind nd a
classification precision of about 85% [20, 21].

3. Research methods and models
    Condensation of exhaled air moisture is formed by condensation of water vapor from the
environment and aerosol impurities, which appear when the liquid is adjacent to the epithelial layer of
the respiratory tract. The primary source of this aerosol impurity is the distal part of the respiratory
system. Therefore, changes in the condensate components reflect changes in the level of small bronchi
and alveoli.
    The general principles of the respiratory system condition analysis of the inspected person consist
of the following. Previously, the study of a representative sample of persons without comorbidities
(conditional norm) establishes the most common nature of the relationship of different sizes of
biological ingredients, which is normological. All other different characteristics of the relationship of
ingredients reflect a variety of changes.
    Biophysical indicators are the percentage contributions of microparticles entering the moisture
condensate of the exhaled air. Their redistribution means the presence or absence of various
macromolecular fractions, which corresponds to different states of the bronchopulmonary system,
including the presence of pathologies.
    To obtain numerical values of biophysical parameters, microparticles of exhaled air were examined
by laser correlation spectroscopy. The technique of laser correlation spectroscopy is based on the
measurement of the spectral characteristics of monochromatic coherent radiation due to the scattering
of light as it passes through a dispersed nanoparticle system suspended in a liquid.
    By measuring the fluctuation spectrum of the photocurrent and determining its half-width, it is easy
to obtain the particle size in the system under study. As a rule, the investigated samples are polydisperse,
i.e. at the same time in a solution there are particles of different sizes. However, the studied particle
sizes (especially biological fluids) are rarely monodisperse. Suppose the spectrum of light scattered by
monodisperse particles is a Lorentz curve (I), then for a polydisperse system. In that case, the spectrum
is a sum, and for continuous distributions – an integral of Lorentzians with different half-widths G. In
this case, the spectrum has the form:
                                                     𝐴(𝐴̃)𝐴̃                                           (1)
                                          𝐼(𝑊) = ∫           𝑑𝐴̃,
                                                    𝐴̃2 +𝜔

where А(Ã) – particle distribution function by diffusion coefficients and, consequently, by size.
Determining the size distribution of particles is thus the solution of this integral equation with the
Lorentz nucleus.
   A self-developed device is used to collect condensate of exhaled air moisture, which allows
obtaining the same volumes of condensate in a short time from each of the subjects in the amount
required for the laser correlation spectroscopy method [22].
    The moisture condensate of the exhaled air, like other biological fluids, is rarely monodisperse.
Usually, polydisperse samples are examined, i.e., particles of different sizes are simultaneously in
solution. The size distribution of condensate moisture in exhaled air particles is a histogram defined on
a grid of 32 points. The high accuracy of the method allows showing the concentration of particles with
a radius of 1 to 10000 nm.
    The obtained biophysical parameters formed the basis for diagnosing the condition of the
pulmonological system of the subjects. Diagnosis is performed by classifying the conditions of the
subjects, using three classes: “norm,” “asthma,” and “bronchitis.”
    Traditionally, much biological and medical research is based on Naive Bayes Classifier [23]. It
allows predicting posterior probabilities of specific class membership [24, 25].
    No less common is the use of logistic regression in diagnostics [26, 27]. “A major advantage of
logistic regression compared to other similar approaches like probit regression—and therefore, a reason
for its popularity among medical researchers – is that the exponentiated logistic regression slope
coefficient can be conveniently interpreted as an odds ratio. The odds ratio indicates how much the odds
of a particular outcome change for a 1-unit increase in the independent variable” [28]. In [29], it is said
that the need for preliminary data analysis and it is shown that the excessive complexity of the dataset
leads to low classification accuracy.
   Using a classifier based on a random forest is successful, as it demonstrates its effectiveness when
processing data with a large number of signs [30]. In addition, in the future, random forest methods can
make it possible to assess the significance of individual characteristics and reduce their number while
ensuring the required classification accuracy [31]. As shown in [32], the use of random forest will
improve the explainability of the results for users who are not experts in the studied subject area.

4. Experiment and results
    In studies of asthma and bronchitis diseases using laser correlation spectroscopy, 32 biophysical
parameters can be monitored. However, the approach used makes it possible to analyze the situation
for each indicator separately, which can lead to a distortion of the diagnosis due to ignoring the influence
of other indicators. Accordingly, there is a need to automate the classification procedure to ensure the
possibility of simultaneous accounting of all available indicators.
    The experiment performed consisted of two stages. In the first step, we visualized the available data
to determine how different the classes under study differ. In the second step, we classified the available
data using three algorithms to assess the best classification results.

4.1.    Preliminary analysis of pulmonary data
    Under the study’s objectives, the entire contingent of the surveyed was divided into three non-
overlapping groups: children aged 6 to 10 years whose parents agreed to the processing of pulmonary
data. The first group included 275 children who were not burdened with known diseases based on a
pediatric examination. The second group consisted of 115 children who had bronchial asthma. The third
group included 131 children with bronchitis.
    The quality of the samples obtained was assessed by two indicators: representativeness and
reliability. A serial sampling methodology was followed to ensure representativeness, which allows
reducing the determinism in sampling and approaching random sampling. At the same time, objectively
existing groups of healthy children and children with asthma and bronchitis are randomly selected. All
children were examined within the groups. To ensure reliability:
         ensured that all elements of the population were included in the sampling frame;
         checks were carried out for the absence of duplication of the surveyed;
         guaranteed exclusion from the sampling frame of the surveyed groups that did not meet the
    requirements (children with unconfirmed asthma and bronchitis or unconfirmed healthy conditions);
         created and centrally stored electronic lists with unambiguous identification of the surveyed
    data.
    Let us investigate the ability of biophysical traits to separate classes of diseases. To visually
represent the values of 32 biophysical traits for different classes of diseases, we will reduce the
dimension of the initial data while minimizing the loss of information. For this, we use the principal
component analysis (PCA) method. Figure 1 shows the results of class division using two components
based on a representative sample of initial data. The values of biophysical parameters ideally identify
the class “norm” for the classes “asthma” and “bronchitis,” there is some overlap. It means that the
values of biophysical parameters can be used to diagnose asthma and bronchitis.
Figure 1: Principal component distribution of classes data

4.2.    Pulmonary data classification
   Cross-validation is used to evaluate the model’s training. The original data is divided into subsets
Train, Valid, and Test (Figure 2).


Figure 2: Distribution of data in constructing classifiers

   Table 1 shows the characteristics of the constructed classifiers: values of accuracy, completeness,
and F1-measure as a harmonic mean between them.

Table 1
Primary characteristics of classifiers
        Classifier                 Precision                 Recall                F1-measure
      Naive Bayes                    0.783                   0.761                    0.772
   Logistic Regression               0.870                   0.823                    0.846
     Random Forest                   0.859                   0.844                    0.851
   As expected, the Naive Bayes Classifier performed the worst because of a natural interdependence
between the values of various biophysical signs. However, these dependencies have not yet been
formalized.
   In order to reduce errors of classifiers, two approaches were used:
        putting forward more stringent requirements for experimental conditions;
        balancing the studied classes.
   Since the experimental study of the moisture condensation of exhaled air was carried out in various
medical institutions and using different (albeit of the same type) equipment, it was decided to use only
those data that were obtained on equipment with completely identical characteristics. In addition, for
the “asthma” and “bronchitis” classes, patients who at the time of the examination had just been
admitted to the hospital and had not yet begun treatment were considered.
   To take into account all these factors, a database was developed, the diagram of which is shown in
Figure 3.


Figure 3: The patient survey database schema

    The database includes reference tables. For example, the table «surveymoment» contains
characteristics of the survey time in relation to the patient’s state: upon admission to the hospital,
immediately after treatment, after two weeks for adaptation after the treatment end, etc. The
characteristics of the survey conditions include temperature, relative humidity, time of patient
adaptation before the examination, etc. The values of the characteristics of the survey conditions have
different measurement units and can have the specified limits of the value range.
    “Sifting” the initial experimental data made it possible to improve the quality of the classification;
the obtained modified characteristics of the classifiers are shown in Table 2.

Table 2
Modified characteristics of classifiers
        Classifier                 Precision                  Recall                  F1-measure
       Naive Bayes                   0.814                    0.796                      0.805
   Logistic Regression               0.933                    0.918                      0.925
     Random Forest                   0.874                    0.862                      0.868

   The studied classes are not balanced, affecting the quality of the solution to the classification
problem. There are several approaches to overcome the imbalance problem, e.g., undersampling
(reduction of large classes) and oversampling (duplication of objects of small classes) to equalize the
proportions of the studied classes.
    The use of the undersampling approach leads to a decrease in the number of samples for the “norm”
class by more than 2 times, and for the class “bronchitis” – approximately 1.2 times to match the
dimension of the minor class “asthma.” The oversampling approach can cause an overfitting problem
if duplicates of minor class objects fall into the Train, Valid, and Test sets. Therefore, the Synthetic
Minority Oversampling Technique (SMOTE) is an effective method, which interpolates existing
objects of the minor class and expands their size at the expense of the resulting objects. The results of
applying the undersampling and oversampling approaches are shown in Figure 4.


Figure 4: Classification precision values for classifiers of different architectures

    The precision on all classifiers increased; the most significant increase in precision was obtained
after applying the SMOTE method.
    The received results were validated with the multi-class cross-entropy loss calculated as:
                                                𝑐
                                                                                                  (2)
                                𝐿(𝑋𝑖 , 𝑌𝑖 ) = − ∑ 𝑦𝑖𝑗 ∗ 𝑙𝑜𝑔(𝑝𝑖𝑗 ),
                                               𝑗=1
where Yi is a one-hot encoded vector (yi1, yi2, …, yic),
         1 if 𝑖th element is in class 𝑗
𝑦𝑖𝑗 = {                                  ,
         0           otherwise
pij is the probability that ith element is in class j.
     The validation confirmed the results described above.

5. Discussion
   Making a diagnosis is extremely important and risky since the patient’s health, and sometimes life
depends on its solution’s correctness. Therefore, the informational support of the doctor in the process
of making a diagnosis is valuable.
   Improvement of pulmonological diagnostics means and the transition to laser correlation
spectroscopy in studies of asthma and bronchitis diseases increased the number of indicators available
for analysis by 32 biophysical indicators.
   To improve the classification accuracy, the following procedures were carried out:
       selection of the classifier providing the best values of Precision, Recall and F1-measure;
       “sifting” the obtained experimental data;
       balancing classes.
    When choosing a classifier, the best results were shown by logistic regression, which provided
precision=0.870, recall=0.823, and F1-measure=0.846.
    When “sifting” the data, the results of the following surveys were excluded:
        obtained on equipment with characteristics that differ from standard ones (laser power,
    centrifuge speed, etc.);
        obtained under incomplete observance of the experimental conditions (temperature, relative
    humidity of the room, etc.);
        obtained during the examination of patients who have already started treatment or have
    undergone a course of treatment (for class “asthma”).
    When balancing classes, the best results were shown by oversampling approach with applying the
SMOTE method. This made it possible to improve the classification precision up to 96%.
    Let us analyze possible errors in the operation of the obtained classifiers. Regardless of the chosen
method (Naive Bayes, Logistic Regression, or Random Forest), errors of the first and second kind are
present during the classification.
    Errors of the first kind correspond to situations in which the examination showed the presence of a
disease, although the person is healthy (regarding diseases such as asthma and bronchitis). Since the
proposed classifiers do not replace the doctor’s decision but provide him with additional helpful
information to help make a diagnosis, additional examinations and analyses can reveal errors of the first
kind (“false alarm”).
    Errors of the second kind correspond to “missing the target” and are traditionally much more severe.
In this case, the sick patient is diagnosed as healthy, and with the screening form of examination, he
may not be provided with primary treatment.
    The estimation of errors of the second kind when using the classifier based on logistic regression
showed 7% errors before the procedures to improve the accuracy and 4% after these procedures, which
is a good result.

6. Conclusion
   The article is the first to apply the method of laser correlation spectroscopy, which allows studying
the moisture condensate of the exhaled air by the values of 32 biophysical indicators. Based on the
values of these indicators, the diagnosis of diseases for asthma and bronchitis in children aged 6 to 10
years was performed.
   The traditional approach to analyzing indicators is based on determining the indicators with the most
excellent separating ability and comparing the patient’s data for these indicators with the “reference”
ones. However, this approach has many disadvantages. Firstly, with the traditional approach, the
comparison is carried out for 3-6 separate indicators. Secondly, whether the separating ability of
individual indicators depends on the age of the patients and the region in which they live has not been
investigated today. Therefore, most of the indicators remain unused in the diagnostic process. At the
same time, it is known that the quality of a solution is determined mainly by the amount of information
used to obtain it.
   The article examined the methods of Naive Bayes, Logistic Regression, and Random Forest as
applied to the diagnosis of diseases for asthma and bronchitis. Studies have shown that the classifier
based on logistic regression showed the highest precision in classifying – 96%, despite previous studies
giving up to 85%. At the same time, the number of errors of the second kind was reduced to 4%, i.e.,
they are reduced by more than three times. The use of classification methods for solving pulmonological
diagnostics made it possible to consider all the available data obtained by laser correlation spectroscopy.
   Studies have shown that using classification methods for biophysical indicators for diagnosing
pulmonary diseases allows obtaining results with high accuracy, which improves the quality of
diagnosis. Automation of the classification process provides fast, objective results.
   Data visualization showed that the classes of diseases do not have a pronounced boundary. Still,
they are clearly separated from the normal state. Therefore, a further direction of this work is to study
the applicability of fuzzy classification methods for solving pulmonological diagnostics.
7. References
[1] V. Majhi, S. Paul, R. Jain, Bioinformatics for Healthcare Applications, in: Proceedings of the 2019
    Amity International Conference on Artificial Intelligence (AICAI), IEEE 2019, pp. 204–207.
    doi:10.1109/AICAI.2019.8701277.
[2] N. M. Luscombe, D. Greenbaum, M. Gerstein, What is bioinformatics? An introduction and
     overview, Yearbook of Medical Informatics 10.01 (2001) 83–100. doi:10.1055/s-0038-1638103.
[3] J. Xiong, Essential Bioinformatics, Сambridge University Press, 2006.
[4] K. Gobalan, J. Ahamed. Applications of Bioinformatics in Genomics and Proteomics, Journal of
     Advanced Applied Scientific Research 1.3 (2016) 29–42.
[5] D. F. Gomez-Casati, M. V. Busi, J. Barchiesi, D. A. Peralta, N. Hedin, V. Bhadauria, Applications
     of Bioinformatics to Plant Biotechnology, Current Issues in Molecular Biology 27 (2018) 89–104.
     doi:10.21775/cimb.027.089.
[6] S. K. Gill, A. F. Christopher, V. Gupta, P. Bansal, Emerging role of bioinformatics tools and
     software in evolution of clinical research, Perspectives in Clinical Research 7.3 (2016) 115–122.
     doi:10.4103/2229-3485.184782.
[7] M. Younus Wani, NA. Ganie, S. Rani, S. Mehraj, M. R. Mir, M. F. Baqual, K. A. Sahaf, F. A.
     Malik, K. A. Dar, Advances and applications of Bioinformatics in various fields of life,
     International Journal of Fauna and Biological Studies 5.2 (2018) 03–10.
[8] F. Wang, X. Li, J. T. L. Wang and S. Ng, Guest Editorial: Special Section on Biological Data
     Mining and Its Applications in Healthcare, IEEE/ACM Transactions on Computational Biology
     and Bioinformatics 14.3 (2017) 501–502. doi:10.1109/TCBB.2016.2612558.
[9] H. Alinejad-Rokny, E. Sadroddiny, V. Scaria, Machine learning and data mining techniques for
     medical complex data analysis, Neurocomputing 276 (2018). doi:10.1016/j.neucom.2017.09.027.
[10] Y. Lin, F. Qian, L. Shen, F. Chen, J. Chen, B. Shen, Computer-aided biomarker discovery for
     precision medicine: data resources, models and applications, Briefings in Bioinformatics 5.3
     (2019) 952–975. doi:10.1093/bib/bbx158.
[11] J. Finkelstein, I. C. Jeong, Machine learning approaches to personalize early prediction of asthma
     exacerbations, in: Annals of the New York Academy of Sciences 1387.1 (2017) 153–165.
     doi:10.1111/nyas.13218.
[12] H. S. Porieva, K. O. Ivanko, C. I. Semkiv, V. I. Vaityshyn, Investigation of Lung Sounds Features
     for Detection of Bronchitis and COPD Using Machine Learning Methods, Visnyk NTUU KPI
     Seriia-Radiotekhnika Radioaparatobuduvannia 84 (2021) 78–87.
[13] E. C. Oelsner, L. R. Loehr, A. G. Henderson, K. M. Donohue, P. L. Enright, R. Kalhan, C. M. Lo
     Cascio, A. Ries, N. Shah, B. M. Smith, W. D. Rosamond, Classifying Chronic Lower Respiratory
     Disease Events in Epidemiologic Cohort Studies, Annals of the American Thoracic Society 13.7
     (2016) 1057–1066. doi:10.1513/AnnalsATS.201601-063OC.
[14] A. Kantar, Phenotypic presentation of chronic cough in children, Journal of Thoracic Disease 9.4
     (2017) 907–913. doi:10.21037/jtd.2017.03.53.
[15] M. Pokorski, Chest Radiography in Children Hospitalized with Bronchiolitis, Pulmonology 1222
     (2019) 55–62. doi:10.1007/5584_2019_435.
[16] M. Shafiq, H. Lee, L. Yarmus, D. Feller-Kopman, Recent Advances in Interventional
     Pulmonology, Annals of the American Thoracic Society 16.7 (2019) 786–796.
     doi:10.1513/AnnalsATS.201901-044CME.
[17] M. Heching, D. Rosengarten, D. Shitenberg, O. Shtraichman, N. Abdel-Rahman, A. Unterman, M.
     R. Kramer, Bronchoscopy for Chronic Unexplained Cough Use of Biopsies and Cultures Increase
     Diagnostic Yield, Journal of Bronchology & Interventional Pulmonology 27.1 (2020) 30–35.
     doi:10.1097/LBR.0000000000000629.
[18] P. Shende, J. Vaidya, Y.A. Kulkarni, R.S. Gaud, Systematic approaches for biodiagnostics using
     exhaled      air,      Journal    of     Controlled     Release       268     (2017)     282–295.
     doi:10.1016/j.jconrel.2017.10.035.
[19] E. Nepomnyashchaya, E. Velichko, E. Aksenov, T. Bogomaz, Laser Correlation Spectroscopy as
     a Powerful Tool to Study Immune Responses, in: Proceedings Volume 10685, Biophotonics:
     Photonic Solutions for Better Health Care VI, SPIE Photonics Europe, 2018, Strasbourg, France,
     2018. doi:10.1117/12.2307241.
[20]     N. Komlevaya, A. Komlevoy, K. Chernega, Designing of the specialized computer system for
    making pulmonology diagnosis, in: Proceedings of the 8th International Conference of
    Programming UkrPROG’2014, Kyiv Ukraine, 2014, pp. 253–263.
[21] N. O. Komleva, K. S. Cherneha, B. I. Tymchenko, O. M. Komlevoy, Intellectual Approach
     Application for Pulmonary Diagnosis, IEEE First International Conference «Data Stream Mining
     & Processing», Lviv Ukraine, 2016, pp. 48–52.
[22] О. М. Komlevoi, Yu. I. Bazhora, inventors; Odessa State Medical University, patent owner.
     Device for collecting moisture condensate from exhaled air. Ukraine patent UA № 47117. Filed
     November 6th., 2009, Issued January 1st., 2010.
[23]     N. Boyko, K. Boksho, Application of the Naive Bayesian Classifier in Work on Sentimental
    Analysis of Medical Data, in: Proceedings of the 3rd International Conference on Informatics &
    Data-Driven Medicine, Växjö Sweden, 2020, pp. 230–239.
[24] S. R. B. Shree, H. S. Sheshadri, Diagnosis of Alzheimer’s disease using Naive Bayesian Classifier,
     Neural Computing & Applications 29.1 (2018) 123–132. doi:10.1007/s00521-016-2416-3.
[25] B. Bratic, V. Kurbalija, M. Ivanovic, I. Oder, Z. Bosnic, Machine Learning for Predicting
     Cognitive Diseases: Methods, Data Sources and Risk Factors, Journal of Medical Systems 42.12
     (2018). doi:10.1007/s10916-018-1071-x.
[26] E. Y. Boateng, D. A. Abaye, A Review of the Logistic Regression Model with Emphasis on
     Medical Research, Journal of Data Analysis and Information Processing 7 (2019) 190–207.
     doi:10.4236/jdaip.2019.74012.
[27] N. Esener, A. M. Guerra, K. Giebel, D. Lea, M. J. Green, A. J. Bradley, T. Dottorini, Mass
     spectrometry and machine learning for the accurate diagnosis of benzylpenicillin and multidrug
     resistance of Staphylococcus aureus in bovine mastitis, PLOS Computational Biology 17.6 (2021).
     doi:10.1371/journal.pcbi.1009108.
[28] P. Schober P, T. R. Vetter, Logistic Regression in Medical Research, Anesthesia & Analgesia
     132.2 (2021) 365–366. doi:10.1213/ANE.0000000000005247.
[29] S. Kaur, A. Abdullah, N. N. Hairi, S. K. Sivanesan, Logistic Regression Modeling to Predict
     Sarcopenia Frailty among Aging Adults, International Journal of Advanced Computer Science and
     Applications 12.8 (2021) 497–504.
[30] Z. Khan, A. Gul, A. Perperoglou, M. Miftahuddin, O. Mahmoud, W. Adler, B. Lausen, Ensemble
     of optimal trees, random forest and random projection ensemble classification, Advances in Data
     Analysis and Classification 14.1 (2020) 97–116. doi:10.1007/s11634-019-00364-9.
[31] Y. Tian, J. Yang, M. Lan, T. Zou, Construction and analysis of a joint diagnosis model of random
     forest and artificial neural network for heart failure, AGING-US 12.24 (2020) 26221–26235.
     doi:10.18632/aging.202405.
[32]     D. Petkovic, R. Altman, M. Wong, A. Vigil, Improving the explainability of Random Forest
    classifier – user centered approach, Pacific Symposium on Biocomputing 23 (2018) 204–215.