<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">0933-3657</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.26583/sv.9.4.08</article-id>
      <title-group>
        <article-title>Analysis of Biomedical Data in Order to Construct Intelligent Analytical Models for Assessing the Risk of Cancer</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dmitry Lagerev</string-name>
          <email>lagerevdg@mail.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anton Korsakov</string-name>
          <email>korsakov_anton@mail.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alena Zakharova</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bryansk State Technical University</institution>
          ,
          <addr-line>50 let Oktyabrya Ave., 7, 241035 Bryansk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Control Science of Russian Academy of Sciences</institution>
          ,
          <addr-line>65, Profsoyuznaya st., Moscow, 117997</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>104</volume>
      <fpage>78</fpage>
      <lpage>88</lpage>
      <abstract>
        <p>This article substantiates the need to use data from an integrated electronic medical record of a patient to assess the risk of cancer. An exploratory analysis of the data of the integrated electronic medical record of patients in the Bryansk region who received a diagnosis of "malignant neoplasm" is being carried out. The influence of the patient's age on the risk of oncological diseases is evaluated by the example of the nosologies C50, C61. Provides an overview of the capabilities of the Auto ML Libraries and their limitations. The article describes the result of constructing models for assessing the risk of oncological diseases based on the ML.NET and Auto-WEKA libraries. It is concluded that it is impossible to constructing models for assessing the risk of oncological diseases based on the data of an integrated electronic medical record using Auto ML libraries without preliminary preparation and preprocessing of data. And since it is required to constructing separate models for each nosology and regular retraining of these models, it is advisable to develop an add-on over the Auto ML libraries that will extract and convert the data of the integrated electronic medical record into a form suitable for analysis. In addition, to improve the quality of the model, it is advisable to use patient history data, data obtained after vectorization of laboratory tests, aggregated data on visits to specialized specialists and related diagnoses, data from online patient questionnaires filled out during the course of medical examination, as well as data on environmental pollution.</p>
      </abstract>
      <kwd-group>
        <kwd>Keywords1</kwd>
        <kwd>Data mining</kwd>
        <kwd>constructing data mining models</kwd>
        <kwd>exploratory analysis</kwd>
        <kwd>auto ML</kwd>
        <kwd>biomedical data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Currently, in various fields of activity, data mining models (DM) are more and more actively used.
There is growing interest in the use of data mining models in medicine and in healthcare management
problems [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. A prerequisite for this was the process of informatization of healthcare in the regions,
which was actively started in the Russian Federation in 2012 and, in general, completed by 2018 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Nowadays, in all constituent entities of the Russian Federation, medical information systems are
functioning, which ensure the automation of the functioning of medical organizations and the primary
input of data. There are also regional information systems that ensure the integration of regional medical
information systems into a single whole and the consolidation of data from the entire region in a single
data warehouse. At the federal level, regional information systems are united by the Unified State
Information System in the field of health care, which currently supports 13 services. One of the major
services is the Federal Integrated Electronic Medical Record, which is a subsystem of the Unified
System designed to collect, systematize and process structured impersonal information about persons
who receive medical care.
      </p>
      <p>2021 Copyright for this paper by its authors.</p>
      <p>To date, the regional information systems "Electronic medicine" of the Bryansk region have already
accumulated a volume of biomedical data sufficient to start the process of constructing DM models,
which, in turn, with an appropriate level of accuracy, can be used in the development of management
decisions in the field of management health care at the regional level and the level of individual medical
organizations. However, so far the DM models in the Russian Federation are not used in the process of
health care management at the regional level, which is explained by the following reasons:
 The complexity of constructing DM models.
 The high cost of data scientists and the small number of such specialists in the regions.
 Failure to provide the level of accuracy required for quality control of all models.
 The need to construct an extremely large number of DM models for comprehensive automation
of healthcare management, which is due to the number of different nosologies and the impossibility
of constructing a single model for all nosologies. The International Classification of Diseases and
Related Health Problems of the Tenth Revision (ICD-10) contains a description of about 14,400
different nosologies. More than a thousand of them are related to malignant and benign neoplasms.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Relevance of using DDM methods for assessing the degree of risk of cancer in patients</title>
      <p>
        According to the latest estimates of the International Agency for Research on Cancer (IARC) at the
World Health Organization (WHO) GLOBOCAN 2020 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], the incidence of malignant neoplasms
(MN) in the world has increased to 19.3 million new cases and 10.0 million deaths from them in 2020
... One in five people in the world will develop cancer in their lifetime, and one in eight men and one in
11 women will die from the disease. The 10 most common types of cancer account for more than 60%
of newly diagnosed cases and more than 70% of deaths. Cancer of the breast is the most common cancer
worldwide (11.7% of the total number of new cases), followed by lung cancer (11.4%), colorectal
cancer (10.0%), prostate cancer (7.3 %) and stomach cancer (5.6%) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Lung cancer is the leading
cause of cancer death (18.0% of total cancer deaths), followed by colorectal cancer (9.4%), liver cancer
(8.3%), stomach cancer (7.7%) and breast cancer in women (6.9%).
      </p>
      <p>
        According to forecasts [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], about 28.4 million new cancer cases will be registered worldwide in
2040, which is 47% more than in 2020. In addition, countries with economies in transition and countries
classified as countries with a low or medium human development index will have the largest relative
increases in cancer incidence by 2040 (95% and 64% compared to 2020, respectively).
      </p>
      <p>
        Some scientists associate an increase in the incidence of cancer, first of all, with an improvement in
diagnostics, as, for example, it happened in the United States in the 1990s. in the case of the introduction
of the prostate-specific antigen (PSA) test in prostate cancer [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Some note that not all countries are
covered by high-quality cancer registries, they are only about 15% [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which increases the relevance
of using DM models to identify risk groups. According to the research institute named after P.A. Herzen
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the prevalence of cancer in Russia in 2019 was 2675 per 100,000 population, which is 41% higher
than the level of 2009 (1897 per 100,000 population). The growth of this indicator is due to both an
increase in the incidence and detection rate and an increase in the survival rate of cancer patients.
      </p>
      <p>
        One of the main indicators that determine the prognosis of an oncological disease is the degree of
prevalence of the tumor process at the time of detection. So, in Russia in 2019, 57.5% of malignant
neoplasms were diagnosed in stages I and II of the disease (32.3% in stages I and 25.2% in stages II)
and 42.5% in stages III and IV (17.6 % - in stage III and 24.9% - in stage IV) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], while in 2009 more
than 50% of newly diagnosed cancers were diagnosed in stages III-IV [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>In connection with the above, the effectiveness of the functioning of the health care system in terms
of detecting malignant neoplasms at an early stage (I-II) is extremely important in order to form risk
groups in the process of population screening and preserve the health and quality of life of the
population. An important factor hindering the timely detection of oncological diseases is the lack of
clear criteria that allow doctors to identify a risk group during the period of clinical examination and
conduct a set of additional examinations for patients belonging to this group. It seems promising to use
a complex of data mining models for assessing the degree of risk in each patient during the period of
clinical examination and subsequent informing the doctor about the degree of risk of patients
undergoing medical examination, which will, on the one hand, increase the detectability, and, on the
other hand, will not overload the system of functional and laboratory diagnostics.</p>
      <p>
        Among the circumstances of the risk of malignant neoplasms, there are many exogenous and
endogenous factors, which are practically impossible for a doctor to take into account. According to the
literature [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ], IARC and WHO [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ], among the main risk factors for cancer, the use of tobacco,
alcohol, unhealthy diet, lack of physical activity, overweight, hereditary predisposition, chemical
(polycyclic aromatic hydrocarbons, dioxins, pesticides, aflatoxins, arsenic, formaldehyde, nickel,
asbestos, cadmium, and many others), physical (ionizing and ultraviolet radiation) and biological
(infections caused by viruses, bacteria or parasites) environmental carcinogens. It should also be noted
that the upward trend in the incidence of cancer in the world may reflect some general trends in an
increase in the genetic load in human populations due to an increase in chemical and radiation
contamination of the biosphere by “global” and “eternal” pollutants [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Moreover, for different types
of oncology, the significance of the factors will be different, which makes it expedient to construct a
separate DM model for each type of neoplasm.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Exploratory analysis results</title>
      <p>
        The use of technologies for the analysis of biomedical data opens up new opportunities in the
development of medical information systems and improving the quality of healthcare management [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
For the study, we used impersonal biomedical data contained in the regional information system
"Electronic medicine" of the Bryansk region. The data covers the period from 2009 to March 2021 and
contains fragments of integrated electronic medical records of patients.
      </p>
      <p>The data volume is about 100 Gigabytes. The database contains 304 tables with a total of 2062 fields,
which, in the absence of documentation, makes reverse engineering and data extraction for analysis in
an integral form a very difficult and non-trivial task. 53,182,278 records of doctors' appointments and
hospitalizations of patients were extracted from the database, from which 1,773,148 records were
selected for exploratory analysis containing diagnoses with codes of the international classification of
diseases related to oncological diseases (ICD C00-C97).</p>
      <p>
        To perform exploratory data analysis, a set of dashboards was developed on the Microsoft Power BI
platform (Fig. 1-11) using various visualization metaphors [
        <xref ref-type="bibr" rid="ref15">15, 16</xref>
        ]. For the analysis, we used only the
cases when the patient was diagnosed for the first time, without taking into account repeated visits for
the same reason. There were 195,634 such records. Figure 1 shows a matrix in which the rows contain
nosologies, and the columns are the age at which the patient was diagnosed with cancer. At the
intersection of a row and a column, the number of patients of this age and with such a diagnosis is given.
The closer the background color is to red, the closer the quantity is to the maximum, the closer the
background color is to blue, the closer the quantity is to the minimum. Fig. 2 shows a map of the
distribution of the number of patients in the context of the stages at which neoplasms were diagnosed,
depending on the district of the Bryansk region. The given data are incomplete, since not for all patients
it was possible to correctly determine the locality of residence, encoded in the database in the FIAS
format (KLADR). This issue requires further study. Fig. 3 shows a graph showing the number of
patients (195,634) who were diagnosed with oncology (Y-axis) depending on the age (X-axis) at which
this diagnosis was made, for all types of oncology (ICD C00-C97).
      </p>
      <p>The X axis represents the quantized (10 years) age of the patient (Fig. 4) at which the diagnosis was
made, the color indicates the stage at which the oncology was detected. It was found that for ICD C56
Malignant neoplasm of the ovary, almost half of the cases are detected only at the third stage, which is
quite late, since at stages 3 and 4 the prognosis during treatment is much worse than at stages 1 or 2.</p>
      <p>In the process of exploratory data analysis, an unusual dependence was found between the number
of diagnosed oncologies on the age of the patient and the year of diagnosis. Fig. 5 shows the dependence
of the number of diagnosed oncology cases on the patient's age for all years for ICD "C50 Malignant
neoplasm of breast tissue" (25,195 patients) and C61 Malignant neoplasm of the prostate gland (indicate
the number of patients 6,884). As mentioned above, these are some of the most common diagnoses.
When we are analyzing Fig. 5, it can be seen that the age distribution is quite close to normal for both
nosologies with a small additional peak on the right side.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Using AutoML for the Analysis of Biomedical Data</title>
      <p>The use of automated machine learning (AutoML) methods for the analysis (both exploratory and
conventional) of biomedical data is gaining increasing popularity [17, 18], which allow in an automated
mode to construct data mining models suitable for both exploratory analysis and decision making.</p>
      <p>In article [17], the authors showed that AutoML methods can be used to analyze biomedical data
and improve the productivity of specialists in solving a number of machine learning problems. It is also
noted that the main limitation of AutoML at the moment is ineffective work on large datasets (dataset).</p>
      <p>Fig. 13 shows a typical workflow for using AutoML to construct drill-down mining models. The
scheme is based on the Cross-Industry Standard Process for Data Mining (CDISP-DM) methodology
[19]. We reviewed the libraries of the most popular libraries that implement AutoML methods. We
studied the capabilities of cloud services Google AutoML, Azure AutoML, Amazon Web Services
AutoML and libraries Auto Keras, Auto-PyTorch, Auto-sklearn, Scikit-learn, Auto-WEKA, H2O
AutoML, TPOT, MLBox and ML.NET. All of the above AutoML tools allow you to construct an DM
model on a ready dataset in both automatic and automated modes. Many can perform selection of the
optimal method from a predetermined set, selection of the optimal parameters of the selected method,
limited data transformation (normalization, quantization, etc.) and optimization of the input data set
(remove insignificant fields). However, none of the above libraries contains methods and tools that
allow in an automated mode to select data that are optimal for constructing an DM model and perform
their full preprocessing and transformation. The preparation of samples for training requires highly
qualified specialists and a lot of time when processing large databases and data warehouses, which
include sources of biomedical data.
Selection of data mining</p>
      <p>methods
[The quality of the model</p>
      <p>Is not sufficient]</p>
      <p>Model validation
Model deployment
[The quality of
the model
is sufficient]</p>
      <p>Hyperparameret
optimisation</p>
      <p>Shaping a dataset for
exploratory data analysis
Data perparation and</p>
      <p>cleaning
Shaping of the dataset for
analysis</p>
      <p>The use of the ML.NET [20] and Auto-WEKA [21] libraries for constructing a model for assessing
the degree of risk of oncological diseases based on integrated electronic medical records of patients in
the Bryansk region did not allow constructing a model of any value. The preprocessing methods
available in these libraries are intended for normalized data, which did not allow using all the
information available in the database to the fullest. Most of the AutoML libraries are focused on
supporting neural network models, a significant drawback of which is the inability to explain and
substantiate the results obtained. To assess the risk of oncological diseases, it is advisable to use scoring
models based on logistic regression, which allow an expert to assess the contribution of each of the
factors to the results obtained and more reasonably prescribe additional laboratory tests to the patient.
Since AutoML methods applied directly to the available dataset did not lead to significant results and
are unsuitable for replicating DM models, we have developed a technique for replicating DM models
based on AutoML methods. Fig. 14 shows the process of replicating the DM models for a number of
nosologies based on the reference model. Before starting the process, Data Scientist constructs a
reference model for any nosology according to the scheme shown Fig. 11. After that, he needs to set
the parameters for replicating models: a list of ICDs for which it is necessary to construct models,
acceptable data mining methods for constructing models, a set data, a list of data preprocessing methods,
model optimality criteria and accuracy requirements. The Meta ML library generates parameters for
constructing a basic model for the i-th nosology based on the parameters of the reference model.</p>
      <p>After that, using the Auto ML library, the process of constructing a basic DM model is carried out.
Next, Meta ML generates K sets of parameters for constructing a set of DM models and prepares data
sets for constructing these models. Model constructing is done using Auto ML. All built models are
saved to the repository. When the models are built for all combinations of parameters and data sets, the
optimal model for the i-th nosology is selected from the repository and its parameters are compared
with the parameters of the base model. The best model is published. The generation of models is
repeated for N nosologies. As a result, based on the reference model, models will be built for all the
indicated nosologies.
5. Conclusions
1. To effectively flight socially significant diseases, a personalized preventive approach to risk
assessment, early diagnosis of malignant neoplasms in the screening process and targeted correction
of negative changes at the level of the individual, cohort, and population are required.
2. It seems promising to use data mining models to increase the likelihood of detecting malignant
neoplasms in the early stages during medical examination. To use biomedical data in the task of
assessing the risk of malignant neoplasms oncology requires significant preprocessing and data
preparation, which cannot be performed using the existing Auto ML libraries.
3. The use of various visual metaphors in the process of exploratory analysis of biomedical data
allows you to better understand the specifics of the data and discover various patterns, dependencies
and phenomena that must be taken into account when constructing data mining models.
4. As shown by exploratory analysis, risk assessment with an acceptable level of accuracy requires
the construction of a separate predictive model for each nosology, and regular (at least once a year,
or better more often) adjustment (retraining) of these models to correctly take into account the
patient's age.
5. Although the overall distribution of patients by age is close to normal distribution, there is a
shift in the peak incidence over time, which will have a significant impact on the estimated level of
risk. In all likelihood, this process obeys certain patterns, and it is extremely important to find them.
This will allow predicting the peak incidence for a specific year or period and identifying the degree
of risk of malignant neoplasms for different age groups.
6. Since the total number of nosologies is large enough, a single model will have very mediocre
quality indicators, and it is very costly to construct (and keep up to date) a large number of models,
it is advisable to use the proposed method of replicating models and develop a special library for
sampling and preprocessing data from integrated electronic medical records and subsequent launch
of Auto ML methods and tools.</p>
      <p>The process of building multiple models from a reference
Setting the parameters for
constructing models: a list</p>
      <p>of ICD, admissible
methods data mining,
data set, a list of data
processing methods,
criteria for optimality of a
model, requirements for</p>
      <p>accuracy
View the report</p>
      <p>For i=[1; N]</p>
      <p>Generate the parameters
for building a basic model
for the i-th ICD based on
the reference model
Saving the base model</p>
      <p>to repository
For j=[1; K]</p>
      <p>Preparing a dataset for
building the j-th model
Genrate the parameters of
building the model j-th</p>
      <p>Saving the j-model
to the repository
Extract the optimal model</p>
      <p>for the i-th ICD
[Optimal
model
better]
[Basic
model
better]
Publishing the
optimal model</p>
      <p>Publishing the base model
7. When constructing models for assessing the risk of malignant neoplasms in addition to data
from the integrated electronic medical record, it is advisable to use patient history data, data obtained
after vectorization of laboratory tests, aggregated data on visits by specialized specialists and related
diagnoses, data from online patient questionnaires filled in the process of undergoing medical
examination, as well as data on environmental pollution (chemical, physical, biological
contamination) and the work of the population in dangerous manufacture.</p>
    </sec>
    <sec id="sec-5">
      <title>6. Acknowledgements</title>
      <p>The reported study was funded by RFBR, project number 19-07-00844.</p>
    </sec>
    <sec id="sec-6">
      <title>7. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>O.M.</given-names>
            <surname>Gerget</surname>
          </string-name>
          ,
          <article-title>Bionic models for identification of biological systems</article-title>
          ,
          <source>Journal of Physics: Conference Series</source>
          <volume>803</volume>
          (
          <year>2017</year>
          )
          <article-title>012046</article-title>
          . doi:
          <volume>10</volume>
          .1088/
          <fpage>1742</fpage>
          -6596/803/1/012046
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.V.</given-names>
            <surname>Danilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.P.</given-names>
            <surname>Skirnevsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.M.</given-names>
            <surname>Gerget</surname>
          </string-name>
          ,
          <article-title>Segmentation of anatomical structures of the heart based on echocardiography</article-title>
          ,
          <source>Journal of Physics: Conference Series</source>
          <volume>803</volume>
          (
          <year>2017</year>
          )
          <article-title>012031</article-title>
          . doi:
          <volume>10</volume>
          .1088/
          <fpage>1742</fpage>
          -6596/803/1/012031
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[3] Information technologies in healthcare of the Russian Federation</article-title>
          ,
          <source>Zdrav Expert 01.07</source>
          .
          <year>2021</year>
          (in Russian), URL: https://zdrav.expert/index.php/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4] World Health Organization.
          <source>International agency for research on cancer: Press release No. 292</source>
          ,
          <issue>15</issue>
          <year>December 2020</year>
          . URL: https://www.iarc.who.int/wp-content/uploads/2020/12/pr292_E.pdf
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.F.</given-names>
            <surname>Hankey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.J.</given-names>
            <surname>Feuer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.X.</given-names>
            <surname>Clegg</surname>
          </string-name>
          et al.,
          <article-title>Cancer surveillance series: interpreting trends in prostate cancer - Part I: evidence of the effects of screening in recent prostate cancer incidence, mortality, and survival rates</article-title>
          .
          <source>J. Natl. Cancer Inst</source>
          .
          <year>1999</year>
          . vol.
          <volume>91</volume>
          . P.
          <volume>1017</volume>
          ‐
          <fpage>1024</fpage>
          . doi:
          <volume>10</volume>
          .1093/jnci/91.12.1017
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Colombet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Mery</surname>
          </string-name>
          ,
          <source>Cancer Incidence in Five Continents</source>
          , Vol.
          <source>XI (electronic version)</source>
          ,
          <source>Lyon: International Agency for Research on Cancer</source>
          ,
          <year>2018</year>
          . Р.
          <volume>67</volume>
          -
          <fpage>72</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.D.</given-names>
            <surname>Kaprin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.V.</given-names>
            <surname>Starinsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.O. Shahzadova,</surname>
          </string-name>
          <article-title>The state of cancer care for the population of Russia in 2019</article-title>
          , Moscow, Scientific Research Institute of Oncology named after P.A.
          <string-name>
            <surname>Herzen</surname>
          </string-name>
          ,
          <year>2020</year>
          (in Russian). - 239 p. URL: https://glavonco.ru/cancer_register/%D0%
          <article-title>9F%D0%BE%D0% BC%D0%BE%D1%89%D1%8C%202019.pdf</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V.I.</given-names>
            <surname>Chissov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.V.</given-names>
            <surname>Starinsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.V.</given-names>
            <surname>Petrova</surname>
          </string-name>
          ,
          <article-title>Malignant neoplasms in Russia in 2009: morbidity and mortality</article-title>
          , Moscow, Scientific Research Institute of Oncology named after P.A.
          <string-name>
            <surname>Herzen</surname>
          </string-name>
          ,
          <year>2011</year>
          . - 259 p.
          <article-title>(in Russian)</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.J.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.A.</given-names>
            <surname>Colditz</surname>
          </string-name>
          ,
          <article-title>Modifiable risk factors for cancer</article-title>
          ,
          <source>British Journal of Cancer</source>
          .
          <year>2004</year>
          .
          <volume>90</volume>
          (
          <issue>2</issue>
          ),
          <fpage>299</fpage>
          -
          <lpage>303</lpage>
          . doi:
          <volume>10</volume>
          .1038/sj.bjc.6601509
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10] Carcinogenesis: leadership, RAMS, Russian Cancer Research Center, Research Institute of Carcinogenesis, Moscow, Medicine,
          <year>2004</year>
          , 574 p.
          <article-title>(in Russian)</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ferlay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ervik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ervik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Colombet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Mery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Piñeros</surname>
          </string-name>
          , et al.,
          <source>Global Cancer Observatory: Cancer Today. Lyon: International Agency for Research on Cancer</source>
          ,
          <year>2020</year>
          , URL: https://gco.iarc.fr/today
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>[12] WHO cancer information. 3 March</source>
          <year>2021</year>
          , URL: https://www.who.int/news-room/factsheets/detail/cancer
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.V.</given-names>
            <surname>Yablokov</surname>
          </string-name>
          ,
          <article-title>On the concept of population load (review), Hygiene</article-title>
          and sanitation,
          <year>2015</year>
          , No.
          <volume>6</volume>
          , p.
          <fpage>11</fpage>
          -
          <lpage>14</lpage>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.A.</given-names>
            <surname>Zakharova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.G.</given-names>
            <surname>Lagerev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.G.</given-names>
            <surname>Podvesovskii</surname>
          </string-name>
          <article-title>, Multi-level Model for Structuring Heterogeneous Biomedical Data in the Tasks of Socially Significant Diseases Risk Evaluation</article-title>
          , in: A.G. Kravets et al. (Eds.),
          <source>CIT&amp;DS</source>
          <year>2019</year>
          , Communications in
          <source>Computer and Information Science</source>
          , Vol.
          <volume>1084</volume>
          ,
          <string-name>
            <surname>Springer</surname>
            <given-names>Nature Switzerland AG</given-names>
          </string-name>
          <year>2019</year>
          , pp.
          <fpage>461</fpage>
          -
          <lpage>473</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -29750- 3_
          <fpage>36</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.A.</given-names>
            <surname>Zakharova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.G.</given-names>
            <surname>Podvesovskii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.V.</given-names>
            <surname>Shklyar</surname>
          </string-name>
          ,
          <article-title>Visual and Cognitive Interpretation of Heterogeneous Data</article-title>
          ,
          <source>Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLII-2/W12</source>
          (
          <year>2019</year>
          )
          <fpage>243</fpage>
          -
          <lpage>247</lpage>
          . doi:
          <volume>10</volume>
          .5194/isprs-archives-XLII-2
          <string-name>
            <surname>-W12-</surname>
          </string-name>
          243-2019
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>