Initial Data-Driven Model for Estimating
    Impact of Antihypertensive Drug Amount on
               Blood Pressure Lowering?

               Anna Semakova1[0000−0002−5858−5959] and Nadezhda
                       Zvartau1,2[0000−0001−6533−5950]
                  1
                  ITMO University, St. Petersburg 197101, Russia
                            a.a.semakova@gmail.com
     2
       Almazov National Medical Research Centre, St. Petersburg 197341, Russia
                          zvartau@almazovcentre.ru


        Abstract. Due to the increasing popularity of clinical decision support
        systems, the problem of personalized drug dose identification becomes
        more relevant and substantial. In this paper, the authors introduce a
        data-driven model designed to operate in this case. Current work com-
        prises general problem formulation, description of data and its prepro-
        cessing steps, model design overview, its first stage model tuning and
        training, evaluation metrics used to estimate the quality, achieved val-
        ues.

        Keywords: Digital healthcare · Personalized dose identification · Clas-
        sification algorithms · Electronic health records · Decision support sys-
        tems.


1     Introduction

Currently, decision support systems that consider patient characteristics are
gaining more popularity and impact on the process of treatment [1]. Individ-
ual treatment rule (ITR) that assigns an appropriate treatment to the specific
patient based on his/her characteristics is one of decision support systems im-
portant elements [2]. An individual dosage rule (IDR) can be considered as a
part of ITR. It maximizes the expected treatment outcome for each patient by
defining individual drug dosages.
    ITR was studied for various diseases, such as oncology [3] or genome-guided
therapy [4]. In this research, the ITR is constructed for patients with arterial
hypertension. A similar problem was considered in the case of antihypertensive
monotherapy, where the patients get treatment with a single drug class [5].

    Copyright c 2019 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
?
    The reported study was funded by RFBR according to the research project
    #18-37-00441.
2       A. Semakova et al.

   This task of obtaining ITR was being solved in specific areas with such ap-
proaches as the Q-learning [6] and O-learning (Outcome Weighted Learning) [7]
algorithms, and statistical random-effects linear models [8].
   In this paper, the authors consider a supervised learning approach to obtain
ITR for personalized combined drug therapy.


2   Problem statement
Given the vector Xj (ti ) of patient profile j in the moment of time ti before
treatment as:                           n        o
                                           (h)
                              Xj (ti ) = xj (ti )                          (1)
Obtain the set of antihypertensive therapy:
                                       n              o
                                          (k,1) (k,2)
                        Yj (Xj (ti )) = (yj , yj ) ,                             (2)

where j = 1, m; i = 1, n; h = 1, p; k = 1, q and each drug yjk is a vector containing
a drug International Nonproprietary Name (INN) and optimal daily dosage. The
model, which authors propose in current paper, is designed to consist of three
data-driven submodels: the first model receives a vector of patient features as an
input and predicts an optimaln drugs o count (nopt ), the second model extends the
                                        nopt
                                (k,1)
result specifying drug INNs yj        and the third model defines the desired
                              n k=1onopt               n               onopt
                                (k,2)                      (k,1) (k,2)
daily dosages of each drug INN yj        resulting with (yj , yj )           .
                                               k=1                              k=1


3   Data description
The data used in this study were collected from 2010 to 2015, depersonalized
and provided by Almazov National Medical Research Centre.
    In the work, 16 features are grouped into a vector describing the patient
profile. They are the following: age, sex, body mass index (BMI), systolic and
diastolic blood pressure before treatment, smoking status, impaired glucose tol-
erance (IGT), left ventricular hypertrophy (LVH), chronic heart failure (CHF),
ischemic heart disease (IHD), dyslipidemia, diabetes, microalbuminuria, cardio-
vascular diseases (CVD) among relatives, chronic kidney disease stage (CKD),
and combination of concomitant drugs.
    In the authors’ previous study, eight patient clusters were been identification
based on the feature tuples. The patient groups accept the clinical interpreta-
tion. As time passes, a patient profile will be changing due to disease develop-
ment and ageing patient. It means that the same patient can belong to various
groups depending on the disease dynamics at different points in time. To prob-
abilistic model the arterial hypertension development in a patient, we use a
M arkov chain of transition from one cluster to another cluster. Therefore, the
dynamic model of the hypertensive patient transitions process from cluster i
(i = 1, 8) to cluster j (j = 1, 8) is presented as a graph, where the nodes are
      Impact of Antihypertensive Drug Amount on Blood Pressure Lowering                                                                               3

clusters and the edges are transition probabilities Pij (Fig. 1). It’s noted, the
probabilities Pij are determined in a way that the transition probabilities sum
is equal to one. Also, the patient condition will be able to remain the same then
the patient will transition to the same cluster. These transitions are presented
as a loop on between groups graph in Fig. 1. However, the certain clusters are
incompatible for relative transitions, in particular, due to gender characteristics
and/or the chronic concomitant diseases.


                                                                          0.05

                                                                          0.03

                                                       0.02                        0.07

                                                                                   0.10


                                                                          0.10

                                                                0.08

                                                                0.04

                                                                                             0.01
                                                                                             0.08
                                                                                                                                            0.15
                                                                          0.09                        0.55               0.09
                                                                                                                         0.28             Cluster 3
                                                                          0.03
                                                                                                    Cluster 5
                                                                                   0.08
                                                                                                                0.03
                                                                                                                                   0.07
                          0.15               0.14                                                               0.08
                                                                                   0.11
                                                                                                                                   0.19
                                    0.10   Cluster 8                                                                     0.19
                          0.20                                                     0.10
                                    0.03                        0.16
                                                                                                                       Cluster 4
                        Cluster 2            0.05                                  0.11      0.09
                                                       0.11     0.06
                 0.15     0.08                                                               0.04
                                             0.18
                                                       0.32                                           0.03
                                                                                             0.17
                 0.12                                           0.35
       0.22                                                                                  0.19
                                    0.30               0.23   Cluster 7   0.16     0.31               0.10
     Cluster 1                                                            0.21                                  0.25
                                                       0.07                      Cluster 6
                                                                                                                0.10

                                    0.10                                                              0.08
                                             0.05

                                             0.08                         0.10

                                                                0.03                                  0.03


                                                                0.13


                                     Fig. 1. Hypertensive patient state space.


    Additionally, training dataset contains an outcome field as a result of the
one-month treatment process. This value accepts various ways to be set. In this
particular research, it is based on clinical guidelines and points out if systolic and
diastolic blood pressure levels have reached the target values less than 140/90
mm Hg [5].
    Since the task is to predict the optimal therapy for each patient based on
his/her features vector, such fields as drug names and daily dosages have to be
in the training data. So that, data was merged (on patient ID and outpatient
visit date) with records contained medical prescriptions for these patients. As
long as medical texts are written in natural language, they require additional
processing to distinguish desired information and fill these fields.
4       A. Semakova et al.

4   Data preprocessing
Due to the lack of training dataset, the processing task was resolved in this
research with a sequence of regular expressions and extracting rules implement-
ing so-called rule-based natural language processing (NLP). Below is the overall
pipeline that was applied to each outpatient visit:
 1. Split the medical prescriptions field into substrings with ‘\n’ (newline) de-
    limiter. In the provided data most of such substrings contain, if they do,
    only one prescription of the drug.
 2. Find all entries of drugs in each substring using the dictionary prepared by
    authors. This dictionary includes drug brand-names, their different writing
    options that may show up in a natural language text, INNs, and pharmaco-
    logical classes.
 3. The substring may have no medical prescriptions because it has general
    guidance, referral to laboratory testing, etc. Such substring is not involved
    in further processing.
 4. If substring contains several drug brand-names, then distinguish them as an
    alternative or as a combination. Patterns, which are used in this step, include
    checking: their INNs – the same indicates the alternative, their location in
    string boundaries, and the presence of ‘and’ symbols, commas between them,
    conjunctions.
 5. Check the presence of words that mean cancellation, dosages and frequency
    indicators. Substrings that don’t have dosages and frequency can be involved
    in the dataset with filling missing values using appropriate machine learning
    algorithms in further preprocessing.
 6. Extract dosages using regular expressions with measurement units, frequen-
    cies – using regular expressions with parts of the day patterns.
 7. Aggregate INNs of all extracted drugs in the INN field, calculate their daily
    dosages and write them in the Dosage field using specified delimiter (in this
    research ‘|’).


5   Classifier implementation
This paper is aimed to describe the first submodel in detail, which is proposed to
be an extension of a treatment outcome classifier. In the cycle, it concatenates the
vector of the patient features with every possible drug count g = 1, r and utilizes
the classifier to predict the probabilities of treatment ineffectiveness (negative
class, 0) and effectiveness (positive class, 1) denoted as {(p(0), p(1))g }.
    The combination with the maximum outcome probability is assumed to con-
tain an optimal number of drugs.
                                           
                          nopt = arg max (p(0), p(1))g                           (3)
                                     p(1)

   In the current Python implementation, several scikit-learn classifiers (library
version 0.21.3) were trained, tuned and evaluated, including C-Support Vector
      Impact of Antihypertensive Drug Amount on Blood Pressure Lowering          5

Classifier (SVC), random forest classifier (RF), Multi-layer Perceptron (MLP),
classifier as well as LightGBM (library version 2.3.0).
    Hyper-parameters estimation using cross-validation led to the following val-
ues (parameters not mentioned below are expected to have the default values
for the specified library version):

 – SVC: radial basis function (RBF) kernel, balanced class weights, enabled
   probability estimation, kernel coefficient γ = 0.04, penalty parameter of the
   error term C = 8.0.
 – Random forest classifier: entropy criterion, balanced class weights, with 150
   trees in the forest of maximum depth 2 and 10 minimum samples to split.
 – MLP: stochastic gradient descent (sgd) solver, invscaling learning rate, max-
   imum number of iterations is 20, one hidden layer with 76 neurons, regu-
   larization term is 0.2, using Nesterov’s momentum, shuffle samples in each
   iteration.
 – LightGBM: random forest boosting type, balanced class weights, bagging
   frequency is 1, bagging fraction is 0.9, learning rate is 0.01, number of esti-
   mators is 110.


6   Assessment

After the preprocessing step the dataset containing 4521 records were divided
into three datasets: training, validation, and test in a ratio of 0.63:0.27:0.1 re-
spectively. It was decided to consider both Sensitivity (Recall) and Specificity
while estimating the treatment outcome classifiers parameters. This decision is
based on the requirement to efficiently identify both ineffective and effective
treatment and use it in revealing the optimal drug amount.
     The first two datasets were used in 5 splits with 7 repeats cross-validation,
results of which are presented in Table 1. Table 2 gives the results of classifiers
quality evaluation on the third (test) dataset. Although Sensitivity and Speci-
ficity were considered as target metrics, the tables additionally include values of
such metrics as Accuracy, Precision, F1 score and ROC AUC (Receiver Operat-
ing Characteristic Area Under ROC Curve).


                         Table 1. Cross validation scores.

Classifier Accuracy Precision Recall Specificity F1 score ROC AUC
SVC        0.56133 0.44986 0.73122 0.45855       0.55668 0.62650
RF         0.57787 0.45993 0.68487 0.51284       0.54999 0.62827
MLP        0.51973 0.39233 0.49659 0.53435       0.43776 0.53079
LGPM 0.57695 0.45488 0.61014 0.55715             0.52049 0.61416


    As can be seen from the tables that all classifiers avoided overfitting. MLP
classifier performed worse than the others, which showed comparatively close
6                       A. Semakova et al.

                                                Table 2. Test scores.

 Classifier Accuracy Precision Recall Specificity F1 score ROC AUC
 SVC        0.56076 0.45691 0.70763 0.46783       0.55528 0.58773
 RF         0.56979 0.46409 0.71186 0.47989       0.56187 0.59588
 MLP        0.50082 0.38925 0.50636 0.49732       0.44015 0.50184
 LGPM 0.56897 0.45293 0.54025 0.58713             0.49275 0.56369


results. The random forest classifier is assumed to perform the most optimal
way.
   The most important patient features and their importances with more than
1% impact returned by random forest classifier are presented in Fig. 2. They
are calculated as the impurity decrease from each feature: the reduce in node
impurity weighted by the probability of reaching the node. It can be seen that
the drug count provides around 4.0% of the total decision.
    However, authors assume that using a bigger training dataset will improve
the results and the impact of the drug count feature, which is now considered
to be not sufficient enough to reliably separate the effective therapy from the
ineffective.


                     0.40
                             0.36
                     0.35

                     0.30
Feature importance


                     0.25
                                    0.208
                     0.20

                     0.15                    0.132
                                                     0.119
                     0.10
                                                             0.061
                     0.05                                            0.04    0.03
                                                                                     0.015      0.01
                     0.00
                                           I

                                                              e


                                                                              F

                                                                                       x

                                                                                                ry
                                lic

                               lic


                                                             nt


                                                              t
                                         BM


                                                            un

                                                                            CH


                                                                                    Se
                                                           Ag


                                                                                             ita
                                                          ita
                           sto


                             to


                                                         co
                          as


                                                                                           ed
                                                       om
                         Sy


                                                      ug
                        Di


                                                                                         r
                                                    nc


                                                                                      He
                                                   Dr
                                                  Co


                                                     Feature of patient

                                             Fig. 2. Feature importances.
      Impact of Antihypertensive Drug Amount on Blood Pressure Lowering                7

7    Conclusion and Future works
As a result of this work, the model that predicts the optimal antihypertensive
drug count was presented, implemented, trained, and evaluated. This model is
a part of the proposed general model predicting the optimal antihypertensive
drug dosages based on the patient features.
   Future works of this research include:
 – preparation and preprocessing of new data collected from 2016 to 2019 that
   will be provided by Almazov National Medical Research Centre;
 – further training and parameters tuning of additional classifiers predicting the
   optimal amount of prescription drugs for a patient with arterial hypertension;
 – development of the data-driven model predicting the most effective individ-
   ual antihypertensive therapy including drug INNs and daily dosages.


Acknowledgements
The reported study was funded by RFBR according to the research project
#18-37-00441.


References
1. Somogyi, R., McMichael, J.P., Baranzini, S.E., Mousavi, P., Greller, L.D.: 10 Ad-
   vanced data mining and predictive modelling at the core of personalised medicine.
   Studies in Multidisciplinarity 3, 165–192 (2005)
2. Darwich, A.S., Ogungbenro, K., Vinks, A.A., Powell, J.R., Reny, J.L., Marsousi,
   N., Daali, Y., Fairman, D., Cook, J., Lesko, L.J., McCune, J.S., Knibbe, C.A.J., de
   Wildt, S.N., Leeder, J.S., Neely, M., Zuppa, A.F., Vicini, P., Aarons, L., Johnson,
   T.N., Boiani, J., Rostami-Hodjegan, A.: Why has model-informed precision dosing
   not yet become common clinical reality? lessons from the past and a roadmap for
   the future. Clin. Pharmacol. Ther. 101(5), 646–656 (2017)
3. Barbolosi, D., Ciccolini, J., Lacarelle, B., Barlési, F., André, N.: Computational
   oncology-mathematical modelling of drug regimens for precision medicine. Nat Rev
   Clin Oncol 13(4), 242–254 (2016)
4. Bielinski, S., Olson, J., Pathak, J.: Preemptive genotyping for personalized medicine:
   Design of the right drug, right dose, right timedusing genomic data to individualize
   treatment protocol. Mayo Clinic Proceedings 89(1), 25–33 (2014)
5. Semakova, A., Zvartau, N., Bochenina, K., Konradi, A.: Towards Identifying of
   Effective Personalized Antihypertensive Treatment Rules from Electronic Health
   Records Data Using Classification Methods: Initial Model. In: Procedia Computer
   Science, pp. 852–858. Elsevier B.V. (2017)
6. Moodie, E.E., Chakraborty, B., Kramer, M.S.: Q-learning for estimating optimal
   dynamic treatment rules from observational data. Can J Stat 40(4), 629–645 (2012)
7. Chen, G., Zeng, D., Kosorok, M.R.: Personalized Dose Finding Using Outcome
   Weighted Learning. J Am Stat Assoc 111(516), 1509–1521 (2016)
8. Diaz, F.J., Yeh, H.W., de Leon, J.: Role of Statistical Random-Effects Linear Models
   in Personalized Medicine. Curr Pharmacogenomics Person Med 10(1), 22–32 (2012)