Arteriovenous fistula flow level prediction through ordinal
classifier methods
Mario Garbelli1, Luca Neri1, and Francesco Bellocchio1
1
    Fresenius Medical Care AG & Co. KGaA, Bad Homburg, Germany


                Abstract
                Arteriovenous fistula (AVF) is the conjunction of an artery and a venous, surgically created as
                point of access to blood stream in dialysis patients. The measure of blood flow in AVF (Qa) is
                an important indicator to assess the functioning of AVF and the consequent adequacy for
                effective dialysis treatment. The measurement of Qa requires time-consuming and costly
                procedures which are generally performed only for a subset of patients considered at risk of
                problems. Our hypothesis is that there is a set of information collected during dialysis treatment
                that can be exploited to estimate Qa. The objective of this study is twofold: analyze the
                accuracy of Qa estimation from routinely collected clinical and dialysis ma-chine data
                variables and compare different approaches for this problem. We tack-led this estimation as an
                ordinal classification problem. As in machine learning domain there are different approaches
                to include order information in multiclass classification, we explored pros and cons of such
                different methods.

                Keywords 1
                Ordinal Multiclass Classification, Hemodialysis, Arteriovenous fistula.

1. Introduction
    Chronic kidney disease is an increasing long-term condition characterized by a deterioration of
kidneys functioning. After a certain level of renal damage, the patient needs renal replacement therapy
(RRT) to survive. There are two kinds of RRT: hemodialysis and peritoneal dialysis. The former (and
most common) consists in re-moving fluid and waste products from the blood using a special filter
called dialyzer. To take the blood from the patient and filter through the dialyzer and return it to the
patient, the doctor needs to make access to the patient’s blood vessel, called vascular access. There are
several types of vascular access, arteriovenous fistula (AVF) is one associated with lower incidence of
complication and longer patient’s survival. It is a conjunction of an artery and a venous, surgically
created, to exploit the blood pressure of the artery and guarantee an adequate blood flow through the
dialyzer.
    Continuous monitoring of AVF functioning is an important aspect of dialysis patient management
[1] [2] as if the AVF doesn’t work properly the alternative vascular access (typically, a central catheter)
would increase the risk of complications and decrease the effectiveness of dialysis treatment.
    The assessment of AVF blood flow (Qa) is an important prognostic test for evaluating the status of
vascular access [1][2] and it is used to consider clinical procedures to reestablish the patency of AVF.
This test can be performed in several way (doppler ultrasound, angiography, Body Thermal Monitor,
…) with different level of accuracy. These tests are time-consuming for clinical staff, operator
dependent, and costly for healthcare providers, so they are limited to those patients with some signs of
AVF malfunctioning.
    On the other hand, most of the dialysis machines automatically collect, at each dialysis treatment, a
set of data related to treatment blood flow and needles pressure which can be exploited to estimate the
AVF blood flow. Furthermore, the information present in the patient health records, such as age,

HC@AIxIA 2022: 1st AIxIA Workshop on Artificial Intelligence For Healthcare
EMAIL: mario.garbelli@fmc-ag.com; luca.neri@fmc-ag.com; francesco.bellocchio@fmc-ag.com

             ©️ 2020 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
laboratory tests, comorbidities, and others, might influence the AVF blood flow. In this study we
focused on the prediction of the AVF flow exploiting all this routinely collected data.
    The value of AVF blood flow is generally categorized in clinical meaningful levels: < 525 ml/h
(very low level), 525 ml/h - 925 ml/h (low level), > 925 ml/h (normal level).
    Considering this categorization, AVF flow estimate can be seen as an ordinal classification problem,
as there is a clear inherent order among the classes but not a well-defined numerical distance.
    This kind of multiclassification can be tackled with different machine learning approaches. We have
selected two kinds of ordinal classification methods: the first transforms ordinal targets into binary
classification subtasks [3][4] and the second is based on a single deep neural network combined with a
set of binary learning tasks [5]. The main difference between these two approaches is that the first one
is generally less computational costly but suffers from classifier inconsistencies among the binary
rankings [5].
    In this study we also compare the ordinal classification methods with a model based on a standard
regression methos.
    We already explored the application of machine learning approaches for the monitoring of AVF
status in [6]. In that study we focused on the assessment of general AVF problem risk (any kind of event
requiring physician intervention). Here, we focused on the prediction of a measurable variable that is
well known and used by the physicians, simplifying the introduction of the model in clinical practice.


2. Method
2.1. Ordinal classification
   Ordinal multi-classification problem consists in a task of predicting labels characterized by an
ordinal scale. An ordinal classifier C maps each object 𝑥 [𝑖] ∈ 𝑋 into an ordered set 𝑙: 𝑋 ⟶ 𝑌 , where
𝑌 = {𝑟1 ≺ ⋯ ≺ 𝑟𝑘 }The labels are characterized by the fact that it is possible to define an order among
them but not a measure of distance.
   Although, in Machine Learning there are many algorithms for general multi-classification problems,
there are much less able to consider the order information
    There are mainly two classes of algorithms for this kind of problem. The first [3] exploits the use
of a set of 𝐾 − 1 binary classifiers. In our experiments, we used a XGBoost model for generating the
classifiers. We call this model BINCM.
   The estimation of the first and the last class depends on a single classifier, 𝑃(𝑄𝑎 < 525) is given
by the first classifier and 𝑃(𝑄𝑎 ≥ 925) is given by the last classifier. The probability of the middle
class is computed as 𝑃(𝑄𝑎 ≥ 525) ⋅ (1 − 𝑃(𝑄𝑎 ≥ 925))
   This approach suffers from the possible inconsistency of the predicted probabilities [5]. This
inconsistency, since the predictions for individual binary tasks may disagree, can lead to a contradictory
result for which,
              𝑃(𝑄𝑎 ≥ 925) > 𝑃(525 < 𝑄𝑎 < 925) and 𝑃(𝑄𝑎 ≥ 925) < (𝑃(𝑄𝑎 ≥ 525)).
   The second approach, called rank-consistent ordinal regression framework (CORN), is based on a
neural network with an output layer that implements a set of 𝐾 − 1 binary learning tasks associated
with the ranks [5][7].
   In this study we applied the method described in [5] and we summarize it in the following.
                                        𝑁
   Given a dataset 𝐷 = {𝑥 [𝑖] , 𝑦 [𝑖] }𝑖=1 , a rank 𝑦 [𝑖] is transformed into 𝐾 − 1 binary labels
 [𝑖]      [𝑖]                [𝑖]                             [𝑖]
𝑦1 , … , 𝑦𝑘−1 such that 𝑦𝑘 ∈ {0,1} and indicates if 𝑦 is greater than 𝑟𝑘 . To ensure classifiers
consistency, the model estimates a series of conditional probabilities such that the output of the 𝑘-th
binary task 𝑓𝑘 (𝑥 [𝑖] ) represents the conditional probability

                 𝑓𝑘 (𝑥 [𝑖] ) = 𝑃(𝑦 [𝑖] > 𝑟𝑘 | 𝑦 [𝑖] > 𝑟𝑘−1 ), where {𝑦 [𝑖] > 𝑟𝑘 } ⊆ {𝑦 [𝑖] > 𝑟𝑘−1 }

   The unconditional probabilities can be derived through the chain rule for probabilities to the model
outputs:
                                                             𝑘
                                            [𝑖]
                                      𝑃(𝑦         > 𝑟𝑘 ) = ∏ 𝑓𝑗 (𝑥 [𝑖] )
                                                           𝑗=1
Since ∀𝑗 0 ≤ 𝑓(𝑥 [𝑖] ) ≤ 1, we have

                         𝑃(𝑦 [𝑖] > 𝑟1 ) ≥ 𝑃(𝑦 [𝑖] > 𝑟2 ) ≥ ⋯ ≥ 𝑃(𝑦 [𝑖] > 𝑟𝑘−1 )
   This guarantees the rank consistency among the 𝑘 − 1 binary tasks.

   To evaluate the benefits of using ordinal classifier approaches than a classical regression approach,
we used as benchmark an XGBoost regressor to estimate the exact value of Qa and, after that, we map
these values to the three classes (very low flow, low flow, normal flow) using the thresholds 525 ml/h
and 925 ml/h. We call this model REGM.
   The accuracy comparison among the three methods was assessed computing mean absolute error
(MAE), precision and recall, F1-score and confusion matrix


2.2.    Dataset
   Our dataset was composed of 46,292 Qa measurements referred to 5,940 hemodialysis patients from
four different European countries (Czech Republic, Portugal, Slovakia, Spain) and collected between
2015 and 2022. The Qa distribution was shown in Figure 1.


Figure 1: Qa measures distribution

   Written informed consent for statistical analysis was obtained from all the patients. Qa
measurements are mainly obtained through dilution technique (thermal 93.34% and other 6.27%) and
only a few measures (0.39%) were obtained through doppler ultrasound.
The input dataset was composed of 49 parameters that characterize patient, AVF and treatments
routinely recorded through Fresenius Medical Care clinical database (EuClID®). The input variables
were computed collecting the information in the 90 days before the Qa measure.
70% of the dataset was randomly selected to train each model, 10% was used as validation set (for
CORN) and the remaining 20% was used as test set.
3. Results
In Figures 2, 3, and 4 we reported the confusion matrix with the percentage of labels correctly and
incorrectly classified for the three classes for: REGM, BINCM and CORN respectively. The overall
MAE was 0.37, 0.27 and 0.36.
The results show that both ordinal classification models performed better than the benchmark in terms
of overall MAE, and BINCM outperformed CORN (0.27 Vs. 0.36). It is interesting to note that when
we looked at the best recall for single class, we found that the best one for “very low” and “low” flow
is still BINCM but for normal flow the best one is REGM. On the other hand, the best one in terms of
precision for “very low” flow is REGM and for “low” and “normal” flow it is BINCM.


Figure 2: Confusion matrix (CM) for REGM. The diagonal elements of CM represent the number of
right matches between predicted and actual (true) label, while off-diagonal elements are those that
are mislabeled. The CM values are normalized over actual label (by row)

Table 1
Accuracy performance metric for REGM

                    precision         recall        F1-score        MAE                support
     very low           0.93            0.58           0.72         0.50                3578
       low              0.45            0.55           0.49         0.45                2470
      normal            0.68            0.84           0.75         0.16                3211
      overall           0.71            0.66           0.67         0.37                9259
Figure 3: Confusion matrix for BINCM

Table 2
Accuracy performance metric for BINCM
               precision    recall       F1-score   MAE    support
 very low         0.9           0.8        0.84     0.25    3578
   low           0.57           0.7        0.63     0.30    2470
  normal         0.79           0.75       0.77     0.27    3211
  overall        0.77           0.76       0.76     0.27    9259


Figure 4: Confusion matrix for BINCM

Table 3
Accuracy performance metric for CORN
               precision      recall    F1-score    MAE    support
  very low       0.86          0.75       0.80      0.32    3578
    low          0.48          0.64       0.55      0.36    2470
   normal        0.71          0.63       0.67      0.41    3211
   overall       0.71          0.68       0.69      0.36    9259
4. Discussion
    In this exploratory paper we studied the estimation of AVF blood flow exploiting routinely collected
clinical data extracted from a clinical database. AVF blood flow assessment is critically important to
assess the status of the fistula and evaluate preemptive intervention to avoid AVF failure, that may lead
to the risk of complications.
    As AVF blood flow is generally categorized in three medical meaningful classes (very low when
less than 525 ml/h, low when between 525 and 925 ml/h and normal when greater than 925 ml/h), we
tackled the problem as an ordinal classification task.
    There are several approaches in machine learning domain for ordinal classification, we focused on
two techniques: ordinal classification with K-1 binary classifiers and CORN. We used a popular
regression model (XGBoost) with post classification as benchmark.
    The objective of this study can be summarized in two points: first, evaluate the accuracy of AVF
blood estimation to assess its possible medical use and second, compare different Machine Learning
approaches for this problem.
    Regarding the first point, the preliminary results are very promising. The ability to detect fistula with
very low flow with a precision of 0.9 (with BINCM) is a remarkable result per se. It should be also
noted that 7.3 % of the missed “very low flow” AVFs are predicted as “low flow” and only 2.7% are
predicted as normal flow. This suggests that the model could be used to build an alert system to warn
physicians about AVFs in critical situations. It might substantially improve the AVF surveillance
process, referring the patient for further investigation when it is really needed and potentially decrease
the risk of AVF failure.
    Regarding the comparison among different approaches, the results are interesting as well. If we focus
on the overall MAE and F1-score, it seems that the possible inconsistency of probabilities classes might
be an acceptable tradeoff for our problem as those metrics are much better for the BINCM than CORN
(0.27 and 0.76 Vs. 0.36 and 0.69). Although CORN results less accurate than BINCM, it can be a useful
tool for the application where it is important to show to the physician the single probability of the three
classes. In this case, the inconsistency of the probabilities would be difficult to understand.
    It is also important to consider that if we focus on the performance of the models in the single classes,
the best model is not always the same. This suggests that those approaches have different peculiarities
that can be exploited to create a more robust and effective medical support decision system.
    Considering the promising preliminary results, we see different next steps for this study, for
example: a deep feature analysis to explore the contribution of each variable to the prediction;
evaluating further ordinal classification methods; discuss with clinicians for a possible use case of the
model in real clinic setting and how integrate it in the clinical system and in clinical practice.

5. References

[1] C.E. Lok, T.S. Huber, T. Lee, S. Shenoy, A.S. Yevzlin, K. Abreo, M. Allon, A. Asif, B.C. Astor,
    M.H. Glickman, et al., KDOQI Clinical Practice Guideline for Vascular Access: 2019 Update.
    Am. J. Kidney Dis., 75, S1–S164, (2020)..
[2] M. Gallieni, M. Hollenbeck, N. Inston, M. Kumwenda, S. Powell, J. Tordoir, J. Al Shakarchi, P.
    Berger, D. Bolignano, D. Cassidy, et al., Clinical practice guideline on peri- and postoperative care
    of arteriovenous fistulas and grafts for haemodialysis in adults. Nephrol. Dial. Transplant, 34
    (Suppl. 2), II1–II42, (2019).
[3] F. Eibe and M. Hall. A simple approach to ordinal classification. In Proc 12th European
    Conference on Machine Learning, Freiburg, Germany, pages 145--156. Springer, (2001)
[4] L. Li, H.T. Lin, Ordinal regression by extended binary classification, Advances in Neural
    Information Processing Systems, pp. 865-872, (2007).
[5] X. Shi, W. Cao, S. Raschka, Deep Neural Networks for Rank-Consistent Ordinal Regression Based
    On Conditional Probabilities, CoRR, https://arxiv.org/abs/2111.08851, (2021)
[6] R. Peralta, M. Garbelli, F. Bellocchio, P. Ponce, S. Stuard, M. Lodigiani, J. Fazendeiro Matos,
    Ribeiro, M. Nikam, M. Botler, E. Schumacher, D. Brancaccio, L. Neri, Development and
    Validation of a Machine Learning Model Predicting Arteriovenous Fistula Failure in a Large
    Network of Dialysis Clinics. Int J Environ Res Public Health. 2021 Nov 24;18(23):12355. doi:
    10.3390/ijerph182312355. PMID: 34886080; PMCID: PMC8656573.
[7] W. Cao, V. Mirjalili, S. Raschka, Rank Consistent Ordinal Regression for Neural Networks with
    Application         to      Age        Estimation.     Pattern      Recognition       Letters.
    https://doi.org/10.1016/j.patrec.2020.11.008, (2020).