<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Post-Processing Techniques</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alberto Archetti</string-name>
          <email>alberto.archetti@polito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Stranieri</string-name>
          <email>francesco.stranieri@polito.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Matteucci</string-name>
          <email>matteo.matteucci@polimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Politecnico di Milano</institution>
          ,
          <addr-line>Via Giuseppe Ponzio, 34, 20133 Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Politecnico di Torino, Corso Duca degli Abruzzi</institution>
          ,
          <addr-line>24, 10138 Turin</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Università degli Studi di Milano-Bicocca</institution>
          ,
          <addr-line>Viale Sarca, 336, 20126 Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Survival analysis is a crucial tool in healthcare, allowing us to understand and predict time-to-event occurrences using statistical and machine-learning techniques. As deep learning gains traction in this domain, a specific challenge emerges: neural network-based survival models often produce discretetime outputs, with the number of discretization points being much fewer than the unique time points in the dataset, leading to potentially inaccurate survival functions. To this end, our study explores post-processing techniques for survival functions. Specifically, interpolation and smoothing can act as efective regularization, enhancing performance metrics integrated over time, such as the Integrated Brier Score and the Cumulative Area-Under-the-Curve. We employed various regularization techniques on diverse real-world healthcare datasets to validate this claim. Empirical results suggest a significant performance improvement when using these post-processing techniques, underscoring their potential as a robust enhancement for neural network-based survival models. These findings suggest that integrating the strengths of neural networks with the non-discrete nature of survival tasks can yield more accurate and reliable survival predictions in clinical scenarios. survival analysis, neural networks, regularization techniques, healthcare Survival analysis [1] is a field of statistics concerned with modeling time-to-event data. Its primary objective is to construct a survival function  depending on time  tailored to a particular subject, representing the probability of not experiencing a particular event of interest up to  , such as disease onset, death, or hospital discharge. Thus, a survival function is formally . The analysis of time-to-event data is of paramount importance in healthcare, facilitating the identification of patient risk factors over time. Distinctively, survival analysis difers from conventional machine learning tasks such as classification and regression due to its ability to handle censored data points - instances where the event of interest has not yet occurred for a particular subject. This characteristic is common in clinical data, given the 0000-0003-3826-4645 (A. Archetti); 0000-0002-5366-8499 (F. Stranieri); 0000-0002-8306-6739 (M. Matteucci) htp:/ceur-ws.org CEUR Workshop Proceedings (CEUR-WS.org) IS N1613-073</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>(M. Matteucci)
CEUR
Workshop
Proceedings
prolonged, complex, and privacy-constrained nature of data collection, which challenges the
applicability of data-intensive machine learning models.</p>
      <p>
        Recent advancements in survival applications exploit neural network-based deep learning
techniques, emphasizing their ability to model the non-linear relationships between patient
features and time-to-event records. Their utility has been demonstrated in various studies [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5 ref6">2,
3, 4, 5, 6</xref>
        ], emphasizing their generalization advantage over traditional statistical approaches
and matching the expressive power of ensemble methods [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ]. However, most common
neural network architectures involve a set of discrete outputs, necessitating specific processing
to adapt to the continuous nature of survival analysis. To this end, numerous coping strategies
between discrete-output neural networks and survival analysis have been introduced. Most
techniques focus on time discretization [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], enabling neural networks to encapsulate time-event
associations for a limited set of time points. Instead, few methodologies directly tackle
timecontinuous survival functions and are based on proportional hazard [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or piece-wise constant
hazard [
        <xref ref-type="bibr" rid="ref10 ref5">5, 10</xref>
        ].
      </p>
      <p>In our research, we conduct a thorough examination of multiple interpolation methods to
determine if post-processing interpolation can augment the eficacy of discrete-output neural
networks. Specifically, we delve into three interpolation techniques: linear, piece-wise
exponential, and spline-based, applying them to the state-of-the-art neural survival models. We
investigate whether performance gaps are relevant between interpolated and non-interpolated
versions of the same survival model. Our investigation employs time-dependent survival metrics
to gauge the eficacy of neural-based models, namely the Integrated Brier Score (IBS) and the
Cumulative Area-Under-the-Curve (Cumulative AUC). Our empirical analysis, validated across
several real-world healthcare datasets, indicates that interpolation supports the generalization
capability of neural-based survival models. This improvement is particularly relevant when the
number of discretization bins and, consequently, neural network outputs is substantially smaller
than the dataset’s sample count. This scenario commonly arises in practical applications where
the dataset size considerably outweighs the neural network’s output neurons.</p>
      <p>In summary, our research ofers a comprehensive empirical analysis of interpolation methods
tailored for neural-based survival models. We explore the potential advantages of incorporating
a post-processing interpolation phase based on simple operations with negligible computational
overhead. These insights bear significant implications for the clinical applicability of survival
models, suggesting that a simple interpolation step can markedly boost the generalizability of a
neural-based survival model.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Background</title>
      <p>This section provides the necessary background on survival analysis as a machine-learning
problem, alongside the description of the survival metrics to assess model performance that
will be investigated in subsequent experimental evaluation.</p>
      <sec id="sec-3-1">
        <title>2.1. Survival Analysis</title>
        <p>Survival analysis tackles time-to-event modeling, leveraging both statistical and
machinelearning methodologies. It plays a pivotal role in interpreting clinical data, forecasting
occurrences such as the onset of a disease, relapses, mortality, and hospital discharges. By harnessing
patient information, the aim is to formulate a time-dependent parametric function, () , that
denotes the probability of a subject not encountering a specified event up to a given time,
expressed as</p>
        <p>() =  ( &gt; ).</p>
        <p>This non-increasing function starts with a value of 1 at  = 0 , approaching 0 as  tends to infinity.</p>
        <p>Instead of () , several survival methods estimate the instantaneous hazard rate for each
individual, called hazard function:
From the hazard function, the survival function can be derived as
ℎ() = lim
→0
 ( ≤  &lt;  + | ≥ )</p>
        <p>.</p>
        <p>() = exp(− ())
where  () represents the integral of ℎ over the interval from 0 to  .</p>
        <p>What sets survival analysis apart from conventional machine learning tasks, like classification
or regression, is its ability to analyze censored data points. Such data represent subjects who
have not encountered the specified event during the data collection period. Hence, survival
datasets comprise triplets: (x ,   ,   ), where (i) x indicates the feature vector for subject  ; (ii)   is
a binary flag, which is set to 0 if the sample is censored; and (iii)   designates either the event’s
time or the censoring time, depending on the value of   . This is the most common scenario in
survival problems, referred to as right censoring. Throughout this paper, our discussions will
refer to the right censoring context.</p>
        <p>
          The most prevalent models used for deriving survival functions include the non-parametric
Kaplan-Meier model [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and the linear Cox model [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Machine learning-enhanced non-linear
extensions typically employ ensemble strategies [
          <xref ref-type="bibr" rid="ref13 ref7">7, 13</xref>
          ] and neural networks, which will be
analyzed in Section 3.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Metrics in Survival Analysis</title>
        <p>
          The most common metrics used to evaluate the predictive power of survival models are the
Concordance Index (C-Index), the IBS, and the Cumulative AUC. The C-Index [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] measures
the agreement between the predicted survival outcomes from a model and the actual observed
outcomes for pairs of samples. Specifically, for each time point, the predicted outcome is
determined by the model’s survival probability or risk score, while the true outcome reflects
the event status – 1 for non-censored and 0 for censored samples. Only pairs with times
 1 &lt;  2 and events  1,  2 where  1 is non-censored are considered comparable. The C-Index
measures the proportion of comparable pairs that are concordant, meaning the sample with
the higher predicted survival probability outlives the other. This measure can be interpreted as
(1)
(2)
(3)
the probability that, for two randomly chosen individuals, the one with the higher risk score
will experience the event first. A C-Index value of 0.5 signifies random predictions, whereas 1
indicates perfect concordance. While easy to interpret, the C-Index does not provide information
about model calibration.
        </p>
        <p>
          Alongside the C-Index, another common metric for assessing survival models is the Brier
Score (BS) [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], which quantifies both precision and calibration of predicted survival outcomes.
The BS computes the squared diference between the actual event occurrence (1 for the event
and 0 otherwise) and the predicted survival probability for a specific time instant. Ideally, a BS
value should be close to 0, indicating perfect prediction. The IBS integrates the Brier scores
over various times, giving an overall temporal performance evaluation of the model. The IBS
summarizes the model’s ability to capture accurate event probabilities. However, its evaluation
can be afected by the integration range and the time density of available samples.
        </p>
        <p>
          The third most common metric for survival models is the Cumulative AUC. While the AUC
is traditionally a classification metric, its application extends to survival studies with
timedependent outcomes [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. In this context, the AUC examines the predicted survival probabilities
against observed event statuses over several time instants. Samples that are censored before
or during this period are treated as negative events. The Cumulative AUC integrates these
time-dependent AUC values, with 1 indicating perfection in prediction.
        </p>
        <p>
          To adjust for censoring biases, the Inverse Probability of Censoring Weighting (IPCW)
method [
          <xref ref-type="bibr" rid="ref14">14, 17</xref>
          ] is employed. Here, each sample is assigned a weight based on its inverse
censoring probability at a given time. Observations with high censoring likelihoods get more
weight, and vice versa for low-censoring observations. This weighting helps to counteract
potential biases due to the event censoring distribution. Also, each of the metrics described
focuses on a specific aspect of survival models. Therefore, for a comprehensive evaluation of
the overall quality of a survival model, multiple metrics must be taken into account.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Related Work</title>
      <p>
        In recent years, deep learning increased the expressive capability of traditional survival models.
The first works were devoted to the extension of one of the most prominent survival models:
the Cox model [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The Cox model defines a hazard function based on the assumption that the
relative risk between subjects remains unchanged over time (proportional hazard assumption):
ℎ(|x ) = ℎ0()exp(x ),
(4)
where ℎ0() is the baseline hazard common across all subjects, and exp(x ) is a subject-specific
factor that modifies the baseline hazard based on an individual’s risk profile. The classic Cox
model assumes the existence of a linear relationship between features and subject hazard with
the risk multiplier being the exponential of the dot product of features and weights.
      </p>
      <p>
        A substantial extension of the Cox model is DeepSurv [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Here, the linear relationship
between features and risks is replaced with a deep neural network, capturing non-linear
interactions between features and the hazard function. It leverages the same diferentiable loss
function as the original Cox model for training, called partial log-likelihood. This loss function
is tailored to train models based on the proportional hazard assumption.
      </p>
      <p>
        However, the proportional hazards assumption, though rendering models straightforward
and interpretable, can sometimes hamper their generalization. In fact, many real-world datasets
do not respect this assumption, rendering such models less efective. A paradigm shift in neural
survival models emerged with time discretization techniques [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. These techniques allowed
neural networks to directly approximate discretized hazard and survival functions. Among the
models following this approach, DeepHit [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] employs sigmoid activations to estimate discrete
probabilities for designated event times. DeepHit is specifically tailored to compute probabilities
for multiple competing events, predicting which event occurs first. In fact, its loss function is
designed not only to improve the model’s accuracy but also to predict event occurrence in the
most probable order.
      </p>
      <p>Drawing inspiration from the Multi-Task Logistic Regression (MTLR) approach [18], Neural
Multi-Task Logistic Regression (N-MTLR) [19] employs multiple neural-based logistic regression
heads to predict event occurrence probability for each time step. These outputs are subsequently
normalized using a softmax function to yield event probabilities.</p>
      <p>
        Finally, the Logistic Hazard model [
        <xref ref-type="bibr" rid="ref5">20, 5</xref>
        ] frames the survival problem discretely, transforming
it into a sequence of binary classification tasks. Each task predicts the risk for an event
occurrence at a given time interval. The model captures time-dependent efects through a
multi-output neural network employing softmax activations, making it a robust choice for
handling time-varying efects in survival analysis.
      </p>
      <p>
        An alternative approach from [
        <xref ref-type="bibr" rid="ref10 ref5">5, 10</xref>
        ] instead of discretizing the survival function, assumes the
hazard function to be piece-wise constant. This method, called PC-Hazard, produces continuous
survival functions framed as piece-wise exponentials. Thus, PC-Hazard adapts any regression
model to a survival model, trainable with the Poisson regression technique.
      </p>
    </sec>
    <sec id="sec-5">
      <title>4. Interpolation Methods</title>
      <p>In survival models based on neural networks, the discrete outputs, or anchor points, define
the value of the survival function for a set of specific time instants. This section provides a
description of several interpolation techniques designed to bridge the gap between discrete
survival functions and continuous metric evaluation. Consider a set of  time instants, each
corresponding to the limit of a discretization bin, { 1,  2, … ,   }, such that 0 &lt;  1 &lt;  2 &lt; ⋯ &lt;   .
Then, a survival model based on neural networks produces a set of outputs { 1,  2, … ,   }, such
that 1 ≥  1 ≥  2 ≥ ⋯ ≥   ≥ 0. The set of pairs (  ,   )corresponds to the anchor points leveraged
by the interpolation methods to obtain a continuous survival function. In order to allow the
interpolation to attain the properties of survival functions, we consider the pairs (0, 1)and ( ∞, 0)
to always be part of the set of anchor points. Figure 1 illustrates the considered interpolation
techniques evaluated on a set of fixed anchor points.</p>
      <sec id="sec-5-1">
        <title>4.1. Step-wise Interpolation</title>
        <p>
          Most works apply step-wise interpolation to produce continuous outputs from the set of anchor
points produced by a survival model. In particular, given a time instant  ∈ [  ,  +1 ), this simple
0.0
for  = 2, 3, 6, 7, and 9. The first two plots from the left refer to step-wise interpolations considering the
following or previous anchor points, respectively. The third plot illustrates a linear interpolation. The
fourth is a piece-wise exponential, inspired by the PC-Hazard model [
          <xref ref-type="bibr" rid="ref10 ref5">5, 10</xref>
          ]. The final plot interpolates
the anchor points with a monotonic cubic spline.
type of interpolation defines the value of a survival function as
        </p>
        <p>() =  
() =  +1 .
which corresponds to the value of the closest anchor points with a lower corresponding time.
We call this interpolation method Step FWD, indicating that the anchor point is propagated
forward in the survival function. In the following analyses, we also employ an alternative
approach, called Step BWD, which propagates the next closest anchor point backward in the
survival function as
The idea of Step BWD is to focus on future event instances rather than the immediate past.
It might be relevant in situations where interventions or treatments are planned, and the
anticipation of the next event risk is more clinically significant than the immediate past.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Linear Interpolation</title>
        <p>The most straightforward extension to the step-wise interpolation techniques is to define the
interpolation point on the line connecting the considered anchor points. A linearly-interpolated
survival function is defined as</p>
        <p>−</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.3. Piece-wise Exponential Interpolation</title>
        <p>
          This interpolation method is inspired by the piece-wise constant hazard model [
          <xref ref-type="bibr" rid="ref10 ref5">5, 10</xref>
          ]. This
method, referred to as PWE, assumes the hazard function to be constant within each time
interval. Then, according to Eq. (3), the survival function results in a piece-wise exponential
function. The interpolation is computed as
where
() =   exp (  ⋅
        </p>
        <p>−   )
A suficient condition to ensure monotonicity is to set, for   &gt; 3 or   &gt; 3,
At this point, the values of tangents   guarantee that a Hermit spline passing through the
anchor points is non-increasing. The survival function is computed as
() = (2 3 − 3 2 + 1)  + ( 3 − 2 2 + )  + (−2 3 + 3 2) +1 + ( 3 −  2) +1
  =
 +1 −</p>
      </sec>
      <sec id="sec-5-4">
        <title>4.4. Monotonic Cubic Spline Interpolation</title>
        <p>The Hermite spline with monotonicity constraints [21] is a spline-based interpolation method
to fit a set of anchor points with a non-increasing smooth function maintaining a
continuous derivative. The Fritsch–Carlson method enables the construction of survival functions
with a smooth transition between anchor points. In the subsequent sections, we refer to this
interpolation technique as Spline.</p>
        <p>The idea is to constrain the tangents of the Hermit spline in such a way that the resulting
piece-wise function is monotonic. To this end, the Fritsch–Carlson method starts from the
secant lines between successive anchor points
and initializes the average of the secants as
where</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Experiments</title>
      <p>
        This section collects the experimental methodology and results to validate our claims about
interpolation techniques for neural-based survival models. In particular, we describe the datasets
involved in the experiments, the training and inference procedure, and the final results obtained.
We highlight that each of the datasets involved is publicly accessible and used in several survival
studies for benchmarking purposes [
        <xref ref-type="bibr" rid="ref8 ref9">22, 8, 9</xref>
        ]. To allow for reproducibility, we made the source
code of the experiments publicly available1.
      </p>
      <sec id="sec-6-1">
        <title>1https://github.com/archettialberto/interpolation_for_deep_survival_analysis</title>
        <sec id="sec-6-1-1">
          <title>5.1. Datasets</title>
          <p>
            This section describes the datasets processed throughout the experiments:
• Worcester Heart Attack Study (WHAS500) [23]: This dataset focuses on cardiovascular
health, specifically patients who have experienced myocardial infarction. Given that
heart diseases are one of the leading causes of mortality worldwide, models built on this
dataset can help in risk prediction, better understanding of prognostic factors, and overall
improved patient management strategies.
• German Breast Cancer Study Group (GBSG2) [24]: Cancer recurrence is a significant
concern for patients who have undergone treatment. The GBSG2 dataset provides insights
into factors that may afect recurrence, especially in the context of hormone treatments.
The dataset’s focus on covariates like age, menopausal status, and tumor-specific details
makes it a rich source for modeling and predictions, which can directly influence treatment
decisions.
• Molecular Taxonomy of Breast Cancer International Consortium
(METABRIC) [
            <xref ref-type="bibr" rid="ref2">25, 2</xref>
            ]: This dataset ofers clinical attributes related to patients
experiencing breast cancer. It is part of a larger project ofering genomic data, paving the
way for personalized treatment plans by taking into account the genetic variations that
might influence survival rates.
• The Cancer Genome Atlas Program - Breast Cancer Study (TCGA-BRCA) [26]:
The TCGA provides a comprehensive view of the genomic changes across various cancer
types. Among the data collection projects revolving around TCGA, BRCA focuses on
breast-invasive carcinoma, ofering insights into the variations in survival outcomes
based on geographic regions and their associated clinical practices. This dataset comes
from a dataset suite for medical federated learning, called Flamby [26]. In this study, we
do not consider the federated aspect, aggregating the regional clients into a single cluster
of individuals.
          </p>
        </sec>
        <sec id="sec-6-1-2">
          <title>5.2. Experimental Setup</title>
          <p>This section delineates the methodological approach utilized to assess the eficacy of
interpolation as a post-processing measure in survival models based on neural networks. The datasets
employed for our evaluation, specifically WHAS500, GBSG2, METABRIC, and TCGA-BRCA,
are detailed in Section 5.1. Data from these datasets were uniformly sampled to formulate
both training and test splits, comprising 80% and 20% of the overall samples, respectively.
Subsequently, the training subset underwent an additional 80-20% split to generate a validation
subset.</p>
          <p>The experiments involved four state-of-the-art neural network-based models from survival
analysis: DeepSurv, DeepHit, Logistic Hazard, and N-MTLR, each thoroughly described in
Section 3. Notably, DeepSurv is the only model based on the proportional hazard assumption,
whereas the others rely on an explicit definition of discrete time bins. Concerning these
discretization points, we adopted a uniform splitting approach, increasing the anchor count
with every experiment. The tested numbers of anchors are 5, 10, 50, 100, 500, and 1000. These
numbers hold for non-proportional models only, as DeepSurv has a fixed number of anchors,
corresponding to the points of the baseline function, shared across all subjects.</p>
          <p>Each model comprises a two-layer fully connected neural network with a number of inputs
equal to the dataset features and a hidden layer size of 32. Each layer is followed by a ReLU
activation function and a dropout regularization layer with 0.1 probability. The number of
outputs is 1 for DeepSurv and equal to the number of anchor points for all the other models.
In the experiments, models are trained using the Adam optimizer with a learning rate of 0.01.
Training executed till convergence for a maximum of 300 epochs, adopting an early stopping
strategy on the validation set with a 10-epoch patience threshold. The selected batch size was
ifxed at 128.</p>
          <p>In the subsequent inference phase, survival functions were derived from the anchor points
of each model, after an interpolation step leveraging the methods outlined in Section 4 – Step
BWD, Step FWD, Linear, PWE, and Spline. For each trained model paired with an interpolation
strategy, the C-Index, the IBS, and the Cumulative AUC with IPCW weighting were evaluated,
as described in Section 2.2. The IBS and the Cumulative AUC were integrated over the 25th
and 75th percentiles of the test times, to limit the noise that could be introduced by the lower
sample density at the endpoints of the time spectrum. Finally, to limit the efects of randomness,
each single experiment was repeated 30 times, averaging the final results.</p>
        </sec>
        <sec id="sec-6-1-3">
          <title>5.3. Results</title>
          <p>In this section, we present and discuss the empirical results derived from our experiments with
various interpolation techniques. For brevity, we enumerate the IBS (Table 2a), Cumulative
AUC (Table 2b), and C-Index (Table 2c) values achieved on the METABRIC dataset, which is the
largest dataset among the ones analyzed, for 10, 100, and 1000 anchor points. Detailed numerical
values on the WHAS500, GBSG2, and TCGA-BRCA datasets are reported in Appendix A. On
top of that, the time-dependent metrics for all datasets, namely IBS and Cumulative AUC, are
plotted for 5, 50, and 500 anchor counts in Figure 2a and Figure 2b.</p>
          <p>Does interpolation serve as an efective post-processing step when evaluated using the IBS metric?
As illustrated in Table 2a and Figure 2a, implementing any form of interpolation generally
proves beneficial over the Step BWD or Step FWD techniques. Specifically, for a limited number
of anchor points, i.e., 5 and 10, neural models leveraging Linear and PWE interpolations
demonstrate a better IBS compared to their counterparts. Although Spline interpolation surpasses
step-wise methods, it falls behind Linear and PWE. As the number of anchor points increases,
the distinction among interpolation methods diminishes. This is expected, as a larger anchor
count ofers a finer discretization grid, enabling the neural network to precisely adjust the
survival function and thereby mitigating the necessity for interpolation. Notably, while minor
diferences can still be observed at 50 and 100 anchors, increasing to 500 or 1000 efectively
equalizes the results of all methods. This convergence can be attributed to the anchor count
approaching the dataset size, compelling the model to capture the behavior of individual time
instances.</p>
          <p>Does interpolation serve as an efective post-processing step when evaluated using the Cumulative
AUC metric? The Cumulative AUC metric outcomes, reported in Table 2b and Figure 2b, largely
follow the trend of the previous observations. Non-step-based interpolation methods tend
to augment the Cumulative AUC for neural models, especially when the number of anchor
points is low. An outlier to this trend is observed with DeepHit using 10 anchor points on the
METABRIC dataset, where Step FWD emerges as the best technique. However, this remains the
only exception with respect to the general trend. Remarkably, while Step FWD often serves as a
default choice for state-of-the-art survival models, it is consistently outperformed by Step BWD.
Similar to the IBS trend, the performance diference among interpolation techniques diminishes
with an increased anchor count.</p>
          <p>How do interpolation techniques afect the C-Index metric? As highlighted in Table 2c, step-based
interpolation methods marginally outperform other techniques regarding the C-Index on the
METABRIC dataset. Hence, for specific applications where concordance is the only metric of
utmost importance, step-based interpolation stands as a reliable choice. On the other hand, for
any other situation, smoother interpolation techniques present better time-dependent metrics
with only a negligible degradation of concordance.</p>
          <p>Is there a correlation between the proportional hazard assumption and interpolation’s eficacy? The
proportional hazard assumption significantly impacts the model’s outputs, imposing constant
subject ratios over time. Consequently, the chosen interpolation method should not afect the
C-Index, as confirmed by DeepSurv’s performance in Table 2c. Interestingly, for the other
metrics, IBS and Cumulative AUC, deviations are not noticeable to the fourth decimal place.
Thus, for models based on the proportional hazard assumption, the influence of interpolation
on performance is negligible. Instead, as thoroughly analyzed earlier, the opposite holds for
non-proportional models based on time discretization.</p>
          <p>How does censoring impact results? As previously discussed, interpolation techniques generally
enhance survival metrics. This improvement is particularly evident in the METABRIC dataset,
which has the most significant proportion of censored samples among the datasets we examined.
When we compare this to other datasets with fewer censored samples, the positive efect of
interpolation, although still present, is less marked. While it is not definitive that there is a
direct correlation between interpolation and the percentage of censorship, we can afirm that a
high rate of censoring does not hinder the benefits of interpolation techniques.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusion</title>
      <p>In this study, we investigated the influence of interpolation techniques on the performance
metrics of survival models. Due to their expressive power, these models often achieve a high
degree of generalization. However, their inherent discretization limitations can compromise
their precision. To address this, we focused on the post-processing of survival functions through
interpolation between anchor points, aiming to improve time-dependent metrics such as the IBS
and Cumulative AUC. The empirical analyses conducted across various real-world healthcare
datasets and model configurations underscored a consistent pattern: even simple interpolation
methods, like linear interpolation, ofer tangible improvements in these metrics. This trend is
especially noticeable when the number of anchor points is orders of magnitude smaller than
the dataset cardinality, which corresponds to most real-world use cases. In summary, this study
underscores the potential of combining the expressiveness of neural networks with interpolation
techniques to improve the accuracy of survival predictions in clinical contexts.</p>
    </sec>
    <sec id="sec-8">
      <title>7. Ethical Discussion</title>
      <p>While our study focuses on a specific mathematical question concerning the post-processing of
existing, well-studied survival models, the delicate nature of risk assessment in the healthcare
domain raises discussions on several ethical dimensions. First, at its core, survival analysis
studies the probability outcomes of events over time. In the medical field, the results of SA
models may influence decision-making and treatment priorities. The potential prioritization of
patients based solely on statistical outcomes may lead to short-sighted decisions. Therefore, the
outcomes of survival models should be used as suggestions for domain experts who must take
actions based on several real-world factors that may inevitably not be captured by statistical
models.</p>
      <p>Second, the use of patient data must undergo consent and transparency. Especially in the
healthcare domain, where data are sensitive and privacy-protected, it is of utmost importance
to ensure that the rights of individuals and data owners are respected. In this study, we utilized
publicly available survival datasets that are commonly used to benchmark survival techniques.</p>
      <p>In conclusion, while our focus specifically addresses a technical aspect of survival models, we
recognize the broader impact of survival analysis. Our hope is that by enhancing the reliability
of these models, we contribute to a more ethical and fair healthcare landscape where statistical
predictions serve as one tool among many, to aid judgments of medical professionals.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>This project has been supported by AI-SPRINT: AI in Secure Privacy-pReserving computINg
conTinuum (European Union H2020 grant agreement No. 101016577) and FAIR: Future Artificial
Intelligence Research (NextGenerationEU, PNRR-PE-AI scheme, M4C2, investment 1.3, line on
Artificial Intelligence).
0
0
5AS IBS0.20
H
W
Baseline Hazard Anchors
5</p>
      <p>50
Anchors
500
5
500</p>
      <p>5
50
Anchors</p>
      <p>50
Anchors</p>
      <p>500
(a) IBS results.</p>
      <p>DeepSurv</p>
      <p>DeepHit</p>
      <p>Logistic Hazard</p>
      <p>N-MTLR
Step BWD
Step FWD
Linear
PWE
Spline
Step BWD
Step FWD
Linear
PWE
Spline
Step BWD
Step FWD
Linear
PWE
Spline
Step BWD
Step FWD
Linear
PWE
Spline
the better; for the Cumulative AUC, the higher the better. Columns correspond to survival models,
while rows correspond to survival datasets. Results are averaged over 30 runs.
Journal of Machine Learning Research 21 (2020) 1–6. URL: http://jmlr.org/papers/v21/
20-729.html.
[17] J. M. Robins, A. Rotnitzky, Recovery of information and adjustment for dependent
censoring using surrogate markers, in: AIDS epidemiology, Springer, 1992, pp. 297–331.
[18] C.-N. Yu, R. Greiner, H.-C. Lin, V. Baracos, Learning patient-specific cancer survival
distributions as a sequence of dependent regressors, Advances in neural information
processing systems 24 (2011).
[19] S. Fotso, Deep neural networks for survival analysis based on a multi-task framework,
arXiv preprint arXiv:1801.05512 (2018).
[20] M. F. Gensheimer, B. Narasimhan, A scalable discrete-time survival model for neural
networks, PeerJ 7 (2019) e6257.
[21] F. N. Fritsch, R. E. Carlson, Monotone piecewise cubic interpolation, SIAM Journal on</p>
      <p>Numerical Analysis 17 (1980) 238–246. URL: http://www.jstor.org/stable/2156610.
[22] A. Archetti, E. Lomurno, F. Lattari, A. Martin, M. Matteucci, Heterogeneous datasets for
federated survival analysis simulation, in: Companion of the 2023 ACM/SPEC International
Conference on Performance Engineering, ICPE ’23 Companion, Association for Computing
Machinery, New York, NY, USA, 2023, p. 173–180. URL: https://doi.org/10.1145/3578245.
3584935. doi:10.1145/3578245.3584935.
[23] D. W. Hosmer, S. Lemeshow, S. May, Applied Survival Analysis: Regression Modeling of
Time-to-Event Data, Wiley Series in Probability and Statistics, John Wiley &amp; Sons, Inc.,
Hoboken, NJ, USA, 2008. URL: http://doi.wiley.com/10.1002/9780470258019. doi:10.1002/
9780470258019.
[24] M. Schumacher, G. Bastert, H. Bojar, K. Hübner, M. Olschewski, W. Sauerbrei, C. Schmoor,
C. Beyerle, R. Neumann, H. Rauschecker, Randomized 2 x 2 trial evaluating hormonal
treatment and the duration of chemotherapy in node-positive breast cancer patients.
german breast cancer study group., Journal of Clinical Oncology 12 (1994) 2086–2093.
[25] B. Pereira, S.-F. Chin, O. M. Rueda, H.-K. M. Vollan, E. Provenzano, H. A. Bardwell, M. Pugh,
L. Jones, R. Russell, S.-J. Sammut, D. W. Y. Tsui, B. Liu, S.-J. Dawson, J. Abraham, H. Northen,
J. F. Peden, A. Mukherjee, G. Turashvili, A. R. Green, S. McKinney, A. Oloumi, S. Shah,
N. Rosenfeld, L. Murphy, D. R. Bentley, I. O. Ellis, A. Purushotham, S. E. Pinder,
A.L. Børresen-Dale, H. M. Earl, P. D. Pharoah, M. T. Ross, S. Aparicio, C. Caldas, The
somatic mutation profiles of 2,433 breast cancers refine their genomic and transcriptomic
landscapes, Nature Communications 7 (2016) 11479. URL: https://www.nature.com/articles/
ncomms11479. doi:10.1038/ncomms11479.
[26] J. Ogier du Terrail, S.-S. Ayed, E. Cyfers, F. Grimberg, C. He, R. Loeb, P.
Mangold, T. Marchand, O. Marfoq, E. Mushtaq, B. Muzellec, C. Philippenko, S. Silva,
M. Teleńczuk, S. Albarqouni, S. Avestimehr, A. Bellet, A. Dieuleveut, M. Jaggi, S. P.
Karimireddy, M. Lorenzi, G. Neglia, M. Tommasi, M. Andreux, Flamby: Datasets
and benchmarks for cross-silo federated learning in realistic healthcare settings,
in: S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh (Eds.),
Advances in Neural Information Processing Systems, volume 35, Curran Associates, Inc.,
2022, pp. 5315–5334. URL: https://proceedings.neurips.cc/paper_files/paper/2022/file/
232eee8ef411a0a316efa298d7be3c2b-Paper-Datasets_and_Benchmarks.pdf.</p>
    </sec>
    <sec id="sec-10">
      <title>Detailed</title>
    </sec>
    <sec id="sec-11">
      <title>Numerical Results</title>
      <p>This section presents all the numerical results obtained throughout our experiments. For each
dataset, we list three tables, each corresponding to the IBS, Cumulative
AUC, and</p>
      <sec id="sec-11-1">
        <title>C-Index</title>
        <p>metrics, respectively. In particular, the table-dataset correspondence is as follows:
• WHAS500 dataset: Table 3 (IBS), Table 4 (Cumulative AUC), and Table 5 (C-Index).
•
•</p>
        <p>GBSG2 dataset: Table 6 (IBS), Table 7 (Cumulative AUC), and Table 8 (C-Index).</p>
        <p>METABRIC dataset: Table 9 (IBS), Table 10 (Cumulative AUC), and Table 11 (C-Index).
• TCGA-BRCA dataset: Table 12 (IBS), Table 13 (Cumulative AUC), and Table 14 (C-Index).</p>
        <sec id="sec-11-1-1">
          <title>A.1. WHAS500 dataset</title>
          <p>IBS results on the WHAS500 dataset. Values are averaged over 30 runs and scaled up by a factor of 100
for better readability.</p>
          <p>Anchors</p>
          <p>Step BWD</p>
          <p>Step FWD</p>
        </sec>
        <sec id="sec-11-1-2">
          <title>A.2. GBSG2 dataset</title>
        </sec>
        <sec id="sec-11-1-3">
          <title>METABRIC dataset</title>
        </sec>
        <sec id="sec-11-1-4">
          <title>A.4. TCGA-BRCA dataset</title>
          <p>PWE</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. K.</given-names>
            <surname>Reddy</surname>
          </string-name>
          ,
          <article-title>Machine learning for survival analysis: A survey, ACM Computing Surveys (CSUR) 51 (</article-title>
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Katzman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Shaham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cloninger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kluger</surname>
          </string-name>
          ,
          <article-title>Deepsurv: personalized treatment recommender system using a cox proportional hazards deep neural network</article-title>
          ,
          <source>BMC medical research methodology</source>
          <volume>18</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yoon</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. Van Der Schaar</surname>
          </string-name>
          ,
          <article-title>Deephit: A deep learning approach to survival analysis with competing risks</article-title>
          ,
          <source>in: Proceedings of the AAAI conference on artificial intelligence</source>
          , volume
          <volume>32</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kvamme</surname>
          </string-name>
          , Ø. Borgan,
          <string-name>
            <surname>I. Scheel</surname>
          </string-name>
          ,
          <article-title>Time-to-event prediction with neural networks and cox regression</article-title>
          , arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>00825</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kvamme</surname>
          </string-name>
          , Ø. Borgan,
          <article-title>Continuous and discrete-time survival prediction with neural networks</article-title>
          ,
          <source>Lifetime Data Analysis</source>
          <volume>27</volume>
          (
          <year>2021</year>
          )
          <fpage>710</fpage>
          -
          <lpage>736</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wiegrebe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kopper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sonabend</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bender</surname>
          </string-name>
          ,
          <article-title>Deep learning for survival analysis: A review</article-title>
          ,
          <source>arXiv preprint arXiv:2305.14961</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ishwaran</surname>
          </string-name>
          , U. B.
          <string-name>
            <surname>Kogalur</surname>
            ,
            <given-names>E. H.</given-names>
          </string-name>
          <string-name>
            <surname>Blackstone</surname>
            ,
            <given-names>M. S.</given-names>
          </string-name>
          <string-name>
            <surname>Lauer</surname>
          </string-name>
          , Random survival forests,
          <source>The annals of applied statistics 2</source>
          (
          <year>2008</year>
          )
          <fpage>841</fpage>
          -
          <lpage>860</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Archetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matteucci</surname>
          </string-name>
          , Federated Survival Forests, in: 2023
          <source>International Joint Conference on Neural Networks (IJCNN2023)</source>
          ,
          <source>IEEE (in press)</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Archetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matteucci</surname>
          </string-name>
          ,
          <article-title>Scaling survival analysis in healthcare with federated survival forests: A comparative study on heart failure and breast cancer genomics</article-title>
          ,
          <source>Future Generation Computer Systems</source>
          <volume>149</volume>
          (
          <year>2023</year>
          )
          <fpage>343</fpage>
          -
          <lpage>358</lpage>
          . URL: https://www.sciencedirect.com/ science/article/pii/S0167739X23002935. doi:https://doi.org/10.1016/j.future.
          <year>2023</year>
          .
          <volume>07</volume>
          .036.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bender</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rügamer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scheipl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bischl</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>general machine learning framework for survival analysis</article-title>
          ,
          <source>in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases</source>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>158</fpage>
          -
          <lpage>173</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Meier</surname>
          </string-name>
          ,
          <article-title>Nonparametric estimation from incomplete observations</article-title>
          ,
          <source>Journal of the American statistical association 53</source>
          (
          <year>1958</year>
          )
          <fpage>457</fpage>
          -
          <lpage>481</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Cox</surname>
          </string-name>
          ,
          <article-title>Regression models and life-tables</article-title>
          ,
          <source>Journal of the Royal Statistical Society. Series B (Methodological) 34</source>
          (
          <year>1972</year>
          )
          <fpage>187</fpage>
          -
          <lpage>220</lpage>
          . URL: http://www.jstor.org/stable/2985181.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hothorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bühlmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dudoit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Molinaro</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. J. Van Der Laan</surname>
          </string-name>
          , Survival ensembles,
          <source>Biostatistics</source>
          <volume>7</volume>
          (
          <year>2005</year>
          )
          <fpage>355</fpage>
          -
          <lpage>373</lpage>
          . URL: https://doi.org/10.1093/biostatistics/kxj011. doi:
          <volume>10</volume>
          .1093/biostatistics/ kxj011. arXiv:https://academic.oup.com/biostatistics/articlepdf/7/3/355/690076/kxj011.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Uno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Pencina</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. B. D'Agostino</surname>
            ,
            <given-names>L.-J.</given-names>
          </string-name>
          <string-name>
            <surname>Wei</surname>
          </string-name>
          ,
          <article-title>On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data</article-title>
          ,
          <source>Statistics in medicine 30</source>
          (
          <year>2011</year>
          )
          <fpage>1105</fpage>
          -
          <lpage>1117</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>E.</given-names>
            <surname>Graf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schmoor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Sauerbrei</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Schumacher, Assessment and comparison of prognostic classification schemes for survival data</article-title>
          ,
          <source>Statistics in medicine 18</source>
          (
          <year>1999</year>
          )
          <fpage>2529</fpage>
          -
          <lpage>2545</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pölsterl</surname>
          </string-name>
          , scikit
          <article-title>-survival: A library for time-to-event analysis built on top of scikit-learn,</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>