<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Active Learning for Survival Analysis with Incrementally Disclosed Label Information</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Klest Dedja</string-name>
          <email>klest.dedja@kuleuven.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Felipe Kenji Nakano</string-name>
          <email>felipekenji.nakano@kuleuven.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Celine Vens</string-name>
          <email>celine.vens@kuleuven.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>KU Leuven - Department of Public Health and Primary Care</institution>
          ,
          <addr-line>Etienne Sabbelaan 53, 8500 Kortirjk</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>imec - ITEC research group</institution>
          ,
          <addr-line>Etienne Sabbelaan 51, 8500 Kortirjk</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <fpage>46</fpage>
      <lpage>64</lpage>
      <abstract>
        <p>Our study introduces a novel, generalised active learning framework for survival analysis, where we challenge the assumption that the oracle possesses full information about the time-to-event at any stage. Given this generalisation, we allow for querying the same instance multiple times, leading to an approach never employed in survival analysis to our knowledge. A central component of our contribution lies in the modification of the underlying time-to-event prediction model, the Random Survival Forest in our case, enabling it to accommodate partial information on 'unseen' data during the active learning phase and thus efectively estimate the conditional risk of a given instance. This adaptation is relevant in scenarios where instances are equipped with partial information. Furthermore, we adapt and introduce new sampling strategies that align with the novel survival analysis framework, thereby ensuring their compatibility with the conditional predictions output by the model. This research ofers a comprehensive, novel approach to active learning for survival analysis, setting a solid foundation for further developments in this under-explored intersection of the fields of active learning and survival analysis. Finally, our work not only expands the capabilities of both fields, but also sets the stage for their collaborative use in real-world longitudinal time-to-event studies, where new information typically emerges over time.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        its performance.
nEvelop-O
(C. Vens)
Active Learning (AL) is a field of machine learning that focuses on algorithms that can
interactively query an oracle (usually a human expert) to acquire the labels of the most informative
unlabelled training examples, with the aim of improving predictive models’ performance in
the presence of a pool of unlabelled examples [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and maximising model performance with
the least amount of training labels. These algorithms bypass the need for exhaustive labelling
and provide therefore a cost-efective solution to complex problems. AL is especially beneficial
when unlabelled data are abundant, but manual labelling is costly or time-consuming. Initially,
a base model is trained on the set of labelled data; the model is then used to predict unlabelled
data instances and select which ones should be labelled based on an informativeness criterion.
These newly labelled instances are then used to update the model, thus iteratively improving
      </p>
      <p>
        AL has been broadly employed in regression and in classification tasks, such as image
recognition [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or text classification [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and these frameworks have been covered by a rich
CEUR
literature summarised in reviews such as [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. Other research aspects of AL such as budget
constraints [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ] and estimation of learning curves [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] have also extensively been analysed. On
the other side, little research has been done on applying AL techniques to Survival Analysis
tasks.
      </p>
      <p>Survival Analysis is a branch of statistics whose goal is to predict the time until an event
occurs. A key challenge tackled within this framework is the so called censoring efect, that is
when the exact time of the event is not observed, leaving the researcher with partial, or weakly
supervised, information. Censoring is a common issue in healthcare applications such as in
clinical studies, where patients may drop out of the study before their event occurred (e.g., the
occurrence of a disease complication, hospital discharge, or death), or where patients may not
have not reached the event by the termination of the study.</p>
      <p>
        Survival Analysis (SA) has only recently been explored in the machine learning community [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
and little work has been done in the intersection between SA and AL. The most notable examples
of this are [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] respectively. The former provides an AL-based survival model which
uses (regularised) Cox proportional hazard model [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] in combination with a novel model
discriminative gradient-based sampling scheme. Meanwhile, the latter combines a deep learning
approach to SA and proposes a novel AL sampling technique for such framework. However,
both methods operate under the assumption that the time-to-event labels, once accessed during
the query stage, are immutable and cannot be updated in subsequent queries.
      </p>
      <p>
        This assumption, however, does not always hold in reality as noted in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], as it does not take
into consideration label delay, a common factor to keep in mind when dealing with longitudinal
studies. We therefore challenge this assumption and do not assume that full information is
revealed by the oracle, we assume instead that an unknown, random amount of information is
provided by the oracle, and we open the possibility to query the same instance multiple times to
obtain more updated information. The relaxation of such assumptions is particularly relevant
in real-world applications such as clinical trials, where patients or volunteers can be called
multiple times to record the occurrence of a certain event, but for which budget constraints
limit the possibility of follow-up.
      </p>
      <p>
        To align with the assumption, existing SA models need to be adjusted to consider partial
information during the querying phase. To this end, we have modified the Random Survival Forest
(RSF) algorithm [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] to handle this type of information. Moreover, we have revised two existing
AL sampling strategies to fit the SA approach, while also introducing two novel strategies; we
ensured the compatibility of these adjustments with the broader proposed framework.
      </p>
      <p>We tested the validity of the proposed framework on several datasets, both real-world and
synthetically generated, and showed that relatively simple sampling strategies can outperform
random sampling.</p>
      <sec id="sec-1-1">
        <title>1.1. Contributions</title>
        <p>To summarise, the main contributions of our paper are the following:
• We propose a generalised framework for AL applied to SA, where we drop the common,
underlying assumption that the oracle has access to full information about the
timeto-event at any given stage. To our knowledge, this is the first study that includes the
possibility of querying the same instance multiple times.
• We adapt the widely used time-to-event RSF model to handle partial information during
the querying phase so that it can efectively predict the conditional risk of a given
instance. This adaptation becomes particularly relevant in the light of the previously listed
contribution, where instances with partial information are created.
• We adapt and develop new sampling strategies to be integrated with the novel SA
framework, and we also ensure that these strategies are compatible with the conditional
predictions made by the framework.
• We evaluate the eficacy of the adapted conditional model in conjunction with various
sampling strategies, including both revised existing methods and novel ones, across
multiple datasets under our newly proposed framework.</p>
        <p>The code and datasets used in this paper can be accessed at: https://github.com/Klest94/
AL-SA-paper-material/.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <sec id="sec-2-1">
        <title>2.1. Survival Analysis</title>
        <p>This section outlines two key areas that underpin our work: SA and AL. Following these, we
will discuss related work in a separate subsection.</p>
        <p>A SA set-up is able to make unbiased estimates of time-to-event predictions in the presence of
censoring. In most of the cases, the censoring leads to a lower estimate of the true time-to-event
only, called right-censoring. With a right-censoring scenario, there are two possible outcomes:
either the true time-to-event is observed, or a censoring event happens before the real event: the
instance’s event falls outside of the observation window and the observer has no information
on the event time from that moment onward.</p>
        <p>Mathematically speaking, the information on the event of interest for an instance  can be
represented as a tuple ( , ) , where  corresponds to the time of the event, and  corresponds
to the observation’s status:  = 1 stands for the event of interest being efectively observed at
time  , while  = 0 stands for an event being censored as it reaches the end of the observation
window at time  .</p>
        <p>The goal of a SA study is to estimate such time-to-event at a population or at an individual
level given a set of individual information. The typical outcome is a function over time called
survival function (|  ) which indicates the probability of not experiencing the event (in other
words, ‘surviving’, hence the name) until at least time  for individual   . In other words:
( |   ) = ℙ( &gt;  for instance   )
(1)
The conditional over the instance   is omitted from now on, as the same notation can be used
to describe the survival distribution both at an individual and at a population level.</p>
        <p>Another useful concept in SA is the hazard function, defined as the the conditional probability
of an event happening within an infinitesimal time interval Δ , divided by the width of that
interval:
() =
lim ℙ( ≤  &lt;  + Δ |  ≥ )
Δ→0 Δ ⋅ ()
= −
 ′()
()
.</p>
        <p>(2)
Its integral form Λ() = ∫ ()</p>
        <p>, is called cumulative hazard function (CHF) and is also widely
used.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Active Learning</title>
        <p>The AL framework relies on a iterated cycle of querying, labelling, and updating. Initially, there
are two pools of data: the labelled set  and the unlabelled set  , and the learning model ℳ is
trained on the labelled pool. Following the initialisation step, the AL framework moves into the
querying phase. Here, the learner selects specific instances from the unlabelled pool, based on
a pre-defined querying strategy. These instances are labelled by an oracle, usually a human
expert, and added to the labelled dataset. The model is then updated or re-trained using the
expanded labelled dataset, improving its predictive capabilities, and the cycle repeats, enhancing
the model’s performance with each iteration. The query strategy plays a key role in the AL
framework, as a good sampling strategy can increase the eficiency of the model by reducing
the amount of necessary data while simultaneously improving the performance of the model.</p>
        <p>
          Various sampling strategies have been developed to select the most informative samples from
the unlabelled pool. The simplest strategy is uncertainty sampling, which involves querying the
instances about which the current model is most uncertain. This strategy is intuitively appealing
because it directly reduces the model’s uncertainty [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. However, it can lead to excessive
querying near the decision boundaries, while ignoring potentially informative instances farther
away [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Another common strategy is query-by-committee (QBC), in which several models (the
‘committee’) are trained on the current labelled set. Each committee member votes on the labels
of unlabelled instances, and the instances with the most disagreement are queried [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. QBC
leverages the wisdom of the crowd to mitigate the risk of querying uninformative instances.
        </p>
        <p>
          Moreover, several hybrid strategies have been proposed, and common approaches consist of
strategies that combine the aforementioned ones with a diversity-driven sampling strategy, or
that take into account the estimated informativeness of the candidate sample. Finally, several
sampling strategies have been designed so to consider the potential influence of instances on
the model’s future performance, referred to as expected model change [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], expected error
reduction [18], and variance reduction [19].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Related work</title>
        <p>
          As mentioned before, the intersection of AL and SA is a relatively unexplored field, although
some studies have emerged recently. The first example is represented by the work of [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ],
where the authors introduce a novel sampling strategy in AL to iteratively improve a deep
learning model. More specifically, the authors propose a Convolutional Neural Network to learn
a meaningful feature representation and train a Cox regression model on top of it. As a next
step, a sampling strategy based on the expected performance change of the model is proposed
for the AL framework. The authors use an external data source as the oracle for labelling the
instances during the querying phase. More specifically, the expected time-to event from life
a result, fully observed event tuples (  , 1) are provided to the queried instances during the
labelling phase.
        </p>
        <p>
          The second example is represented by [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], where an Active Regularised Cox regression
framework is introduced, and combines AL with Cox regression models. The authors, starting
from the log-likelihood of regularised Cox regressions, derive a framework that makes use
of gradient-based sampling strategy to select instances for labelling during the AL process.
Unlike [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], there is no assumption that the queried labels include full information on the
time-to-event, and tuples of the format (  ,   ) are provided by the oracle upon querying request.
        </p>
        <p>However, neither of these studies takes into consideration the longitudinal nature of
timeto-event data. More specifically, these studies do not take into account the fact that in many
real-world situations the information of an instance’s label also depends on the time of querying,
and that more information can be available at later time points.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Framework</title>
      <p>To counter the aforementioned limitations, we propose a novel AL framework that includes the
possibility to query the same instance multiple times for time to event data. In other words, a
given instance in  can be queried multiple times until all available information is extracted.
That is, instances are queried until the real time of the event is observed, with (  , 1), or when a
terminal censoring factor is encountered, with (  , 0).</p>
      <p>Our proposed framework represents a more general set-up, expanding in two main directions:
• on the information revealing side: both partial information as well as full information
can be revealed by the oracle. In the first setting, the oracle knows that the event has not
been reached yet and can provide updated information about the censoring time. In the
second setting, the oracle confirms that the event has taken place and provides the event
time, or confirms that the instance has been lost to follow-up or has reached the study
end, and provides the final censoring time.
• on the inclusion criteria for instances to be part of the pool  set: both instances with no
label information and instances with partial label information are included1. The former
category includes instances for which no information regarding event time is available.
The latter includes instances for which the oracle already revealed partial information
before, e.g. that the event was not reached yet at a particular point in time.</p>
      <p>A visual representation is given in Table 1, where we compare the scope of our framework
against other AL + SA set-ups. We refer to our framework as a multi-query AL.</p>
      <p>Certain modifications are required to implement our suggested framework. Firstly, the label
set now comprises triples (  ,   ,   ), where   and   represent the previously mentioned
timeto-event data, and where the variable   shows whether there is additional label information
available to be queried from the instance (  = 0) or if the instance’s data has been completely
‘exhausted’ (  = 1). This additional information is crucial because an instance can now be
queried multiple times, allowing it to simultaneously be part of the training set  and the pool
set  . Consequently, the sampling strategies must consider this new feature, linking each
1partial label information in the sense that more information can be queried from the oracle in the future. Not to be
confused with the partial information in the sense of censoring in a SA setting.
candidates
instance with a status  which marks whether additional information can be queried or whether
the instance has reached its exhaustion point.</p>
      <p>Upon their initial query, query instances are included in the training set  , and they are only
removed from the pool set  once exhausted. It is important to note that although the model
ℳ is continuously aware (through the oracle) of whether an instance has been exhausted, it
remains uninformed about the total data quantity that can be revealed by the oracle in each
step.</p>
      <p>From the perspective of an instance   in the pool set  , the procedure looks as follows:
• The instance starts with no available information; this equals to being censored at time 0,
and the instance label corresponds to the tuple   = (0, 0, 0).
• When queried for the first time, the instance enters the training set  with the
corresponding revealed label ( ,1 ,  ,1 ,  ,1 ); if  ,1 = 0 then  ,1 = 0 automatically.
• In general, when queried for the  -th time, the label ( ,−1 , 0, 0) is updated to ( , ,  , ,  , ),
and if  , = 1 the instance is dropped from  .</p>
      <p>
        To handle partial information, our proposed framework necessitates adaptations of the
underlying model ℳ. We therefore introduce a conditional adaptation of the Random Survival
Forest (RSF) model [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The choice of RSF, a time-to-event adaptation of the established Random
Forest algorithm by [20], is based on the proven eficacy of RF classifiers in AL scenarios,
particularly in handling unbalanced classification tasks [ 21], coupled with the scarcity of
literature concerning applications of RSF in AL scenarios.
      </p>
      <p>Our RSF adaptation updates the predictions initially made by a standard RSF model when
faced with partial label information. These updated predictions are then used in querying phase
of the AL framework. Further, our framework unlocks the ability of the model ℳ and the
sampling strategy  to leverage the partial information inherent in the samples that belong to
both  and  . Specifically, when forming the query set  to be submitted to the oracle, the
model ℳ must be capable of estimating survival curves (̂) from previously unseen samples in
 , and concurrently estimate curves conditionally, based on the status of a given sample up
to time   and its corresponding label   = (  , 0, 0) in  . This requirement can be fulfilled by
updating the prediction (̂) of a standard survival model ℳ as follows:
(̃) = (̂ |  ≥   ) = {
(̂)/ (̂  ), for  ≥  
1, for  &lt;</p>
      <p>This method corresponds closely with the approach delineated in [22], where predictions are
updated based on the conditional survival function and the definition of conditional probabilities.</p>
      <p>In the context of a multi-query framework where it is possible to access partial label
information, this assumption proves particularly useful as it allows us to refine our estimate of the time
to the event. Specifically, if we know that the event hasn’t occurred by time   , we can update
our estimate based on this knowledge. Similarly, when considering the conditional cumulative
hazard function, we obtain the updated estimate as:
Λ̃ () = Λ̂ ( |  ≥   ) = {
Λ̂ () − Λ̂ (  ), for  ≥  
0,
for  &lt;  
It is worth noting that such updates are performed only on queried labels of the form (  , 0, 0),
as instances that are fully exhausted are dropped from  and do not need to be updated. In this
paper, unless stated otherwise, we will refer to our novel, conditional RSF model as ℳ.</p>
      <p>At each iteration step, a sampling strategy identifies the top  instances from  , predicted
to be the most informative based on a specific heuristic  . A detailed discussion on this
implementation is provided in Section 3.1. The instances that are deemed as the most informative
are then queried to the oracle. During the reveal procedure, the oracle discloses some amount of
information on the corresponding label. If the instance was previously unseen, it is incorporated
into the training set  with the corresponding label, following a typical AL framework. However,
an instance is only removed from the pool set  once all the available information has been
fully disclosed.</p>
      <p>The pseudo-code of our multi-query, AL framework for SA is presented in Algorithm 1.</p>
      <sec id="sec-3-1">
        <title>3.1. Sampling strategies</title>
        <p>We select a range of sampling strategies, some previously suggested for AL in binary
classification tasks and then adapted to align with a SA setting, others that are proposed by our work for
the first time. In choosing the sampling strategies, given the exploratory nature of our paper,
we favoured strategies that are vastly established, simple to implement, and easy to replicate.
We examine their eficacy within the multi-query framework and compare them against their
density-weighted counterparts. The specific sampling strategies under comparison in our study
include:</p>
        <p>Random. Consists of randomly sampling of  instances from the  at each iteration (also
known as ‘passive learning’). This strategy serves as a baseline for the next sampling strategies,
and can perform reasonably well as noted by [23, 24].</p>
        <p>Uncertainty-based. This strategy involves querying instances whose predictions are closest
to the population average in the training set:  ̄ . This method is analogous to margin-based</p>
        <sec id="sec-3-1-1">
          <title>Algorithm 1 Our proposed multi-query AL framework for SA</title>
          <p>Require:  , 
Require: Sampling strategy 
Require:  rounds, batch size 
1: ℳ( )
2: for  = 1 to  do
3:
4:
5:
6:
7:
8:
9:
10:
11:
12: end for
 
 
  
 ←  (ℳ,  )
, with  ⊆ 
and || = 
Update labels in  through the oracle
= {(  ,   ) ∶   ∈  ∩   ∉  }
= {(  ,   ) ∶   ∈  ∩   ∈  }</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Update</title>
          <p>Update  =  ∪ 
ℳ = ℳ( )
 =  ⧵ 
 

▷ Initialise training set and pool set
▷</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>Define a sampling strategy</title>
          <p>▷ Usually constrained by the budget
▷ Fit initial model
▷</p>
        </sec>
        <sec id="sec-3-1-4">
          <title>Select top  instances according to</title>
          <p>▷ Identify unseen (new) samples
▷ Identify samples used in previous iterations
▷</p>
          <p>Update existing labels with the new labels in  
▷ Add the new, previously unseen labels to 
▷</p>
        </sec>
        <sec id="sec-3-1-5">
          <title>Retrain model ℳ</title>
          <p>
            ▷ Drop exhausted instances from pool set
= (  ,   ) ∶ (  ∈ ) ∩ (  = 1)} ▷ Identify exhausted labels and relative instances
sampling in classification contexts [
            <xref ref-type="bibr" rid="ref1">1, 25</xref>
            ], where instances nearest to the decision boundary are
prioritised for querying:
          </p>
          <p>∗ = argmax|ℳ() −  ̄ | ∶= argmax  ()
Given that there is no decision boundary in SA tasks, the first instances to be sampled are those
predicted to have a survival rate closest to the population average.</p>
          <p>Variance-based. This strategy is a novel approach where instances with the highest
standard deviation (or equivalently, variance) in their predicted rank are prioritised for sampling.
Uniquely, the standard deviation is calculated among the predicted ranks of individual decision
tree learners that make up ℳ.</p>
          <p>Belonging to the Query by Committee (QBC) family, this strategy benefits from directly
leveraging the ensemble nature of the Random Survival Forest (RSF) model. The sampling
criterion can be expressed as follows:




 ∗ = argmax √V ( () ) ∶= argmax  ()
where  () = { rank (,  )
for  ∈ ℳ}
Here, rank (,  )
learner  ∈ ℳ</p>
          <p>
            is a function that outputs the rank of the prediction  =̂ (  ) of a single
, compared to the predictions made by of the ensemble ℳ on the instances
in the training set  . Note that we are focusing on the standard deviation of the predicted
ranks, rather than the predicted risk itself. This focus arises from the consideration that in
time-to-event scenarios, the relative ranking typically bears more meaningful insights than the
absolute predicted risk [
            <xref ref-type="bibr" rid="ref12">12, 26</xref>
            ].
          </p>
          <p>Uncertainty+density. Is a hybrid approach that combines the uncertainty-based approach
with a density-based score. We compute the uncertainty measure as defined in Equation (5)
(5)
(6)
(7)
(8)
and min-max normalise it across the candidates of the pool set to get  (̄) . Next, we add the
min-max normalised abnormality score ( )̄</p>
          <p>of each candidate as computed by an Isolation
Forest[27] algorithm, weighted by a factor  . In formulas, the sampling strategy follows the
following criterion:</p>
          <p>
            ∗ = argmax ( (̄) +  ( )̄ )
In our experiments we set  = 1 , similarly to other approaches in literature [
            <xref ref-type="bibr" rid="ref4">4, 28</xref>
            ].
          </p>
          <p>Variance+density. Is another novel, hybrid approach that combines the variance-based
approach introduced above with a density-based score. The computation of the standard deviation
in the predicted ranks is computed as in Equation (6), and the correction by normalisation and
abnormality score is carried in the same fashion as in the previous approach, leading to  (̄) .
 ∗ = argmax ( (̄) +  ( )̄ ) ,

where we again set  = 1 .</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation set-up</title>
      <p>In order to test the proposed framework, we make use of 69 publicly available time-to-event
datasets available in the SurvSet Python library, the NHANES [29] dataset, and two synthetically
generated datasets with a partial dependence between the real time-to-event and censoring, and
with difering degrees of noise (more details in Appendix
B). Out of this initial set, we select for
further analysis only the datasets that meet the following criteria:
• The number of samples  is at least 500: there is little labelling cost to be saved with AL
with smaller datasets.
• The (5-fold cross validated) performance of the initial training data  0 has a concordance
index of at least 0.65. Having models with good performance  0 is an implicit assumption
for most AL frameworks [30].
• The 5-fold cross validated performance obtained by fully labelling the pool set and adding
it to the training set is at least  0 + 0.02. Smaller gains of performance make the AL
framework less relevant.</p>
      <p>Despite the relatively mild selection criteria (particularly the second), only a few datasets
meet these requirements. The datasets that do meet these standards are: rott2, vlbw, grace,
NHANES, UnempDur, Framingham, and the two synthetic datasets. More details regarding
these datasets are provided in Table 2. Information related to the creation of the synthetic
datasets is available in the Appendix.</p>
      <p>
        For the experimental setup, we configure the initial training set  to be 5% of the total
training size. The remaining labels are completely masked to form the pool set  . We employ a
batch-mode AL approach as in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we define a fixed batch size
 and select the best  queries
according to the current querying strategy. Here,  is set to be 1% of the initial pool set size  .
      </p>
      <p>The decision to use a constant relative batch size stems from our aim to make cross-dataset
comparisons more uniform. By maintaining a proportionate batch size across diferent datasets
we do not account for diversity within the batch, but we ensure that the AL process is not biased
by the absolute size of the dataset. This uniformity allows for more consistent evaluation across
datasets of varying sizes, thus providing more meaningful insights when comparing results
across datasets in Section 5. The number of iterations is restricted to  = 200 , in line with [31].</p>
      <p>In order to simulate a longitudinal AL scenario, we mimic conditions where the final label
(  ,   , 1) of an instance in  is not instantly available to the oracle when queried. Instead, a
random portion of the remaining information is provided, ranging from 20% to 100%. Specifically,
for an instance currently labelled as ( ,−1 , 0, 0) and with underlying final label being (  ,   , 1),
the label is updated upon querying as follows:
( , ,  , ,  , ) , where ⎨ ,
⎩ ,
⎧ ,
=  ,−1 +  ⋅ (  −  ,−1 )
= 1 ( = 1 ) ⋅  
= 1 ( = 1 )
(9)</p>
      <p>Here,  is a random variable that with equally likely outcomes in {0.2, 0.4, 0.6, 0.8, 1}. After a
query, the labels are updated in line with the above equation2. It is possible, alternatively, to
sample the amount of reveal time ( , −  ,−1 ) from a distribution bounded between 0 and   .</p>
      <p>It is important to underline that the model ℳ does not have access to the outcomes of the
random variable  nor any expectation about the quantity of information the oracle will disclose
upon being queried. The model has only access to the latest value of the labels   and to whether
more information can be queried (no terminal event or terminal censoring). This represents a
scenario where information such as the time since last sampling is not available, but where it
is possible to inquire whether the event of interest has (already) happened at query time. For
example, it is worth noting that in longitudinal studies such as clinical studies [32], time since
the last sampling might be available.</p>
      <p>Taking into account the random variable  , let’s consider  as the probability of a specific
outcome  =  . The number of repeated observations until the event  =  is observed is
denoted by an integer-valued random variable  , which follows a geometric distribution. The
2The amount of information revealed at the − th query for instance   ∈  is generated in advance and is equal
across sampling strategies.
expected value of  is expressed by: y
∞
=0
( ) =
∑ (1 − ) −1  =</p>
      <p>1 −  =0
∞
∑ (1 − )  =

1
Given that an instance can be queried until the event  = 1 occurs, and given that this happens
with probability  = 0.2 , we conclude that an instance is queried 5 times on average before
being dropped from  . In practice, given the stopping criterion, the average number of queries
per instance is lower, but should still follow an exponential behaviour (as confirmed by the
results in Appendix A).</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>We compare the proposed sampling strategies on the selected datasets with a standard 5-fold
cross validation set-up, where 80% of the data is placed in  and  , and 20% serves as a test
set to evaluate performance. We record the learning curve across the first 200 rounds and we
report the 5-fold average test performance for every learning step. More specifically, we report
the average concordance index across the 200 iterations. We also plot the performance obtained
by the RSF model in case where all training labels with full information are provided, this serves
as an upper benchmark for the framework.
(10)
noise; to the right, dataset with higher noise. The learning curves are also compared against the scenario
where all training instances are fully labelled, representing the ‘upper benchmark’. The fluctuations
within the curves can be attributed to the inherent randomness of the RSF algorithm.
sampling strategies generally outperform random sampling in the proposed multi-query AL
framework. In particular, the newly proposed variance+density strategy emerges, as well as
the (also newly proposed) variance-based strategy. These strategies, along with the
conventional uncertainty-based sampling strategy, require around 75 iterations to reach maximum
performance, as opposed to over 200 iterations required for the random sampling and the
uncertainty+density sampling strategy.</p>
      <p>
        Additionally, it is noteworthy that, for the first simulated dataset (as shown in Figure 1, left
plot), the performance of the uncertainty-based sampling strategy surpasses that of the upper
benchmark. We believe this can be attributed to the introduction of sampling bias, as discussed
by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. As all instances are progressively exhausted, we anticipate the performance to align
with the upper benchmark, mirroring the trends observed with the variance-based strategies
between iterations 100 and 200.
      </p>
      <p>Transitioning to real-world data that adheres to the criteria outlined in Section 4, Figure 2
showcases the results. In these datasets, a similar trend emerges with the density-corrected
variance-based strategy performing best overall. Even more, random sampling outperforms the
density+uncertainty sampling strategy in these instances.</p>
      <p>Similarly to the simulated datasets, the density correction leads to more eficient querying
for the variance-based strategy, but does not ofer any improvement for uncertainty sampling.
This could potentially be attributed to instances near the population average not necessarily
embodying uncertainty, but rather samples whose risk accurately matched with the population
average. It is also worth noting that uncertainty sampling has been reported to perform poorly
in tasks other than binary classification [ 33], a trend which we can somewhat corroborate for
SA tasks, especially when it comes to the density-corrected uncertainty sampling.</p>
      <p>In order to further investigate these outcomes, we also take a qualitative approach. This
involves calculating the area under the curve for each sampling strategy’s performance over
time until the last iteration round is reached. The underlying rationale is that more efective
sampling strategies will show faster learning and therefore a larger the area under the curve. In
practice, this translates to computing the average concordance index for each strategy across
all rounds, providing an insightful comparison on their relative performances. The results are
given in the Table 3.</p>
      <p>Our original findings are confirmed, with the newly proposed variance based strategies
performing well. Among these, the one that incorporates density correction exhibits the best
overall performance. The improvements observed in the density-corrected method suggest
that even in straightforward cases where  = 1 , incorporating density correction can increase
the sampling performance. Interestingly, in the context of uncertainty sampling, the density
correction seems to negatively impact performance. This could be attributed to the fact that
this correction does not rectify the fundamental issue of the strategy, which has the tendency
to sample not only instances that are dificult to predict, but also instances with a risk profile
truly close to the population average.</p>
      <p>Additionally, a relatively large standard deviation is observed across the folds of each dataset,
indicating that the prediction algorithm’s performance is sensitive to the specific data fold in
use. This variability in performance is amplified by the relatively low size of the datasets, and
we observe larger datasets exhibiting smaller standard deviations.</p>
      <p>
        Finally, we test the statistical significance of the observed average performances across
the diferent sampling strategies and we test them against the performance obtained by fully
labelling the training set (indicated as upper benchmark in Figures 1-2). We do so by conducting
a post-hoc Friedman-Nemenyi test, setting the significance level to 0.05, as recommended
in [
        <xref ref-type="bibr" rid="ref18">34</xref>
        ], and having rejected the hypothesis that all methods perform equally well with a p-value
 ≈ 2.5 ⋅ 10 −4. Results are visualised with a critical diference diagram shown in Figure 3, which
connects sampling strategies that are not statistically significantly diferent by a horizontal line
segment.
      </p>
      <p>Despite the limited sample size available for the test, the post-hoc test confirms the
aforementioned trends: the newly proposed variance based methods perform reasonably well, and
the ‘density+variance’ strategy significantly outperforms random sampling, whereas the
‘density+uncertainty’ approach is relegated to last place.</p>
      <p>To conclude, Figures 1-2 and Table 3 suggest that a performance comparable to the fully
labelled dataset is retrieved within 100 iterations, whereas for the current values of  = 0.01
and ( ) , around ( )/ = 500 iterations would be needed for a standard learning procedure.
This validates the idea of performing AL on a SA framework with label information evolving
over time in a longitudinal fashion; this is especially relevant given that not all studies have
been able to achieve significant gain over random sampling such as in [ 23, 24].</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Future work</title>
      <p>With this work, we lay the groundwork for a new framework within the field of AL for SA.
More specifically, we show that simple AL strategies can be efective in querying labels in a
multi-query set-up, where the oracle gradually reveals more information about a time to event.
Additionally, our novel sampling strategies based on the variance of the rankings are performing
better than uncertainty based strategies adapted from binary classification set-ups. In particular,
our density corrected variance based approach ‘density+variance’ is performing best overall.</p>
      <p>We also made modifications to the existing Random Survival Forests (RSF) model architecture.
This allows RSF to utilise partial information about labels during the prediction phase and it is
noteworthy that we assumed the model does not have prior knowledge about the amount of
information (partial or full) revealed during the querying phase. We have tested these scenarios
on several datasets, proving that simple AL strategies are eficient for this task and outperform
random sampling.</p>
      <p>
        Moving forward, our future work will include testing more state-of-the-art sampling strategies
based on expected error minimisation [18], hierarchical sampling [
        <xref ref-type="bibr" rid="ref19">35</xref>
        ], as well as taking into
account batch diversity [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and we will eventually evaluate such strategies on a larger number
of datasets. Another direction of interest lies in the analysis of the added value of the conditional
approach within RSF models. Furthermore, we plan to relax the assumptions imposed on the
oracle in Section 4. More specifically, we will examine the scenarios where the oracle is aware
of the elapsed rounds since the last instance query, or can estimate the amount of updated label
information ( , −  ,−1 ) for instances in the pool set.
      </p>
    </sec>
    <sec id="sec-7">
      <title>A. Querying analysis</title>
      <p>Here we provide more details on the distribution of the querying across instances. More
specifically, we examine two distinct datasets as a form of sanity check. The aim is to spot any
peculiarities in the querying pattern and ensure that our process are consistent with theoretical
expectations.</p>
      <p>We first consider vlbw (Figure 4), a relatively small dataset (617 instances), and interestingly,
all its instances are fully exhausted within 200 iterations. This is less than expected, as (at most)
500 iterations should be needed instead; the deviation can be explained due the small number of
instances under consideration. Finally, the distribution of the sampling across instances shows
an exponential decay, which is in line with Equation (10).</p>
      <p>We report the same analysis on the substantially larger (4699 instances) Framingham dataset,
in Figure 5. Here, not all instances are sampled until exhaustion, which is to be expected, and
the truncation efect generated by the stopping criterion can be observed. In particular, we
observe that a substantial amount of instances has never been sampled, whereas the instances
that have been sampled at least once follow an exponential decay as expected.</p>
      <p>Furthermore, once the last iteration is reached, it is worth noting that all considered strategies
show higher rates of fully exhausted instances compared to the random strategy (bottom
right plot), likely due to a tendency of the sampling strategies to repeatedly sample the most
informative instances until all information is revealed.</p>
      <p>The above patterns are representative of all considered datasets, with the remaining plots
being available in the public repository.</p>
    </sec>
    <sec id="sec-8">
      <title>B. Synthetic data generation</title>
      <p>In this Section, we provide more details on the generative process for the synthetic datasets. The
full procedure is available in the shared repository. The generative process consists of two parts:
the first one consists in generating the real times to event   (without censoring) for each instance;
the second part generates the censoring event   using some of the previous information to
simulate partial dependence. Once both events are generated, we build  = min(  ,   ) and
 = 1(  ≥   ) as usual.</p>
      <p>
        We start by sampling the covariates for  = ( 1, … ,  10) from a standard normal distribution,
and coeficients for  = ( 1, … ,  10) from a uniform distribution in [
        <xref ref-type="bibr" rid="ref2">−2, 2</xref>
        ]. Next, and interaction
efect is added between pairs ( 1,  2) and ( 3,  4) by mapping  1 →  1 ⋅  2 and  3 →  3 ⋅  4.
The dot product   =  ⋅  is computed and scaled within the interval [
        <xref ref-type="bibr" rid="ref5">−5, 5</xref>
        ] by means of a
sigmoid function. Noise  ∈ {0.01, 0.3} is added at this point3. More specifically, Gaussian noise
of scale  times the standard deviation of the sigmoid transformed  is added, so to control
the proportion of signal to noise. Given the (noisy) risk  ̃, we compute the hazard rate as
ℎ = 0.1 exp()̃ and sample the times to event from a Weibull distribution with shape  = 1.2
and scale  = 1/ℎ  , where time to event are expected to be larger for smaller values of  ̃.
      </p>
      <p>To simulate partial conditional censoring we sample 4 random covariates   and respective
  from the previous process, and add 6 new   and   as independent, external features for
censoring. The dot product   =  ⋅  , the added noise and the hazard rate ℎ are computed as
before, and the event times are sampled from a Weibull distribution with parameters  = 1.1
and  = 1/ℎ  respectively.
3simul data 1 has  = 0.01 , whereas simul data 2 has  = 0.3
46–64
51–60.
[18] N. Roy, A. McCallum, Toward optimal active learning through monte carlo estimation of
error reduction, Proceedings of the 18th international conference on Machine learning 2
(2001) 441–448.
[19] A. Kapoor, E. Horvitz, S. Basu, Selective supervision: Guiding supervised learning with
decision-theoretic active learning., in: Proceedings of the 20th International Joint
Conference on Artificial Intelligence, volume 7, 2007, pp. 877–882.
[20] L. Breiman, Random forests, Machine Learning 45 (2001) 5–32. URL: https://link.springer.</p>
      <p>com/article/10.1023/A:1010933404324. doi:1 0 . 1 0 2 3 / A : 1 0 1 0 9 3 3 4 0 4 3 2 4 .
[21] N. Bhosle, M. Kokare, Random forest-based active learning for content-based image
retrieval, Int. J. Intelligent Information and Database Systems 13 (2020) 72–88.
[22] S.-H. Jung, H. Y. Lee, S.-C. Chow, Statistical methods for conditional survival analysis,
Journal of Biopharmaceutical Statistics 28 (2018) 927–938. doi:1 0 . 1 0 8 0 / 1 0 5 4 3 4 0 6 . 2 0 1 7 .
1 4 0 5 0 1 2 .
[23] K. Tomanek, F. Laws, U. Hahn, H. Schütze, On proper unit selection in active learning:
co-selection efects for named entity recognition, in: Proceedings of the NAACL HLT 2009
Workshop on Active Learning for Natural Language Processing, 2009, pp. 9–17.
[24] B. C. Wallace, K. Small, C. E. Brodley, T. A. Trikalinos, Active learning for biomedical
citation screening, in: Proceedings of the 16th ACM SIGKDD international conference on
Knowledge discovery and data mining, 2010, pp. 173–182.
[25] P. Jain, S. Vijayanarasimhan, K. Grauman, Hashing hyperplane queries to near points with
applications to large-scale active learning, Advances in Neural Information Processing
Systems 23 (2010).
[26] F. E. Harrell, R. M. Calif, D. B. Pryor, K. L. Lee, R. A. Rosati, Evaluating the yield of medical
tests, Journal of the American Medical Association 247 (1982) 2543–2546.
[27] F. T. Liu, K. M. Ting, Z.-H. Zhou, Isolation forest, in: 2008 eighth ieee international
conference on data mining, IEEE, 2008, pp. 413–422.
[28] A. McCallum, K. Nigam, et al., Employing em and pool-based active learning for text
classification., in: Proceedings of the 15th international conference on Machine learning,
volume 98, Citeseer, 1998, pp. 350–358.
[29] Centers for Disease Control and Prevention (CDC), National Center for Health Statistics
(NCHS), National Health and Nutrition Examination Survey Data, NHANES III plan and
operations procedures manuals, 1997.
[30] P. Felt, E. K. Ringger, K. D. Seppi, K. Heal, R. Haertel, D. Lonsdale, First results in a
study evaluating pre-annotation and correction propagation for machine-assisted syriac
morphological analysis., in: 8th international conference on Language Resources and
Evaluation, 2012, pp. 878–885.
[31] K. Konyushkova, R. Sznitman, P. Fua, Learning active learning from data, Advances in
neural information processing systems 30 (2017).
[32] J. Gabrielsson, D. Weiner, Pharmacokinetic and pharmacodynamic data analysis: concepts
and applications, CRC press, 2001.
[33] A. Holub, P. Perona, M. C. Burl, Entropy-based active learning for object recognition, in:
2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Workshops, IEEE, 2008, pp. 1–8.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Settles</surname>
          </string-name>
          ,
          <article-title>Active learning</article-title>
          ,
          <source>Synthesis Lectures on Artificial Intelligence and Machine Learning</source>
          , Springer Cham,
          <year>2012</year>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 0 3 1 - 0 1 5 6 0 - 1</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Budd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. C.</given-names>
            <surname>Robinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kainz</surname>
          </string-name>
          ,
          <article-title>A survey on active learning and human-in-the-loop deep learning for medical image analysis</article-title>
          ,
          <source>Medical Image Analysis</source>
          <volume>71</volume>
          (
          <year>2021</year>
          ).
          <source>doi:1 0 . 1 0</source>
          <volume>1 6</volume>
          / j . m
          <source>e d i a . 2 0</source>
          <volume>2 1 . 1 0 2 0 6 2 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yuan</surname>
          </string-name>
          , H.-T. Lin,
          <string-name>
            <given-names>J.</given-names>
            <surname>Boyd-Graber</surname>
          </string-name>
          ,
          <article-title>Cold-start active learning through self-supervised language modeling</article-title>
          ,
          <source>ArXive</source>
          (
          <year>2020</year>
          ). URL: http://arxiv.org/abs/
          <year>2010</year>
          .09535.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Settles</surname>
          </string-name>
          ,
          <article-title>Active learning literature survey</article-title>
          ,
          <source>Technical Report</source>
          ,
          <year>2009</year>
          . URL: https://minds. wisconsin.edu/handle/1793/60660.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.-Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. B.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A survey of deep active learning</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>54</volume>
          (
          <year>2021</year>
          ). URL: https://doi.org/10.1145/3472291.
          <source>doi:1 0 . 1 1</source>
          <volume>4 5 / 3 4 7 2 2 9 1 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Hacohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dekel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Weinshall</surname>
          </string-name>
          ,
          <article-title>Active learning on a budget: Opposite strategies suit high and low budgets</article-title>
          ,
          <source>Proceedings of the 39th International Conference on Machine Learning</source>
          (
          <year>2022</year>
          ). URL: https://github.com/avihu111/TypiClust.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Chandra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. V.</given-names>
            <surname>Desai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Devaguptapu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. N.</given-names>
            <surname>Balasubramanian</surname>
          </string-name>
          ,
          <article-title>On initial pools for deep active learning</article-title>
          ,
          <source>in: NeurIPS 2020 Workshop on Pre-registration in Machine Learning, PMLR</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Viering</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Loog</surname>
          </string-name>
          ,
          <article-title>The shape of learning curves: A review</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>45</volume>
          (
          <year>2023</year>
          )
          <fpage>7799</fpage>
          -
          <lpage>7819</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>0</volume>
          <fpage>9</fpage>
          <string-name>
            <surname>/ T P A M I</surname>
          </string-name>
          .
          <volume>2 0 2 2 . 3 2 2 0 7 4 4 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. K.</given-names>
            <surname>Reddy</surname>
          </string-name>
          ,
          <article-title>Machine learning for survival analysis: A survey</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>51</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>39</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>4 5 / 3 2 1 4 3 0 6 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Vinzamuri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. K.</given-names>
            <surname>Reddy</surname>
          </string-name>
          ,
          <article-title>Active learning based survival regression for censored data</article-title>
          ,
          <source>Proceedings of the 2014 ACM International Conference on Information and Knowledge Management</source>
          (
          <year>2014</year>
          )
          <fpage>241</fpage>
          -
          <lpage>250</lpage>
          . URL: http://dx.doi.
          <source>org/10.1145/2661829.2662065. doi:1 0 . 1 1</source>
          <volume>4 5 / 2 6 6 1 8 2 9 . 2 6 6 2 0 6 5 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>M. Z. Nezhad</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Sadati</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>A deep active survival analysis approach for precision treatment recommendations: Application of prostate cancer</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>115</volume>
          (
          <year>2019</year>
          )
          <fpage>16</fpage>
          -
          <lpage>26</lpage>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>1 6 /</volume>
          <string-name>
            <surname>J . E S W</surname>
          </string-name>
          <article-title>A . 2 0 1 8 . 0 7 . 0 7 0</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Cox</surname>
          </string-name>
          ,
          <article-title>Regression models and life-tables</article-title>
          ,
          <source>Journal of the Royal Statistical Society: Series B (Methodological) 34</source>
          (
          <year>1972</year>
          )
          <fpage>187</fpage>
          -
          <lpage>202</lpage>
          .
          <source>doi:1 0 . 1 1 1 1 / j . 2 5</source>
          <volume>1 7 - 6 1 6 1 . 1 9 7 2</volume>
          .
          <source>t b 0 0</source>
          <volume>8 9 9</volume>
          . x .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>H. M. Gomes</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Read</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Bifet</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          <string-name>
            <surname>Barddal</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Gama</surname>
          </string-name>
          ,
          <article-title>Machine learning for streaming data: state of the art, challenges, and opportunities</article-title>
          ,
          <source>ACM SIGKDD Explorations Newsletter</source>
          ,
          <year>2019</year>
          . Available at https://doi.org/10.1145/3373464.3373470.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ishwaran</surname>
          </string-name>
          , U. B.
          <string-name>
            <surname>Kogalur</surname>
            ,
            <given-names>E. H.</given-names>
          </string-name>
          <string-name>
            <surname>Blackstone</surname>
            ,
            <given-names>M. S.</given-names>
          </string-name>
          <string-name>
            <surname>Lauer</surname>
          </string-name>
          ,
          <article-title>Random survival forests</article-title>
          ,
          <source>Annals of Applied Statistics</source>
          <volume>2</volume>
          (
          <year>2008</year>
          )
          <fpage>841</fpage>
          -
          <lpage>860</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>D. D. Lewis</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Catlett</surname>
          </string-name>
          ,
          <article-title>Heterogeneous uncertainty sampling for supervised learning</article-title>
          ,
          <source>Machine learning proceedings (</source>
          <year>1994</year>
          )
          <fpage>148</fpage>
          -
          <lpage>156</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Freund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Seung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Shamir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tishby</surname>
          </string-name>
          ,
          <article-title>Selective sampling using the query by committee algorithm</article-title>
          ,
          <source>Machine learning 28</source>
          (
          <year>1997</year>
          )
          <fpage>133</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>W.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Zhou,
          <article-title>Maximizing expected model change for active learning in regression</article-title>
          ,
          <source>in: 2013 IEEE 13th international conference on data mining, IEEE</source>
          ,
          <year>2013</year>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>J.</given-names>
            <surname>Demšar</surname>
          </string-name>
          ,
          <article-title>Statistical comparisons of classifiers over multiple data sets</article-title>
          ,
          <source>The Journal of Machine learning research 7</source>
          (
          <year>2006</year>
          )
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dasgupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hsu</surname>
          </string-name>
          ,
          <article-title>Hierarchical sampling for active learning</article-title>
          ,
          <source>in: Proceedings of the 25th international conference on Machine learning</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>208</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>