<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Adaptive version of the Metropolis Adjusted Langengevin Algorithm for Survival prediction in a high dimensional framework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gabriele Tinè</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rosalba Miceli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Biostatistics for Clinical Research Unit, Fondazione IRCCS Istituto Nazionale dei Tumori</institution>
          ,
          <addr-line>Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The objective is to construct a prognostic index that incorporates radiomic information with the validated prognostic index (Sarculator) provided by the Fondazione IRCCS Istituto Nazionale dei Tumori di Milano. A Bayesian approach was employed, utilising a Weibull model. Vague prior distributions were elicited for the shape parameter, the intercept, and the Sarculator. A multivariate Gaussian prior was elicited for the 2,144 radiomic parameters, incorporating a penalty factor, λ . A total of 100 penalty values were considered. A new, ad hoc adaptive version of the pre-conditioned Metropolis adjusted Langevin algorithm (A-MALA) was proposed for sampling. Bayesian Model Averaging (BMA) was employed to yield a composite of the 100 models. A Bayesian hypothesis test was constructed to evaluate the superiority of the BMA prognostic index relative to the Sarculator. The five-year AUC posterior mean was 0.809, with a 95% credible interval (CI) of (0.768, 0.851). The posterior mean of the C-index was 0.804 (95% CI, 0.764, 0.845) for the BMA, 0.743 (95% CI, 0.713, 0.771) for the best model log λ = 10.39 and 0.735 (95% CI, 0.674, 0.761) for the Sarculator. The results suggest that radiomic variables should be included in the model.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Bayesian computation</kwd>
        <kwd>Survival Analysis</kwd>
        <kwd>Shrinkage Prior in Survival Analysis</kwd>
        <kwd>Metropolis Adjusted Langevin Algorithm</kwd>
        <kwd>Adaptive Metropolis</kwd>
        <kwd>Bayesian Model Averaging</kwd>
        <kwd>Hypotesis test construction</kwd>
        <kwd>Omics data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Background</title>
      <p>The combination of radiomic variables with clinical variables has been extensively utilized in
the literature for the differentiation of benign and malignant lesions and for tumor grading
differentiation [1], [2]. In contrast, the application of radiomic variables in constructing
prognostic indices remains a relatively unexplored area of research. Among the few studies
that address the topic of soft tissue sarcomas (STS), prognosis, and radiomics, Spraker et al.
conducted an analysis of T1-weighted MRI sequences [3]. A notable limitation of the study
is the relatively small number of radiomic variables considered, with only 30 included. This
limitation is not merely a matter of quantity; it also concerns the selection process itself. The
rationale behind the selection of this specific subset of variables is not clearly articulated. The
exclusion of a more extensive range of radiomic attributes inherent to MRI images increases
the likelihood that an inadequate variable set has been selected, which may result in the
omission of valuable predictive factors. Furthermore, the method used to select the variables
is a cause for concern. Spraker et al. employed a Cox proportional hazards model with
LASSO penalization for variable selection. However, LASSO regression does not provide
inferential guarantees and can suffer from selection bias due to the uncertainty associated
with the selection process itself. The reliability of the selected predictors may be limited
in the absence of control for the false discovery rate (FDR). Other studies [4] put forth the
proposition of constructing a prognostic index that incorporates radiomic variables through
a joint analysis of T1- and T2-weighted MRI sequences. This resulted in the extraction of a
total of 1,394 radiomic variables, which appears to be an adequate number for MRI images.
Nevertheless, analogous constraints are evident in their study. Moreover, the authors utilized
Cox-LASSO regression for variable selection but did not provide inferential guarantees on the
selected predictors, as there was no control for the FDR. In a previous study [5], the authors
employed machine learning algorithms on computed tomography (CT) images. However, a
comprehensive analysis revealed that the FDR control measure was not employed, which may
have resulted in an inadequate level of inferential assurance regarding the selected predictors.
Additionally, no statistically significant difference was found between the prognostic accuracy
of the models that were based exclusively on clinical variables and those that incorporated
both clinical and CT-derived radiomic variables. This result could suggest that the CT-derived
radiomic variables do not contribute additional prognostic value beyond that of the clinical
factors. However, the absence of inferential guarantees concerning the selection process could
lead to misleading conclusions. Similarly, [6] extracted 103 radiomic variables from
diffusionweighted imaging (DWI) MRI sequences but faced comparable limitations. They applied
Cox-LASSO regression without inferential guarantees on the predictors, and the number of
radiomic variables was relatively small. As with the approach taken by [3], no explanation was
provided for the choice of variables extracted. It is notable that the aforementioned studies not
only exhibit a similar range in sample sizes but also appear to utilise Cox-LASSO regression in
a manner that is somewhat unconventional from a statistical perspective. Indeed, the analyses
were conducted on an identical dataset in two stages. Initially, variable selection was conducted
using Cox-LASSO regression. Subsequently, an unpenalized Cox model was constructed with
the selected predictors, enabling the extraction of p-values for each variable. This methodology
introduces several biases that must be considered. Firstly, hypothesis tests on the coefficients
and their associated p-values are inherently unreliable due to the inherent bias introduced by
LASSO, which affects the coefficients. Therefore, the testing of the coefficients in the newly
constructed model is also affected by this inherent bias, resulting in the invalidation of the tests
performed on the resulting model coefficients [ 7]. Secondly, the selection process is biased
due to the utilization of the same data set for both variable selection and model estimation.
This recycling of data can result in overfitting and an underestimation of true variability,
which may compromise the generalizability of the results. Furthermore, the absence of FDR
control means that the selection lacks inferential guarantees, leaving uncertainty about the
reliability of the selected predictors [8]. It seems that the aforementioned studies, which aimed
to construct a valid and generalizable prognostic indicator, may have faced some challenges in
terms of generalizability and potential biases that could have arisen from the use of the same
data for variable selection and model estimation. Although the relatively small sample sizes
are to be expected given the rarity of the disease, they nevertheless limit the robustness of
the findings. Furthermore, to date, no study has attempted to integrate radiomic variables
into a prognostic index constructed from a significantly larger dataset and validated across
four distinct patient cohorts, such as the Sarculator. In view of these shortcomings, the aim
of the present study is to employ methodologies that overcome the generalizability issues
observed in previous studies and to enhance the prognostic accuracy of the Sarculator by
integrating radiomic variables. We propose a formal Bayesian test with inferential guarantees
to determine whether radiomic features provide meaningful prognostic information beyond
that captured by clinical variables alone. This test is constructed with an innovative and
unbiased method to ensure rigorous statistical inference and control for the false discovery
rate. Furthermore, we introduce an innovative Bayesian algorithm to improve the estimation
process. Specifically, we have developed an adaptive version of the Metropolis adjusted
Langevin algorithm (MALA) to accelerate the convergence of our Bayesian sampling [9, 10].
This adaptation enhances computational efficiency and allows for more effective exploration of
the parameter space, thereby facilitating the generation of more reliable and robust estimates.
By focusing exclusively on radiomic data, our approach does not sacrifice degrees of freedom
for additional clinical variables; rather, these are encapsulated within the Sarculator, ensuring
inferential reliability through the application of proper statistical controls. Furthermore, our
methodology avoids the biases associated with unsuitable variable selection techniques.</p>
      <sec id="sec-1-1">
        <title>Dealing with clinical data</title>
        <p>The clinical omic data presents several challenges due to the limited sample size (consisting
of 91 patients) and the high dimensionality of the feature space (comprising 2145 variables).
Consequently, it may prove challenging to conduct an objective inference that incorporates the
inherent uncertainty associated with the estimated parameters. Furthermore, the outcome to
be predicted is a survival outcome. The Fondazione IRCCS Istituto Nazionale dei Tumori di
Milano developed a Cox model on 1,452 patients affected by soft tissue sarcomas, which was
validated on three independent cohorts [11], [12].</p>
        <p>Given the accuracy and calibration of the clinical prognostic index, it was adopted for
the prediction of survival probability in patients with sarcomas. However, the prognostic
index (referred to as ”Sarculator”) considers only clinical variables [13]. The incorporation of
radiomic data may enhance the prognostic performance of the index. However, the challenge
lies in verifying whether the radiomic information can effectively augment the efficacy of
the Sarculator and in developing a novel prognostic index that incorporates both the existing
index and the novel radiomic variables. To address these challenges, a Bayesian approach has
been selected for both the construction of the new prognostic index and the assessment of its
potential improvement.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Methods</title>
      <p>The Weibull distribution was chosen to model the time-to-event data. Let X be the standardized
matrix of the predictors of dimension n × p, where n = 91 and p = 2145. Let  = (t1, . . . , tn)
be the vector of the observed time of event (or time of censoring) and  = (δ 1, . . . , δ n) the
vector of event or censor indicators, which is equal to 1 if the patient developed the event and
0 otherwise.</p>
      <p>
        Let Ti ∼ Weibull(α , ϵ ) be the time-to-event random variable, where α &gt; 0 is the shape
parameter and γ i = ϵ −α = exp(xi′ ). The likelihood model can be written as:
n
L (X, ,  ;  , α ) = ∏︂ (︁ γα itαi −1)︁ δ i exp (−γ itαi ) ,
i=1
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
where  is a 2146-dimensional vector: 2144 are the radiomic coefficients, β 0 is the intercept,
and β 1 is the coefficient of the linear predictor derived from the Sarculator. For further details
on the parametrization adopted, see [14].
      </p>
      <p>In order to make regression over the 2145 variables feasible, it is necessary to elicit a prior
on  that shrinks the coefcfiients. However, it is not appropriate to shrink all the coefficients
since we know that the coefcfiient β 1 associated with the Sarculator is surely relevant, as
demonstrated by validation studies. There is no reason to penalize the intercept either. Hence,
β 0 and β 1 should not be penalized. For the other coefficients, there is a need to elicit a prior
that shrinks them toward zero. Considering that radiomic variables are highly correlated and
usually characterized by low signal and poor informativeness, the data does not match the
sparsity hypothesis. Therefore, a normal prior equivalent to a Ridge penalty seems to be the
better choice, as discussed in [15].</p>
      <p>
        The parameterization of the Weibull model presented here differs from the more traditional
Accelerated Failure Time (AFT) models. In standard AFT models, the logarithm of the
time-to-event is typically modeled as a linear function of the covariates, expressed as:
log(Ti) = xi′ + σW i,
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
where Wi follows a specific distribution (e.g., standard extreme value, normal). This
formulation emphasizes the multiplicative effect of covariates on the survival time, effectively
accelerating or decelerating the event process.
      </p>
      <p>
        In contrast, our parameterization directly models the scale parameter γ i = exp(xi′ ) within
the Weibull distribution framework. This approach aligns more closely with the proportional
hazards paradigm by specifying the hazard function as:
λ (t|xi) = αγ itα −1,
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
allowing covariates to have a multiplicative effect on the hazard rate. This distinction is
crucial, especially given the high dimensionality and correlation structure of the radiomic
predictors in our study. By employing this parameterization, we facilitate the application
of shrinkage priors, such as the Ridge-type normal prior, which effectively manages
multicollinearity and enhances model interpretability. Additionally, this approach respects the
known importance of specific coefficients like
imposing undue penalization.
      </p>
      <p>The choice between these parameterizations hinges on the underlying assumptions about
how covariates influence the survival process. While AFT models are advantageous when
the primary interest lies in understanding the acceleration or deceleration of survival times,
the proportional hazards-based Weibull model offers flexibility in modeling hazard functions
directly, which is beneficial in high-dimensional settings with correlated predictors.</p>
      <p>For a more in-depth comparison and methodological details, readers may refer to [14]
and standard texts on survival analysis that discuss the nuances between different Weibull
parameterizations and their relationship to AFT models.
β 1 associated with the Sarculator without</p>
      <sec id="sec-2-1">
        <title>Prior and Hyperprior elicitation</title>
        <p>Regarding α remind that α &gt; 0 , consequently, a prior on the positive real has to be
and σ 2 = log(σ α2 + 1). In fact, recall that if α ∼ lognorm(µ , σ 2), then:
specified. A lognormal prior seems to be the better choice since it’s reparametrization ξ =
log(α ) it’s a normal distribution and in the calculation of the posterior there is no need to
calculate the Jacobian. The hyperparameters of the prior distribution for α
were elicited such
that the prior mean of α</p>
        <p>was set to one, reflecting a neutral position regarding the hazard
function specification. By setting a high variance, σ α2 , the log-normal prior becomes effectively
non-informative, allowing the parameter to update primarily based on the likelihood. To
ensure E(α ) = 1, the log-normal hyperparameters are specified as follows: µ = − log(σ α2 + 1)/2
E(α ) = exp µ +
︃(
= 1,</p>
        <p>Var(α ) = exp (︁ 2µ + σ 2)︁ (︁ exp (︁ σ 2)︁ − 1)︁ = σ α2 .</p>
        <p>Therefore:
σ 2 )︃
2
µ = −
log(σ α2 + 1)
2</p>
        <p>, σ 2 = log(σ α2 + 1).
f(α ) =
α
√
1
2πσ 2</p>
        <p>︃(
exp
−
(log(α ) − E(α )2 )︃
2σ 2
Finally, since the density function of the log-normal distribution is given by:
and the logarithm of a log-normal is normally distributed with the same parameters, it is
possible to reparameterize by setting ξ = log(α ) and considering the prior distribution for ξ ,
with
ξ ∼ N
︃(
−
log(σ α2 + 1)
, log(σ α2 + 1) .</p>
        <p>
          ︃)
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
centered in zero. Let λ identified the precision parameter. The prior distribution for  is:
Regarding  , let σ β2 be a higher variance, used to imposed a vague normal prior on β 0, β 1
⎛
× exp ⎝−
        </p>
        <p>
          ξ 2
2 log(σ α2 + 1)
−
ξ
2
p
j=2
− λ ∑︂ β j2 −
val [e5, e10.5]. The interval has been established so that the lower value generated a identifiable
model and the upper one corresponded to a high shrinkage. Assuming independence between
ξ and  , and combining the relations (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ), (??) and (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ), is possible to define the kernel of the
λ -th posterior distributions such as:
π ( | X, ,  ) ∝ L (X, ,  |  , ξ )π (ξ )π ( )
︄(
n
i=1
n
i=1
The posterior distribution lacks a known, tractable representation that would allow us to
leverage (marginal) conjugacy. This renders Gibbs sampling infeasible. It is necessary to
adopt a Metropolis algorithm, but in a high-dimensional framework, the standard
MetropolisHastings is typically ineffective in exploring the posterior distribution efficiently [ 17]. Therefore
we need to move towards more complex algorithm such as the Metropolis Adjusted Langevin
algorithm (MALA). The mixing of such an algorithm is contingent upon the covariance matrix.
It is therefore crucial to precondition the algorithm to an appropriate covariance matrix [18].
        </p>
        <p>In order to construct an appropriate covariance matrix for each of the 100 proposal
distributions, a reasonable strategy is to derive the covariance from the observed information matrix,
which is calculated on the log posterior distribution:
log π ( , ξ | X, ,  ) = ∑︂ (︂ δ iξ + δ ii′ − ei′ tieξ + δ i (︁ eξ − 1)︁ log ti +
︂)
−</p>
        <p>ξ 2
2 log(σ α2 + 1)
−
ξ
2
p
j=2
− λ ∑︂ β j2 −
β 20 + β 21 + c,
2σ 2
where c is a constant term including all the log-additional terms which do not depend from
parameters. Now we can derive the pre-conditioning matrix. Formally, called I (︁ ˜)︁ the
observed matrix in ˜, where ˜ is the initialization vector for the generic proposal distribution,
the observed covariance matrix Σ for the pre-conditioning can be computed as follow [19]:
Σ ≈ I−1 (︁ ˜)︁ = [︁ −H (︁ ˜)︁] −1 ,
where H (︁ ˜)︁ is the hessian of the log posterior evaluated in ˜. Hence, for a sequence of optimal
initialization vectors (︁ ˜0, . . . ˜λ . . . ˜100</p>
        <p>
          ︁) it is possible to derive the 100 pre-conditioning
matrices.
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
(
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
order to establish
the initialization
vectors for each
proposal
distribution
︁( ˜0, . . . ˜λ . . . ˜100︁) it is possible to maximize the log posterior so that, for the generic  ˜λ we
have:
˜λ = arg max log π (λ | X, ,  ) .
        </p>
        <p>
          λ
(
          <xref ref-type="bibr" rid="ref9">9</xref>
          )
From equation (
          <xref ref-type="bibr" rid="ref9">9</xref>
          ), it is possible to derive the generic initialization vector ˜λ for the initialization
of the λ -th pre-conditioning matrix I−1 (︁ ˜λ )︁ for the λ -th proposal distribution as follows:
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩
        </p>
        <p>∂ξ
⎧ ∂ log π (· |X,t, ) = 0
⎪⎪⎪ ∂ log π (· |X,t, ) = 0
∂β 0
∂β 1
⎪⎪⎪ ∂ log π (· |X,t, ) = 0
⎪
⎪⎪⎪ ∂ log π (· |X,t, ) = 0
⎨</p>
        <p>∂β 2
⎪⎪⎪ ∂ log π (· |X,t, ) = 0
⎪</p>
        <p>∂β j
⎪⎪⎪ ∂ log π (· |X,t, ) = 0
⎪
∂β p
⇒ ∑︁n
⇒ ∑︁n
⇒ ∑︁n
⇒ ∑︁n
.
.
.
.
.
.</p>
        <p>︂(
i=1 δ ixi0 − xi0ei′ t
i=1 [︂δ i + log ti ︂( eξδ i − ei′ +ξ teξ )︂]︂</p>
        <p>i
eξ )︂ − σ β β200 = 0
i
i=1 δ ixi1 − xi1ei′ teξ )︂ − σ β β211 = 0
︂(</p>
        <p>i
i=1 δ ixi2 − xi2ei′ teξ )︂ − 2λβ 2 = 0
︂(</p>
        <p>i
⇒ ∑︁n ︂(
i=1 δ ixij − xijei′ teξ )︂ − 2λβ j = 0</p>
        <p>i
⇒ ∑︁n ︂(
i=1 δ ixip − xipei′ teξ )︂ − 2λβ p = 0
i
− log(σ ξ α2 +1) − 12 = 0
The system cannot be solved analytically, so it is necessary to use a numerical approximation
for which an adaptive nonlinear least squares algorithm was chosen [20]. The procedure
was repeated over a sequence of 100 distinct values of λ , and the vectors ˆ minimizing
− log π ( | X, t,  ) were selected as initialization values for the sampling algorithm .</p>
      </sec>
      <sec id="sec-2-2">
        <title>A-MALA pre-conditioned</title>
        <p>for the generic λ can be written as follows:</p>
        <p>Given that we decided to use the MALA algorithm, the structure of proposal distribution
︄(
*λ | λ ∼ N p+2 λ +
ε</p>
        <p>2
2(p + 2) 3
1</p>
        <p>I−1(˜λ )∇ log π (λ | X, ,  )⃓
⃓
⃓ λ =˜λ
,
ε</p>
        <p>2
(p + 2) 3
1</p>
        <p>I−1(˜λ ) ,
)︄
where ∇ log π (λ | X, ,  ) is the gradient of the log posterior, which incorporates the
information of the structure of the posterior distribution; the vector λ = ˜λ at the first iteration; the
quantity ε is the step-size parameter, which regulates the entity of the jumps. Incorporating
the structure of the posterior at each iteration through the gradient of the lo posterior in
the proposal distribution helps the proposal to generate candidate from region with higher
densities and, consequently, with more probability of being accepted [21]. In order to make
more computational efficient generating from the proposal without the needing of inverting
the covariance matrix at each iteration we used the spectral decomposition such that:</p>
        <p>I−1(˜λ ) = Vλ Dλ−1/2Dλ−1/2V′λ
where Vλ represents the standardized eigenvectors and Dλ represents the diagonal matrix
of the eigenvalues of Σ λ . Hence, setting A
generate the new candidate from the following equation:
λ = Vλ Dλ</p>
        <p>−1/2, such that I−1(˜λ ) = Aλ Aλ′ we can
ε</p>
        <p>2
(p + 2) 3
λ∗ (j) = λ(j−1) +
1 ∇ log π (λ | X, ,  )⃓
⃓
⃓ λ =λ(j−1) +</p>
        <p>
          ε
(p + 2) 6
where Z ∼ N p+2(0, Ip+2). Note that using spectral decomposition we only need to deal with
Dλ at each iteration, instead of Σ , which is computationally more efficient to manipulate.
(
          <xref ref-type="bibr" rid="ref10">10</xref>
          )
(
          <xref ref-type="bibr" rid="ref11">11</xref>
          )
Moreover, for computational efficiency, we implemented Singular Value Decomposition (SVD),
however since the pre-conditioning matrix is symmetric and squared, SVD and Spectral
        </p>
        <sec id="sec-2-2-1">
          <title>Decomposition are equivalent.</title>
          <p>Algorithm 1 reported the pseudocode used to analytically compute the Hessian in the optimum
point. Furthermore, the entire process for the construction of the matrices used to efficiently
generate from the proposal distribution is reported.</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Algorithm 1: Proposal distributions initialization</title>
          <p>Analytical derivation of Hessian matrix and optimal point evaluation −H⃓⃓  = ˜ = I( ˜ )
⃓
Function Amatrix(,σ α2 , σ β2 , l, X, ,  ):</p>
          <p>H ←</p>
          <p>empty matrix (p + 2) × (p + 2)
diag(H)[j] ← ∑︁in=1 ︃(
xi2je−i0′˜
t
e ϑ˜0 )︃ + 2f(ϑ˜ −0,j)
i
for k ←
end
2 to p + 2 do</p>
          <p>1 to j − 1 do
H[j, k] ← ∑︁in=1 xijxike−0
i′˜
te ϑ˜0
i
H is obtained by symmetry completion of the upper triangular matrix</p>
          <p>SVD(H)[u] Normalized eigenvectors matrix</p>
          <p>SVD(H)[δ ] Singular value vector
D = diag (︂ 1/√d)︂ ;˜˚ ←</p>
          <p>2p.3+822 VD2V′</p>
          <p>Matrix used to efficiently generate from the proposal distribution</p>
          <p>The step-size ε is scaled so that it matches the optimal step that maximizes the diffusion
(being the speed of diffusion related to asymptotic variance) and has to be tuned [22]. The
strategy implemented was adapting the step-size each 50 iterations so that the optimal
acceptance rate of the MALA, which is 57.4% was reached [22]. Besides, in order not to
compromise the convergence to the stationary distribution, which is guaranteed by the ergodic
theorem, we adapted the step-size according to the assumption of diminishing adaptation [23],
[24]. Eventually, we adapted the step-size parameter only within the burn in period, making
the number of bur in iterations variable between different λ according to the stabilization of
the step-size parameter. In particular, the adaptive strategy we adopted was the following: let
ι = 0.054 identifying the tolerance parameter, the stopping rule for the burn in period requires
the satisfaction of the following conditions:
1. the burn-in iterations are higher than 40000;
2. ε is constant for 500 iterations;
3. if conditions 1 e 2 are not satisfied the burn-in is stopped at 150000 iterations.
The criteria to assess whether, every 50 iterations, the burn in had to be stopped was the
following: let r be the r-th burn in iteration and let ε(j−1) the ε value deriving from the
previous updating step, then
⎪⎪⎩ε(j) = ε(j−1)
⎪
⎧ε(j) = ε(j−1) + min 0.01, √1rj )︂</p>
          <p>︂(
⎪
⎨ε(j) = ε(j−1) − min 0.01, √1rj )︂
︂(
if
if
if
ε(j) ∈/ [0.574, 0.547 + ι ]
ε(j) ∈/ [0.547 − ι , 0.547)
ε(j)
∈ [0.547 − ι , 0.547 + ι ].</p>
          <p>
            (
            <xref ref-type="bibr" rid="ref12">12</xref>
            )
out ←
for i ←
          </p>
          <p>AA′ ; S ←</p>
          <p>S−1
1 to length( ) do
 = betamat [i, ]
A ← Amatrix(, X, ,  , σ α2 , σ β2 , l = λ i)</p>
          <p>if batch = bat &amp; r &lt; burnin + 1 then
S ←
logp ←
lgrad ←
accepted ←
index ←</p>
          <p>r ←
while r ⩽ burnin + R do
Adaptive-MALA (A-MALA). The detailed construction of the A-MALA algorithm we proposed
is reported as pseudocode in Algorithm 2.</p>
          <p>Algorithm 2: Pre-conditioned A-MALA with tuning step-size</p>
          <p>Function MALA(burnin, R, τ , ,σ α2 , σ β2 , , X, ,  , bat, target = 0.574, tolerance = 0.054):
array: 100 matricies: rows= R/τ , coulumns = p + 2. p number of features; τ thinning period
ε ; batch ←
0 ; accepted ←</p>
          <p>0
all(ε = adaptivemonitoring [(j − r + 1) : j, i])= TRUE then
if accepted/bat &gt; target + tolerance then
if accepted/bat &lt; target − tolerance then
ε ← ε + min 0.01, √︂ 1 )︃
︃(</p>
          <p>r
ε ← ε − min 0.01, √︂ 1 )︃
︃(</p>
          <p>r
end
batch ← batch + 1
logpnew
lgradnew
value accepted
index ←
out[index, , i] ←</p>
          <p>index + 1
if r &gt; burnin &amp; r ∈ sequenza then</p>
          <p>accepted + 1 update accepted values counter
ε2
∗ ←  + 2(p+2)1/3 lgrad 2(p+2)1/3 AZ
√︃</p>
          <p>ε2
logpnew ←
lgradnew ←
logposterior(∗ , X, ,  , σ α2 = 16, σ β2 = 600, l = λ i)</p>
          <p>S logradient(∗ , X, ,  , σ α2 = 16, σ β2 = 600, l = λ i)
diffold ←  − * − 2(p+2)1/3 lgradnew
diffnew ← * −  − 2(p+2)1/3 lgrad
ε2
ε2
qold ← diffoldS−1diffold
(p+2)1/3</p>
          <p>ε2
diffnewS−1diffnew (p+2)1/3</p>
          <p>ε2
min {1, exp (logpnew − logp + qold − qnew)}
end
end
end
qnew ←
α ←
u ←
if u &lt; α then
u ∼</p>
          <p>
            U(
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            )
logp ←
lgrad ←
 = ∗
accepted ←
end
end
end
end
End Function
          </p>
          <p>return out</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Bayesian Model Averaging</title>
        <p>
          When sampling stage was succesfully completed ∀λ using the A-MALA with 75000
iterations of sampling and a thinning period of 15, we used the Watanable Akaike Criterion
Information (WAIC) [25], to select the best model and to build a new model by applying the
Bayesian Model Averaging (BMA) [26]. We retained that BMA was more appropriate than
just select the best model since selecting one model which has a low posterior probability
(5%) introduces a selection bias since the uncertainty of the selection process is not taken
into account [27], [28]. In order to take into account the selection uncertainty seems better to
use a mixture of all models, where each of them is weighted with its posterior probability.
Formally, let ∆ be a quantity of inferential interest. Since we do not have any prior information
about the probability of each model Mλ let elicit a uniform prior distribution on each model:
Mλ ∼ U(
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ) ∀λ . Then we can calculate BMA estimate of ∆ as:
where π (Mλ | X, ,  ) represents the posterior probability for the λ -th model.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Approximation of the posterior probability model</title>
        <p>
          In equation (
          <xref ref-type="bibr" rid="ref13">13</xref>
          ) the term π (Mλ | X, ,  ) has to be calculated as the ratio of the marginal
likelihood of the λ -th model and the sum of the all marginal likelihoods:
        </p>
        <p>π (Mλ | X, ,  ) = ∑︁Lλ L(X(, X,, , | M| Mλ)λ ) .</p>
        <p>However, these quantity are unknown ad have to be estimated. We could approximate it using
the relationship between the marginal likelihood and the Bayesian Information Criteria (BIC),
as [29]:</p>
        <p>L (X, ,  | Mλ )
π (Mλ | X, ,  ) = ∑︁λ L (X, ,  | Mλ ) ≈</p>
        <p>exp (︁ − 12 BICλ )︁
∑︁Λλ =1 exp (︁ − 12 BICλ )︁
.</p>
        <p>
          Nevertheless, we prefer to use the relation (
          <xref ref-type="bibr" rid="ref15">15</xref>
          ) replacing the BIC with the WAIC. WAIC is
indeed a fully Bayesian criteria which measures how well the model will perform on new data.
Indeed, WAIC approximate the Leave One Out Cross Validation (LOO-CV) [30], [27], [31], [32].
In this way, the posterior probability is related to the model’s ability to fit new data. So we
approximated (
          <xref ref-type="bibr" rid="ref14">14</xref>
          ) with:
        </p>
        <p>L (X, ,  | Mλ )
π (Mλ | X, ,  ) = ∑︁λ L (X, ,  | Mλ ) ≈</p>
        <p>exp (︁ − 21 WAICλ )︁
∑︁Λλ =1 exp (︁ − 12 WAICλ )︁
.</p>
        <p>
          Note that the approximation (
          <xref ref-type="bibr" rid="ref14">14</xref>
          ) has already been introduced also for other information
criteria, such as AIC [33].
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>Hypotesis test</title>
        <p>We compared the best model with the BMA but we also compared the BMA with the
Sarculator. To do so, we use exactly the same algorithm to sample the values of the Sarculator
excluding the radiomic variables. Called M0 the Sarculator model, we build a Bayesian in
order to evaluate whether the radiomic variables added prognostic power. The test we build
can be written as follows:</p>
        <p>H0 : M0 is sufficient vs</p>
        <p>
          H1 : Mλ is better than M0,
(
          <xref ref-type="bibr" rid="ref17">17</xref>
          )
where π (M0) = 0.5 and π (Mλ ) = 0.005. If π (M0 | X, ,  ) &lt; 0.5 then H0 is rejected. Note that
we set the prior probability of the Sarculator much higher than the prior probability of the
others model due to the fact that the Sarculator is a validated prognostic index. Formally, the
test we built can be specified as:
        </p>
        <p>H0 : π (M0 | X, ,  ) ⩾ 0.5 | π (M0) = 0.5 vs</p>
        <p>
          H1 : π (M0 | X, ,  ) &lt; 0.5 | π (M0) = 0.5. (
          <xref ref-type="bibr" rid="ref18">18</xref>
          )
(
          <xref ref-type="bibr" rid="ref13">13</xref>
          )
(
          <xref ref-type="bibr" rid="ref14">14</xref>
          )
(
          <xref ref-type="bibr" rid="ref15">15</xref>
          )
(
          <xref ref-type="bibr" rid="ref16">16</xref>
          )
Combining the equation (
          <xref ref-type="bibr" rid="ref14">14</xref>
          ) and the approximation (
          <xref ref-type="bibr" rid="ref16">16</xref>
          ) the posterior probability model of
M0 can be computed as:
π (M0 | X, ,  ) =
        </p>
        <p>L (X, ,  | M0) π (M0)
L (X, ,  | M0) π (M0) + ∑︁λΛ L (X, ,  | Mλ ) π (Mλ )
exp (︁ − 12 WAIC0)︁ × 2</p>
        <p>
          1
≈
1 .
exp (︁ − 12 WAIC0)︁ × 21 + ∑︁Λλ exp (︁ − 12 WAICλ )︁ × 2Λ
(
          <xref ref-type="bibr" rid="ref19">19</xref>
          )
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>A total of 91 eSTS patients were included in the study. Demographic and tumor characteristics
are detailed in Table 1. The median age of the cohort was 59 years (interquartile range [IQR]:
46, 71), and the median tumor diameter was 8.0 mm (IQR: 5.0, 10.8). The median follow-up
period was approximately 55.8 months (IQR: 44.8, 72.9), during which 21 patients (23.1%)
succumbed to tumor-related causes.</p>
      <p>The results of the Bayesian analysis indicate that the incorporation of radiomic variables into
the prognostic model is justified. The Bayesian Model Averaging (BMA) yielded a posterior
mean Area Under the Curve (AUC) at five years of 0.809, with a 95% Credible Interval (CI) of
(0.768, 0.851). This indicates that the model demonstrates a robust capacity to discriminate
over a five-year period. The posterior mean Brier score at five years was 0.277, with a 95%
CI of (0.257, 0.304), indicating that the model predictions exhibited acceptable calibration.
When evaluated over the entire study period, the posterior mean Brier Score was 0.316, with
a 95% CI of (0.291, 0.346). Furthermore, the posterior mean Concordance Index (C-Index)
was 0.804, with a 95% CI of (0.764, 0.845), which provides additional evidence in support
of the model’s predictive accuracy. The posterior mean of the coefficient associated with
the previously validated prognostic index (Sarculator) was 1.008, with a 95% CI of (0.989,
1.036). This corresponds to a hazard ratio (HR) of 2.739, indicating that higher scores on the
prognostic index are associated with a significantly increased risk of adverse outcomes. The
posterior mean of the shape parameter of the Weibull distribution was estimated at 0.963,
which suggests a near-constant hazard over time. It is notable that all posterior estimates of the
radiomic parameters were close to zero due to the penalty imposed by the prior distribution.
This is to be expected due to the model’s ability to shrink insignificant coefficients towards
zero, mitigating the risk of overfitting without introducing a selection bias. The BMA shows
wider distributions and wider credible sets than the best model, as it takes into account the
uncertainty associated with the selection process. The posterior mean of the AUC at 5 years
and C-index is better for the BMA. With regard to calibration, the Brier Scores of the BMA and
the best model are similar (see Tabel 2). The results show that the BMA performs better than
the best model.</p>
      <p>Metric</p>
      <p>BMA (95% CI)</p>
      <p>Best model (log λ = 10.39) (95% CI)</p>
      <p>Sarculator (95% CI)
The Bayesian test implemented gave the following result:</p>
      <p>π (M0 | X, ,  ) ≈ 0.0049.</p>
      <p>Since π (M0 | X, ,  ) &lt; 0.5, the evidence in favour of H0 is very low, consequently we reject H0,
propending for H1 : ∃λ | Mλ is better than M0. Consequently, radiomic variables should
be included in a prognostic index to increase the prognostic accuracy.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>This study explores aspects of statistics and computational efcfiiently sampling algorithms
rarely fully developed within a single framework. Specifically, high variable dimensionality
(omics data) and low sample size are common in today clinical studies.</p>
      <p>Typically, high-dimensional data problems assume a large number of observations, even
when p ≫ n. While this increases computational demands, the ample sample size allows for
standard approaches like splitting data into training and test sets. The training set is used
for variable selection—potentially using cross-validation—and the test set for inference. This
works because the sample size overrepresents the population, ensuring the subsample used
for selection contains all necessary information.</p>
      <p>However, with small sample sizes, dividing data into subsets can result in samples that
no longer represent the population, leading to biased and suboptimal variable selection, in
our case, for example, we have only one patient affected by Vascular Sarcomas. In such
cases, the classical data-splitting paradigm fails to provide inferential guarantees. This issue
is often overlooked, with methods like LASSO applied despite lacking inferential assurances.
Moreover, using LASSO on the same data for both selection and inference introduces bias due
to selection and the absence of inferential guarantees. In this study, we developed a method
that, considering the data type, sample size, and number of variables, predicts the OS without
performing variable selection, thus avoiding associated bias. An ad hoc model was constructed
by eliciting prior distributions, which did not penalize the Sarculator. We incorporated prior
knowledge of relevant variables, applying penalization only to radiomic variables. A Bayesian
approach is inherently suitable for ensuring inferential guarantees, which are assurances
that the conclusions drawn from a statistical model are both reliable and valid, particularly
concerning parameter estimation, prediction, and inference about data relationships. Bayesian
inference provides a robust framework for accounting for uncertainty through probability
distributions, incorporating prior knowledge, and updating it with observed data. This allows
for the generation of credible intervals for parameter estimates and naturally balances model
complexity with data support, thereby maintaining inferential guarantees even in complex,
high-dimensional feature spaces or when sample sizes are limited. Moreover, Bayesian credible
intervals offer direct probabilistic interpretations of parameter uncertainty, which is especially
advantageous in sparse data contexts compared to the frequentist freamwork. The inherent
nature of the Bayesian approach, incorporating uncertainty through prior and posterior
distributions, mitigates the risk of overestimation when compared to frequentist methods. The
application of Bayesian Model Averaging (BMA) further addresses uncertainty by considering
less probable models, thereby reducing overestimation risk.</p>
      <p>To specify suitable prior distributions and initialize proposal distributions based on solid
theory, we built our own A-MALA within a MCMC framework, implemented ad hoc for
optimal performance. While this introduced theoretical and practical complexities, it allowed
us to thoroughly explore each relevant step within the Bayesian framework without relying on
inflexible pre-existing algorithms. The limited literature on Bayesian methods for survival data
with high dimensionality and low sample sizes necessitated a detailed analysis of theoretical
options, involving significant effort to construct a robust approach.</p>
      <p>Given the high dimensionality of the data, it is reasonable to question why the Ridge
prior was chosen among available shrinkage priors. Addressing this requires understanding
the application domain, data characteristics, and Bayesian shrinkage mechanisms. Morever,
using a Bayesian Ridge penalty was a choice that we made after a comprehensive theoretical
considerations. In fact several Bayesian Variable Selection approaches could be adopted
however, the alternatives does not adapt properly to the nature od the data. The Laplace
distribution, a Bayesian analogue of LASSO regression, could effectively set coefficients to
zero in the frequentist context. However, this property does not hold in Bayesian settings
when considering the posterior mean. In high-dimensional Bayesian contexts, using a Laplace
or Normal prior yields similar results. Although the Laplace distribution concentrates more
probability mass at zero, it cannot set the posterior mean of coefficients to exactly zero.
Achieving this requires using the maximum a posteriori (MAP) estimator, which introduces
selection bias by ignoring uncertainty around the mode. Moreover, the penalization induced
by the Laplace distribution also results in the loss of the oracle property [34]. The Horseshoe
prior, introduced by [35], and its variation, the Regularized Horseshoe [36], could be useful for
understanding different types of penalization. The Horseshoe prior uses a hierarchical model
with hyperpriors on variance, allowing minimal penalization on certain coefficients. It has
heavier tails than Laplace or Normal priors, and the combination of local and global shrinkage
parameters helps balance penalization. However, in small samples, global shrinkage can
dominate local effects, limiting the prior’s effectiveness in high-dimensional settings so it did
not appear the appropriate choice for our data structure [37]. In contexts like radiomics, where
many variables are correlated, it is not reasonable to assume sparsity. Radiomic variables often
capture similar effects, leading to redundancy rather than sparsity. The Horseshoe prior may
still require arbitrary criteria for variable selection, which introduces bias. The Hyperlasso
prior, though addressing Laplace’s limitations, also assumes sparsity [38].</p>
      <p>Spike-and-slab priors could be another potential alternative, it can set coefficients exactly to
zero but face practical challenges in scenarios with low sample sizes and many variables. In
such cases, the distribution may not properly characterize the signal [39, 40]. To avoid bias,
it is more appropriate to consider all penalized coefficients sampled from the posterior and
account for uncertainty across models.</p>
      <p>Considering all these factors, the Ridge prior was selected because it penalizes all coefficients
without assuming sparsity, making it suitable for a small sample with many variables.
Performing variable selection with non-sparse priors inevitably introduces bias. Using the same
data for model estimation and variable selection can lead to errors due to multiple uses of the
data. From a Bayesian perspective, selecting variables arbitrarily disregards the uncertainty of
excluded variables.</p>
      <p>This study presents several notable strengths that address the limitations of prior research in
radiomics and prognostic modeling for limb soft tissue sarcomas. Through the implementation
of an innovative and unbiased formal test with inferential guarantees, we have rigorously
assessed the additional prognostic value that radiomic features contribute beyond established
clinical predictors, such as those encapsulated by the Sarculator. The formal Bayesian
hypothesis test we built allow us to assess whether the Sarculator model alone was sufficient or
whether adding radiomic features provides additional prognostic power. To do so, we assign
prior probabilities to reflect the Sarculator’s status as a validated model, ensuring a
conservative stance on adding complexity. The posterior probability of the Sarculator model was
then computed. This approach provides a clear, probabilistic criterion for model sufficiency,
favoring interpretability by focusing on posterior probabilities rather than relative evidence
measures like the Bayes Factor. In clinical context providing posterior probabilities rather than
Bayes Factor is particularly useful, as it allows for a cautious, directly interpretable evaluation
of whether adding radiomic features meaningfully improves the Sarculator model’s utility.</p>
      <p>Furthermore in our Bayesian Model Averaging (BMA) framework, overfitting is unlikely due
to several key methodological safeguards. First, the Bayesian approach inherently incorporates
uncertainty, utilizing prior and posterior distributions that allow for regularization, particularly
essential in high-dimensional, small-sample contexts. The use of WAIC (Widely Applicable
Information Criterion) to evaluate posterior probabilities further ensures that model complexity
is balanced with data fit, prioritizing models that generalize well rather than simply fitting
the estimation data closely. Note that the pWAIC of the best model (which can be seen as
the number of effective parameters [21] resulted lower than 1, (0.88). This results further
limits the risk of overfitting. This can be derived from the influence of the prior distributions,
where, given the limited sample size, the prior information on the 2,144 radiomic parameters
dominates the effect detected by the pWAIC. This indicates that the model is not excessively
complex relative to the available data. By imposing a regularizing structure, the priors mitigate
the risk of fitting noise rather than true signal, promoting a more parsimonious model that
enhances generalizability. Moreover, BMA mitigates overfitting risk by averaging across
multiple models rather than selecting a single, potentially overfitted model. By including
less probable models in the averaging process, BMA reduces sensitivity to any one model’s
idiosyncratic fit, providing a robust estimate that accounts for model uncertainty and minimizes
reliance on any single set of coefficients. Consequently, BMA offers a stable, interpretable
approach that enhances model generalizability, providing a more reliable prognostic tool and
addressing overfitting concerns effectively within the Bayesian framework.</p>
      <p>Nevertheless, this methodology has certain limitations that require careful consideration.
First, the high computational demands associated with processing an extensive feature space
using A-MALA necessitate substantial resources, which may pose constraints in some research
settings. Furthermore, given the relatively small sample size compared to the large number of
features, external validation is essential to rigorously assess and quantify the improvement
in model performance, ensuring reliable generalizability and mitigating potential biases
introduced by the high-dimensional feature space in a limited sample context.</p>
      <p>To conclude, this study demonstrates the potential of integrating radiomic features with
established clinical predictors to improve prognostic modeling for limb soft tissue sarcomas.
The use of an adaptive Bayesian approach, coupled with rigorous inferential guarantees, offers
a promising framework for enhancing patient stratification and individualized treatment
planning. Despite the challenges posed by computational demands and the need for external
validation, our findings lay the groundwork for future research aimed at validating and
refining these methods to ultimately improve patient outcomes and quality of care.
Inference, volume 2B of Kendall’s Library of Statistics, 2nd ed., Arnold Publishers, London,
2004. Chapters 4: "Asymptotic Approximations" and 5: "The Posterior Distribution and
the Information Matrix".
[20] J. E. Dennis Jr, D. M. Gay, R. E. Walsh, An adaptive nonlinear least-squares algorithm,
ACM Transactions on Mathematical Software (TOMS) 7 (1981) 348–368. doi:10.1145/
355958.355965.
[21] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Rubin, Bayesian data analysis, Chapman and</p>
      <p>Hall/CRC, 1995.
[22] G. O. Roberts, J. S. Rosenthal, Optimal scaling for various metropolis-hastings algorithms,</p>
      <p>Statistical science 16 (2001) 351–367.
[23] G. O. Roberts, J. S. Rosenthal, Coupling and ergodicity of adaptive markov chain monte
carlo algorithms, Journal of applied probability 44 (2007) 458–475.
[24] H. Haario, E. Saksman, J. Tamminen, An adaptive metropolis algorithm, Bernoulli (2001)
223–242.
[25] S. Watanabe, A widely applicable bayesian information criterion, The Journal of Machine</p>
      <p>Learning Research 14 (2013) 867–897.
[26] A. E. Raftery, D. Madigan, J. A. Hoeting, Bayesian model averaging for linear regression
models, Journal of the American Statistical Association 92 (1997) 179–191.
[27] L. Wasserman, Bayesian model selection and model averaging, Journal of mathematical
psychology 44 (2000) 92–107.
[28] T. M. Fragoso, W. Bertoli, F. Louzada, Bayesian model averaging: A systematic review
and conceptual classification, International Statistical Review 86 (2018) 1–28.
[29] T. Ando, Bayesian predictive information criterion for the evaluation of hierarchical
bayesian and empirical bayes models, Biometrika 94 (2007) 443–458.
[30] S. Watanabe, M. Opper, Asymptotic equivalence of bayes cross validation and widely
applicable information criterion in singular learning theory., Journal of machine learning
research 11 (2010).
[31] A. Vehtari, J. Lampinen, Bayesian model assessment and comparison using
crossvalidation predictive densities, Neural computation 14 (2002) 2439–2468.
[32] A. Vehtari, A. Gelman, J. Gabry, Practical bayesian model evaluation using leave-one-out
cross-validation and waic, Statistics and computing 27 (2017) 1413–1432.
[33] K. P. Burnham, D. R. Anderson, Multimodel inference: Understanding AIC and BIC
in model selection, Sociological Methods &amp; Research 33 (2004) 261–304. doi:10.1177/
0049124104265511.
[34] T. Park, G. Casella, The bayesian lasso, Journal of the American Statistical Association
103 (2008) 681–686.
[35] C. M. Carvalho, N. G. Polson, J. G. Scott, Handling sparsity via the horseshoe, Proceedings
of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS) 5
(2009) 73–80.
[36] J. Piironen, A. Vehtari, Sparsity information and regularization in the horseshoe and
other shrinkage priors, Electronic Journal of Statistics 11 (2017) 5018–5051. doi:10.1214/
17-EJS1337SI.
[37] A. Bhadra, J. Datta, N. G. Polson, B. Willard, Lasso meets horseshoe, Statistical Science
34 (2019) 405–427.
[38] J. E. Griffin, P. J. Brown, Inference with normal-gamma prior distributions in regression
problems, Bayesian Analysis 5 (2010) 171–188. doi:10.1214/10-BA507.
[39] E. I. George, R. E. McCulloch, Approaches for bayesian variable selection, Statistica Sinica
7 (1997) 339–373.
[40] H. Ishwaran, J. S. Rao, Spike and slab variable selection: frequentist and bayesian
strategies, The Annals of Statistics 33 (2005) 730–773. doi:10.1214/009053604000001147.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V. D.</given-names>
            <surname>Corino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Montin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Messina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Casali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gronchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marchianò</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. T.</given-names>
            <surname>Mainardi</surname>
          </string-name>
          ,
          <article-title>Radiomic analysis of soft tissues sarcomas can distinguish intermediate from high-grade lesions</article-title>
          ,
          <source>Journal of Magnetic Resonance Imaging</source>
          <volume>47</volume>
          (
          <year>2018</year>
          )
          <fpage>829</fpage>
          -
          <lpage>840</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Fanciullo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gitto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Carlicchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Albano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Messina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Sconfienza</surname>
          </string-name>
          ,
          <article-title>Radiomics of musculoskeletal sarcomas: a narrative review</article-title>
          ,
          <source>Journal of Imaging</source>
          <volume>8</volume>
          (
          <year>2022</year>
          )
          <fpage>45</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Spraker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. S.</given-names>
            <surname>Wootton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Hippe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. C.</given-names>
            <surname>Ball</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Peeken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. W.</given-names>
            <surname>Macomber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Chapman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Hoff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Pollack</surname>
          </string-name>
          , et al.,
          <article-title>Mri radiomic features are independently associated with overall survival in soft tissue sarcoma</article-title>
          ,
          <source>Advances in radiation oncology 4</source>
          (
          <year>2019</year>
          )
          <fpage>413</fpage>
          -
          <lpage>421</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Peeken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Spraker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Knebel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dapper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pfeiffer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Devecka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Thamer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Shouman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ott</surname>
          </string-name>
          , R. von
          <string-name>
            <surname>Eisenhart-Rothe</surname>
          </string-name>
          , et al.,
          <article-title>Tumor grading of soft tissue sarcomas using mri-based radiomics</article-title>
          ,
          <source>EBioMedicine</source>
          <volume>48</volume>
          (
          <year>2019</year>
          )
          <fpage>332</fpage>
          -
          <lpage>340</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Peeken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bernhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Spraker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pfeiffer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Devecka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Thamer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Shouman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Nüsslin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Mayr</surname>
          </string-name>
          , et al.,
          <article-title>Ct-based radiomic features predict tumor grading and have prognostic value in patients with soft tissue sarcomas treated with neoadjuvant radiation therapy</article-title>
          ,
          <source>Radiotherapy and Oncology</source>
          <volume>135</volume>
          (
          <year>2019</year>
          )
          <fpage>187</fpage>
          -
          <lpage>196</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <article-title>Radiomics signature extracted from diffusion-weighted magnetic resonance imaging predicts outcomes in osteosarcoma</article-title>
          ,
          <source>Journal of Bone Oncology</source>
          <volume>19</volume>
          (
          <year>2019</year>
          )
          <fpage>100263</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          ,
          <article-title>Regression shrinkage and selection via the lasso</article-title>
          ,
          <source>Journal of the Royal Statistical Society Series B: Statistical Methodology</source>
          <volume>58</volume>
          (
          <year>1996</year>
          )
          <fpage>267</fpage>
          -
          <lpage>288</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Lukacs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Burnham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <article-title>Model selection bias and freedman's paradox</article-title>
          ,
          <source>Annals of the Institute of Statistical Mathematics</source>
          <volume>62</volume>
          (
          <year>2010</year>
          )
          <fpage>117</fpage>
          -
          <lpage>125</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Langevin</surname>
          </string-name>
          ,
          <article-title>Sur la théorie du mouvement brownien</article-title>
          ,
          <source>Compt. Rendus</source>
          <volume>146</volume>
          (
          <year>1908</year>
          )
          <fpage>530</fpage>
          -
          <lpage>533</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Girolami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Calderhead</surname>
          </string-name>
          ,
          <article-title>Riemann manifold langevin and hamiltonian monte carlo methods</article-title>
          ,
          <source>Journal of the Royal Statistical Society Series B: Statistical Methodology</source>
          <volume>73</volume>
          (
          <year>2011</year>
          )
          <fpage>123</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Callegaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Miceli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bonvalot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ferguson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Strauss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Grifnfi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Hayes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stacchiotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Le Pechoux</surname>
          </string-name>
          , et al.,
          <article-title>Development and external validation of two nomograms to predict overall survival and occurrence of distant metastases in adults after surgical resection of localised soft-tissue sarcomas of the extremities: a retrospective analysis</article-title>
          ,
          <source>The Lancet Oncology</source>
          <volume>17</volume>
          (
          <year>2016</year>
          )
          <fpage>671</fpage>
          -
          <lpage>680</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Callegaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Miceli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bonvalot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Ferguson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Strauss</surname>
          </string-name>
          ,
          <string-name>
            <surname>V. V. van Praag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Griffin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Hayes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stacchiotti</surname>
          </string-name>
          , et al.,
          <article-title>Development and external validation of a dynamic prognostic nomogram for primary extremity soft tissue sarcoma survivors</article-title>
          ,
          <source>EClinicalMedicine</source>
          <volume>17</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Miceli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Callegaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Barretta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gronchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vergani</surname>
          </string-name>
          ,
          <source>Sarculator 2.1</source>
          .2, https:// apps.apple.com/na/app/sarculator/id1052119173, https://play.google.com/store/apps/ details?id=it.digitalforest.sarculator&amp;hl=it&amp;gl=US,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A. J. Hallinan</given-names>
            <surname>Jr</surname>
          </string-name>
          ,
          <article-title>A review of the weibull distribution</article-title>
          ,
          <source>Journal of Quality Technology</source>
          <volume>25</volume>
          (
          <year>1993</year>
          )
          <fpage>85</fpage>
          -
          <lpage>93</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hsiang</surname>
          </string-name>
          ,
          <article-title>A bayesian view on ridge regression</article-title>
          ,
          <source>Journal of the Royal Statistical Society Series D: The Statistician</source>
          <volume>24</volume>
          (
          <year>1975</year>
          )
          <fpage>267</fpage>
          -
          <lpage>268</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Van Erp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Oberski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mulder</surname>
          </string-name>
          ,
          <article-title>Shrinkage priors for bayesian penalized regression</article-title>
          ,
          <source>Journal of Mathematical Psychology</source>
          <volume>89</volume>
          (
          <year>2019</year>
          )
          <fpage>31</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Beskos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stuart</surname>
          </string-name>
          ,
          <article-title>Computational complexity of metropolis-hastings methods in high dimensions</article-title>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>G. O.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          ,
          <article-title>Examples of adaptive mcmc</article-title>
          ,
          <source>Journal of computational and graphical statistics 18</source>
          (
          <year>2009</year>
          )
          <fpage>349</fpage>
          -
          <lpage>367</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>A. O'Hagan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Forster</surname>
          </string-name>
          ,
          <source>Kendall's Advanced Theory of Statistics</source>
          , Volume
          <volume>2B</volume>
          : Bayesian
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>