<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Samuel G. Armato III, Geofrey McLennan, Michael F. McNitt-Gray, Charles R. Meyer,
David Yankelevitz, Denise R. Aberle, Claudia I. Henschke, Eric A. Hofman, Ella A. Kazerooni,
Heber MacMahon, Anthony P. Reeves, Barbara Y. Croft, Laurence P. Clarke. Lung image
database consortium: developing a resource for the medical imaging research community.
Radiology</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1109/ISCID.2016.1111</article-id>
      <title-group>
        <article-title>Revitalize the Potential of Radiomics: Interpretation and Feature Stability in Medical Imaging Analyses through Groupwise Feature Importance</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anna Theresa Stüber</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Coors</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Ingrisch</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Radiology, University Hospital, LMU Munich</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Statistics, LMU Munich</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Munich Center for Machine Learning (MCML), LMU Munich</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>3</volume>
      <issue>2004</issue>
      <fpage>26</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>Radiomics, involving analysis of calculated, quantitative features from medical images with machine learning tools, shares the instability challenge with other high-dimensional data analyses due to variations in the training set. This instability afects model interpretation and feature importance assessment. To enhance stability and interpretability, we introduce grouped feature importance, shedding light on tool limitations and advocating for more reliable radiomics-based analysis methods.</p>
      </abstract>
      <kwd-group>
        <kwd>Analyses</kwd>
        <kwd>radiology</kwd>
        <kwd>radiomics</kwd>
        <kwd>feature (importance) instability</kwd>
        <kwd>grouped feature importance</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Radiomics [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is a field of study that aims to extract quantitative features from medical images
using machine learning (ML) and statistical analysis. These features can be used to identify
patterns and associations that may not be apparent from visual inspection alone. Radiomics
have become increasingly popular in medical imaging as they provide a non-invasive and
eficient way to extract biomarkers from medical images. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
      </p>
      <p>
        Radiomics analyses typically involve three main steps: image acquisition and segmentation,
feature extraction, and statistical analysis (see Fig. 1). In the first step, medical images are
acquired and segmented to isolate the region of interest. In the second step, quantitative
features are extracted from the segmented region using mathematical algorithms and statistical
methods. These features can include shape, texture, and intensity-based metrics, among others.
In the final step, statistical / machine learning (ML) based analyses are performed to identify
patterns and associations between the extracted features and clinical outcomes, such as disease
diagnosis, prognosis, and treatment response. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
      </p>
      <p>
        Radiomics relies on measuring feature importance to understand their impact on
predictions. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] ML models like Random Forests [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] generate scores indicating a feature’s
contribuLate-breaking work, Demos and Doctoral Consortium, colocated with The 1st World Conference on eXplainable Artificial
nEvelop-O
tion to prediction accuracy. Ensuring model robustness across datasets requires understanding
the stability of these measures. However, like in other high-dimensional data analyses, the
sensitivity of feature selection to training set variations [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is restricted. Hence, methods
to restore feature stability (FS) in radiomics-based analyses are indispensable. This entails
assessing the coherence in feature importance scores across datasets or utilizing stability selection
to identify consistent features across numerous model building iterations.
      </p>
      <p>
        Hypothesizing low feature stability (FS) in radiomics-based prediction models, similar to
other high-dimensional data analyses, we propose grouped feature importance [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] for
assessing radiomics-based ML models, aiming to enhance stability and simplify interpretation.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Material and methods</title>
      <sec id="sec-2-1">
        <title>2.1. Assessing feature stability of un-grouped radiomics features</title>
        <p>To investigate the instability of radiomics-based analyses, we constructed a working example
utilizing 136 pre-calculated radiomics features [14] from the Lung Image Database Consortium
image collection (LIDC-IDRI) [15]. This database contains thoracic computed tomography (CT)
scans with annotated lesions, classifing 616 benign and 281 malignant nodules. We established
a standard machine learning (ML) classification pipeline using the R package mlr3 [ 16]. The
pipeline encompassed a random forest (rf) with feature preprocessing (imputation, factor
encoding, and correlation-based feature selection). To refine the pipeline, we utilized nested
resampling with 5-fold cross-validation for both outer and inner loops. Hyperparameters,
including the fraction filtered in preprocessing, the count of features randomly sampled per
decision tree split, and the number of trees within the rf model, were fine-tuned. The
correlationbased filter’s optimization involved a correlation cutof range from 0.1 to 0.9. AUC served as
the basis for optimization and performance assessment.</p>
        <p>Using a methodology akin to test-retest, we trained our ML pipeline over 1000 bootstrap
iterations, varying solely the underlying training set (seed) [17]. For model-agnostic feature
assessment, we employed minimal depth (MD) and permutation feature importance (PFI). The
variable’s ‘importance‘ threshold was determined based on the mean PFI or MD within one
iteration, revealing how often the variable was deemed important across 1000 bootstraps.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Feature stability of grouped radiomics features</title>
        <p>To test our hypothesis on improved model stability with grouped radiomics features, we’ll form
these groups and assess their stability using Group Feature Importance (GFI) methods.</p>
        <sec id="sec-2-2-1">
          <title>2.2.1. Grouping radiomics features</title>
          <p>We will consider various approaches to organizing radiomics features into groups:
1. Grouping based on Semantic Meaning / Clinical Relevance: Categories according to the
anatomical or physiological aspects (shape, intensity, texture).
2. Feature Type Grouping: Groups based on calculation nature, e.g., original vs. processed
(wavelet, log-filter) image features.
3. Statistical Grouping: Use statistical techniques like clustering or intercorrelation analysis
to group features based on their statistical properties.
4. Task-Specific Grouping: Adapt feature grouping to the research question; e.g. for
predicting treatment response, cluster treatment-related features.
5. Expert Knowledge-based Grouping: Categories guided by physicians or domain experts,
based on clinical significance or feature relevance.</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.2.2. Grouped feature importance</title>
          <p>To assess grouped feature importance [18] in radiomics analyses, we will use
permutationbased [19], refitting[ 20], and Shapley-based [21] methods.</p>
          <p>1. Permutation-based method: Randomly permuting grouped features measures their impact
on the model’s predictive accuracy.
2. Refitting method: Fit the model multiple times, excluding specific feature groups, to
assess the change in performance.
3. Shapley-based method: Assign values to grouped features based on their contribution to
predictions using cooperative game theory.</p>
          <p>
            Furthermore, we’ll employ the combined features efect plot (CFEP) [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] to visualize grouped
feature impact. CFEP presents a sparse and interpretable linear combination, ofering insights
into the collective efect of grouped features and their combined influence on predictions.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>In our classification demonstration, employing 136 pre-calculated radiomics features from the
LIDC-IDRI dataset, we trained a ML pipeline over 1000 bootstrap rounds. The models achieved
an average AUC of 0.880 (inter-quartile range: [0.871; 0.891]). In each bootstrap iteration, we
calculated MD and PFI values, marking variables exceeding the mean MD/PFI as ‘important‘.</p>
      <p>In Figure 2, the relative frequency of each variable’s importance is depicted. Among the
136 radiomics features, 46 were chosen at least once by PFI, and 38 by MD. The top four vital
features were selected between 54.7% and 75.6% of the time for PFI, and 61.0% to 78.3% for MD.
These four features—promenance0_N, promenance0_P, sphericity, and uniformity_N—appeared
as the most important in diferent orders. The fith feature (PFI: diameter_mm, MD: skewness)
was chosen less than 40% of the time in both cases.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and outlook</title>
      <p>
        In conclusion, our study highlights challenges in achieving feature stability and interpretability
in radiomics-based analyses, notably during feature importance interpretation. Using a
testretest approach training one ML model (pipeline) 1000 times on varied training set combinations,
we found that the most crucial feature was selected in only about 75% of cases, clearly revealing
variability and uncertainty. This inconsistency persisted despite incorporating a correlation
iflter into our ML pipeline, signifying that the issue extends beyond feature correlations [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        To tackle this challenge and augment the interpretability of radiomics-based ML models,
we advocate for the adoption of grouped feature importance techniques. Among these, the
combined features efect plot (CFEP) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] shows promise, visually representing feature group
influence through a sparse and understandable linear combination.
      </p>
      <p>Instability in radiomics feature calculations arises from various factors like acquisition modes,
reconstruction parameters, and segmentation thresholds. [22] [23] This instability extends to
radiomics-based ML model performance due to high-dimensional test set variations. Hence,
future research should focus on refining grouped feature importance methods, including CFEP,
to enhance feature stability and interpretability.</p>
      <p>Ultimately, by fortifying the stability and interpretability of radiomics-based ML models, we
can revitalize the potential of radiomics in medical imaging, enabling more precise diagnoses,
prognoses, and informed treatment choices for various medical conditions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>McCague</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ramlee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Reinius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Selby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hulse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Piyatissa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bura</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. CrispinOrtuzar</surname>
            , E. Sala,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Woitek</surname>
          </string-name>
          .
          <article-title>Introduction to radiomics for a clinical audience</article-title>
          .
          <source>Clin Radiol</source>
          .
          <year>2023</year>
          Feb;
          <volume>78</volume>
          (
          <issue>2</issue>
          ):
          <fpage>83</fpage>
          -
          <lpage>98</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.crad.
          <year>2022</year>
          .
          <volume>08</volume>
          .149. PMID:
          <volume>36639175</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Joon</given-names>
            <surname>Young</surname>
          </string-name>
          <article-title>Choi. “Radiomics and Deep Learning in Clinical Imaging: What Should We Do?</article-title>
          .
          <source>” Nuclear medicine and molecular imaging</source>
          vol.
          <volume>52</volume>
          ,
          <issue>2</issue>
          (
          <year>2018</year>
          ):
          <fpage>89</fpage>
          -
          <lpage>90</lpage>
          . doi:
          <volume>10</volume>
          .1007/s13139- 018-0514-0
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Janita</surname>
            <given-names>E. van Timmeren</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davide Cester</surname>
            , Stephanie Tanadini-Lang, Hatem Alkadhi,
            <given-names>Bettina</given-names>
          </string-name>
          <string-name>
            <surname>Baessler</surname>
          </string-name>
          .
          <article-title>Radiomics in medical imaging-“how-to” guide and critical reflection</article-title>
          .
          <source>Insights Imaging</source>
          <volume>11</volume>
          ,
          <issue>91</issue>
          (
          <year>2020</year>
          ). https://doi.org/10.1186/s13244-020-00887-2
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Andrei</given-names>
            <surname>Mouraviev</surname>
          </string-name>
          , Jay Detsky, Arjun Sahgal, Mark Ruschin, Young K Lee, Irene Karam, Chris Heyn, Greg J Stanisz, Anne L Martel.
          <article-title>Use of radiomics for the prediction of local control of brain metastases after stereotactic radiosurgery</article-title>
          ,
          <source>Neuro-Oncology</source>
          , Volume
          <volume>22</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>6</given-names>
          </string-name>
          ,
          <string-name>
            <surname>June</surname>
            <given-names>2020</given-names>
          </string-name>
          , Pages
          <fpage>797</fpage>
          -805, https://doi.org/10.1093/neuonc/noaa007
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Sohi</given-names>
            <surname>Bae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Chansik</given-names>
            <surname>An</surname>
          </string-name>
          , Sung Soo Ahn, Hwiyoung Kim, Kyunghwa Han, Sang Wook Kim, Ji Eun Park, Ho Sung Kim,
          <string-name>
            <surname>Seung-Koo Lee</surname>
          </string-name>
          .
          <article-title>Robust performance of deep learning for distinguishing glioblastoma from single brain metastasis using radiomic features: model development and validation</article-title>
          .
          <source>Sci Rep</source>
          <volume>10</volume>
          ,
          <issue>12110</issue>
          (
          <year>2020</year>
          ). https://doi.org/10.1038/s41598-020- 68980-6
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Johanna</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Enke</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jan H. Moltz</surname>
          </string-name>
          ,
          <string-name>
            <surname>Melvin D'Anastasi</surname>
          </string-name>
          , Wolfgang G. Kunz, Christian Schmidt, Stefan Maurus, Alexander Mühlberg, Alexander Katzmann, Michael Sühling, Horst Hahn, Dominik Nörenberg, Thomas Huber.
          <article-title>Radiomics features of the spleen as surrogates for CT-based lymphoma diagnosis and subtype diferentiation</article-title>
          .
          <source>Cancers</source>
          ,
          <volume>14</volume>
          (
          <issue>3</issue>
          ),
          <volume>713</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Alexandros</given-names>
            <surname>Kalousis</surname>
          </string-name>
          , Julien Prados,
          <string-name>
            <given-names>Melanie</given-names>
            <surname>Hilario</surname>
          </string-name>
          .
          <article-title>Stability of feature selection algorithms: a study on high-dimensional spaces</article-title>
          .
          <source>Knowl Inf Syst</source>
          <volume>12</volume>
          ,
          <fpage>95</fpage>
          -
          <lpage>116</lpage>
          (
          <year>2007</year>
          ). https://doi.org/10. 1007/s10115-006-0040-8
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Barbara</given-names>
            <surname>Pes</surname>
          </string-name>
          .
          <article-title>Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains</article-title>
          .
          <source>Neural Comput &amp; Applic</source>
          <volume>32</volume>
          ,
          <fpage>5951</fpage>
          -
          <lpage>5973</lpage>
          (
          <year>2020</year>
          ). https://doi.org/10. 1007/s00521-019-04082-3
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Salem</given-names>
            <surname>Alelyani</surname>
          </string-name>
          , Zheng Zhao,
          <string-name>
            <given-names>Huan</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <article-title>A Dilemma in Assessing Stability of Feature Selection Algorithms</article-title>
          .
          <source>IEEE International Conference on High Performance Computing and Communications</source>
          , Banf,
          <string-name>
            <surname>AB</surname>
          </string-name>
          , Canada,
          <year>2011</year>
          , pp.
          <fpage>701</fpage>
          -
          <lpage>707</lpage>
          , doi: 10.1109/HPCC.
          <year>2011</year>
          .
          <volume>99</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Utkarsh</surname>
            <given-names>Mahadeo Khaire</given-names>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Dhanalakshmi</surname>
          </string-name>
          .
          <year>2022</year>
          .
          <article-title>Stability of feature selection algorithm: A review</article-title>
          .
          <source>J. King Saud Univ. Comput. Inf. Sci. 34</source>
          ,
          <issue>4</issue>
          (
          <year>2022</year>
          ),
          <fpage>1060</fpage>
          -
          <lpage>1073</lpage>
          . https: //doi.org/10.1016/j.jksuci.
          <year>2019</year>
          .
          <volume>06</volume>
          .012
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Quay</surname>
            <given-names>Au</given-names>
          </string-name>
          , Julia Herbinger, Clemens Stachl, Bernd Bischl, Giuseppe Casalicchio:
          <article-title>Grouped feature importance and combined features efect plot</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          (
          <year>2022</year>
          ).
          <volume>36</volume>
          .
          <fpage>1</fpage>
          -
          <lpage>50</lpage>
          .
          <fpage>10</fpage>
          .1007/s10618-022-00840-5.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Cheng</surname>
            <given-names>Zhu</given-names>
          </string-name>
          , Huili Gong,
          <string-name>
            <given-names>Zhongren</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Chunxia</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>Application of High Dimensional Feature Grouping Method in Near-Infrared Spectra of Identification of Tobacco Growing Areas</article-title>
          .
          <source>3rd International Conference on Information Science and Control Engineering (ICISCE)</source>
          , Beijing, China,
          <year>2016</year>
          , pp.
          <fpage>230</fpage>
          -
          <lpage>234</lpage>
          , doi: 10.1109/ICISCE.
          <year>2016</year>
          .
          <volume>58</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Zhigang</surname>
            <given-names>Shang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Mengmeng</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <source>Feature Selection Based on Grouped Sorting. 9th Interna-</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>