<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Journal of
Afective Disorders 305 (2022) 47-54. doi: 10.1016/j.jad.2022.02.072.
[9] M. Sundararajan</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1186/s40644-023-00594-3</article-id>
      <title-group>
        <article-title>Assesing the Interpretability of the Statistical Radiomic Features via Image Saliency Maps in Medical Image Classification Tasks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksandr Davydko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Technological University Dublin</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>70</volume>
      <fpage>4765</fpage>
      <lpage>4774</lpage>
      <abstract>
        <p>The presented research aims to improve the interpretability of medical image classification models trained with statistical radiomic features. While showing classification results comparable with stateof-the-art convolutional neural network models, statistical radiomic features' interpretability is still understudied. Neural network models use saliency map approaches to provide a human operator with intuitive visualisation of the model's attention, but statistical radiomic-based models still have no such tools developed. This research aims to eliminate this gap and allow the saliency map generation for models trained with statistical radiomic features. Preliminary results show that the proposed approach may generate faithful saliency maps for the ResNet-50 classification model trained the first-order statistical radiomic features.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Medical image classification</kwd>
        <kwd>Texture analysis</kwd>
        <kwd>Statistical radiomic features</kwd>
        <kwd>Saliency maps</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Context and motivation
visual explanations than standard numerical feature importance.</p>
      <p>This research aims to introduce a method for generating a saliency map when a classification
model is trained with statistical radiomic features, subsequently improving the explainability of
statistical-radiomic-based classification models for medical images.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        The statistical radiomic features were applied to solve many diferent medical image
classification tasks, showing near state-of-the-art classification performance. Authors of the
research [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] have attempted to fuse grey-level co-occurrence matrix (GLCM), grey-level run
length matrix (GLRLM), and segmentation-based fractal texture analysis (SFTA) features to
detect the COVID-19 lesions presence on the chest X-ray images. The fusion of features
combined with feature selection done by principle components analysis allowed reaching 0.94
F1-score while distinguishing between healthy and COVID pneumonia lung images. Similar
results were observed in the same task when using first-order statistics (FOS), GLCM, GLRLM,
and grey-level size zone matrix (GLSZM) feature extraction methods in the work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Here, the
authors report the F1-score at a 0.98 rate. In research [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] authors report 0.975 accuracy while
performing a classification of brain tumors on magnetic resonance images (MRI)
      </p>
      <p>
        Current advances in the interpretability of radiomic-based models mostly include interpreting
importances of used high-order radiomic features. Authors of the research in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] use SHAP [5],
which allowed them to identify the most influential features. In another research work [ 6],
authors interpret radiomic feature groups importances by analysing logistic regression
coeficients. A study [ 7] uses SHAP to reveal the most influential features to diagnose
schizophrenia by brain magnetic resonance images (MRI). The authors of the work [8] use the
same technique to find the connection between particular features and panic disorder signs.
      </p>
      <p>At the same time, researchers utilise saliency map methods such as Integrated Gradients [9],
layer-wise propagation [10], DeepLIFT [11], GradCAM [12] for interperting convolutional
neural network predictions. The saliency map is much easier to understand from the point of
view of human perception. For radiomic-based models, a little work discusses some analogs of
saliency maps. In the research [13], authors discuss the interpretability of tumor tissue signature
identification when local statistical radiomic features are used. The problem of interpretability
was tackled by visualizing feature activation maps for a single high-order feature.</p>
      <p>It can be stated, that the problem of statistical radiomic features interpretability is definitely
in focus of researchers but requires further investigations. The main problem to investigate is
the generation of understandable image saliency maps for radiomic-based models.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Design and methodology</title>
      <p>This research describes methods for saliency map generation of the first- and high-order
statistical radiomics features.</p>
      <sec id="sec-3-1">
        <title>3.1. Mapping first-order radiomic features’ attributions</title>
        <p>First-order statistical radiomic features are formed by computing the frequency of some texture
substructure appearing in the image. Some examples of such substructures are:
1. Pair of pixels with intensities , 
2. Run-length of pixels with the same intensity 
3. Cluster of connected pixels with same intensity 
A proposed method of saliency map generation includes the computation of features’ attributions
and subsequently adding those attribution values to pixels involved in forming particular feature
values (figure 1). Regarding the size of the statistical radiomics features matrices (256x256 at
least), it is proposed to use convolutional neural networks to build the classifiers and
gradientbased methods to obtain attributions (such as Integrated Gradients).</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Mapping high-order radiomic features’ attributions</title>
        <p>A mapping of high-order statistical radiomic features could be defined as an extension of
the first-order features mapping procedure. It is a noticeable fact that all high-order feature
formulas are represented with diferentiable functions. That allows the usage of gradient-based
methods (such as Integrated Gradients, GradCAM, DeepLIFT) directly to obtain first-order
feature attributions and subsequently map them to medical image pixel attributions by applying
procedure from section 3.1. The graphical representation of this process is displayed in figure 2.
This approach is feasible if the classification model is of a fully-connected neural network type.
Known model-agnostic methods such as SHAP cannot be used to attribute the massive number
(at least 65536 for GLCM and more for other methods) of the first-order features due to slowness
in the calculation process, which leaves the usage of other classification models as an open
question out of this research scope.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Experiment</title>
        <p>The experiment is designed to test the faithfulness of the saliency maps generated by
methods described in 3.2 and 3.2 and compare them with those generated by existing methods in
diferent classification tasks on X-ray and MRI images. The experiment’s scheme is presented
in the figure 3. The described experiment is run separately with diferent datasets to test the
generalisability of the proposed approach.</p>
        <sec id="sec-3-3-1">
          <title>3.3.1. Data Preparation</title>
          <p>Each image in the dataset is converted into greyscale if needed, as statistical radiomic features
are defined only for greyscale textures in the current literature. Images are left intact for the
baseline pipeline (steps G and H in the figure 3).</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>3.3.2. Feature extraction</title>
          <p>First-order statistical radiomic features are extracted with GLCM, GLRLM, GLSZM, grey-level
dependency matrix (GLDM), and neighboring grey-tone diference matrix (NGTDM) methods
Method
GLCM
GLRLM
GLSZM
GLDM
NGTDM</p>
          <p>Features
Angular Second Moment, Contrast, Correlation, Sum of Squares, Inverse
Diference Moment, Sum Average, Sum Variance, Sum Entropy, Entropy,
Diference Variance, Diference Entropy, Information Measures of
Correlation (2x)
Short Run Emphasis, Long Run Emphasis, Grey Level Non-Uniformity, 16
Gray Level Non-Uniformity, Run Length Non-Uniformity, Run Length
NonUniformity Normalized, Run Percentage, Grey Level Variance, Run Variance,
Run Entropy, Low Gray Level Run Emphasis, High Gray Level Run Emphasis,
Short Run Low Gray Level Emphasis, Short Run High Gray Level Emphasis,
Long Run Low Gray Level Emphasis, Long Run High Gray Level Emphasis
Small Area Emphasis, Large Area Emphasis, Gray Level Non-Uniformity, 16
Gray Level Non-Uniformity Normalized, Size-Zone Non-Uniformity,
SizeZone Non-Uniformity Normalized, Zone Percentage, Gray Level Variance,
Zone Variance, Zone Entropy, Low Gray Level Zone Emphasis, High Gray
Level Zone Emphasis, Small Area Low Gray Level Emphasis, Small Area
High Gray Level Emphasis, Large Area Low Gray Level Emphasis, Large
Area High Gray Level Emphasis
Small Dependence Emphasis, Large Dependence Emphasis, Gray Level Non- 14
Uniformity, Dependence Non-Uniformity, Dependence Non-Uniformity
Normalized, Gray Level Variance, Dependence Variance, Dependence
Entropy, Low Gray Level Emphasis, High Gray Level Emphasis, Small
Dependence Low Gray Level Emphasis, Small Dependence High Gray Level
Emphasis, Large Dependence Low Gray Level Emphasis, Large Dependence
High Gray Level Emphasis</p>
          <p>Coarseness, Contrast, Busyness, Complexity, Strength 5
into matrices, as described in works [14, 15, 16, 17, 18]. While calculating statistical radiomics
features, background pixels are not taken into account. The parameters for the mentioned
methods are to be found with a hyperparameter search procedure. High-order statistical
radiomics features are extracted out of matrices containing first-order radiomic features with
formulas defined in [ 14, 15, 16, 17, 18] and concatenated into a single feature vector. The list of
features calculated is provided in table 1</p>
          <p>Each of the described features has its unique formula for calculation. For example, a formula
for the Contrast feature of the GLCM matrix is defined as:
 
=1 =1</p>
          <p>Contrast = ∑︁ ∑︁ ( − )2 (, )
Here   - number of grey shades in the image (usually equals 256),  - matrix containing
grey-level co-occurrence first-order radiomic features.</p>
        </sec>
        <sec id="sec-3-3-3">
          <title>3.3.3. Classification models and their training</title>
          <p>Three types of classification models are trained. A first type receives first-order statistical
radiomic feature matrices. Models of the first type are represented with convolutional neural
Total
13</p>
          <p>Reference
[14]
[15]
[16]
[17]
[18]
networks. In this particular experiment, ResNet-50, VGG-16, and EficientNet architectures are
used. Models of the second type are represented with special architecture, which implements
high-order statistical formulas from [14, 15, 16, 17, 18]. A multi-layer perceptron follows the
implementation of the block with formulas along with a sigmoid or softmax layer. The baseline
models receive plain images as input and are represented by ResNet-50, VGG-16, and EficientNet
architectures. Each classification model of every type is trained from scratch. The training is
held until the F1-score on the development set stops to improve for ten subsequent epochs.
Adam algorithm is used as an optimizer with a 1 − 4 learning rate.</p>
        </sec>
        <sec id="sec-3-3-4">
          <title>3.3.4. Statistical radiomic feature attribution and saliency maps</title>
          <p>As all classification models described in 3.3.3 are represented by neural networks, is is possible
to use gradient-based methods for feature attributing. In this research, it is proposed to use
Integrated Gradients, GradCAM, and DeepLIFT without any modifications. Subsequently, for
radiomic-based models, an additional mapping, which is described in section 3.1, is conducted
to obtain the saliency map. For plain images, attributions may be used as ready saliency map
without additional transformations.</p>
        </sec>
        <sec id="sec-3-3-5">
          <title>3.3.5. Evaluation</title>
          <p>The classification models’ performance are measured with accuracy and F1-score metrics.
The faithfulness of the saliency maps is measured numerically by Increase-In-Confidence and
Average Drop metrics [19] and compared to the same metrics for plain statistical radiomic
features attributions. Additionally, the same evaluation is conducted with Insertion Correlation
(IC) and Deletion Correlation (DC) metrics [20] as they also taking into account magnitudes of
saliency values. However, for IC and DC some iterations in the computation procedure will be
merged to drastically reduce the number of iterations, as the 256x256 saliency map requires
more than 3000 predictions to compute these metrics.</p>
        </sec>
        <sec id="sec-3-3-6">
          <title>3.3.6. Datasets for experiment</title>
          <p>Schenzen tuberculosis open-access dataset [21] contains 662 x-ray scans. The dataset is balanced;
there are 326 images of the healthy lungs and 336 images of the lung with the signs of tuberculosis.
No additional transformations are applied to this dataset, except those described in 3.3.1. During
the experiment, the task of distinguishing between healthy and tuberculosis lungs was assessed
with this dataset.COVIDx CXR-4 dataset [22] contains 84,818 chest X-ray scans. There are
65,681 scans containing COVID-19 lesions, and 19,137 are healthy lungs. Test and validation
sets for this dataset were formed balanced while leaving the train set unbalanced to ensure
faithful classification metrics. No additional transformations are applied to this dataset, except
those described in 3.3.1. During the experiment, the task of distinguishing between healthy and
COVID-19-damaged lungs was solved with this dataset. Cancer-Net BCa contains 253 breast
MRI scans with evidence of breast cancer. During the experiment, the task of full remission
prediction is considered.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Research question and hypothesis</title>
      <p>This research aims to answer the next question: how a medical image saliency map could
be generated to explain a classification result when a classification model is trained with the
statistical radiomic features? According to this, the research hypothesis may be defined as
follows:
Research hypothesis IF neural network trained with first- or high-order statistical radiomic
features to perform the medical image classification AND aforementioned features attributed
with Integrated Gradients, GradCAM, DeepLIFT AND image saliency map generated with
proposed mapping method THEN Increase-In-Confidence, Average Drop, Insertion Correlation,
and Deletion Correlation metrics for generated image saliency maps will be at least the same or
statistically significantly higher as for direct feature attributions.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Preliminary results</title>
      <p>The preliminary results, described in the author’s previous work [23], indicate that the saliency
maps generated method described in 3.1 could be considered faithful in terms of a numerical
evaluation, maintaining Increase-In-Confidence metric at 50% − 80% level and Average Drop
at 10% − 38%. Also, results indicate that the ResNet-50 classification model trained with only
ifrst-order statistical radiomic features yields the same classification quality as the ResNet-50
model with raw image input, indicating that results are eligible for practical usage.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Expected final contribution to knowledge</title>
      <p>The final contribution of the described research is expected to be a method for visually explaining
a classification result via saliency maps when the classification model is trained with first- or
high-order statistical radiomic features. The newly proposed method should allow for the
explanation and validation of the results of the previous work which uses statistical radiomic
features to solve medical image classification problems.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Koyuncu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Barstuğan</surname>
          </string-name>
          , Covid
          <article-title>-19 discrimination framework for x-ray images by considering radiomics, selective information, feature ranking, and a novel hybrid classifier</article-title>
          ,
          <source>Signal Processing: Image Communication</source>
          <volume>97</volume>
          (
          <year>2021</year>
          )
          <article-title>116359</article-title>
          . URL: https://www.sciencedirect.com/science/article/pii/S092359652100165X. doi:https: //doi.org/10.1016/j.image.
          <year>2021</year>
          .
          <volume>116359</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Ş.</given-names>
            <surname>Öztürk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Özkaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Barstuğan</surname>
          </string-name>
          ,
          <article-title>Classification of coronavirus (covid-19) from x-ray and ct images using shrunken features</article-title>
          ,
          <source>International Journal of Imaging Systems and Technology</source>
          <volume>31</volume>
          (
          <year>2020</year>
          )
          <fpage>5</fpage>
          -
          <lpage>15</lpage>
          . doi:
          <volume>10</volume>
          .1002/ ima.22469.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Zulpe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Pawar</surname>
          </string-name>
          ,
          <article-title>Glcm textural features for brain tumor classification</article-title>
          ,
          <source>IJ CSI 9</source>
          (
          <year>2012</year>
          )
          <fpage>354</fpage>
          -
          <lpage>359</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.-P.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.-T.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.-Y.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <article-title>A radiomics-based interpretable model to predict the pathological grade of pancreatic neuroendocrine tumors</article-title>
          ,
          <source>European Radiology</source>
          <volume>34</volume>
          (
          <year>2023</year>
          )
          <fpage>1994</fpage>
          -
          <lpage>2005</lpage>
          . doi:
          <volume>10</volume>
          .1007/s00330-023-10186-1.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>