=Paper=
{{Paper
|id=Vol-2744/paper34
|storemode=property
|title=Attention-based Convolutional Neural Network for MRI Gibbs-ringing Artifact Suppression
|pdfUrl=https://ceur-ws.org/Vol-2744/paper34.pdf
|volume=Vol-2744
|authors=Maksim Penkin,Andrey Krylov,Alexander Khvostikov
}}
==Attention-based Convolutional Neural Network for MRI Gibbs-ringing Artifact Suppression==
<pdf width="1500px">https://ceur-ws.org/Vol-2744/paper34.pdf</pdf>
<pre>
Attention-based Convolutional Neural Network for MRI
          Gibbs-ringing Artifact Suppression?

    Maksim Penkin[0000−0002−8027−9333] , Andrey Krylov[0000−0001−9910−4501] , and
                    Alexander Khvostikov[0000−0002−4217−7141]

                     Lomonosov Moscow State University, Moscow, Russia
                                penkin97@gmail.com
                                    kryl@cs.msu.ru
                               khvostikov@cs.msu.ru
                                https://imaging.cs.msu.ru/ru


        Abstract. Gibbs-ringing artifact is a common artifact in MRI image processing.
        As MRI raw data is taken in a frequency domain, 2D inverse discrete Fourier
        transform is applied to visualize data. Inability to take inverse Fourier transform
        of full spectrum (full k-space) leads to the insufficient sampling of the high fre-
        quency data and results in a well-known Gibbs phenomenon. It is worth to notice
        that truncation of high frequency information generates a significant blur, thus
        some techniques from other image restoration problems (for example, image de-
        blur task) can be successfully used. We propose attention-based convolutional
        neural network for Gibbs-ringing reduction which is the extension of recently
        proposed GAS-CNN (Gibbs-ringing Artifact Suppression Convolutional Neural
        Network). Proposed method includes simplified non-linear mapping, amended
        by LRNN (Layer Recurrent Neural Network) refinement block with feature at-
        tention module, controlling the correlation between input and output tensors of
        the refinement unit. The research shows that the proposed post-processing refine-
        ment construction considerably simplifies the non-linear mapping.

        Keywords: Gibbs-ringing artifacts, Magnetic resonance imaging, Attention CNN,
        Image deringing.


1     Introduction
Gibbs-ringing artifact reduction is an image restoration problem, that can be solved by
mathematical methods of image processing.
    Gibbs oscillations (Gibbs phenomenon) often occur near high-frequency image fea-
tures, for example, edges. Artifacts can be observed while mapping image on a finer
grid, during contrast enhancement, video compression and MRI data visualizing. Slight
image distortions can be left invisible, while severe Gibbs artifacts may even create
obstacles in patients diagnosing, if we refer to Gibbs oscillations caused by k-space
(Fourier space) truncation of MRI frequency domain (see Fig. 1).
    Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons
    License Attribution 4.0 International (CC BY 4.0).
?
    The reported study was funded by RFBR, CNPq and MOST according to the research project
    19-57-80014 (BRICS2019-394).
2 M. Penkin, A. Krylov, A. Khvostikov


Fig. 1. Examples of Gibbs-ringing artifacts on MRI images. Arrows point out areas of distortion.


    Simple finite real-valued periodic function can be observed to disclose mathematical
reasons for Gibbs phenomenon:
                            (
                             a, if t ∈ [−τ /2, τ /2]
                    ξ(t) =                                           ,               (1)
                             0, if t ∈ [−T /2, T /2] \ [−τ /2, τ /2]

                                      ξ(t) = ξ(t + T ),
where ξ(t) is a basic model of a contrast edge, a is the amplitude of edge and T is the
period of the model function.
    Assuming Fourier transform in a complex form and T = 2τ , (1) can be rewritten in
a form:
                                         +∞
                                         X
                                ξ(t) =        dk eiωk t ,                           (2)
                                           k=−∞
                R T2
where dk = T  1
                  − T2
                         ξ(t)e−iωk t dt, ωk = Ωk, Ω = 2π/T .

             +∞
            aX                 1                      2π     a             1
ξ(t) = 2 ·         (−1)k ·            · cos (2k + 1) · t = (cos Ωt − cos 3Ωt + ...),
            2              (2k + 1)π                  T      π             3
              k=0
                                                                                        (3)
where Ω = 2π/T .
    In practice it is impossible to include all terms in Fourier series (3), so Gibbs oscil-
lations occur (see Fig. 2). The amplitude of Gibbs oscillations is constant for a given
signal and doesn’t depend on a chosen cut-off frequency.
    In this paper we propose a new CNN architecture for MRI Gibbs-ringing suppres-
sion. It differs from recently introduced GAS-CNN [1] model by simplified architecture
of non-linear mapping, followed by trainable LRNN [2] post-processing with atten-
tion block, which controls correlation between input and output tensors of the post-
processing unit. The proposed architecture outperforms GAS-CNN on the generated
synthetic testing dataset in terms of PSNR [3].
    The remainder of this paper is organized as follows. In Section 2 we observe some
known methods for MRI Gibbs-ringing suppression. In Section 3 we describe MRI
dataset generation, give a detailed overview of the proposed architecture and show
                           Attention-based CNN for MRI Gibbs-ringing Artifact Suppression 3


Fig. 2. Gibbs oscillations on a model edge function ξ(t). (a) is artifact free signal, (b) is signal
with Gibbs-ringing.


profit of involving our modifications to the architecture. In Section 4 the results and
comparisons are presented. The work is concluded in Section 5.


2    Related Work
Gibbs-ringing reduction task has been solved by many methods so far. For example, the
problem can be tackled as variational, and the solution can be searched as a function,
which minimizes the stated functional in some functional space (L2 or L1 , for example):
                                            Z
                          1          2
                 J(u) = ku − u0 k + λ          | ∇u(x) | dx → min,                   (4)
                          2                  Ω                   u∈U

where u0 is an input Gibbs-corrupted image, u is a searched Gibbs-free image from the
chosen functional space U , Ω is the image’s area and λ is the regularization parameter.
The parameter can depend on the distance from the nearest image edge [4]. Joint ringing
estimation and suppression can be performed using sparse representations [5].
     Another recently introduced method is based on a search of optimal subpixels shifts [6].
The approach is intended to find a unique best shift for each pixel in terms of minimiz-
ing total variation in some predefined pixel’s neighbourhood. They found the neigh-
bourhood K = [1, 3] to be sufficient for the most Gibbs-ringing cases. Proposed ap-
proach was visually compared by authors with median filter and Lanczos filtering, and
it surpassed them.
     Deep learning methods have acquired great popularity in computer vision and im-
age processing nowadays. Convolutional neural networks map input images in high
dimensional feature spaces, implement filtering with a convolutional set and produce
output images using final image reconstruction net.
     GAS-CNN [1] is the example of very deep architecture used by authors to sup-
press Gibbs-ringing artifact on MRI images. Authors proposed it as the extension of the
super-resolution model EDSR [7].
4 M. Penkin, A. Krylov, A. Khvostikov

      The following distinct model’s features were presented by authors:

    – external U-Net [8] like skip connections;
    – decreasing the model’s size (diminishing of the feature space dimension);
    – flat architecture, as Gibbs oscillations are almost local phenomenon (rejection of
      spatial reduction layers, such as max pool or convolution with stride 2).

GAS-CNN maps input tensor into high dimentional feature space of depth 64 and then
implements non-linear residual filtering with 32 ResBlocks [9]. Architecture is con-
cluded with the simple reconstruction net, composed of one projection convolutional
layer.
    We chose this recently proposed model as a baseline and decided to conduct a re-
search on the ways of non-linear mapping simplification with the maintenance of gen-
eralization ability. Despite of making quite extensive analysis of GAS-CNN, for exam-
ple, showing the advantages in utilizing external skip connections and residual learn-
ing and making comparisons with other methods (sinc filtering, bilateral filtering [10],
NLM [11], GARCNN [12]), authors of GAS-CNN didn’t pay much attention on pos-
sible model’s redundancy. It deserves to mention that the trend of recent years is to
propose some hybrid refinement modules which make it possible to reduce amount of
convolutions in the ensemble [2, 13, 14], as the straightforward excessive stacking of
convolutional layers leads to learning degradation, vanishing gradients and so on.
    So, in this article we demonstrate the way to shrink the number of convolutions in
the non-linear mapping by two times and save (even improve) model’s generalization
ability, utilizing the proposed attention LRNN refinement module.


3     Proposed Architecture

The proposed architecture is shown in Fig. 3. We call it GAS14-ACNN (Gibbs-ringing
Artifact Supression Attention-based Convolutional Neural Network). It comprises of
the following structural blocks: representing input corrupted samples into high dimen-
sional feature space of depth 64 with the first convolution; performing non-linear map-
ping with 14 RCAN blocks [15] (two times less than in GAS-CNN); implementing
trainable post-processing with the proposed attention LRNN refinement module and
reconstructing output image with the final projection convolutional layer.


3.1    Dataset generation

Training, validating and testing synthetic sets were generated from ground truth MRI
dataset IXI1 using the following pipeline:

    – apply Fourier transform to ground truth image 256 × 256 from IXI dataset;
    – crop frequency spectrum: central 91 part of frequency domain is saved;
    – implement zero-padding, so that Gibbs-corrupted image fits the shape of ground
      truth image;
    – apply inverse Fourier transform to get Gibbs-corrupted image.
                             Attention-based CNN for MRI Gibbs-ringing Artifact Suppression 5


         Fig. 3. Proposed GAS14-ACNN architecture for MRI Gibbs-ringing suppression.


Fig. 4. Dataset generation pipeline. (a) is ground truth image from IXI dataset, (b) is Fourier
spectrum of image (a), (c) is cropped and zero-padded Fourier spectrum, (d) is Gibbs-corrupted
image.


Dataset generation process is visualized in Fig. 4.
    Zero-padding is not a necessary step in Gibbs data generation, Gibbs-ringing can
be synthesized just by cropping frequencies. In this work we include zero-padding to
create image pairs {Igt , IGibbs }N
                                  i=1 of the same spatial size. Zero-padding is often used
before iFFT to project image on a finer grid, and zero-padding is often passed to project
image on a coarse grid by inverse Fourier transform.
    IXI dataset contains 581 T1, 578 T2 and 578 PD volumes. Firstly, the intersection of
these volumes was taking, producing 577 volumes, which have all three modalities: T1,
T2 and PD. Then, first 400 volumes were utilized to synthesize training set, next 100
 1
     http://brain-development.org/ixi-dataset/
6 M. Penkin, A. Krylov, A. Khvostikov

volumes to create testing set and the rest of data was taken to generate validating set.
25 slices at both ends were discarded and every tenth slice was obtained to produce pair
(Igt , IGibbs )i [1]. So, training, validating and testing sets consist of 10427, 2016 and
1617 image pairs respectively. T1, T2 and PD have different data range, thus, maxmin
normalization was used to map input features to a single band:

                        Imin = min(IGT ), Imax = max(IGT ),                            (5)
                                          IGibbs − Imin
                            IGibbs normed =               ,                     (6)
                                           Imax − Imin
                                          IGT − Imin
                            IGT normed =                ,                       (7)
                                          Imax − Imin
where IGibbs is Gibbs-corrupted image, IGT is ground truth image,
IGibbs normed is normed Gibbs-corrupted image and IGT normed is normed ground truth
image.

3.2   Non-Linear Mapping
Images with Gibbs-ringing artifact obtain a significant blur also, as Gibbs-corrupted
images are generated by high frequencies truncation. Noticed that RCAN structural
module was successfully used in the recently published deep CNN architecture for im-
age deblur [15], one of the proposed modifications to GAS-CNN is the replaced Res-
Block [9] with RCAN (see Fig. 5) in non-linear mapping. The key difference of RCAN


      Fig. 5. RCAN module, used in non-linear mapping instead of GAS-CNN’s ResBlock.


module is the presence of trainable weights for each slice of convolution output. These
weights are generated applying global pooling operation (calculating expectation val-
ues over feature slices) and subsequently fusing acquired features by 1 × 1 ResBlock
with sigmoid as a closing activation.
     To the best of our knowledge the authors of GAS-CNN didn’t provide code and
weights, so to make fair comparisons we trained all presented here models ourselves,
utilizing the same training procedure (refer to Section 4 for details) and the same syn-
thetic generated dataset.
     GAS-CNN and RCAN-GAS-CNN were trained to evaluate RCAN performance.
RCAN-GAS-CNN precisely matches GAS-CNN architecture with the only difference
that RCAN module is used in non-linear mapping instead of an ordinary ResBlock. The
performance growth is shown in Table 1 and in Fig. 6
                         Attention-based CNN for MRI Gibbs-ringing Artifact Suppression 7

Table 1. Average PSNR values on the generated testing set. Benefits estimation of RCAN block
including to the non-linear mapping.

                       Model                          Average PSNR
                       Initial Gibbs-corrupted images     29.79
                       GAS-CNN                            32.04
                       RCAN-GAS-CNN                       32.19


Fig. 6. RCAN influence on GAS-CNN architecture. (a) is Gibbs-corrupted image, (b) is ground
truth image, (c) is GAS-CNN result, (d) is RCAN-GAS-CNN result.


3.3   Attention LRNN Refinement
LRNN refinement block was introduced in [2], and its ideas have been effectively in-
corporated into deblur [2] and depth generation pipelines [13].
    We utilized LRNN approach in solving Gibbs-ringing problem and, moreover, ex-
tended it by attention mechanism, controlling correlation between LRNN input and
output features. Such attention module aims to force LRNN to be a refinement block.
Authors [14] used similar attention unit to improve biomedical image segmentation,
nevertheless, our approach differs from the existing one [14] in the way we employ
attention mechanism. For the current case, it has the sense of additional constraint to
the refinement operation, whereas in [14] it is applied to group features based on their
correlation within a single feature tensor.
    LRNN has two input tensors: feature tensor (non-linear mapping result), to be recur-
sively processed, and weights tensor. We acquire weights by the auxiliary RNN weights
generation net (see Fig. 3), which is trained end-to-end with the whole neural net-
work. LRNN implements 4 recursive updates: left-to-right, right-to-left, top-to-down
and down-to-top, applying the rule:

                           Ht+1 := (1 − ω) · Ht+1 + ω · Ht ,                            (8)

where Ht is the current processing row, if we refer to top-to-down or down-to-up op-
erations. Subsequent concatenation and convolution fuse recursively processed tensors
and conclude LRNN operation.
    LRNN can be viewed as an alternative (hybrid) way to enlarge receptive field and to
accumulate global spatial information within the layer. Despite of the locality of Gibbs
oscillations, mentioned above, accounting overall information within the layer occurred
to be very helpful and remarkably raised generalization ability of the architecture.
8 M. Penkin, A. Krylov, A. Khvostikov

   We trained two extra CNNs to reveal LRNN advantages:
 – RCAN8-GAS-CNN – model accurately matches with original GAS-CNN, but the
   number of blocks in non-linear mapping is heavily decreased by 4 times;
 – RCAN8-GAS-CNN+LRNN×2 – previously stated model, extended by proposed
   LRNN refinement;
    The obvious positive LRNN impact can be observed in Table 2 and in Fig. 7.
RCAN8-GAS-CNN+LRNN×2 has the comparable performance with the original 4
times deeper GAS-CNN, whereas repealing of LRNN post-processing leads to algo-
rithm’s degradation.


Table 2. Average PSNR values on the generated testing set. Benefits estimation of LRNN block
including after the non-linear mapping.

                       Model                          Average PSNR
                       Initial Gibbs-corrupted images     29.79
                       GAS-CNN                            32.04
                       RCAN8-GAS-CNN                      31.67
                       RCAN8-GAS-CNN+LRNN×2               32.19


                Fig. 7. PSNR on the validation set over epoch while training.


    Attention module is shown in Fig. 8. It gets three tensors as inputs: x1 ∈ IRC×H·W
– non-linear mapping output (LRNN input), x2 ∈ IRH·W ×C – LRNN output, x3 ∈
IRC×H·W – transposed copy of x2 . Attention block performs weighted regrouping of
x3 features in a manner to uplift feature at position i, mostly correlated with ith feature
of LRNN’s input tensor x1 .
                         Attention-based CNN for MRI Gibbs-ringing Artifact Suppression 9


                                   Fig. 8. Attention unit.

    Assume f1 , ..., fC to be x3 features. Introduce following latent variables A =
{a1 , ..., aC }. Value ai ∈ {1, ..., C} defines x3 feature, mostly correlated with ith fea-
ture of LRNN’s input tensor x1 . Weighted regrouping is executed via computed corre-
lation tensor C ∈ IRC×C (probability distribution map over the latent variables values).
    It deserves mentioning that there is an analogy with word aligning task in classical
machine learning (for example, IBM Model 1). Proposed attention LRNN unit performs
like feature aligner of LRNN input and output tensors, forcing LRNN to be a refinement
block.
    Finally, we get the overall proposed architecture: RCAN14-GAS-CNN+Attention
LRNN×2 (see Fig. 3). We call it GAS14-ACNN. Table 3 shows the increase of perfor-
mance, caused by the attention unit.


Table 3. Average PSNR values on the generated testing set. Estimation of the performance
growth, caused by the attention block.

                      Model                          Average PSNR
                      Initial Gibbs-corrupted images     29.79
                      GAS-CNN                            32.04
                      RCAN14-GAS-CNN+LRNN×2              32.57
                      GAS14-ACNN (proposed)              32.65


4     Experiments

Proposed attention-based convolutional neural network for Gibbs-ringing reduction was
implemented in Python 3 with the use of deep learning framework Tensorflow 1.14. We
provide implementation of our code at2 .
 2
     https://github.com/MaksimPenkin/GAS14-ACNN
10 M. Penkin, A. Krylov, A. Khvostikov

  Models were trained with Adam optimizer [16] (β1 =0.9, β2 =0.999, ε=1e-08) on
GPU NVIDIA GeForce RTX 2080 Ti. Learning rate had polynomial decay:

                                                         x p
                          lr(x) = (lr0 − lr1 ) · (1 −      ) + lr1 ,                     (9)
                                                         M

where x – training step, lr0 – initial learning rate value, lr1 – final learning rate value,
M = Ne · Nd / bs – amount of training steps, Ne – epochs number, Nd – amount of
pairs in the training set, bs – number of pairs, fed to the algorithm on the current training
step (batch size), p – polynomial power.
    We used the following values of these parameters: lr0 = 10−4 , lr1 = 0, Ne = 1000,
bs = 20, p = 0.3.
    We applied L1 loss function with l2 weights regularization (γ = 10−4 ) to prevent
models’ overfit.
    We utilized augmentation by rotations and flips for patches of shape (48 × 48)
during training. 10 random patches were cropped from each training image before an
augmentation. Validation and testing were performed on full-size images.
    All convolutions’ kernels have spatial size 3 × 3, except for one projection convolu-
tion just before LRNN refinement unit: it has kernel of spatial size 1 × 1, and it projects
features on some trainable manifold of the less dimension.
    GAS-CNN, chosen baseline model, and the proposed GAS14-ACNN can be com-
pared in Fig. 9 and Fig. 10. It takes approximately 1.03 sec. and 0.05 sec. to process one
image (256 × 256) for GAS-CNN on CPU and GPU respectively3 . And it takes approx-
imately 1.13 sec. and 0.08 sec. to process one image (256×256) for our GAS14-ACNN
on the same CPU and GPU respectively3 .


                  Fig. 9. PSNR on the validation set over epoch while training.


 3
     Intel(R) Core(TM) i7-8700; NVIDIA GeForce RTX 2080 Ti
                          Attention-based CNN for MRI Gibbs-ringing Artifact Suppression 11


Fig. 10. Estimation of visual quality growth. (a) Gibbs-corrupted images, (b) ground truth im-
ages, (c) GAS-CNN results, (d) GAS14-ACNN results (proposed).


5   Conclusion
We proposed the new attention-based convolutional architecture GAS14-ACNN for
MRI Gibbs-ringing suppression. This architecture is the extension of recently proposed
GAS-CNN model with significantly simplified non-linear mapping, followed by atten-
tion LRNN unit to save generalization ability. The presented attention mechanism acts
as auxiliary constraint for LRNN post-processing and as feature filtering module. The
proposed GAS14-ACNN model outperforms baseline GAS-CNN on the generated syn-
thetic testing set.
12 M. Penkin, A. Krylov, A. Khvostikov

References
1. Zhao, X., Zhang, H., Zhou, Y., Bian, W., Zhang, T., Zou, X.: Gibbs-ringing artifact suppres-
   sion with knowledge transfer from natural images to MR images. Multimedia Tools and Ap-
   plications, 1–23 (2019)
2. Zhang, J., Pan, J., Ren, J., Song, Y., Bao, L., Lau, R. W., Yang, M. H.: Dynamic scene deblur-
   ring using spatially variant recurrent neural networks. In: Proceedings of the IEEE Confer-
   ence on Computer Vision and Pattern Recognition, pp. 2521–2529. Salt Lake City, UT, USA
   (2018). https://doi.org/10.1109/CVPR.2018.00267
3. Al-Najjar, Y. A., Soong, D. C.: Comparison of image quality assessment: PSNR, HVS, SSIM,
   UIQI. Int. J. Sci. Eng. Res 3(8), 1–5 (2012)
4. Sitdikov, I. T., Krylov, A. S.: Variational Image Deringing Using Varying Regularization Pa-
   rameter. Pattern Recognition and Image Analysis: Advances in Mathematical Theory and Ap-
   plications 25(1), 96–100 (2015)
5. Umnov, A. V., Krylov, A. S.: Sparse Approach to Image Ringing Detection and Suppression.
   Pattern Recognition and Image Analysis: Advances in Mathematical Theory and Applications
   27(4), 754–762 (2017)
6. Kellner, E., Dhital, B., Kiselev, V. G., Reisert, M.: Gibbs-ringing artifact removal based on
   local subvoxel-shifts. Magnetic resonance in medicine 76(5), 1574–1581 (2016)
7. Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks
   for single image super-resolution. In: Proceedings of the IEEE conference on computer
   vision and pattern recognition workshops, pp. 136–144. Honolulu, HI, USA (2017).
   https://doi.org/10.1109/CVPRW.2017.151
8. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical im-
   age segmentation. In: International Conference on Medical image computing and computer-
   assisted intervention, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-
   319-24574-4 28
9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceed-
   ings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Las
   Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90
10. Zhang, M., Gunturk, B. K.: Multiresolution bilateral filtering for image denoising. IEEE
   Transactions on image processing 17(12), 2324–2333 (2008)
11. Manjón, J. V., Coupé, P., Buades, A., Fonov, V., Collins, D. L., Robles, M.: Non-local MRI
   upsampling. Medical image analysis 14(6), 784–792 (2010)
12. Wang, Y., Song, Y., Xie, H., Li, W., Hu, B., Yang, G.: Reduction of Gibbs artifacts in mag-
   netic resonance imaging based on Convolutional Neural Network. In: 2017 10th international
   congress on image and signal processing, biomedical engineering and informatics (CISP-
   BMEI), pp. 1–5. Shanghai, China (2017). https://doi.org/10.1109/CISP-BMEI.2017.8302197
13. Cheng, X., Wang, P., Yang, R.: Learning depth with convolutional spatial propagation net-
   work. arXiv preprint arXiv:1810.02695 (2018)
14. Sinha, A., Dolz, J.: Multi-scale self-guided attention for medical image segmentation. IEEE
   Journal of Biomedical and Health Informatics, arXiv preprint arXiv:1906.02849 (2020)
15. Park, D., Kim, J., Chun, S. Y.: Down-scaling with learned kernels in multi-scale deep neural
   networks for non-uniform single image deblurring. arXiv preprint arXiv:1903.10157 (2019)
16. Da, K.: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

</pre>