=Paper=
{{Paper
|id=Vol-2744/paper34
|storemode=property
|title=Attention-based Convolutional Neural Network for MRI Gibbs-ringing Artifact Suppression
|pdfUrl=https://ceur-ws.org/Vol-2744/paper34.pdf
|volume=Vol-2744
|authors=Maksim Penkin,Andrey Krylov,Alexander Khvostikov
}}
==Attention-based Convolutional Neural Network for MRI Gibbs-ringing Artifact Suppression==
Attention-based Convolutional Neural Network for MRI Gibbs-ringing Artifact Suppression? Maksim Penkin[0000−0002−8027−9333] , Andrey Krylov[0000−0001−9910−4501] , and Alexander Khvostikov[0000−0002−4217−7141] Lomonosov Moscow State University, Moscow, Russia penkin97@gmail.com kryl@cs.msu.ru khvostikov@cs.msu.ru https://imaging.cs.msu.ru/ru Abstract. Gibbs-ringing artifact is a common artifact in MRI image processing. As MRI raw data is taken in a frequency domain, 2D inverse discrete Fourier transform is applied to visualize data. Inability to take inverse Fourier transform of full spectrum (full k-space) leads to the insufficient sampling of the high fre- quency data and results in a well-known Gibbs phenomenon. It is worth to notice that truncation of high frequency information generates a significant blur, thus some techniques from other image restoration problems (for example, image de- blur task) can be successfully used. We propose attention-based convolutional neural network for Gibbs-ringing reduction which is the extension of recently proposed GAS-CNN (Gibbs-ringing Artifact Suppression Convolutional Neural Network). Proposed method includes simplified non-linear mapping, amended by LRNN (Layer Recurrent Neural Network) refinement block with feature at- tention module, controlling the correlation between input and output tensors of the refinement unit. The research shows that the proposed post-processing refine- ment construction considerably simplifies the non-linear mapping. Keywords: Gibbs-ringing artifacts, Magnetic resonance imaging, Attention CNN, Image deringing. 1 Introduction Gibbs-ringing artifact reduction is an image restoration problem, that can be solved by mathematical methods of image processing. Gibbs oscillations (Gibbs phenomenon) often occur near high-frequency image fea- tures, for example, edges. Artifacts can be observed while mapping image on a finer grid, during contrast enhancement, video compression and MRI data visualizing. Slight image distortions can be left invisible, while severe Gibbs artifacts may even create obstacles in patients diagnosing, if we refer to Gibbs oscillations caused by k-space (Fourier space) truncation of MRI frequency domain (see Fig. 1). Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). ? The reported study was funded by RFBR, CNPq and MOST according to the research project 19-57-80014 (BRICS2019-394). 2 M. Penkin, A. Krylov, A. Khvostikov Fig. 1. Examples of Gibbs-ringing artifacts on MRI images. Arrows point out areas of distortion. Simple finite real-valued periodic function can be observed to disclose mathematical reasons for Gibbs phenomenon: ( a, if t ∈ [−τ /2, τ /2] ξ(t) = , (1) 0, if t ∈ [−T /2, T /2] \ [−τ /2, τ /2] ξ(t) = ξ(t + T ), where ξ(t) is a basic model of a contrast edge, a is the amplitude of edge and T is the period of the model function. Assuming Fourier transform in a complex form and T = 2τ , (1) can be rewritten in a form: +∞ X ξ(t) = dk eiωk t , (2) k=−∞ R T2 where dk = T 1 − T2 ξ(t)e−iωk t dt, ωk = Ωk, Ω = 2π/T . +∞ aX 1 2π a 1 ξ(t) = 2 · (−1)k · · cos (2k + 1) · t = (cos Ωt − cos 3Ωt + ...), 2 (2k + 1)π T π 3 k=0 (3) where Ω = 2π/T . In practice it is impossible to include all terms in Fourier series (3), so Gibbs oscil- lations occur (see Fig. 2). The amplitude of Gibbs oscillations is constant for a given signal and doesn’t depend on a chosen cut-off frequency. In this paper we propose a new CNN architecture for MRI Gibbs-ringing suppres- sion. It differs from recently introduced GAS-CNN [1] model by simplified architecture of non-linear mapping, followed by trainable LRNN [2] post-processing with atten- tion block, which controls correlation between input and output tensors of the post- processing unit. The proposed architecture outperforms GAS-CNN on the generated synthetic testing dataset in terms of PSNR [3]. The remainder of this paper is organized as follows. In Section 2 we observe some known methods for MRI Gibbs-ringing suppression. In Section 3 we describe MRI dataset generation, give a detailed overview of the proposed architecture and show Attention-based CNN for MRI Gibbs-ringing Artifact Suppression 3 Fig. 2. Gibbs oscillations on a model edge function ξ(t). (a) is artifact free signal, (b) is signal with Gibbs-ringing. profit of involving our modifications to the architecture. In Section 4 the results and comparisons are presented. The work is concluded in Section 5. 2 Related Work Gibbs-ringing reduction task has been solved by many methods so far. For example, the problem can be tackled as variational, and the solution can be searched as a function, which minimizes the stated functional in some functional space (L2 or L1 , for example): Z 1 2 J(u) = ku − u0 k + λ | ∇u(x) | dx → min, (4) 2 Ω u∈U where u0 is an input Gibbs-corrupted image, u is a searched Gibbs-free image from the chosen functional space U , Ω is the image’s area and λ is the regularization parameter. The parameter can depend on the distance from the nearest image edge [4]. Joint ringing estimation and suppression can be performed using sparse representations [5]. Another recently introduced method is based on a search of optimal subpixels shifts [6]. The approach is intended to find a unique best shift for each pixel in terms of minimiz- ing total variation in some predefined pixel’s neighbourhood. They found the neigh- bourhood K = [1, 3] to be sufficient for the most Gibbs-ringing cases. Proposed ap- proach was visually compared by authors with median filter and Lanczos filtering, and it surpassed them. Deep learning methods have acquired great popularity in computer vision and im- age processing nowadays. Convolutional neural networks map input images in high dimensional feature spaces, implement filtering with a convolutional set and produce output images using final image reconstruction net. GAS-CNN [1] is the example of very deep architecture used by authors to sup- press Gibbs-ringing artifact on MRI images. Authors proposed it as the extension of the super-resolution model EDSR [7]. 4 M. Penkin, A. Krylov, A. Khvostikov The following distinct model’s features were presented by authors: – external U-Net [8] like skip connections; – decreasing the model’s size (diminishing of the feature space dimension); – flat architecture, as Gibbs oscillations are almost local phenomenon (rejection of spatial reduction layers, such as max pool or convolution with stride 2). GAS-CNN maps input tensor into high dimentional feature space of depth 64 and then implements non-linear residual filtering with 32 ResBlocks [9]. Architecture is con- cluded with the simple reconstruction net, composed of one projection convolutional layer. We chose this recently proposed model as a baseline and decided to conduct a re- search on the ways of non-linear mapping simplification with the maintenance of gen- eralization ability. Despite of making quite extensive analysis of GAS-CNN, for exam- ple, showing the advantages in utilizing external skip connections and residual learn- ing and making comparisons with other methods (sinc filtering, bilateral filtering [10], NLM [11], GARCNN [12]), authors of GAS-CNN didn’t pay much attention on pos- sible model’s redundancy. It deserves to mention that the trend of recent years is to propose some hybrid refinement modules which make it possible to reduce amount of convolutions in the ensemble [2, 13, 14], as the straightforward excessive stacking of convolutional layers leads to learning degradation, vanishing gradients and so on. So, in this article we demonstrate the way to shrink the number of convolutions in the non-linear mapping by two times and save (even improve) model’s generalization ability, utilizing the proposed attention LRNN refinement module. 3 Proposed Architecture The proposed architecture is shown in Fig. 3. We call it GAS14-ACNN (Gibbs-ringing Artifact Supression Attention-based Convolutional Neural Network). It comprises of the following structural blocks: representing input corrupted samples into high dimen- sional feature space of depth 64 with the first convolution; performing non-linear map- ping with 14 RCAN blocks [15] (two times less than in GAS-CNN); implementing trainable post-processing with the proposed attention LRNN refinement module and reconstructing output image with the final projection convolutional layer. 3.1 Dataset generation Training, validating and testing synthetic sets were generated from ground truth MRI dataset IXI1 using the following pipeline: – apply Fourier transform to ground truth image 256 × 256 from IXI dataset; – crop frequency spectrum: central 91 part of frequency domain is saved; – implement zero-padding, so that Gibbs-corrupted image fits the shape of ground truth image; – apply inverse Fourier transform to get Gibbs-corrupted image. Attention-based CNN for MRI Gibbs-ringing Artifact Suppression 5 Fig. 3. Proposed GAS14-ACNN architecture for MRI Gibbs-ringing suppression. Fig. 4. Dataset generation pipeline. (a) is ground truth image from IXI dataset, (b) is Fourier spectrum of image (a), (c) is cropped and zero-padded Fourier spectrum, (d) is Gibbs-corrupted image. Dataset generation process is visualized in Fig. 4. Zero-padding is not a necessary step in Gibbs data generation, Gibbs-ringing can be synthesized just by cropping frequencies. In this work we include zero-padding to create image pairs {Igt , IGibbs }N i=1 of the same spatial size. Zero-padding is often used before iFFT to project image on a finer grid, and zero-padding is often passed to project image on a coarse grid by inverse Fourier transform. IXI dataset contains 581 T1, 578 T2 and 578 PD volumes. Firstly, the intersection of these volumes was taking, producing 577 volumes, which have all three modalities: T1, T2 and PD. Then, first 400 volumes were utilized to synthesize training set, next 100 1 http://brain-development.org/ixi-dataset/ 6 M. Penkin, A. Krylov, A. Khvostikov volumes to create testing set and the rest of data was taken to generate validating set. 25 slices at both ends were discarded and every tenth slice was obtained to produce pair (Igt , IGibbs )i [1]. So, training, validating and testing sets consist of 10427, 2016 and 1617 image pairs respectively. T1, T2 and PD have different data range, thus, maxmin normalization was used to map input features to a single band: Imin = min(IGT ), Imax = max(IGT ), (5) IGibbs − Imin IGibbs normed = , (6) Imax − Imin IGT − Imin IGT normed = , (7) Imax − Imin where IGibbs is Gibbs-corrupted image, IGT is ground truth image, IGibbs normed is normed Gibbs-corrupted image and IGT normed is normed ground truth image. 3.2 Non-Linear Mapping Images with Gibbs-ringing artifact obtain a significant blur also, as Gibbs-corrupted images are generated by high frequencies truncation. Noticed that RCAN structural module was successfully used in the recently published deep CNN architecture for im- age deblur [15], one of the proposed modifications to GAS-CNN is the replaced Res- Block [9] with RCAN (see Fig. 5) in non-linear mapping. The key difference of RCAN Fig. 5. RCAN module, used in non-linear mapping instead of GAS-CNN’s ResBlock. module is the presence of trainable weights for each slice of convolution output. These weights are generated applying global pooling operation (calculating expectation val- ues over feature slices) and subsequently fusing acquired features by 1 × 1 ResBlock with sigmoid as a closing activation. To the best of our knowledge the authors of GAS-CNN didn’t provide code and weights, so to make fair comparisons we trained all presented here models ourselves, utilizing the same training procedure (refer to Section 4 for details) and the same syn- thetic generated dataset. GAS-CNN and RCAN-GAS-CNN were trained to evaluate RCAN performance. RCAN-GAS-CNN precisely matches GAS-CNN architecture with the only difference that RCAN module is used in non-linear mapping instead of an ordinary ResBlock. The performance growth is shown in Table 1 and in Fig. 6 Attention-based CNN for MRI Gibbs-ringing Artifact Suppression 7 Table 1. Average PSNR values on the generated testing set. Benefits estimation of RCAN block including to the non-linear mapping. Model Average PSNR Initial Gibbs-corrupted images 29.79 GAS-CNN 32.04 RCAN-GAS-CNN 32.19 Fig. 6. RCAN influence on GAS-CNN architecture. (a) is Gibbs-corrupted image, (b) is ground truth image, (c) is GAS-CNN result, (d) is RCAN-GAS-CNN result. 3.3 Attention LRNN Refinement LRNN refinement block was introduced in [2], and its ideas have been effectively in- corporated into deblur [2] and depth generation pipelines [13]. We utilized LRNN approach in solving Gibbs-ringing problem and, moreover, ex- tended it by attention mechanism, controlling correlation between LRNN input and output features. Such attention module aims to force LRNN to be a refinement block. Authors [14] used similar attention unit to improve biomedical image segmentation, nevertheless, our approach differs from the existing one [14] in the way we employ attention mechanism. For the current case, it has the sense of additional constraint to the refinement operation, whereas in [14] it is applied to group features based on their correlation within a single feature tensor. LRNN has two input tensors: feature tensor (non-linear mapping result), to be recur- sively processed, and weights tensor. We acquire weights by the auxiliary RNN weights generation net (see Fig. 3), which is trained end-to-end with the whole neural net- work. LRNN implements 4 recursive updates: left-to-right, right-to-left, top-to-down and down-to-top, applying the rule: Ht+1 := (1 − ω) · Ht+1 + ω · Ht , (8) where Ht is the current processing row, if we refer to top-to-down or down-to-up op- erations. Subsequent concatenation and convolution fuse recursively processed tensors and conclude LRNN operation. LRNN can be viewed as an alternative (hybrid) way to enlarge receptive field and to accumulate global spatial information within the layer. Despite of the locality of Gibbs oscillations, mentioned above, accounting overall information within the layer occurred to be very helpful and remarkably raised generalization ability of the architecture. 8 M. Penkin, A. Krylov, A. Khvostikov We trained two extra CNNs to reveal LRNN advantages: – RCAN8-GAS-CNN – model accurately matches with original GAS-CNN, but the number of blocks in non-linear mapping is heavily decreased by 4 times; – RCAN8-GAS-CNN+LRNN×2 – previously stated model, extended by proposed LRNN refinement; The obvious positive LRNN impact can be observed in Table 2 and in Fig. 7. RCAN8-GAS-CNN+LRNN×2 has the comparable performance with the original 4 times deeper GAS-CNN, whereas repealing of LRNN post-processing leads to algo- rithm’s degradation. Table 2. Average PSNR values on the generated testing set. Benefits estimation of LRNN block including after the non-linear mapping. Model Average PSNR Initial Gibbs-corrupted images 29.79 GAS-CNN 32.04 RCAN8-GAS-CNN 31.67 RCAN8-GAS-CNN+LRNN×2 32.19 Fig. 7. PSNR on the validation set over epoch while training. Attention module is shown in Fig. 8. It gets three tensors as inputs: x1 ∈ IRC×H·W – non-linear mapping output (LRNN input), x2 ∈ IRH·W ×C – LRNN output, x3 ∈ IRC×H·W – transposed copy of x2 . Attention block performs weighted regrouping of x3 features in a manner to uplift feature at position i, mostly correlated with ith feature of LRNN’s input tensor x1 . Attention-based CNN for MRI Gibbs-ringing Artifact Suppression 9 Fig. 8. Attention unit. Assume f1 , ..., fC to be x3 features. Introduce following latent variables A = {a1 , ..., aC }. Value ai ∈ {1, ..., C} defines x3 feature, mostly correlated with ith fea- ture of LRNN’s input tensor x1 . Weighted regrouping is executed via computed corre- lation tensor C ∈ IRC×C (probability distribution map over the latent variables values). It deserves mentioning that there is an analogy with word aligning task in classical machine learning (for example, IBM Model 1). Proposed attention LRNN unit performs like feature aligner of LRNN input and output tensors, forcing LRNN to be a refinement block. Finally, we get the overall proposed architecture: RCAN14-GAS-CNN+Attention LRNN×2 (see Fig. 3). We call it GAS14-ACNN. Table 3 shows the increase of perfor- mance, caused by the attention unit. Table 3. Average PSNR values on the generated testing set. Estimation of the performance growth, caused by the attention block. Model Average PSNR Initial Gibbs-corrupted images 29.79 GAS-CNN 32.04 RCAN14-GAS-CNN+LRNN×2 32.57 GAS14-ACNN (proposed) 32.65 4 Experiments Proposed attention-based convolutional neural network for Gibbs-ringing reduction was implemented in Python 3 with the use of deep learning framework Tensorflow 1.14. We provide implementation of our code at2 . 2 https://github.com/MaksimPenkin/GAS14-ACNN 10 M. Penkin, A. Krylov, A. Khvostikov Models were trained with Adam optimizer [16] (β1 =0.9, β2 =0.999, ε=1e-08) on GPU NVIDIA GeForce RTX 2080 Ti. Learning rate had polynomial decay: x p lr(x) = (lr0 − lr1 ) · (1 − ) + lr1 , (9) M where x – training step, lr0 – initial learning rate value, lr1 – final learning rate value, M = Ne · Nd / bs – amount of training steps, Ne – epochs number, Nd – amount of pairs in the training set, bs – number of pairs, fed to the algorithm on the current training step (batch size), p – polynomial power. We used the following values of these parameters: lr0 = 10−4 , lr1 = 0, Ne = 1000, bs = 20, p = 0.3. We applied L1 loss function with l2 weights regularization (γ = 10−4 ) to prevent models’ overfit. We utilized augmentation by rotations and flips for patches of shape (48 × 48) during training. 10 random patches were cropped from each training image before an augmentation. Validation and testing were performed on full-size images. All convolutions’ kernels have spatial size 3 × 3, except for one projection convolu- tion just before LRNN refinement unit: it has kernel of spatial size 1 × 1, and it projects features on some trainable manifold of the less dimension. GAS-CNN, chosen baseline model, and the proposed GAS14-ACNN can be com- pared in Fig. 9 and Fig. 10. It takes approximately 1.03 sec. and 0.05 sec. to process one image (256 × 256) for GAS-CNN on CPU and GPU respectively3 . And it takes approx- imately 1.13 sec. and 0.08 sec. to process one image (256×256) for our GAS14-ACNN on the same CPU and GPU respectively3 . Fig. 9. PSNR on the validation set over epoch while training. 3 Intel(R) Core(TM) i7-8700; NVIDIA GeForce RTX 2080 Ti Attention-based CNN for MRI Gibbs-ringing Artifact Suppression 11 Fig. 10. Estimation of visual quality growth. (a) Gibbs-corrupted images, (b) ground truth im- ages, (c) GAS-CNN results, (d) GAS14-ACNN results (proposed). 5 Conclusion We proposed the new attention-based convolutional architecture GAS14-ACNN for MRI Gibbs-ringing suppression. This architecture is the extension of recently proposed GAS-CNN model with significantly simplified non-linear mapping, followed by atten- tion LRNN unit to save generalization ability. The presented attention mechanism acts as auxiliary constraint for LRNN post-processing and as feature filtering module. The proposed GAS14-ACNN model outperforms baseline GAS-CNN on the generated syn- thetic testing set. 12 M. Penkin, A. Krylov, A. Khvostikov References 1. Zhao, X., Zhang, H., Zhou, Y., Bian, W., Zhang, T., Zou, X.: Gibbs-ringing artifact suppres- sion with knowledge transfer from natural images to MR images. Multimedia Tools and Ap- plications, 1–23 (2019) 2. Zhang, J., Pan, J., Ren, J., Song, Y., Bao, L., Lau, R. W., Yang, M. H.: Dynamic scene deblur- ring using spatially variant recurrent neural networks. In: Proceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition, pp. 2521–2529. Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00267 3. Al-Najjar, Y. A., Soong, D. C.: Comparison of image quality assessment: PSNR, HVS, SSIM, UIQI. Int. J. Sci. Eng. Res 3(8), 1–5 (2012) 4. Sitdikov, I. T., Krylov, A. S.: Variational Image Deringing Using Varying Regularization Pa- rameter. Pattern Recognition and Image Analysis: Advances in Mathematical Theory and Ap- plications 25(1), 96–100 (2015) 5. Umnov, A. V., Krylov, A. S.: Sparse Approach to Image Ringing Detection and Suppression. Pattern Recognition and Image Analysis: Advances in Mathematical Theory and Applications 27(4), 754–762 (2017) 6. Kellner, E., Dhital, B., Kiselev, V. G., Reisert, M.: Gibbs-ringing artifact removal based on local subvoxel-shifts. Magnetic resonance in medicine 76(5), 1574–1581 (2016) 7. Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136–144. Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPRW.2017.151 8. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical im- age segmentation. In: International Conference on Medical image computing and computer- assisted intervention, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3- 319-24574-4 28 9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceed- ings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90 10. Zhang, M., Gunturk, B. K.: Multiresolution bilateral filtering for image denoising. IEEE Transactions on image processing 17(12), 2324–2333 (2008) 11. Manjón, J. V., Coupé, P., Buades, A., Fonov, V., Collins, D. L., Robles, M.: Non-local MRI upsampling. Medical image analysis 14(6), 784–792 (2010) 12. Wang, Y., Song, Y., Xie, H., Li, W., Hu, B., Yang, G.: Reduction of Gibbs artifacts in mag- netic resonance imaging based on Convolutional Neural Network. In: 2017 10th international congress on image and signal processing, biomedical engineering and informatics (CISP- BMEI), pp. 1–5. Shanghai, China (2017). https://doi.org/10.1109/CISP-BMEI.2017.8302197 13. Cheng, X., Wang, P., Yang, R.: Learning depth with convolutional spatial propagation net- work. arXiv preprint arXiv:1810.02695 (2018) 14. Sinha, A., Dolz, J.: Multi-scale self-guided attention for medical image segmentation. IEEE Journal of Biomedical and Health Informatics, arXiv preprint arXiv:1906.02849 (2020) 15. Park, D., Kim, J., Chun, S. Y.: Down-scaling with learned kernels in multi-scale deep neural networks for non-uniform single image deblurring. arXiv preprint arXiv:1903.10157 (2019) 16. Da, K.: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)