=Paper=
{{Paper
|id=Vol-3351/paper02
|storemode=property
|title=Coupled Feedback Attention Networks
|pdfUrl=https://ceur-ws.org/Vol-3351/paper02.pdf
|volume=Vol-3351
|authors=Rong Wang,Chunjiang Duanmu
|dblpUrl=https://dblp.org/rec/conf/aiotc/WangD22
}}
==Coupled Feedback Attention Networks==
Coupled Feedback Attention Networks 1
Rong Wang1*, Chunjiang Duanmu2
1
College of Mathematics and Computer Science, Zhejiang Normal University, Jin Hua, Zhejiang, China
2
College of Physics and Electronic Information Engineering, Zhejiang Normal University, Jin Hua, Zhejiang,
China
Abstract
In their daily lives, people frequently need to obtain images with a high dynamic range and
resolution. Due to technological equipment limitations, high dynamic range images are
produced by multi-exposure fusion (MEF) of low dynamic range images, while high resolution
images are frequently obtained by super-resolution (SR) of low resolution images. MEF and
SR are often analyzed separately. This research examines existing approaches and proposes a
coupled feedback network attention network and its method to address the issue that current
models cannot achieve high dynamic range and high resolution simultaneously.
Keywords
channel attention mechanism; coupled feedback mechanism;
1 Introduction
High dynamic range (HDR) images contain a broader dynamic range and richer texture features
compared to typical low dynamic range (LDR) images and low resolution (LR) images, and high
resolution (HR) images can enhance object detection accuracy. Technical methods to obtain HDR
images and HR images, respectively, include single image super resolution (SISR) and multi-exposure
image fusion (MEF).
By fusing two LDR images, the extreme exposure image fusion method creates an HDR image. Ma
et al.[8] provided a quick approach for fusing multiple exposure images that improved the initial weights
using a guided filter. Later, Xu et al[7] proposed a unified unsupervised fusion method that overcomes
the fusion barrier of most images by constraining the similarity between the fused image and the original
image.
With the continuous development of deep neural networks, many CNN-based methods have been
proposed in the field of SISR. RCAN[4] introduces an attention mechanism to further improve the
reconstruction quality. SRFBN[2] introduces a feedback structure to optimize shallow features through
iteration to produce deeper features.
The above MEF and SISR methods are used to solve the LDR and LR problems, respectively, but
in real life, people often need to see HDR and HR images on cell phones or TVs, so the joint MEF and
SR methods are necessary. This paper proposes a coupled feedback attention network-based image
exposure fusion and super-resolution method, which can effectively suppress the superposition of
redundant information in cyclic iterations, improve the quality of parameter sharing as well as exposure
feature propagation.
2 Coupled Feedback Attention Network
In order to solve the propagation of redundant features and enhance the propagation of useful
features in the coupled feedback network, this paper combines the coupled attention mechanism and
feedback mechanism, and proposes an image exposure fusion and image super-resolution method based
AIoTC2022@International Conference on Artificial Intelligence, Internet of Things and Cloud Computing Technology
EMAIL: 2101440741@qq.com (Rong Wang), duanmu@zjnu.cn (Chunjiang Duanmu)
ยฉ 2022 Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)
8
on the coupled feedback attention network.
2.1 Basic network structure
The structure of the coupled feedback network is shown in Fig. 1. The shallow features ๐น and
๐น go through T rounds of iteration by the coupled feedback attention module in the upper and lower
network, respectively. The feedback features in each iteration combine the feedback features in the other
network and the shallow features in this network, together as the input of the next iteration, to achieve
the refinement fused features. The coupled-feedback attention layer contains multiple coupled-feedback
blocks and an attention module.
The extraction process of shallow features ๐น and ๐น of LR images can be expressed as
๐น =๐ (๐ผ )
๐น =๐ (๐ผ )
where ๐ contains two convolutional layers Conv(3,4รm) and Conv(1,m), which are used to
extract LR features and compress LR features, respectively. The extracted shallow features are first
passed through SRB to obtain the deep features ๐บ and ๐บ , which can be expressed as
๐บ =๐ (๐น )
๐บ =๐ (๐น )
where ๐ is the super-resolution module (SRB) operation.
Next, the deep exposure features of the two sub-networks are deeply fused after several iterations.
At each iteration, the feedback features of the previous iteration are coupled and the shallow features
๐น and ๐น of the respective networks are together as the input of this iteration, and the feedback
features ๐ถ and ๐ถ of the t-th iteration can be expressed as
๐ถ =๐ (๐น , ๐บ ,๐บ )
๐ถ =๐ (๐น , ๐บ ,๐บ )
where ๐ is the operation of the coupled feedback attention module. At the first iteration, ๐บ
and ๐บ are the outputs ๐บ and ๐บ of the SRB, respectively.
Finally, the output of the coupled feedback attention module of each iteration and super-resolution
features after channel attention module is reconstructed by the reconstruction module REC to obtain the
SR residual image, then summed with the up-sampling of the corresponding LR image to produce the
SR image:
๐ผ =๐ (๐ถ ) + ๐ (๐ผ )
๐ผ =๐ (๐ถ ) + ๐ (๐ผ )
Upsample
SR
REC over-exposed
Output
LR CA
CA REC
over-exposed
โฎ
Input ๐
๐ฐ๐๐๐ ๐ญ๐๐๐ ๐ฎ๐ ๐ฎ๐โ๐ ๐ฎ๐๐ป
FEB SRB c CFB c CFB c CFB CA REC
๐ฎ๐๐ ๐ฎ๐๐
W
๐
๐ฐ๐๐๐ ๐ญ๐๐ ๐ฎ๐๐ ๐ฎ๐๐
FEB SRB c CFB c CFB c CFB CA REC
๐ฎ๐ ๐ฎ๐๐โ๐ ๐ฎ๐๐ป SR
LR โฎ Fused image
under-exposed
CA REC
input
CA
REC SR
under-exposed
Output
Upsample
C Concatenation Element sum WS Weighted sum
Figure 1 Coupled feedback attention network
9
2.2 Coupled Feedback Attention Module
This section specifically describe the specific iterative process of the coupled feedback block and
channel attention module.
As shown in Fig. 2, the coupled feedback attention structure mainly contains iterative convolutional
and deconvolutional layers constituting the CFB, and channel attention gates.
According to 3.1, in the upper sub-network, the inputs of the coupled feedback attention module are
๐บ , ๐บ , ๐น . firstly, the channel compression is performed through the convolutional layer Conv(1,m)
to obtain the input ๐ฟ (0) of the coupled feedback attention module.
๐ฟ (0) = ๐ ([๐บ , ๐บ , ๐น ])
Next, multiple working groups consisting of convolutional and deconvolutional layers, the HR
feature ๐ป (๐) of the n-th working group in the t-th iteration can be expressed as
๐ป (๐) = ๐ ([๐ฟ (0), ๐ฟ (1), โฆ , ๐ฟ (๐ โ 1)])
where ๐ is the deconvolution layer Deconv(3,m). The HR features are generated by upsampling
the LR features jointly from the first n-1 workgroups. Similarly, LR features ๐ฟ (๐) can be expressed
as
๐ฟ (๐) = ๐ ([๐ป (1), ๐ป (2), โฆ , ๐ฟ (๐ โ 1)])
where ๐ is the convolutional layer Conv(3,m).
The output of the final N-th working group is generated by the joint LR features of the previous N
working groups passing through the convolution layer Conv(1,m) as follows.
๐บ =๐ (๐ฟ (1), ๐ฟ (2), โฆ , ๐ฟ (๐)])
The above describes the iterative process of the extreme high exposure branch, and the iterative
process of the extreme low exposure branch is the same.
The feedback features ๐บ and ๐บ are output from each iteration, go through the channel attention
module CA for feature optimization. The CA in this paper consists of three steps, which are global
information compression, scaling and excitation, and recalibration.
1๏ผ Global information compression
In order to obtain the global information of each channel, this paper represents the feature values of
each channel by global averaging pooling:
1
๐ = ๐บ (๐, ๐)
๐ปร๐
1
๐ = ๐บ (๐, ๐)
๐ปร๐
where ๐บ (๐, ๐) and ๐บ (๐, ๐) are the values at each position in the output extreme exposure feature,
and compresses the multiple channels into a one-dimensional feature tensor.
2๏ผ Squeeze and excitation
In order to more fully explore the dependencies between individual channels, the paper introduces a
gate mechanism for learning the nonlinear mapping between each channel and uses a sigmoid activation
function to avoid the formation of adversarial relationships between channels, which can be expressed
as
๐ = ๐(๐ ๐ฟ(๐ ๐ ))
๐ = ๐(๐ ๐ฟ(๐ ๐ ))
Where ๐ and ๐ are the convolutional layer weights.
3๏ผ Recalibration
The original input features ๐บ individual channels are scaled by the channel attention weight
matrix just learned, thus enhancing useful features and suppressing useless features:
10
๐บ ร (๐ + 1) ๐ก=1
๐ถ =
๐บ ร๐ + ๐บ ร (๐ + 1) ๐ก > 1
๐บ ร (๐ + 1) ๐ก=1
๐ถ =
๐บ ร๐ + ๐บ ร (๐ + 1) ๐ก > 1
Where ๐ and ๐ are the channel attention weights of the previous iteration.
SR fused SR fused SR fused SR fused
image image image image
Channel Channel Channel Channel
Gate Gate Gate Gate
CFB CFB CFB CFB
LR coupled LR coupled LR coupled LR coupled
features features features features
Figure 2 Coupled feedback attention structure
2.3 Loss Function
The method in this paper mainly achieves image super-resolution and image multi-exposure fusion,
so the model in this paper uses a hierarchical loss function for optimization, and the loss function is
expressed as
๐ฟ =๐ ๐ฟ ๐ผ ,๐ผ +๐ ๐ฟ ๐ผ ,๐ผ + ๐ (๐ฟ ๐ผ ,๐ผ +๐ฟ ๐ผ ,๐ผ )
Where ๐ผ and ๐ผ are the HR standard images with extreme exposure, and ๐ผ is the HDR, HR
standard image, which is the target to be achieved in the final fusion image. ๐ , ๐ , {๐ } are the
weight coefficients of each loss part. In this paper, we set ๐ = ๐ = {๐ } = 1.
3 Experiment and Analysis
3.1 Experiment Establishment
1๏ผExperimental setup
In this paper, the training model was trained on GeForce GTX 1070Ti.The experiments in this paper
mainly use SICE [5] dataset, which contains 589 high-quality reference images and their corresponding
image sequences, and only extremely exposure are used in this paper.
2๏ผComparison Method
The network model proposed in this paper achieves both image super-resolution and image exposure
fusion, we combine the current image super-resolution method and the image exposure fusion method
as a comparison method. The image super-resolution methods are DBPN[3], RCAN[4], SRFBN[2], and
SwinIR[9], and the main image exposure fusion methods are MGFF [10], FAST SPD-MEF [6], MEF-Net
[8]
, and U2Fusion [7]. We combined SR methods and MEF methods, and changed the order of SR
methods and MEF methods, i.e., SR+MEF or MEF+SR, to generate 32 comparison methods. The CF-
Net [1] was also selected for comparison.
3.2 Objective evaluation
In order to verify the effectiveness of the method in this paper under magnification factor of 2, we
use the SICE dataset and compare it with other advanced methods. These comparison methods are
combined by SR method and MEF method. Table 1 shows the results of our method with the comparison
methods for magnification factor of 2 under three metrics.
11
In Table 1, highlighting the first value of the fusion quality index in bold and the second ranked
value in underline. From Table 1, we can see that the method of this paper has the best fusion effect,
ranking first among 34 methods in metrics. PSNR index is improved by 0.25 dB, SSIM by 0.0028, and
MEF-SSIM by 0.0005 compared to the second place CF-Net method.
Table 1. comparison of the fusion results under the magnification factor of 2
Super Resolution + Image Fusion
Methods MGFF[10] FAST SPD-MEF[6] MEF-Net[8] U2Fusion[7]
Combinations PSNR SSIM MEF- PSNR SSIM MEF- PSNR SSIM MEF- PSNR SSIM MEF-
SSIM SSIM SSIM SSIM
DBPN[3] 17.47dB 0.7434 0.9121 17.30dB 0.7615 0.8976 17.26dB 0.7660 0.8888 17.83dB 0.7423 0.8807
RCAN[4] 17.39dB 0.7406 0.9114 17.34dB 0.7618 0.8974 17.24dB 0.7653 0.8882 17.85dB 0.7409 0.8804
SRFBN[2] 17.48dB 0.7425 0.9130 17.34dB 0.7601 0.8983 17.29dB 0.7641 0.8895 17.84dB 0.7402 0.8811
SWinIR[9] 17.44dB 0.7436 0.9113 17.26dB 0.7618 0.8968 17.23dB 0.7667 0.8881 17.82dB 0.7436 0.8802
Image Fusion + Super Resolution
Methods DBPN[3] RCAN[4] SRFBN[2] SWinR[9]
Combinations PSNR SSIM MEF- PSNR SSIM MEF- PSNR SSIM MEF- PSNR SSIM MEF-
SSIM SSIM SSIM SSIM
MGFF[10] 17.27dB 0.7161 0.9144 17.18dB 0.7122 0.9135 17.38dB 0.7218 0.9158 17.19dB 0.7135 0.9131
Fast SPD- 17.26dB 0.7554 0.8954 17.24dB 0.7533 0.8949 17.31dB 0.7557 0.8962 17.21dB 0.7546 0.8944
MEF[6]
MEF-Net[8] 17.25dB 0.7636 0.8886 17.23dB 0.7624 0.8882 17.27dB 0.7630 0.8892 17.20dB 0.7629 0.8878
U2Fusion[7] 17.81dB 0.7384 0.8843 17.82dB 0.7368 0.8837 17.85dB 0.7395 0.8850 17.76dB 0.7374 0.8835
CF-Net[1] PSNR=21.24dB SSIM=0.8140 MEF-SSIM=0.9332
Ours PSNR=21.49dB SSIM=0.8168 MEF-SSIM=0.9337
3.3 Subjective evaluation
Fig. 3 visually depicts the fused images produced by this paper and other advanced methods at
magnification of factor 2. From the experimental results, it can be seen that compared with SR+MEF
and MEF+SR methods, the method in this paper has a great improvement in details, and compared with
the coupled feedback network, this paper alleviates the phenomenon that there is redundant information
in the image due to the coupled feedback mechanism.
(a)Over-exposed input (b)Under-exposed input (c)DBPN+Fast SPD-MEF
(d)Fast SPD-MEF+RCAN (e)MEF-Net+DBPN (f)MGFF+SRFBN
(g)RCAN+U2Fusion (h)SRFBN+MGFF (i)SwinIR+MEF-Net
(j)U2Fusion+RCAN (k)CF-Net (l)Ours
Figure 3 Comparison of different methods of "landscape" under 2ร
4 Conclusion
Based on the powerful image reconstruction property of feedback mechanism and the property that
channel attention mechanism can distinguish the importance of features. In this paper, a coupled
feedback attention network is proposed to solve the image super-resolution problem and image exposure
fusion problem simultaneously. The experimental results show that the algorithm in this paper retains
12
the detailed information of edges, region boundaries and textures of the original image sequence.
5 References
[1] Deng X., Zhang Y. T., Xu M., et al. Deep coupled feedback network for joint exposure fusion and
image super-resolution[J]. IEEE Transactions on Image Processing. 2021, 30:3098-3112.
[2] Li Z., Yang J. L., Liu Z., et al. Feedback Network for Image Super-Resolution[C]. IEEE/CVF
Conference on Computer Vision and Pattern Recognition. 2019.
[3] Harris M., Shakhnarovich G., Ukita N., et al. Deep back-projectinetworks for single image super-
resolution[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021,
43(12):4323-4337.
[4] Zhang T. L., Li K. P., Li K., et al. Image Super-Resolution Using Very Deep Residual Channel
Attention Networks[C]. European Conference on Computer Vision. 2018.
[5] Cai J. R., Gu S. H. & Zhang L. Learning a deep single image contrast enhancer from multi-exposure
images[J]. IEEE Transactions on Image Processing. 2018, 27(4): 2049-2062.
[6] Li H., Ma KD., Yong H. W., et al. Fast multi-scale structural patch decomposition for multi-exposure
image fusion[J]. IEEE Transactions on Image Processing. 2020, 29: 5805-5816.
[7] Xu X., Ma J. Y., Jiang J. J., et al. U2Fusion: A uni๏ฌed unsupervised image fusion network[J]. IEEE
Transactions on Pattern Analysis and Machine Intelligence. 2022, 44(1): 502-518.
[8] Ma K., Duanmu Z., Zhu H., Fang Y., et al. Deep guided learning for fast multi-exposure image
fusion[J]. IEEE Transactions on Image Processing. 2020, 29: 2808โ2819.
[9] Liang J. Y., Cao J. Z., Sun G. L., et al. SwinIR: Image Restoration Using Swin Transformer[C].
IEEE/CVF International Conference on Computer Vision Workshops. 2021. Electr Network.
[10] Bavirisetti D. P., Xiao G., Zhao J. H., et al. Multi-scale guided image and video fusion: a fast and
efficient approach[J]. Circuits Systems and Signal Processing. 2019, 38(12): 5576-5605.
13