1 Introduction

Coupled Feedback Attention Networks 1

Rong Wang

Chunjiang Duanmu

1 0 College of Mathematics and Computer Science, Zhejiang Normal University , Jin Hua, Zhejiang , China 1 College of Physics and Electronic Information Engineering, Zhejiang Normal University , Jin Hua, Zhejiang , China

8 13

In their daily lives, people frequently need to obtain images with a high dynamic range and resolution. Due to technological equipment limitations, high dynamic range images are produced by multi-exposure fusion (MEF) of low dynamic range images, while high resolution images are frequently obtained by super-resolution (SR) of low resolution images. MEF and SR are often analyzed separately. This research examines existing approaches and proposes a coupled feedback network attention network and its method to address the issue that current models cannot achieve high dynamic range and high resolution simultaneously.

eol>channel attention mechanism coupled feedback mechanism

1 Introduction 2 Coupled Feedback Attention Network

on the coupled feedback attention network. 2.1 Basic network structure network, respectively. The feedback features in each iteration combine the feedback features in the other network and the shallow features in this network, together as the input of the next iteration, to achieve the refinement fused features. The coupled-feedback attention layer contains multiple coupled-feedback blocks and an attention module.

The extraction process of shallow features of LR images can be expressed as contains two convolutional layers Conv(3,4×m) and Conv(1,m), which are used to extract LR features and compress LR features, respectively. The extracted shallow features are first passed through SRB to obtain the deep features and , which can be expressed as is the super-resolution module (SRB) operation.

Next, the deep exposure features of the two sub-networks are deeply fused after several iterations. At each iteration, the feedback features of the previous iteration are coupled and the shallow features of the respective networks are together as the input of this iteration, and the feedback and of the t-th iteration can be expressed as = = ( , ( , , , ) )

is the operation of the coupled feedback attention module. At the first iteration, and are the outputs and

of the SRB, respectively.

Finally, the output of the coupled feedback attention module of each iteration and super-resolution features after channel attention module is reconstructed by the reconstruction module REC to obtain the SR residual image, then summed with the up-sampling of the corresponding LR image to produce the SR image: = = ( ) + ( ) ( ) + ( ) = = and ( ) ( ) Upsample

REC CA CA

REC

Upsample

C Concatenation

LR over-exposed

Input under-exposed

LR input

FEB SRB c CFB − c CFB FEB SRB c CFB − c CFB

CA CA

⋮ c CFB CA REC c CFB C⋮A REC

REC REC

FusedS Rimage

2.2 Coupled Feedback Attention Module

This section specifically describe the specific iterative process of the coupled feedback block and channel attention module.

As shown in Fig. 2, the coupled feedback attention structure mainly contains iterative convolutional and deconvolutional layers constituting the CFB, and channel attention gates.

According to 3.1, in the upper sub-network, the inputs of the coupled feedback attention module are ,

, to obtain the input (0) of the coupled feedback attention module.

. firstly, the channel compression is performed through the convolutional layer Conv(1,m) (0) = ([ , , ])

Next, multiple working groups consisting of convolutional and deconvolutional layers, the HR feature ( ) of the n-th working group in the t-th iteration can be expressed as ( ) =

([ (0), (1), … , ( − 1 )]) is the deconvolution layer Deconv(3,m). The HR features are generated by upsampling the LR features jointly from the first n-1 workgroups. Similarly, LR features ( ) can be expressed ( ) =

([ (1), (2), … , ( − 1 )]) is the convolutional layer Conv(3,m).

The output of the final N-th working group is generated by the joint LR features of the previous N working groups passing through the convolution layer Conv(1,m) as follows. process of the extreme low exposure branch is the same.

The feedback features

and

are output from each iteration, go through the channel attention module CA for feature optimization. The CA in this paper consists of three steps, which are global information compression, scaling and excitation, and recalibration.

1）

Global information compression In order to obtain the global information of each channel, this paper represents the feature values of each channel by global averaging pooling:

= × = × 1 1 (, ) (, ) 2）

Squeeze and excitation

and compresses the multiple channels into a one-dimensional feature tensor.

where (, ) and

(, ) are the values at each position in the output extreme exposure feature,

In order to more fully explore the dependencies between individual channels, the paper introduces a gate mechanism for learning the nonlinear mapping between each channel and uses a sigmoid activation function to avoid the formation of adversarial relationships between channels, which can be expressed as = ( = ( ( )) ( ))

Where

3） and

Recalibration The original input features

are the convolutional layer weights.

individual channels are scaled by the channel attention weight matrix just learned, thus enhancing useful features and suppressing useless features:

Where

and are the channel attention weights of the previous iteration. = = SR fused image Channel Gate CFB × ( + 1) × × ( ×

2.3 Loss Function

The method in this paper mainly achieves image super-resolution and image multi-exposure fusion, so the model in this paper uses a hierarchical loss function for optimization, and the loss function is expressed as = , + , + ( , + , ) Where and are the HR standard images with extreme exposure, and is the HDR, HR standard image, which is the target to be achieved in the final fusion image. , , { } weight coefficients of each loss part. In this paper, we set = = { } = 1.

are the

3 Experiment and Analysis 3.1 Experiment Establishment 1）Experimental setup 2）Comparison Method

In this paper, the training model was trained on GeForce GTX 1070Ti.The experiments in this paper mainly use SICE [ 5 ] dataset, which contains 589 high-quality reference images and their corresponding image sequences, and only extremely exposure are used in this paper.

The network model proposed in this paper achieves both image super-resolution and image exposure fusion, we combine the current image super-resolution method and the image exposure fusion method as a comparison method. The image super-resolution methods are DBPN[ 3 ], RCAN[ 4 ], SRFBN[ 2 ], and SwinIR[ 9 ], and the main image exposure fusion methods are MGFF [ 10 ], FAST SPD-MEF [ 6 ], MEF-Net [ 8 ], and U2Fusion [ 7 ]. We combined SR methods and MEF methods, and changed the order of SR methods and MEF methods, i.e., SR+MEF or MEF+SR, to generate 32 comparison methods. The CF

Net [1] was also selected for comparison. 3.2 Objective evaluation

In order to verify the effectiveness of the method in this paper under magnification factor of 2, we use the SICE dataset and compare it with other advanced methods. These comparison methods are combined by SR method and MEF method. Table 1 shows the results of our method with the comparison methods for magnification factor of 2 under three metrics.

In Table 1, highlighting the first value of the fusion quality index in bold and the second ranked value in underline. From Table 1, we can see that the method of this paper has the best fusion effect, ranking first among 34 methods in metrics. PSNR index is improved by 0.25 dB, SSIM by 0.0028, and

MEF-SSIM by 0.0005 compared to the second place CF-Net method. 3.3 Subjective evaluation 4 Conclusion

Based on the powerful image reconstruction property of feedback mechanism and the property that channel attention mechanism can distinguish the importance of features. In this paper, a coupled feedback attention network is proposed to solve the image super-resolution problem and image exposure fusion problem simultaneously. The experimental results show that the algorithm in this paper retains the detailed information of edges, region boundaries and textures of the original image sequence.

5 References

[1] Deng

, Zhang Y. T., Xu

, et al. Deep coupled feedback network for joint exposure fusion and image super-resolution[J] . IEEE Transactions on Image Processing . 2021 , 30 : 3098 - 3112 .

[2] Li

, Yang

J. L.

, Liu

, et al. Feedback Network for Image Super-Resolution[C] . IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2019 .

[3] Harris

, Shakhnarovich

, Ukita

, et al. Deep back-projectinetworks for single image superresolution[J] . IEEE Transactions on Pattern Analysis and Machine Intelligence . 2021 , 43 ( 12 ): 4323 - 4337 .

[4] Zhang

T. L.

, Li

K. P.

, Li

, et al. Image Super-Resolution Using Very Deep Residual Channel Attention Networks [C]. European Conference on Computer Vision . 2018 .

[5] Cai

J. R.

, Gu

S. H.

& Zhang L. Learning a deep single image contrast enhancer from multi-exposure images[J] . IEEE Transactions on Image Processing . 2018 , 27 ( 4 ): 2049 - 2062 .

[6] Li

, Ma KD., Yong

H. W.

, et al. Fast multi-scale structural patch decomposition for multi-exposure image fusion[J] . IEEE Transactions on Image Processing . 2020 , 29 : 5805 - 5816 .

[7] Xu

, Ma

J. Y.

, Jiang

J. J.

, et al. U2Fusion: A unified unsupervised image fusion network[J] . IEEE Transactions on Pattern Analysis and Machine Intelligence . 2022 , 44 ( 1 ): 502 - 518 .

[8] Ma

, Duanmu

, Zhu

, Fang

, et al. Deep guided learning for fast multi-exposure image fusion[J] . IEEE Transactions on Image Processing . 2020 , 29 : 2808 - 2819 .

[9] Liang

J. Y.

, Cao

J. Z.

, Sun

G. L.

, et al. SwinIR: Image Restoration Using Swin Transformer[C] . IEEE/CVF International Conference on Computer Vision Workshops. 2021 . Electr Network.

[10] Bavirisetti

D. P.

, Xiao

, Zhao

J. H.

, et al. Multi-scale guided image and video fusion: a fast and efficient approach [J]. Circuits Systems and Signal Processing . 2019 , 38 ( 12 ): 5576 - 5605 .