<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Coupled Feedback Attention Networks 1</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rong Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chunjiang Duanmu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Mathematics and Computer Science, Zhejiang Normal University</institution>
          ,
          <addr-line>Jin Hua, Zhejiang</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>College of Physics and Electronic Information Engineering, Zhejiang Normal University</institution>
          ,
          <addr-line>Jin Hua, Zhejiang</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <fpage>8</fpage>
      <lpage>13</lpage>
      <abstract>
        <p>In their daily lives, people frequently need to obtain images with a high dynamic range and resolution. Due to technological equipment limitations, high dynamic range images are produced by multi-exposure fusion (MEF) of low dynamic range images, while high resolution images are frequently obtained by super-resolution (SR) of low resolution images. MEF and SR are often analyzed separately. This research examines existing approaches and proposes a coupled feedback network attention network and its method to address the issue that current models cannot achieve high dynamic range and high resolution simultaneously.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;channel attention mechanism</kwd>
        <kwd>coupled feedback mechanism</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2 Coupled Feedback Attention Network</title>
      <p>on the coupled feedback attention network.
2.1 Basic network structure


network, respectively. The feedback features in each iteration combine the feedback features in the other
network and the shallow features in this network, together as the input of the next iteration, to achieve
the refinement fused features. The coupled-feedback attention layer contains multiple coupled-feedback
blocks and an attention module.</p>
      <p>The extraction process of shallow features 
of LR images can be expressed as
contains two convolutional layers Conv(3,4×m) and Conv(1,m), which are used to
extract LR features and compress LR features, respectively. The extracted shallow features are first
passed through SRB to obtain the deep features 
and  , which can be expressed as
is the super-resolution module (SRB) operation.</p>
      <p>Next, the deep exposure features of the two sub-networks are deeply fused after several iterations.
At each iteration, the feedback features of the previous iteration are coupled and the shallow features
of the respective networks are together as the input of this iteration, and the feedback
and 
of the t-th iteration can be expressed as
 = 
 = 
( , 
( , 
, 
, 
)
)</p>
      <p>is the operation of the coupled feedback attention module. At the first iteration, 
and 
are the outputs 
and</p>
      <p>of the SRB, respectively.</p>
      <p>Finally, the output of the coupled feedback attention module of each iteration and super-resolution
features after channel attention module is reconstructed by the reconstruction module REC to obtain the
SR residual image, then summed with the up-sampling of the corresponding LR image to produce the
SR image:
 = 
 = 
( ) +  ( )
( ) +  ( )


= 
= 
and 
( )
( )
Upsample</p>
      <p>REC
CA
CA</p>
      <p>REC</p>
      <p>Upsample</p>
      <p>C Concatenation</p>
      <p>LR
over-exposed</p>
      <p>Input
under-exposed</p>
      <p>LR
input</p>
      <p>FEB   SRB   c CFB  


 − c CFB  


 FEB   SRB   c CFB  


 − c CFB</p>
      <p>CA
CA</p>
      <p>⋮
c CFB   CA REC
c CFB   C⋮A REC</p>
      <p>REC
REC</p>
      <p>W</p>
      <p>FusedS Rimage</p>
    </sec>
    <sec id="sec-3">
      <title>2.2 Coupled Feedback Attention Module</title>
      <p>This section specifically describe the specific iterative process of the coupled feedback block and
channel attention module.</p>
      <p>As shown in Fig. 2, the coupled feedback attention structure mainly contains iterative convolutional
and deconvolutional layers constituting the CFB, and channel attention gates.</p>
      <p>According to 3.1, in the upper sub-network, the inputs of the coupled feedback attention module are
 ,</p>
      <p>, 
to obtain the input  (0) of the coupled feedback attention module.</p>
      <p>. firstly, the channel compression is performed through the convolutional layer Conv(1,m)
 (0) =  ([ ,  ,  ])</p>
      <p>Next, multiple working groups consisting of convolutional and deconvolutional layers, the HR
feature 
( ) of the n-th working group in the t-th iteration can be expressed as
 ( ) =</p>
      <p>([ (0),  (1), … ,  ( − 1 )])
is the deconvolution layer Deconv(3,m). The HR features are generated by upsampling
the LR features jointly from the first n-1 workgroups. Similarly, LR features  ( ) can be expressed
 ( ) =</p>
      <p>([ (1),  (2), … ,  ( − 1 )])
is the convolutional layer Conv(3,m).</p>
      <p>The output of the final N-th working group is generated by the joint LR features of the previous N
working groups passing through the convolution layer Conv(1,m) as follows.
process of the extreme low exposure branch is the same.</p>
      <sec id="sec-3-1">
        <title>The feedback features</title>
        <p>and</p>
        <p>are output from each iteration, go through the channel attention
module CA for feature optimization. The CA in this paper consists of three steps, which are global
information compression, scaling and excitation, and recalibration.</p>
        <p>1）</p>
        <sec id="sec-3-1-1">
          <title>Global information compression In order to obtain the global information of each channel, this paper represents the feature values of each channel by global averaging pooling:</title>
          <p>=  × 
 =  × 
1
1


(,  )
(,  )
2）</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Squeeze and excitation</title>
          <p>and compresses the multiple channels into a one-dimensional feature tensor.</p>
          <p>where 
(,  ) and</p>
          <p>(,  ) are the values at each position in the output extreme exposure feature,</p>
          <p>In order to more fully explore the dependencies between individual channels, the paper introduces a
gate mechanism for learning the nonlinear mapping between each channel and uses a sigmoid activation
function to avoid the formation of adversarial relationships between channels, which can be expressed
as
 = (
 = (
(  ))
(  ))</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Where</title>
        <p>3）
and</p>
        <sec id="sec-3-2-1">
          <title>Recalibration</title>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>The original input features</title>
        <p>are the convolutional layer weights.</p>
        <p>individual channels are scaled by the channel attention weight
matrix just learned, thus enhancing useful features and suppressing useless features:</p>
      </sec>
      <sec id="sec-3-4">
        <title>Where</title>
        <p>and 
are the channel attention weights of the previous iteration.
 =
 =
SR fused
image
Channel
Gate
CFB
 × ( + 1)
 × 


× (
×</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>2.3 Loss Function</title>
      <p>The method in this paper mainly achieves image super-resolution and image multi-exposure fusion,
so the model in this paper uses a hierarchical loss function for optimization, and the loss function is
expressed as

=  
 ,  +  
 ,  + 
(
 ,  + 
 ,  )
Where 
and 
are the HR standard images with extreme exposure, and 
is the HDR, HR
standard image, which is the target to be achieved in the final fusion image.  ,  , { }
weight coefficients of each loss part. In this paper, we set 
= 
= { }
= 1.</p>
      <p>are the</p>
    </sec>
    <sec id="sec-5">
      <title>3 Experiment and Analysis</title>
    </sec>
    <sec id="sec-6">
      <title>3.1 Experiment Establishment</title>
      <sec id="sec-6-1">
        <title>1）Experimental setup</title>
      </sec>
      <sec id="sec-6-2">
        <title>2）Comparison Method</title>
        <p>
          In this paper, the training model was trained on GeForce GTX 1070Ti.The experiments in this paper
mainly use SICE [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] dataset, which contains 589 high-quality reference images and their corresponding
image sequences, and only extremely exposure are used in this paper.
        </p>
        <p>
          The network model proposed in this paper achieves both image super-resolution and image exposure
fusion, we combine the current image super-resolution method and the image exposure fusion method
as a comparison method. The image super-resolution methods are DBPN[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], RCAN[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], SRFBN[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], and
SwinIR[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], and the main image exposure fusion methods are MGFF [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], FAST SPD-MEF [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], MEF-Net
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], and U2Fusion [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. We combined SR methods and MEF methods, and changed the order of SR
methods and MEF methods, i.e., SR+MEF or MEF+SR, to generate 32 comparison methods. The
CF
        </p>
        <sec id="sec-6-2-1">
          <title>Net [1] was also selected for comparison.</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>3.2 Objective evaluation</title>
      <p>In order to verify the effectiveness of the method in this paper under magnification factor of 2, we
use the SICE dataset and compare it with other advanced methods. These comparison methods are
combined by SR method and MEF method. Table 1 shows the results of our method with the comparison
methods for magnification factor of 2 under three metrics.</p>
      <p>In Table 1, highlighting the first value of the fusion quality index in bold and the second ranked
value in underline. From Table 1, we can see that the method of this paper has the best fusion effect,
ranking first among 34 methods in metrics. PSNR index is improved by 0.25 dB, SSIM by 0.0028, and</p>
      <sec id="sec-7-1">
        <title>MEF-SSIM by 0.0005 compared to the second place CF-Net method.</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>3.3 Subjective evaluation</title>
    </sec>
    <sec id="sec-9">
      <title>4 Conclusion</title>
      <p>Based on the powerful image reconstruction property of feedback mechanism and the property that
channel attention mechanism can distinguish the importance of features. In this paper, a coupled
feedback attention network is proposed to solve the image super-resolution problem and image exposure
fusion problem simultaneously. The experimental results show that the algorithm in this paper retains
the detailed information of edges, region boundaries and textures of the original image sequence.</p>
    </sec>
    <sec id="sec-10">
      <title>5 References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Deng</surname>
            <given-names>X.</given-names>
          </string-name>
          , Zhang Y. T.,
          <string-name>
            <surname>Xu</surname>
            <given-names>M.</given-names>
          </string-name>
          , et al.
          <article-title>Deep coupled feedback network for joint exposure fusion and image super-resolution[J]</article-title>
          .
          <source>IEEE Transactions on Image Processing</source>
          .
          <year>2021</year>
          ,
          <volume>30</volume>
          :
          <fpage>3098</fpage>
          -
          <lpage>3112</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Li</surname>
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>J. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>Z.</given-names>
          </string-name>
          , et al.
          <article-title>Feedback Network for Image Super-Resolution[C]</article-title>
          .
          <source>IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          .
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Harris</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shakhnarovich</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ukita</surname>
            <given-names>N.</given-names>
          </string-name>
          , et al.
          <article-title>Deep back-projectinetworks for single image superresolution[J]</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          .
          <year>2021</year>
          ,
          <volume>43</volume>
          (
          <issue>12</issue>
          ):
          <fpage>4323</fpage>
          -
          <lpage>4337</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Zhang</surname>
            <given-names>T. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>K. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>K.</given-names>
          </string-name>
          , et al. Image
          <string-name>
            <surname>Super-Resolution Using Very Deep Residual Channel Attention Networks</surname>
          </string-name>
          [C].
          <source>European Conference on Computer Vision</source>
          .
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Cai</surname>
            <given-names>J. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gu</surname>
            <given-names>S. H.</given-names>
          </string-name>
          &amp; Zhang L.
          <article-title>Learning a deep single image contrast enhancer from multi-exposure images[J]</article-title>
          .
          <source>IEEE Transactions on Image Processing</source>
          .
          <year>2018</year>
          ,
          <volume>27</volume>
          (
          <issue>4</issue>
          ):
          <fpage>2049</fpage>
          -
          <lpage>2062</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Li</surname>
            <given-names>H.</given-names>
          </string-name>
          , Ma KD.,
          <string-name>
            <surname>Yong</surname>
            <given-names>H. W.</given-names>
          </string-name>
          , et al.
          <article-title>Fast multi-scale structural patch decomposition for multi-exposure image fusion[J]</article-title>
          .
          <source>IEEE Transactions on Image Processing</source>
          .
          <year>2020</year>
          ,
          <volume>29</volume>
          :
          <fpage>5805</fpage>
          -
          <lpage>5816</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Xu</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
            <given-names>J. Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            <given-names>J. J.</given-names>
          </string-name>
          , et al.
          <article-title>U2Fusion: A unified unsupervised image fusion network[J]</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          .
          <year>2022</year>
          ,
          <volume>44</volume>
          (
          <issue>1</issue>
          ):
          <fpage>502</fpage>
          -
          <lpage>518</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Ma</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duanmu</surname>
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang</surname>
            <given-names>Y.</given-names>
          </string-name>
          , et al.
          <article-title>Deep guided learning for fast multi-exposure image fusion[J]</article-title>
          .
          <source>IEEE Transactions on Image Processing</source>
          .
          <year>2020</year>
          ,
          <volume>29</volume>
          :
          <fpage>2808</fpage>
          -
          <lpage>2819</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Liang</surname>
            <given-names>J. Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            <given-names>J. Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            <given-names>G. L.</given-names>
          </string-name>
          , et al.
          <article-title>SwinIR: Image Restoration Using Swin Transformer[C]</article-title>
          . IEEE/CVF International Conference on Computer Vision Workshops.
          <year>2021</year>
          . Electr Network.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Bavirisetti</surname>
            <given-names>D. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiao</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            <given-names>J. H.</given-names>
          </string-name>
          , et al.
          <article-title>Multi-scale guided image and video fusion: a fast and efficient approach</article-title>
          [J].
          <source>Circuits Systems and Signal Processing</source>
          .
          <year>2019</year>
          ,
          <volume>38</volume>
          (
          <issue>12</issue>
          ):
          <fpage>5576</fpage>
          -
          <lpage>5605</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>