<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Point-Based Weakly Supervised Deep Learning for Water Extraction from High-Resolution Remote Sensing Imagery</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ming Lu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leyuan Fang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yi Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>the College of Computer Science, Sichuan University</institution>
          ,
          <addr-line>Chengdu 610065</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>the College of Electrical and Information Engineering, Hunan University</institution>
          ,
          <addr-line>Changsha 410082</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The use of deep learning for water extraction requires precise pixel-level labels. However, it is very dificult to label highresolution remote sensing images at the pixel level. Therefore, we study how to utilize point labels to extract water bodies and propose a novel method called the neighbor feature aggregation network (NFANet). Compared with pixel-level labels, point labels are much easier to obtain, but they will lose a lot of information. In this paper, we take advantage of the similarity between the adjacent pixels of a local water body, and propose a neighbor sampler to resample remote sensing images. Then, the sampled images are sent to the network for feature aggregation. Our method uses neighboring features instead of global or local features to learn more representative features. The experimental results show that the proposed NFANet method not only outperforms other weakly supervised approaches, but also obtains similar results as the state-of-the-art ones.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Deep learning</kwd>
        <kwd>weak supervision</kwd>
        <kwd>semantic segmentation</kwd>
        <kwd>water extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        ture extraction is highly dependent on the availability of
suficient pixel-level labels for training. However,
highWater-body extraction from high-resolution remote sens- resolution remote sensing images are large in scale and
ing images is an important research topic in the field data volume, which makes pixel-level labeling extremely
of remote sensing. Although the traditional algorithms laborious. The pixel-level annotation usually requires a
have made some progress in water-body extraction, there lot of time and labor costs, as well as professional
knowlare still problems such as low automation, cumbersome edge to accurately mark uncertain boundaries between
manual feature extraction, and insuficient extraction diferent classes of interest, which hinders the extraction
accuracy. In recent years, deep learning has become of informative features from high-resolution remote
sensan emerging research hot spot in the field of artificial ing images to a certain extent. Training models using
intelligence. The rapid development of deep learning weak labels have received more and more attention in the
technology and the improvement of computer hardware ifeld of computer vision. Compared with fully-supervised
performance have made deep learning, especially the semantic segmentation, weak-supervised learning does
CNN-based techniques, successful in many important not require pixel-level labels, and has the characteristics
tasks, such as image classification, target detection, and of fast labeling and low cost. However, the use of weak
semantic segmentation, and their performance has sur- annotations makes the supervision information seriously
passed many traditional algorithms. The work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in insuficient and, thus, key information such as shape,
texproposes a method that combines graph convolutional ture, and edges are usually lost, which makes it dificult
network (GCN) and CNN to fuse diferent Hyperspec- to extract water from high-resolution remote sensing
tral features to improve the performance of hyperspectral images with complex scenes.
classification. Work in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] studys the multi-modal models Some researchers try to use traditional methods
comand proposes a variety of plug-and-play fusion modules bined with deep learning to solve weak supervision
probto fuse the features of remote sensing images of difer- lems. The work in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] combines super-pixels and a local
ent modalities. Work in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] discusses the importance of map to obtain rough pseudo-labels to train a water
exnonconvex modeling in interpretable AI models from traction model. Work in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] combines super-pixel
poolmultiple topics.Therefore, it is necessary to apply deep ing with multi-scale feature fusion to detect buildings.
learning to extract water bodies [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ]. Other researchers attempt to obtain better results by
usUnfortunately, the success of deep learning for fea- ing the extraction capabilities of neural networks. Work
in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] learns from the principle of CAM [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and
exCDCEO 2021: 1st Workshop on Complex Data Challenges in Earth tracts feature maps from UNet [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] for hard-threshold
Observation, November 1, 2021, Virtual Event, QLD, Australia. processing to obtain segmentation predictions. These
y"zh1a1n4g8@46©s2c21u092.16eC@doupqy.rcqign.hcto(foYmr.thZ(isMhpaa.pneLrgbuy))i;tslaeuythuoras.nUfsae npegrm@itthednuund.eerdCure.actinve (L. Fang);
wmeeathko-sdusphearvveisaecdhlieeavrendinpgrobmutisdinognroetscuoltnssiindetrhtehfieeldchoafrCPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmmUoRns LWiceonsrekAstthribouptionP4r.0oIncteerenadtiionnagl s(CC(CBYE4U.0)R.-WS.org) acteristics of the image itself.
      </p>
      <p>Unlike other natural objects, water bodies are usually the local information or global information of an image,
liquid, the colors and textures of local water bodies are the neighbor feature aggregation efectively utilizes the
very similar. Therefore, there is a high degree of simi- neighbor information and, therefore, more representative
larity between neighbor pixels in water bodies, which features can be learned.
makes the inherent diference of the features contained
between the neighbor pixels of the water-body generally
smaller than that of the non-water-body. We hope to 2. Method
map the neighbor pixels of the remote sensing image
to the same location in space, and then extract neighbor Figure 1 illustrates the proposed weakly supervised
wafeatures from multiple neighbor pixels, and use the neigh- ter extraction framework. Figure 1.a shows the entire
bor features to jointly decide whether the pixel at this recursive training process. We will describe it in the
location belongs to the water bodies. Based on the above third section. The acquisition of pseudo-labels is shown
motivation, we propose the neighbor feature aggrega- in Figure 1.b. We input neighbor images into the network
tion network (NFANet) to make full use of this property. and use point labels for supervision to obtain neighbor
Specifically, we utilize a sampling method called neigh- features. Then the feature aggregation module is used to
bor sampler to generate a set of neighbor images from aggregate the features extracted from the previous step.
high-resolution remote sensing images. The neighbor Finally, post-processing is performed to obtain
pseudopixels of the original image are separately allocated to labels. We will describe the details of each of the above
each neighbor image, so that the pixel values of any two steps in the following sections.
neighbor images at the same position are similar but the
pixel values are diferent. On the whole, neighbor image 2.1. Neighbor Sampler
groups have similar but diferent characteristics. Then,
we use an end-to-end model to perform feature
extraction on each image of the neighbor image groups, and
aggregate the features by using the feature aggregation
module. Compared with other methods that only use</p>
      <sec id="sec-1-1">
        <title>First, we introduce a neighbor sampler to obtain a neigh</title>
        <p>bor images group (1 () , 2 () , . . . ,  ()) from a
single optical remote sensing image .  represents
the number of neighbor images. Figure 2 shows the
schematic diagram of generating a group of neighbor</p>
      </sec>
      <sec id="sec-1-2">
        <title>CMax pooling is adopted to reduce the number of</title>
        <p>channels of each neighbor feature to one. CMax
pooling is defined mathematically in detail as follows:
Given a three-dimensional feature maps tensor group
 = (1 () , 2 () , . . . ,   ()) ∈ R×  × × ,
The operation of CMax pooling is as follows:
,,() = (,,1,(), ,,2,(), . . . , ,,3,()), (1)</p>
        <p>= 1, 2, . . . , ,  = 1, 2, . . . , ,  = 1, 2, . . . , .</p>
        <p>As a result, the feature maps group  =
(1 () , 2 () , . . . ,  ()) ∈ R×  ×  is
obtained. Then, the OTSU algorithm is used to
binarize each feature in  to obtain the result
 = (1 () , 2 () , . . . ,  ()) ∈ R×  × . The
formula is as follows:
images using the neighbor sampler. Let us assume that
the width, height, and channel of the input image  are
bo,rsa, mp, lreerspect=ive(ly.1T,h2e,i.m. p.l,eme)nitsatdioenscorfibtehde anseifgohl--  =  () ,  = 1, 2, . . . , . (2)
lows: Finally, we vote for all binarized neighbor features of the
1. The image  is divided into  ×  cells, where neighbor feature group to obtain the aggregated result
the size of each cell is  ×  × .  is experimentally  .  is calculated using the following equation:
set to 2 and, therefore,  =  ×  = 4.</p>
        <p>2. For the  − ℎ row and  − ℎ column of the cell, the {︃1, ∑︀ 2
pixels in the adjacent positions of each cell are selected , = 0, ∑︀=1 ,, ≥  (3)
in the order from top to bottom and from left to right, =1 ,, &lt; 2
which are regarded as the (, ) − ℎ elements of  = To sum up, the mathematical definition of the feature
(1 () , 2 () , . . . ,  ()). When  is set to 2, the aggregation module is detailed as follows:
pixels at the upper left, upper right, lower left, and lower
right adjacent positions are selected, respectively.  =   ( (  ( ))) (4)
3. For all  ×  cells being divided in step 1, step
2 will be repeated until all the cells are resampled, and where  ∈ R×  × ×  represents the neighbor
feaa neighbor sampler  = (1, 2, . . . , ) is generated. tures group and  ∈ R×  is the output. Next, the
Given an optical remote sensing image , neighbor im- aggregated result  is input to the post-processing
modages group (1 () , 2 () , . . . ,  ()) is generated, ule. The specific operations include filling small holes in
where the size of each neighbor image is  ×  × . the closed area by using area filling and removing noise</p>
        <p>In this way, the neighbor image dataset can be gen- by using morphological operations. Then, we apply a
erated from the original dataset. Neighbor images are point-label constraint to the processed results. If the area
similar but not identical, because for any two neighbor in the result contains point labels, the entire area is
reimages, (, ) − ℎ pixel comes from the neighboring tained, otherwise it is not retained. The generated results
location of the original remote sensing image. are used as pseudo-labels and input into the recursive
training as supervision information.</p>
        <sec id="sec-1-2-1">
          <title>2.2. Neighbor feature aggregation and</title>
          <p>post-processing</p>
        </sec>
        <sec id="sec-1-2-2">
          <title>2.3. Recursive training</title>
          <p>
            We input the neighbor images group to an Recursive training is a weakly supervised strategy. When
end-to-end network to extract features, and ob- applying the resulting model over the training set, the
nettain the corresponding neighbor feature group work outputs capture the shape of objects significantly
(1 () , 2 () , . . . ,   ()) ∈ R×  × × , where better than that of just pseudo-labels [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]. We have
ob () ∈ R×  ×  represents the feature maps served through experiments that when the training set is
extracted from the  − ℎ image in the neighbor input to the network again, the obtained network output
iamsatghees fgeraotuupre. Wexetraucsteiotnheneentwcoodrekr.-deScpoedceificraslltyru,ctthuere will become smoother than the coarse-grained
pseudofeature maps are extracted from the penultimate label, which improves the accuracy of the prediction
convolutional layer. The network structure is shown result to a certain extent.
in Figure 1.b. It is worth noting that the network is We embed the neighbor sampler into the recursive
replaceable (in the experimental part, a variety of training so that the network can learn neighbor features
network structures are used for feature extraction). (the flowchart is shown in Figure 1.a). Recursive training
consists of three steps. First, the remote sensing image
is used to generate neighbor images group. We apply
the neighbor images group and point-label to train the
network to obtain pseudo-label. Second, the
pseudolabel is used to generate pseudo-labels groups. It is worth
noting that the  − ℎ image of the neighbor images
group and the  − ℎ image of the pseudo-labels group
are resampled in the same way. Third, input the  − ℎ
image into the network and utilize the  − ℎ label as
the supervision information for training. After training
the model with all training sets, the neighbor images
group are input again to obtain the results group. When
 = 2, the number of results is 4. We perform a weighted
average on the results group to obtain a new
pseudolabel.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Experimental results</title>
      <sec id="sec-2-1">
        <title>3.1. Datasets and evaluation</title>
        <sec id="sec-2-1-1">
          <title>To prove the efectiveness of the proposed method, we</title>
          <p>
            applied the method to high-resolution visible spectrum
images for water extraction. This water-body dataset
comes from the Gaofen Challenge [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ], which contains
RGB pan-sharpened images with a resolution of 0.5 m
and does not contain infrared bands or digital elevation
models. All images are taken from Wuhan and Suzhou,
China, mainly in rural areas supplemented by urban
areas. The positive labels in the dataset include rivers,
reservoirs, rice fields, ditches, ponds, and lakes, while all
other non-water pixels are considered negative. The data
set is cropped into 1000 images with the size of 492×492
without any overlap. We re-annotated the dataset. The
rule is that each independent water body is randomly
labeled with a point label of the size 5×5.
          </p>
          <p>In the experiment, the weak supervision models use
point labels as the initial supervision information, while other weak supervision methods can predict the local
the full supervision models use pixel-level labels. Because area of the water body, there are errors in the detection of
the remote sensing image segmentation/classification the water body boundary, while our method is relatively
evaluation index of overall accuracy or Kappa coeficient more accurate. The studied weak supervised methods
cannot efectively describe the real structure of image cannot detect small objects appropriately while this issue
segmentation geometry, we choose to use fgIoU (fore- is solved to a great extent by the proposed method.
ground IoU), bgIoU (background IoU), mIoU (mean IoU),
fgDice (foreground Dice), bgDice (background Dice) and 3.4. Efectiveness of neighbor sampling
mDice (mean Dice) to comprehensively evaluate the
results. For each model, we performed five independent In the ablation experiment, the other settings are
unruns to calculate the aforementioned evaluation indica- changed, and only the value of  is changed. We set the
tors and standard deviations. neighbor sampling parameter k of our proposed network
from 1 to 4 and only use cross entropy and dice loss to
3.2. Comparison with Fully Supervised train the model. For diferent  values of NFANet, in
order to avoid interference from other modules, we only</p>
          <p>Approaches select UNet as the feature extraction network for
comIn Table 1, we report the water extraction performance parative experiments. In particular, when the value of
of our proposed approach and compare it with the fully  is set to 1, the neighbor image group degenerates into
supervised approaches. Figure 3 also provides the visual the input image. As shown in Figure 5, with the gradual
performance of all approaches. These approaches ran- increase of , mIoU first increases and then decreases.
domly use 70% of the samples as the training set, and With the increase of neighborhood sampling
paramethe remaining data as the test set. Experiments demon- ter , the number of adjacent pixels to be considered
strate that our method achieves the best score using the increase geometrically, resulting in information
redunNestedUNet-based model, and the visual performance dancy. The size of each reconstructed neighbor image is
shows that the prediction results obtained by our method gradually reduced, and the boundary of the water body
are very close to the ground truth. The mIoU of our will also become unclear. Therefore, we set  equal to 2,
method reached 75.22%, and the mDice reached 85.04%. because the neighbor features require less computation
Compared with the best fully-supervised model DeepLab and achieves better performance.</p>
          <p>V3+, the mIoU of our method is only reduced by 3.03%,
and mDice is only reduced by 2.23%. But the labeling 3.5. Efectiveness of feature aggregation
cost of our method is much less than that of the fully
supervised method. Nevertheless, it is dificult to achieve
fully supervised performance using only point labels.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>3.3. Comparison with Weakly Supervised</title>
      </sec>
      <sec id="sec-2-3">
        <title>Approaches</title>
        <sec id="sec-2-3-1">
          <title>We compare our method with several other weakly su</title>
          <p>pervised remote sensing approaches. The experimental
results are shown in Table 2. To be fair, all methods
are based on UNet. It can be seen that the mIoU of
our method is 8.84%, which is higher than that of the
U-CAM-based method with the mDice of 6.89%. In
addition, Figure 4 shows the prediction results of other
weakly supervised methods and our method. Although
As shown in Table 3, when  is set to 2 ( = 4),
assuming that the features of the  − ℎ neighboring image is ,
∑︀</p>
          <p>=1  means feature aggregation is used. Compared
with the best method that does not use feature
aggregation, the mIoU and mDice of our method are improved
by 4.8% and 3.7% respectively. To a certain extent, the
greater the number of neighboring images, the more
features are available, and these features can complement
each other. Therefore, after the feature aggregation, the
performance of the prediction results can be improved.</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>3.6. Time consumption</title>
        <sec id="sec-2-4-1">
          <title>The hardware configurations for the experiments in this</title>
          <p>paper consisted of Intel Core i7-9700k 3.60 GHz CPU,
GeForce RTX 2080Ti GPU, and 16GB RAM. The results of
the GPU inference time are shown in Table 4. The results
in the table are the average GPU inference time of the
data set. After using recursive training to improve the
quality of pseudo-labels, we input the pseudo-labels and
original images into the models consistent with the
fullysupervised methods for training and inference. Therefore,
the GPU inference time of the proposed method is the
same as that of the fully-supervised methods. It can be
observed from the table that the inference time of DLinkNet
is the shortest. because DLinkNet compresses the feature
channel in the decoder to reduce the computational cost.
NestedUNet embeds U-Nets of diferent depths in its
architecture, which requires more convolution calculations,
thus increasing the consumption of inference time.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Conclusion</title>
      <sec id="sec-3-1">
        <title>In this paper, we proposed a network entitled NFANet.</title>
        <p>Unlike traditional convolutional neural networks that
only use global or local features for discrimination,
NFANet uses neighbor features, which allows more
representative features to be learned. We fuse these
neighbor features to obtain pseudo-labels, and improve the
label quality by recursive training. We tested it on water
data sets and compared it with advanced fully supervised
and weakly supervised methods. By using only point
labels, the proposed method obtains comparable results
with that of full supervision. As a possible future work,
we will conduct research on weakly supervised or
semisupervised methods of self-correction. Remote sensing
images collected from satellites are usually afected by
spectral variability.</p>
        <p>
          Work in [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] uses endmember dictionary and spectral
variability dictionary to model diferent spectral
variability respectively. In addition, this method provides
reasonable prior knowledge for the spectral variability
dictionary. Our proposed method considers local
neighbor pixels. Therefore, when encountering various
degeneration, noise influences and other variability factors, it
is necessary to analyze whether these variability factors
cause greater interference between neighbor pixels. If
the variability factors bring diferent efects to diferent
local areas, it is very likely that the prediction results will
lose the water bodies. In future work, we will consider
introducing cross-local features to improve the network’s
ability to learn features between diferent local water
bodies.
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chanussot</surname>
          </string-name>
          ,
          <article-title>Graph convolutional networks for hyperspectral image classification</article-title>
          ,
          <source>IEEE Transactions on Geoscience and Remote Sensing</source>
          <volume>59</volume>
          (
          <year>2021</year>
          )
          <fpage>5966</fpage>
          -
          <lpage>5978</lpage>
          . doi:
          <volume>10</volume>
          .1109/TGRS.
          <year>2020</year>
          .
          <volume>3015157</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yokoya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chanussot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Zhang,</surname>
          </string-name>
          <article-title>More diverse means better: Multimodal deep learning meets remote-sensing imagery classification</article-title>
          ,
          <source>IEEE Transactions on Geoscience and Remote Sensing</source>
          <volume>59</volume>
          (
          <year>2021</year>
          )
          <fpage>4340</fpage>
          -
          <lpage>4354</lpage>
          . doi:
          <volume>10</volume>
          .1109/TGRS.
          <year>2020</year>
          .
          <volume>3016820</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yokoya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chanussot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Interpretable hyperspectral artificial intelligence: When nonconvex modeling meets hyperspectral remote sensing</article-title>
          ,
          <source>IEEE Geoscience and Remote Sensing Magazine</source>
          <volume>9</volume>
          (
          <year>2021</year>
          )
          <fpage>52</fpage>
          -
          <lpage>87</lpage>
          . doi:
          <volume>10</volume>
          .1109/MGRS.
          <year>2021</year>
          .
          <volume>3064051</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Sea ice and open water classification of sar images using a deep learning model</article-title>
          ,
          <source>in: IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>3051</fpage>
          -
          <lpage>3054</lpage>
          . doi:
          <volume>10</volume>
          .1109/IGARSS39084.
          <year>2020</year>
          .
          <volume>9323990</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Poliyapram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Imamoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nakamura</surname>
          </string-name>
          ,
          <article-title>Deep learning model for water/ice/land classification using large-scale medium resolution satellite images</article-title>
          ,
          <source>in: IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3884</fpage>
          -
          <lpage>3887</lpage>
          . doi:
          <volume>10</volume>
          .1109/IGARSS.
          <year>2019</year>
          .
          <volume>8900323</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <article-title>Optical remote sensing image waters extraction technology based on deep learning context-unet</article-title>
          ,
          <source>in: 2019 IEEE International Conference on Signal, Information and Data Processing (ICSIDP)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . doi:
          <volume>10</volume>
          .1109/ ICSIDP47821.
          <year>2019</year>
          .
          <volume>9173433</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Diao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Wsf-net: Weakly supervised feature-fusion network for binary segmentation in remote sensing image</article-title>
          ,
          <source>Remote Sensing</source>
          <volume>10</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Sun,
          <string-name>
            <given-names>M.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <article-title>Spmf-net: Weakly supervised building segmentation by combining superpixel pooling and multi-scale feature fusion</article-title>
          ,
          <source>Remote Sensing</source>
          <volume>12</volume>
          (
          <year>2020</year>
          )
          <fpage>1049</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Azzari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. B.</given-names>
            <surname>Lobell</surname>
          </string-name>
          ,
          <article-title>Weakly supervised deep learning for segmentation of remote sensing imagery</article-title>
          ,
          <source>Remote Sensing</source>
          <volume>12</volume>
          (
          <year>2020</year>
          )
          <fpage>207</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lapedriza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oliva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <article-title>Learning deep features for discriminative localization</article-title>
          ,
          <source>CVPR</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O.</given-names>
            <surname>Ronneberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brox</surname>
          </string-name>
          , U-net:
          <article-title>Convolutional networks for biomedical image segmentation, International Conference on Medical Image Computing</article-title>
          and
          <string-name>
            <surname>Computer-Assisted Intervention</surname>
          </string-name>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Khoreva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Benenson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hosang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schiele</surname>
          </string-name>
          ,
          <article-title>Simple does it: Weakly supervised instance and semantic segmentation</article-title>
          ,
          <source>in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Long</surname>
          </string-name>
          , E. Shelhamer, T. Darrell,
          <article-title>Fully convolutional networks for semantic segmentation</article-title>
          ,
          <source>in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Sea-land segmentation with res-unet and fully connected crf</article-title>
          ,
          <source>in: IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3840</fpage>
          -
          <lpage>3843</lpage>
          . doi:
          <volume>10</volume>
          .1109/IGARSS.
          <year>2019</year>
          .
          <volume>8900625</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Siddiquee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tajbakhsh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liang</surname>
          </string-name>
          , Unet++
          <article-title>: A nested u-net architecture for medical image segmentation, 4th Deep Learning in Medical Image Analysis (DLMIA</article-title>
          )
          <string-name>
            <surname>Workshop</surname>
          </string-name>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. Wu</surname>
          </string-name>
          , D-linknet:
          <article-title>Linknet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction</article-title>
          ,
          <source>in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>192</fpage>
          -
          <lpage>1924</lpage>
          . doi:
          <volume>10</volume>
          .1109/CVPRW.
          <year>2018</year>
          .
          <volume>00034</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Papandreou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schrof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Adam</surname>
          </string-name>
          ,
          <article-title>Encoder-decoder with atrous separable convolution for semantic image segmentation</article-title>
          , Springer, Cham (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <article-title>2020 gaofen challenge on automated highresolution earth observation image interpretation</article-title>
          , http://en.sw.chreos.org/,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yokoya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chanussot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>An augmented linear mixing model to address spectral variability for hyperspectral unmixing</article-title>
          ,
          <source>IEEE Transactions on Image Processing</source>
          <volume>28</volume>
          (
          <year>2019</year>
          )
          <fpage>1923</fpage>
          -
          <lpage>1938</lpage>
          . doi:
          <volume>10</volume>
          .1109/TIP.
          <year>2018</year>
          .
          <volume>2878958</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>