<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multimodal registration of FISH and nanoSIMS images using convolutional neural networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xiaojia He</string-name>
          <email>Xiaojia.he@ul.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Suchendra M. Bhandarkar</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christof Meile</string-name>
          <email>cmeile@uga.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chemical Insights Research Institute, UL Research Institutes</institution>
          ,
          <addr-line>Marietta, GA 30067</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Marine Sciences, University of Georgia</institution>
          ,
          <addr-line>Athens, GA 30602</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Computing, University of Georgia</institution>
          ,
          <addr-line>Athens, GA 30602</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Nanoscale secondary ion mass spectrometry (nanoSIMS) and fluorescence in situ hybridization (FISH) microscopy provide high-resolution, multimodal image representations of cell identity and cell activity, respectively, for studies of targeted microbial communities in microbiological research. Despite its importance to microbiologists, multimodal registration of FISH and nanoSIMS images is challenging given the morphological distortion and background noise in both image modalities. In this paper we propose a scheme for multimodal registration of FISH and nanoSIMS images that employs convolutional neural networks (CNNs) for multiscale feature extraction, shape context for computation of the minimum transformation cost feature matching and the thin-plate spline (TPS) model for the registration of the two image modalities. Registration accuracy is quantitatively assessed against manually registered images, at both the pixel and structural levels, using standard metrics. Experimental results show that among the six CNN models that were tested, ResNet18 outperforms VGG16, VGG19, GoogLeNet, ShuffleNet and ResNet101 based on most evaluation metrics. This study demonstrates the utility of CNNs in the registration of multimodal images with significant background noise and morphology distortion. We also show that the shape of microbial aggregates, preserved by binarization, to be a robust feature for registering multimodal microbiology-related images. The proposed multimodal image registration scheme can serve as a powerful tool in microbiological research.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Multi-modal image registration</kwd>
        <kwd>Convolutional neural network</kwd>
        <kwd>Microorganisms 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Nanoscale secondary ion mass spectrometry (nanoSIMS) is a powerful tool to quantify elemental
distribution at nanometer-scale resolution [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Combining nanoSIMS imaging with fluorescence
in situ hybridization (FISH) microscopy allows one to study microbial activity and correlate it
with the identity of cells [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, the nanoSIMS and FISH images display unequal
magnification and distortion. Several image registration algorithms exploit geometrical
information to align the input images [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Notably, feature-based registration methods rely on
point- or shape-based correspondences between two images where the features, such as corners
or contours of structures, are either derived automatically from the underlying image or from
markers with known positions. Once the corresponding points are selected, their locations in the
two images are used to reconstruct a spatial transformation [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. In contrast, in intensity-based
methods, only pixel intensity values, instead of specific features, are considered to determine the
spatial transformation.
      </p>
      <p>
        Deep learning has been increasingly recognized as a powerful toolbox for multimodal image
registration, especially in medical imaging [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ] and remote sensing [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. The convolutional
neural network (CNN) is a widely used deep neural network (DNN) architecture comprising of
convolutional layers, max-pooling layers and a softmax layer, in addition to problem-specific
layers. CNNs have been used extensively for feature extraction in image classification [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ],
image segmentation [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ] and image registration [14, 15], and several variants of the CNN
architecture have been proposed for multimodal image registration [
        <xref ref-type="bibr" rid="ref6">6, 16, 17</xref>
        ], and have been
shown to be successful in solving biomedical image registration problems [18-21].
      </p>
      <p>In this paper, we present an automated scheme to register FISH and nanoSIMS images using
multiple CNN models. Although images of neither microorganisms nor microbial aggregates are
in the ImageNet database, deep CNN architectures that are pre-trained on ImageNet have been
shown to be very effective at general image feature extraction. The convolutional feature map is
extracted at multiple image resolutions and used for feature point selection. The shape context
descriptor is used to identify matched features and the thin-plate spline (TPS) model is employed
to register the FISH and nanoSIMS images by computing a transformation matrix [22]. The
results obtained using the different CNNs, feature matching approaches and transformation
computation and registration methods are compared and discussed. To the best of our
knowledge, this is the first documented application of deep CNN models to extract features from
multimodal microbial images and subsequently register them.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Materials and Methods</title>
      <p>The FISH and nanoSIMS images were acquired using the protocol proposed by McGlynn et al.
[23] and a detailed description of sample collection and preparation, measurement methodology
and data analysis is given in [23]. In brief, anaerobic methane-oxidizing consortia were obtained
from ocean sediment samples collected at Hydrate Ridge North (station HR-7) during the AT
1810 Hydrate Ridge August/September 2011 expedition. Push core sediment samples were
processed on ship and kept under an N2 atmosphere at 4°C. Slurry incubations were carried out
with anoxic filtered seawater at elevated pressure. FISH and nanoSIMS images were then
collected and manually aligned using the Matlab program Look@nanoSIMS as described in [24].
These manually aligned images were used as ground truth in this study.</p>
      <p>In our workflow, depicted in Figure 1, 41 raw RGB images and their binarized versions were
used as input. In brief, the input images were preprocessed to remove background noise and
then fed to the chosen CNN models with pretrained weights. Features were then extracted at
desired predetermined layer depths (scales) using the CNN architectures ShuffleNet [25],
GoogLeNet [26], ResNet-18 and ResNet-101 [27], VGG16 and VGG19 [28], with pretrained weights
derived from the several million training images in the ImageNet database
(http://www.imagenet.org). A subset of the extracted features was selected and further constrained to generate a
2D array of matched feature points using shape context and bipartite graph matching algorithms
[22]. Finally, the matched feature points were used for image transformation computation and
image registration using the thin-plate spline (TPS) model. Quantitative registration accuracy
metrics such as the root mean squared error (RMSE), structural similarity index (SSIM), and
average absolute intensity difference (AAID) were computed at both the pixel and structural
levels. Additional details on the above- mentioned of methods are available at
https://doi.org/10.6084/m9.figshare.26321587.v3.</p>
      <sec id="sec-2-1">
        <title>2.1. Image preprocessing</title>
        <p>FISH images are intensity measurements represented in their respective coordinate systems in
the individual RGB channels, whereas nanoSIMS images represent ion counts at each pixel
location. A global threshold was first generated using Otsu's method [29] to minimize the
intraclass variance (i.e., weighted sum of variances of black and white pixels in a binary image) and
was modified manually based on trial and error to preserve aggregate morphology. Aggregate(s)
from the FISH image were then chosen and cropped to best match the nanoSIMS image. The
resulting input images to the CNN were either raw RGB or preprocessed binary FISH and
nanoSIMS images. All input images were rescaled to a size of 224×224 pixels and fed through the
convolutional layers in the CNN.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Feature extraction and matching</title>
        <p>For the FISH and nanoSIMS images, features were extracted from the final layer of each
individual module in the CNN architecture starting with a layer size of 28×28 and proceeding to
layer sizes of 14×14 and 7×7. The selection of convolutional layers was heuristic and aimed to
include both high- and low-level features. The feature maps obtained from each layer was
normalized by applying the transformation z = (x-μ)⁄σ, where the feature x in each feature map
is assumed to be normally distributed with mean μ and standard deviation σ. Next, we generated
the feature distance map by computing the symmetric matrix of pairwise feature distance values.
We concatenated the feature distance maps from each layer to yield a single feature distance
map for each FISH and nanoSIMS image pair and processed the concatenated feature distance
map by selecting the smallest value from each row and using the match threshold to select the
top 20% matched features.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Shape context descriptor</title>
        <p>After selecting the preliminary matching features, we used the shape context descriptor to
determine the feature correspondence that minimizes a transformation cost function. The
transformation cost function quantifies the shape similarity based on the neighborhood
structure of a feature point on a shape contour. The shape context descriptor at feature point pi
is defined as a histogram hi of the relative coordinates q of the remaining n-1 feature points [22]:
ℎ ( ) = #{ ≠   : |( −   )| ∈  ( )} (1)
where the bins are designed to uniformly partition the log-polar (log, ) space ( is the radial
distance and  is the polar angle). To generate a shape context descriptor, we first computed the
Euclidian distance values between points in the matched feature map and normalized them by
the mean. Next, we computed the shape context descriptor by directly counting the points within
each radial and angular region (bin) as described above.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Bipartite graph matching</title>
        <p>We consider minimizing the total cost of matching given by</p>
        <p>( ) = ∑  (  ,   ( )) (2)
where π denotes a permutation, and C is the cost function defined as   , =
21 ∑ =1 (ℎℎ (( ))−+ℎℎ  (( )))2, where hi and hj are the obtained shape context descriptors (normalized k-bin
histograms) for the matched feature points pi and qj on the FISH and nanoSIMS images,
respectively. The resulting weighted bipartite graph matching problem based on H(π) was
solved using the efficient Jonker-Volgenant algorithm [30]. Finally, we computed the Euclidian
distance between each matched feature pair and only retained the matches that fall between the
25% and 75% quantiles as inliers. The values of the matching threshold were chosen based on
trial and error.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Transformation and Registration</title>
        <p>Given a finite set of point correspondences between two shapes, the image transformation and
registration function  : ℝ2 → ℝ2 can be realized using the TPS model [31] which performs
nonrigid registration or alignment of deformed images. The underlying transformation was modeled
as a radial-basis function where the foreground pixels of the moving image deform under the
influence of the control points pi, i = 1, . . . n.</p>
      </sec>
      <sec id="sec-2-6">
        <title>2.6. Similarity registration</title>
        <p>Similarity registration was used as a comparison to our proposed non-rigid, TPS-based
registration scheme. Using the features extracted from CNN models, similarity registration
allows for alignment of images via a combination of globally applied rigid-body translation,
rotation, and scaling operations [32, 33].</p>
      </sec>
      <sec id="sec-2-7">
        <title>2.7. Comparison to a state-of-the-art registration method and non-CNN feature extraction-based registration methods</title>
        <p>The Contrastive Multimodal Image Representations (CoMIR) scheme [34;
https://github.com/MIDA-group/CoMIR], which is shown to outperform several other
state-ofthe-art image registration methods in biomedical and remote sensing applications, was used in
this study for the purpose of comparison. To further evaluate the performance of our CNN-based
feature extraction schemes, we also implemented and evaluated a variety of traditional
nonCNN-based feature extraction methods such as the similarity-invariant, fast and robust
algorithm for local feature extraction SURF [35], scale- and rotation-invariant, fast multiscale
feature detection and description approach for nonlinear scale spaces KAZE [36], scale- and
rotation-invariant, fast feature point extraction algorithm BRISK [37], and Harris corner feature
detector [38] and features from accelerated segment test (FAST) [39].</p>
      </sec>
      <sec id="sec-2-8">
        <title>2.8. Quantitative image registration assessment</title>
        <p>The results of automated registration were compared to manually registered images (i.e., the
ground truth). Three different error metrics were employed to assess registration accuracy at
the pixel and structural levels: root mean squared error (RMSE), structural similarity index
(SSIM), and average absolute intensity difference (AAID). RMSE quantifies the difference
between registered images ( ̂,  ) by computing the square root of the mean square error of pixel
values over the RGB channels between the two images (40):
(3)
(4)
(5)</p>
        <p>The SSIM metric measures the perceived similarity in structural information between two
images and entails computing a weighted combination of the luminance index l, the contrast index
c and the structural index s (41):</p>
        <p>Here  ( ̂,  ) =
2 ̂  + 1 ,  ( ̂,  ) =</p>
        <p>2 ̂  + 2 , and  ( ̂,  ) =
 ̂
2+ 
2+ 1
 ̂
2+ 
2+ 2
 ̂  + 3
 ̂ + 3 where   ̂,   are the
local means;   ̂ ,   the standard deviations; and   ̂ the cross-covariance for images  ̂ and  ,
respectively. The weights α, β and γ were set to 1.</p>
        <p>The AAID metric is based on the absolute intensity difference between the two images ( ̂,  )
[42]:
=

1 ∑ =1 ∑ =1 ∑ =1| ̂ , , −   , , |</p>
        <p />
        <p>where M, N, and Q represent the dimension of images. Smaller RMSE and AAID values
represent a better registration result, whereas the SSIM value is larger for better aligned images.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>Visual (Figures 2 and 3) and quantitative (Table 1) comparison of our results with manually
registered images shows good agreement, signifying the advantages of the automated
registration. Image preprocessing with binary thresholding significantly improved the accuracy
of image registration compared to the raw input RGB images (the left panels in Figures 2 and 3).
It also yielded substantially better quantitative results compared to than when analyzing the raw
RGB images, as reflected in the smaller the small pixel differences in RMSE and AAID values (TPS
registration, Table 1), and the larger SSIM indices, which exceeded 0.8 for a significantly
deformed FISH image and 0.78 for a deformed FISH image with multiple connected components,
respectively (TPS registration). Additional details on the aforementioned results are available at</p>
      <p>The additional intra-aggregate features during RGB image registration, which may differ
between the FISH and nanoSIMS images, results in deterioration of the registration results
(Figures 2 and 3). It is also noted that residual component(s) in the FISH and nanoSIMS images
outside the region of interest (ROI) did not match well even after several exhaustive trial and
error iterations (see binary images in Figures 2 and 3). However, mismatches between the small
connected components due to the binarization preprocessing did not impact the registration of
our microbial aggregate images and hence there is no need to first remove the small connected
components in the two images before proceeding to align them.</p>
      <p>Our results show that TPS-based registration outperforms registration based on similarity
metrics (Table 1). With radial basis functions, TPS-based registration is capable of locally
transforming and warping the target FISH image onto the nanoSIMS image. In contrast,
similarity-based registration involving only global linear rigid-body transformations, i.e.,
rotation, scaling, and translation [43], leads to significant disparity in registration results
between TPS-based and similarity-based registration (Figure 2).</p>
      <p>The CNN models also performed well with deformed FISH images containing multiple
connected components. The analysis of a significantly deformed image (Figure 4), and a
deformed image with multiple connected components (Figure 5) revealed that ResNet and
ShuffleNet often outperformed VGG and GoogLeNet implementations. Registration using a
finetuned CNN in which the weights of a pre-trained CNN are refined by training with new data [44]
produced almost the same registration results using binary images as input (not shown). Using
raw (RGB) images as input improved the registration slightly, but the differences in registration
performance were minimal.</p>
      <p>For validation purposes, we first compared our CNN-based methods to other well-recognized
traditional feature extraction-based methods that employ SURF, KAZE, BRISK, Harris corner
detector and FAST features (see Figures S15-S16 in the supplemental materials available at
https://doi.org/10.6084/m9.figshare.26321587.v3). None of the aforementioned traditional
feature extraction-based methods produced satisfactory registration results in our tests for a
modestly deformed, a significantly deformed, and a multiple-component deformed FISH image
with a nanoSIMS image. With RGB images as input, all the aforementioned traditional feature
extraction-based methods failed completely to register the FISH images with the nanoSIMS
image due to the inherent shortcomings of the extracted and matched features.</p>
      <p>To assess the quality of our CNN-based implementations, we compared them with the results
of the state-of-the-art, pretrained CoMIR method [34] that is based on contrastive learning. Here
we considered three distinct types of deformed FISH images: a moderately deformed FISH image
(Figure 6A), significantly deformed FISH image (Figure 6B), and multiple-component deformed
FISH image (Figure 6C). CoMIR registered these FISH images with the corresponding nanoSIMS
images with high accuracy. Our proposed CNN-based methods performed comparably to the
state-of-the-art CoMIR method, while significantly outperforming the rigid-body registration
methods.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>The integration of multiple multimodal data streams is critical to gain new insights into the
functioning of microbial communities. Here we present the results of a processing pipeline that
merges spatially explicit data sets on the identity and activity of microorganisms in the form of
images. The alignment of such multimodal images to resolving individual cells can be challenging
due to image distortion.</p>
      <p>We successfully used CNNs that are pretrained on the ImageNet database to replace tedious
manual alignment and image registration. Our results indicate that all six CNN models yield high
registration accuracy at both, the pixel and structural levels (Figs. 2 and 3, Table 1), even though
the ImageNet database does not contain microbial imagery. Nevertheless, our pipeline produces
results that compare favorably with manually registered images. This good agreement illustrates
that automated registration is a valuable tool for microbial image analysis.</p>
      <p>The finding that binary thresholding significantly improved image registration shows that
aggregate shape is a useful characteristic or feature to employ and that alignment of (deformed)
aggregate contours that are consistent between image modalities yields robust results (Figures
2 and 3; Table 1). Our analysis also shows that the selection of regions of interest is not extremely
critical and that the results are not sensitive to small mismatches. This is largely due to the
observation that features extracted using the CNNs were mostly found to be from the dominant
object in the image (Figure 5). This facilitates the alignment of FISH and nanoSIMS images, with
the former covering larger areas compared to the more detailed, high-resolution nanoSIMS
observations. However, the registration performance is negatively impacted when objects in the
image are fragmented resulting in the absence of a dominant object.</p>
      <p>We further demonstrate that the use of more involved registration methods can improve the
results substantially. While computationally more intensive, TPS-based registration introduces
smooth, elastic deformations, producing a reasonably well-aligned image even for a significantly
deformed FISH image (Figure 2). This finding is consistent with the reported high accuracy and
robustness of TPS in data interpolation and image registration [45].</p>
      <p>While all CNNs performed well – better than several standard extraction methods, and
comparable to CoMIR, there are some differences between them. Notably, features extracted by
ResNet and ShuffleNet were generally more complex than those extracted by their VGG and
GoogLeNet counterparts; thus, potentially contributing to slightly better registration results for
a significantly deformed FISH image (Figure 2 and Table 1), or a deformed FISH image with
multiple components (Figure 3). Moreover, we found that fine-tuning did not improve
registration significantly. As it consumes significantly more computing power and takes
substantially longer to finish, we deemed that fine-tuning is not necessary for this type of image
registration task. Lastly, graph theory-based [46] and phase-based [47] image registration
techniques have also demonstrated promising registration accuracy for multimodal images.
Future avenues of work will include the incorporation of these techniques. The code for our
processing pipeline is publicly available on the Bitbucket repository at
https://bitbucket.org//MeileLab/he_imageregistration/src/master.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>Our workflow employed advanced CNN models to successfully extract shared feature points in
FISH and nanoSIMS images for multimodal image registration. CNN-derived, feature-based
nonrigid TPS registration methods significantly outperformed conventional similarity-based
rigidbody registration methods and produced registration results that were very comparable to those
of the state-of-art method CoMIR method that is based on contrastive learning. We tested six CNN
models using TPS-based non-rigid registration for different FISH and nanoSIMS images. The
differences between the registration results obtained from the different CNN models considered
in this study were minor. We demonstrated that image preprocessing with binarization is critical
for final image registration and aggregate shape is a robust feature for microbiology-derived
images such as FISH and nanoSIMS images. This may be largely owing to the significant
differences in intra-aggregate patterns between the FISH and nanoSIMS images, leading to poor
registration performance when using raw RGB images as input. It is also noted that images with
significant background noise (non-ROI components) that cannot be easily removed via simple
thresholding and binarization still pose a significant challenge. This highlights the importance of
aggregate morphology and reducing background noise in images with multiple aggregates.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>We thank Victoria Orphan and Gray Chadwick for providing the FISH and nanoSIMS images and
the data on manual image registration used in [23]. This work was supported by the U.S.
Department of Energy, Office of Science, Office of Biological and Environmental Research,
Genomic Sciences Program under award numbers DE-SC0020373 and DE-SC0022991 to
Christof Meile.
[14] H. Sokooti, et al., Nonrigid image registration using multi-scale 3D convolutional neural
networks. In the Proceedings of the International Conference on Medical Image Computing
and Computer-Assisted Intervention (MICCAI) 2017. Springer.
[15] P. Jiang and J.A. Shackleford. CNN driven sparse multi-level b-spline image registration. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
2018.
[16] H. Uzunova, et al., Training CNNs for image registration from few samples with model-based
data augmentation. In the Proceedings of the International Conference on Medical Image
Computing and Computer-Assisted Intervention (MICCAI). 2017. Springer.
[17] E. Ferrante, et al., On the adaptability of unsupervised CNN-based deformable image
registration to unseen image domains. In the Proceedings of the International Workshop on
Machine Learning in Medical Imaging. 2018. Springer.
[18] J.A. Lee, et al., A deep step pattern representation for multimodal retinal image registration.</p>
      <p>In the Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV),
2019: p. 5077-5086.
[19] J. Hu, et al., End-to-end multimodal image registration via reinforcement learning. Medical</p>
      <p>Image Analysis, 2021(68): p. 101878.
[20] A. Hering, et al., CNN-based Lung CT Registration with Multiple Anatomical Constraints.</p>
      <p>Medical Image Analysis, 2021: p. 102139.
[21] H.R. Boveiri, et al., Medical image registration using deep neural networks: A comprehensive
review. Computers &amp; Electrical Engineering, 2020. 87: p. 106767.
[22] S. Belongie and J. Malik. Matching with Shape Contexts. In the 2000 Proceedings Workshop
on Content-based Access of Image and Video Libraries. 2000. Head Island, SC, USA.
[23] S.E. McGlynn, et al., Single cell activity reveals direct electron transfer in methanotrophic
consortia. Nature, 2015. 526(7574): p. 531-535.
[24] L. Polerecky, et al., Look@NanoSIMS-a tool for the analysis of nanoSIMS data in
environmental microbiology. Environmental Microbiology, 2012. 14(4): p. 1009-1023.
[25] X. Zhang, et al., ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile
Devices. In the Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition. 2018.
[26] C. Szegedy, et al., Going deeper with convolutions. In the Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition. 2015.
[27] K. He, et al., Deep residual learning for image recognition. In the Proceedings of the IEEE</p>
      <p>Conference on Computer Vision and Pattern Recognition. 2016.
[28] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image
recognition. In the Proceedings of the 3rd International Conference on Learning
Representations. 2015: San Diego, CA, USA.
[29] N. Otsu, A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on</p>
      <p>Systems, Man, and Cybernetics. 1979, 9(1), 62–66.
[30] R. Jonker and A. Volgenant, A shortest augmenting path algorithm for dense and sparse
linear assignment problems. Computing, 1987. 38(4): p. 325-340.
[31] M.J.D. Powell, A thin plate spline method for mapping curves into curves in two dimensions.</p>
      <p>Computational Techniques and Applications (CTAC '95), 1995.
[32] A. Goshtasby, Image registration by local approximation methods. Image and Vision</p>
      <p>Computing, 1988. 6: p. 255-261.
[33] A. Goshtasby, Piecewise linear mapping functions for image registration. Pattern</p>
      <p>Recognition, 1986. 19: p. 459-466.
[34] N. Pielawski, et al., CoMIR: Contrastive multimodal image representation for registration. In
the Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS
2020). 2021: Vancouver, Canada.
[35] H. Bay, et al., SURF:Speeded Up Robust Features. Computer Vision and Image Understanding
(CVIU), 2008. 110(3): p. 346–359.
[36] P.F. Alcantarilla, A. Bartoli, and A.J. Davison, KAZE Features. In the Proceedings of the
Computer Vision – ECCV 2012. Lecture Notes in Computer Science, A. Fitzgibbon, et al.,
Editors. 2012, Springer: Berlin, Heidelberg. p. 7577.
[37] S. Leutenegger, M. Chli, and R. Siegwart, BRISK: Binary Robust Invariant Scalable Keypoints.</p>
      <p>In the Proceedings of the 2011 International Conference on Computer Vision. 2011.</p>
      <p>Barcelona, Spain.
[38] C. Harris and M. Stephens, A Combined Corner and Edge Detector. In the Proceedings of the
4th Alvey Vision Conference, 1988: p. 147-151.
[39] E. Rosten and T. Drummond, Fusing Points and Lines for High Performance Tracking. In the
Proceedings of the IEEE International Conference on Computer Vision, 2005. 2: p. 1508–
1511.
[40] Y. Bentoutou, et al., An automatic image registration for applications in remote sensing. IEEE</p>
      <p>Transactions on Geoscience and Remote Sensing, 2005. 43(9), pp.2127-2137.
[41] Z. Wang, et al., Image quality assessment: from error visibility to structural similarity. IEEE</p>
      <p>Transactions on Image Processing, 2004. 13(4), pp.600-612.
[42] Z. Zhang, et al., A new image registration algorithm based on evidential reasoning. Sensors,
2019. 19(5): p. 1091.
[43] A. Goshtasby, 2-D and 3-D Image Registration for Medical, Remote Sensing, and Industrial</p>
      <p>Applications. 2005: John Wiley &amp; Sons.
[44] N. Tajbakhsh, et al., Convolutional neural networks for medical image analysis: Full training
or fine tuning? IEEE Transactions on Medical Imaging, 2016. 35(5): p. 1299-1312.
[45] R. Sprengel, et al., Thin-plate spline approximation for image registration. In the Proceedings
of the 18th Annual International Conference of the IEEE Engineering in Medicine and
Biology Society, 1996. 3: p. 1190-1191.
[46] B.W. Papież, et al., Non-local graph-based regularization for deformable image registration.</p>
      <p>In the Proceedings of the Medical Computer Vision and Bayesian and Graphical Models for
Biomedical Imaging: MICCAI 2016 International Workshops, MCV and BAMBI, Athens,
Greece, October 21, 2016, pp. 199-207. Springer International Publishing.
[47] L. Tautz, et al., 2010, April. Phase-based non-rigid registration of myocardial perfusion MRI
image sequences. In the Proceedings of the 2010 IEEE International Symposium on
Biomedical Imaging: From Nano to Macro, pp. 516-519. IEEE.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.G.</given-names>
            <surname>Boxer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.L.</given-names>
            <surname>Kraft</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.K.</given-names>
            <surname>Weber</surname>
          </string-name>
          ,
          <article-title>Advances in imaging secondary ion mass spectrometry for biological samples</article-title>
          .
          <source>Annual Review of Biophysics</source>
          ,
          <year>2009</year>
          .
          <volume>38</volume>
          : p.
          <fpage>53</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.E.</given-names>
            <surname>Dekas</surname>
          </string-name>
          , et al.,
          <article-title>Activity and interactions of methane seep microorganisms assessed by parallel transcription and FISH-NanoSIMS analyses</article-title>
          .
          <source>The ISME Journal</source>
          ,
          <year>2015</year>
          .
          <volume>10</volume>
          : p.
          <fpage>678</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>L.G. Brown,</surname>
          </string-name>
          <article-title>A survey of image registration techniques</article-title>
          .
          <source>ACM Computing Surveys</source>
          ,
          <year>1992</year>
          .
          <volume>24</volume>
          (
          <issue>4</issue>
          ): p.
          <fpage>325</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.S.</given-names>
            <surname>Heckbert</surname>
          </string-name>
          ,
          <article-title>Fundamentals of texture mapping and image warping</article-title>
          .
          <source>MS Thesis in Electrical Engineering and Computer Science</source>
          .
          <year>1989</year>
          , University of California, Berkeley: Berkeley, CA. p.
          <fpage>88</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Arad</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Reisfeld</surname>
          </string-name>
          ,
          <article-title>Image warping using few anchor points and radial functions</article-title>
          ,
          <source>In the Proceedings of Computer Graphics Forum</source>
          .
          <year>1995</year>
          , Blackwell Science Ltd. p.
          <fpage>35</fpage>
          -
          <lpage>46</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hermessi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Mourali</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Zagrouba</surname>
          </string-name>
          ,
          <article-title>Convolutional neural network-based multimodal image fusion via similarity learning in the shearlet domain</article-title>
          .
          <source>Neural Computing and Applications</source>
          ,
          <year>2018</year>
          .
          <volume>30</volume>
          (
          <issue>7</issue>
          ): p.
          <fpage>2029</fpage>
          -
          <lpage>2045</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Haskins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Kruger</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>Deep learning in medical image registration: a survey</article-title>
          .
          <source>Machine Vision and Applications</source>
          ,
          <year>2020</year>
          .
          <volume>31</volume>
          (
          <issue>1</issue>
          ): p.
          <fpage>8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          , et al.,
          <article-title>Multimodal image alignment through a multiscale chain of neural networks with application to remote sensing</article-title>
          .
          <source>In the Proceedings of the European Conference on Computer Vision (ECCV)</source>
          .
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al.,
          <article-title>Registration of multimodal remote sensing image based on deep fully convolutional neural network</article-title>
          .
          <source>IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing</source>
          ,
          <year>2019</year>
          .
          <volume>12</volume>
          (
          <issue>8</issue>
          ): p.
          <fpage>3028</fpage>
          -
          <lpage>3042</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Q.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <article-title>Diverse region-based CNN for hyperspectral image classification</article-title>
          .
          <source>IEEE Transactions on Image Processing</source>
          ,
          <year>2018</year>
          .
          <volume>27</volume>
          (
          <issue>6</issue>
          ): p.
          <fpage>2623</fpage>
          -
          <lpage>2634</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>Q</surname>
          </string-name>
          . Liu, and
          <string-name>
            <given-names>W.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <article-title>A new image classification method using CNN transfer learning and web data augmentation</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <year>2018</year>
          .
          <volume>95</volume>
          : p.
          <fpage>43</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kayalibay</surname>
          </string-name>
          , G. Jensen, and
          <string-name>
            <surname>P. van der Smagt</surname>
          </string-name>
          ,
          <article-title>CNN-based segmentation of medical imaging data</article-title>
          .
          <source>arXiv preprint arXiv:1701.03056</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bao</surname>
          </string-name>
          and
          <string-name>
            <surname>A.C.</surname>
          </string-name>
          <article-title>Chung, Multi-scale structured CNN with label consistency for brain MR image segmentation</article-title>
          .
          <source>Computer Methods in Biomechanics and Biomedical Engineering: Imaging &amp; Visualization</source>
          ,
          <year>2018</year>
          .
          <volume>6</volume>
          (
          <issue>1</issue>
          ): p.
          <fpage>113</fpage>
          -
          <lpage>117</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>