Multimodal registration of FISH and nanoSIMS images using convolutional neural networks Xiaojia He1,2,*, Suchendra M. Bhandarkar3 and Christof Meile1 1 Department of Marine Sciences, University of Georgia, Athens, GA 30602, USA 2 Chemical Insights Research Institute, UL Research Institutes, Marietta, GA 30067, USA 3 School of Computing, University of Georgia, Athens, GA 30602, USA Abstract Nanoscale secondary ion mass spectrometry (nanoSIMS) and fluorescence in situ hybridization (FISH) microscopy provide high-resolution, multimodal image representations of cell identity and cell activity, respectively, for studies of targeted microbial communities in microbiological research. Despite its importance to microbiologists, multimodal registration of FISH and nanoSIMS images is challenging given the morphological distortion and background noise in both image modalities. In this paper we propose a scheme for multimodal registration of FISH and nanoSIMS images that employs convolutional neural networks (CNNs) for multiscale feature extraction, shape context for computation of the minimum transformation cost feature matching and the thin-plate spline (TPS) model for the registration of the two image modalities. Registration accuracy is quantitatively assessed against manually registered images, at both the pixel and structural levels, using standard metrics. Experimental results show that among the six CNN models that were tested, ResNet18 outperforms VGG16, VGG19, GoogLeNet, ShuffleNet and ResNet101 based on most evaluation metrics. This study demonstrates the utility of CNNs in the registration of multimodal images with significant background noise and morphology distortion. We also show that the shape of microbial aggregates, preserved by binarization, to be a robust feature for registering multimodal microbiology-related images. The proposed multimodal image registration scheme can serve as a powerful tool in microbiological research. Keywords Multi-modal image registration, Convolutional neural network, Microorganisms 1 1. Introduction Nanoscale secondary ion mass spectrometry (nanoSIMS) is a powerful tool to quantify elemental distribution at nanometer-scale resolution [1]. Combining nanoSIMS imaging with fluorescence in situ hybridization (FISH) microscopy allows one to study microbial activity and correlate it with the identity of cells [2]. However, the nanoSIMS and FISH images display unequal magnification and distortion. Several image registration algorithms exploit geometrical information to align the input images [3]. Notably, feature-based registration methods rely on point- or shape-based correspondences between two images where the features, such as corners or contours of structures, are either derived automatically from the underlying image or from markers with known positions. Once the corresponding points are selected, their locations in the two images are used to reconstruct a spatial transformation [4, 5]. In contrast, in intensity-based methods, only pixel intensity values, instead of specific features, are considered to determine the spatial transformation. CVCS2024: the 12th Colour and Visual Computing Symposium, September 5–6, 2024, Gjøvik, Norway ∗ Corresponding author. Xiaojia.he@ul.org (X. He); suchi@uga.edu (S. Bhandarkar); cmeile@uga.edu (C. Meile) 0000-0001-8274-5564 (X. He); 0000-0003-2930-4190 (S. Bhandarkar); 0000-0002-0825-4596 (C. Meile) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Deep learning has been increasingly recognized as a powerful toolbox for multimodal image registration, especially in medical imaging [6, 7] and remote sensing [8, 9]. The convolutional neural network (CNN) is a widely used deep neural network (DNN) architecture comprising of convolutional layers, max-pooling layers and a softmax layer, in addition to problem-specific layers. CNNs have been used extensively for feature extraction in image classification [10, 11], image segmentation [12, 13] and image registration [14, 15], and several variants of the CNN architecture have been proposed for multimodal image registration [6, 16, 17], and have been shown to be successful in solving biomedical image registration problems [18-21]. In this paper, we present an automated scheme to register FISH and nanoSIMS images using multiple CNN models. Although images of neither microorganisms nor microbial aggregates are in the ImageNet database, deep CNN architectures that are pre-trained on ImageNet have been shown to be very effective at general image feature extraction. The convolutional feature map is extracted at multiple image resolutions and used for feature point selection. The shape context descriptor is used to identify matched features and the thin-plate spline (TPS) model is employed to register the FISH and nanoSIMS images by computing a transformation matrix [22]. The results obtained using the different CNNs, feature matching approaches and transformation computation and registration methods are compared and discussed. To the best of our knowledge, this is the first documented application of deep CNN models to extract features from multimodal microbial images and subsequently register them. 2. Materials and Methods The FISH and nanoSIMS images were acquired using the protocol proposed by McGlynn et al. [23] and a detailed description of sample collection and preparation, measurement methodology and data analysis is given in [23]. In brief, anaerobic methane-oxidizing consortia were obtained from ocean sediment samples collected at Hydrate Ridge North (station HR-7) during the AT 18- 10 Hydrate Ridge August/September 2011 expedition. Push core sediment samples were processed on ship and kept under an N2 atmosphere at 4°C. Slurry incubations were carried out with anoxic filtered seawater at elevated pressure. FISH and nanoSIMS images were then collected and manually aligned using the Matlab program Look@nanoSIMS as described in [24]. These manually aligned images were used as ground truth in this study. In our workflow, depicted in Figure 1, 41 raw RGB images and their binarized versions were used as input. In brief, the input images were preprocessed to remove background noise and then fed to the chosen CNN models with pretrained weights. Features were then extracted at desired predetermined layer depths (scales) using the CNN architectures ShuffleNet [25], GoogLeNet [26], ResNet-18 and ResNet-101 [27], VGG16 and VGG19 [28], with pretrained weights derived from the several million training images in the ImageNet database (http://www.image- net.org). A subset of the extracted features was selected and further constrained to generate a 2- D array of matched feature points using shape context and bipartite graph matching algorithms [22]. Finally, the matched feature points were used for image transformation computation and image registration using the thin-plate spline (TPS) model. Quantitative registration accuracy metrics such as the root mean squared error (RMSE), structural similarity index (SSIM), and average absolute intensity difference (AAID) were computed at both the pixel and structural levels. Additional details on the above- mentioned of methods are available at https://doi.org/10.6084/m9.figshare.26321587.v3. Figure. 1: Workflow for multimodal registration of FISH and nanoSIMS images. 2.1. Image preprocessing FISH images are intensity measurements represented in their respective coordinate systems in the individual RGB channels, whereas nanoSIMS images represent ion counts at each pixel location. A global threshold was first generated using Otsu's method [29] to minimize the intra- class variance (i.e., weighted sum of variances of black and white pixels in a binary image) and was modified manually based on trial and error to preserve aggregate morphology. Aggregate(s) from the FISH image were then chosen and cropped to best match the nanoSIMS image. The resulting input images to the CNN were either raw RGB or preprocessed binary FISH and nanoSIMS images. All input images were rescaled to a size of 224×224 pixels and fed through the convolutional layers in the CNN. 2.2. Feature extraction and matching For the FISH and nanoSIMS images, features were extracted from the final layer of each individual module in the CNN architecture starting with a layer size of 28×28 and proceeding to layer sizes of 14×14 and 7×7. The selection of convolutional layers was heuristic and aimed to include both high- and low-level features. The feature maps obtained from each layer was normalized by applying the transformation z = (x-μ)⁄σ, where the feature x in each feature map is assumed to be normally distributed with mean μ and standard deviation σ. Next, we generated the feature distance map by computing the symmetric matrix of pairwise feature distance values. We concatenated the feature distance maps from each layer to yield a single feature distance map for each FISH and nanoSIMS image pair and processed the concatenated feature distance map by selecting the smallest value from each row and using the match threshold to select the top 20% matched features. 2.3. Shape context descriptor After selecting the preliminary matching features, we used the shape context descriptor to determine the feature correspondence that minimizes a transformation cost function. The transformation cost function quantifies the shape similarity based on the neighborhood structure of a feature point on a shape contour. The shape context descriptor at feature point pi is defined as a histogram hi of the relative coordinates q of the remaining n-1 feature points [22]: ℎ𝑖 (𝑘) = #{𝑞 ≠ 𝑝𝑖 : |(𝑞 − 𝑝𝑖 )| ∈ 𝑏𝑖𝑛(𝑘)} (1) where the bins are designed to uniformly partition the log-polar (log, ) space ( is the radial distance and  is the polar angle). To generate a shape context descriptor, we first computed the Euclidian distance values between points in the matched feature map and normalized them by the mean. Next, we computed the shape context descriptor by directly counting the points within each radial and angular region (bin) as described above. 2.4. Bipartite graph matching We consider minimizing the total cost of matching given by 𝐻(𝜋) = ∑𝑖 𝐶(𝑝𝑖 , 𝑞𝜋(𝑖) ) (2) where π denotes a permutation, and C is the cost function defined as 𝐶𝑖,𝑗 = (ℎ𝑖 (𝑘)−ℎ𝑗 (𝑘))2 1 2 ∑𝐾 𝑘=1 ℎ𝑖 (𝑘)+ℎ𝑗 (𝑘) , where hi and hj are the obtained shape context descriptors (normalized k-bin histograms) for the matched feature points pi and qj on the FISH and nanoSIMS images, respectively. The resulting weighted bipartite graph matching problem based on H(π) was solved using the efficient Jonker-Volgenant algorithm [30]. Finally, we computed the Euclidian distance between each matched feature pair and only retained the matches that fall between the 25% and 75% quantiles as inliers. The values of the matching threshold were chosen based on trial and error. 2.5. Transformation and Registration Given a finite set of point correspondences between two shapes, the image transformation and registration function 𝑇: ℝ2 → ℝ2 can be realized using the TPS model [31] which performs non- rigid registration or alignment of deformed images. The underlying transformation was modeled as a radial-basis function where the foreground pixels of the moving image deform under the influence of the control points pi, i = 1, . . . n. 2.6. Similarity registration Similarity registration was used as a comparison to our proposed non-rigid, TPS-based registration scheme. Using the features extracted from CNN models, similarity registration allows for alignment of images via a combination of globally applied rigid-body translation, rotation, and scaling operations [32, 33]. 2.7. Comparison to a state-of-the-art registration method and non-CNN feature extraction-based registration methods The Contrastive Multimodal Image Representations (CoMIR) scheme [34; https://github.com/MIDA-group/CoMIR], which is shown to outperform several other state-of- the-art image registration methods in biomedical and remote sensing applications, was used in this study for the purpose of comparison. To further evaluate the performance of our CNN-based feature extraction schemes, we also implemented and evaluated a variety of traditional non- CNN-based feature extraction methods such as the similarity-invariant, fast and robust algorithm for local feature extraction SURF [35], scale- and rotation-invariant, fast multiscale feature detection and description approach for nonlinear scale spaces KAZE [36], scale- and rotation-invariant, fast feature point extraction algorithm BRISK [37], and Harris corner feature detector [38] and features from accelerated segment test (FAST) [39]. 2.8. Quantitative image registration assessment The results of automated registration were compared to manually registered images (i.e., the ground truth). Three different error metrics were employed to assess registration accuracy at the pixel and structural levels: root mean squared error (RMSE), structural similarity index (SSIM), and average absolute intensity difference (AAID). RMSE quantifies the difference between registered images (𝑦̂, 𝑦) by computing the square root of the mean square error of pixel values over the RGB channels between the two images (40): ∑𝑁 (𝑦̂ −𝑦𝑖 )2 𝑅𝑀𝑆𝐸 = √ 𝑖=1 𝑁𝑖 (3) The SSIM metric measures the perceived similarity in structural information between two images and entails computing a weighted combination of the luminance index l, the contrast index c and the structural index s (41): 𝑆𝑆𝐼𝑀 = [𝑙(𝑦̂, 𝑦)]𝛼 [𝑐(𝑦̂, 𝑦)]𝛽 [𝑠(𝑦̂, 𝑦)]𝛾 (4) 2𝜇𝑦 ̂ 𝜇𝑦 +𝑐1 2𝜎𝑦 ̂ 𝜎𝑦 +𝑐2 𝜎𝑦 ̂ 𝑦 +𝑐3 Here 𝑙(𝑦̂, 𝑦) = 𝜇 2 +𝜇 2 +𝑐 , 𝑐(𝑦̂, 𝑦) = 𝜎 2 +𝜎 2 +𝑐 , and 𝑠(𝑦̂, 𝑦) = 𝜎 𝜎 +𝑐 where 𝜇𝑦̂ , 𝜇𝑦 are the ̂ 𝑦 𝑦 1 ̂ 𝑦 𝑦 2 ̂ 𝑦 𝑦 3 local means; 𝜎𝑦̂ , 𝜎𝑦 the standard deviations; and 𝜎𝑦̂𝑦 the cross-covariance for images 𝑦̂ and 𝑦, respectively. The weights α, β and γ were set to 1. The AAID metric is based on the absolute intensity difference between the two images (𝑦̂, 𝑦) [42]: 1 𝑄 𝐴𝐴𝐼𝐷 = ∑𝑀 𝑁 𝑖=1 ∑𝑗=1 ∑𝑘=1|𝑦 ̂𝑖,𝑗,𝑘 − 𝑦𝑖,𝑗,𝑘 | (5) 𝑀𝑁𝑄 where M, N, and Q represent the dimension of images. Smaller RMSE and AAID values represent a better registration result, whereas the SSIM value is larger for better aligned images. 3. Results Visual (Figures 2 and 3) and quantitative (Table 1) comparison of our results with manually registered images shows good agreement, signifying the advantages of the automated registration. Image preprocessing with binary thresholding significantly improved the accuracy of image registration compared to the raw input RGB images (the left panels in Figures 2 and 3). It also yielded substantially better quantitative results compared to than when analyzing the raw RGB images, as reflected in the smaller the small pixel differences in RMSE and AAID values (TPS registration, Table 1), and the larger SSIM indices, which exceeded 0.8 for a significantly deformed FISH image and 0.78 for a deformed FISH image with multiple connected components, respectively (TPS registration). Additional details on the aforementioned results are available at https://doi.org/10.6084/m9.figshare.26321587.v3. The additional intra-aggregate features during RGB image registration, which may differ between the FISH and nanoSIMS images, results in deterioration of the registration results (Figures 2 and 3). It is also noted that residual component(s) in the FISH and nanoSIMS images outside the region of interest (ROI) did not match well even after several exhaustive trial and error iterations (see binary images in Figures 2 and 3). However, mismatches between the small connected components due to the binarization preprocessing did not impact the registration of our microbial aggregate images and hence there is no need to first remove the small connected components in the two images before proceeding to align them. Our results show that TPS-based registration outperforms registration based on similarity metrics (Table 1). With radial basis functions, TPS-based registration is capable of locally transforming and warping the target FISH image onto the nanoSIMS image. In contrast, similarity-based registration involving only global linear rigid-body transformations, i.e., rotation, scaling, and translation [43], leads to significant disparity in registration results between TPS-based and similarity-based registration (Figure 2). Table 1. Image registration accuracy for a significantly deformed FISH image and a deformed FISH image with multiple connected components. RMSE and AAID values are smaller when the registration result is better, whereas the SSIM value is larger for a better aligned image. The best performance measure compared across CNNs is highlighted in gray shade. The best performance measure for a registration scheme and CNN architecture for a given image type is indicated in bold font. The best performance measure for each aggregate is indicated with an asterisk (*). Significantly deformed Binary RGB Similarity TPS Similarity TPS RMSE AAID SSIM RMSE AAID SSIM RMSE AAID SSIM RMSE AAID SSIM GoogLeNet 19.26 6.41 0.83 19.48 2.68 61.03 11.4 0.48 0.82 75.05 60.91 0.2 ResNet101 19.66 4.93 0.814 13.97 1.66* 0.88 75.07 63.37 0.21 49.53 19.77 0.53 ResNet18 25.01 8.66 0.75 12.44* 1.67 0.89* 54.80 32.06 0.51 32.45 6.65 0.7 ShuffleNet 15.52 4.68 0.85 13.23 1.83 0.88 54.80 33.93 0.48 44.7 9.36 0.62 VGG16 39.04 17.75 0.67 14.38 2.25 0.87 66.49 51.63 0.24 45.74 10.59 0.6 VGG19 15.68 4.4 0.86 19.84 2.1 0.81 67.67 51.45 0.23 54.67 23.39 0.53 Deformed with multiple components Binary RGB Similarity TPS Similarity TPS RMSE AAID SSIM RMSE AAID SSIM RMSE AAID SSIM RMSE AAID SSIM GoogLeNet 38.69 2.5 0.75 27.18 4.02 0.8 39.96 3.06 0.74 52.73 7.53 0.62 ResNet101 38.78 2.56 0.74 31.15 4.56 0.78 38.81 4.28 0.74 45.07 6.57 0.71 ResNet18 43.34 3.09 0.71 29.92 3.56 0.79 40.98 4.8 0.719 52.69 7.51 0.63 ShuffleNet 40.97 3.2 0.72 29.50 5.43 0.78 39.12 2.49* 0.75 34.48 2.8 0.76 VGG16 41.47 3.41 0.72 29.29 5.27 0.78 39 3.74 0.75 44.37 4.19 0.69 VGG19 41.08 3.17 0.73 25.62* 5.22 0.81* 39.18 3.1 0.75 55.21 6.07 0.63 Figure 2: Registration of a significantly deformed FISH image and nanoSIMS image using binary (left) and RGB (right) images as input with TPS-based (upper two panels) and similarity- based (lower two panels) schemes. Figure 3: Same as Figure 2, but for a deformed FISH image with multiple components. Orange rectangles indicate missing components in either nanoSIMS or FISH images after binarization. The CNN models also performed well with deformed FISH images containing multiple connected components. The analysis of a significantly deformed image (Figure 4), and a deformed image with multiple connected components (Figure 5) revealed that ResNet and ShuffleNet often outperformed VGG and GoogLeNet implementations. Registration using a fine- tuned CNN in which the weights of a pre-trained CNN are refined by training with new data [44] produced almost the same registration results using binary images as input (not shown). Using raw (RGB) images as input improved the registration slightly, but the differences in registration performance were minimal. Figure 4: Feature correspondences after final thresholding during registration of a significantly deformed FISH image and nanoSIMS image using binary (upper row for each method) and RGB inputs (lower row for each method). For validation purposes, we first compared our CNN-based methods to other well-recognized traditional feature extraction-based methods that employ SURF, KAZE, BRISK, Harris corner detector and FAST features (see Figures S15-S16 in the supplemental materials available at https://doi.org/10.6084/m9.figshare.26321587.v3). None of the aforementioned traditional feature extraction-based methods produced satisfactory registration results in our tests for a modestly deformed, a significantly deformed, and a multiple-component deformed FISH image with a nanoSIMS image. With RGB images as input, all the aforementioned traditional feature extraction-based methods failed completely to register the FISH images with the nanoSIMS image due to the inherent shortcomings of the extracted and matched features. Figure 5: Same as Figure 4, but for a deformed FISH image with multiple components. To assess the quality of our CNN-based implementations, we compared them with the results of the state-of-the-art, pretrained CoMIR method [34] that is based on contrastive learning. Here we considered three distinct types of deformed FISH images: a moderately deformed FISH image (Figure 6A), significantly deformed FISH image (Figure 6B), and multiple-component deformed FISH image (Figure 6C). CoMIR registered these FISH images with the corresponding nanoSIMS images with high accuracy. Our proposed CNN-based methods performed comparably to the state-of-the-art CoMIR method, while significantly outperforming the rigid-body registration methods. Figure 6: Comparison of state-of-art method to our proposed CNN models. Image registration accuracy for a moderately deformed FISH image (A), a significantly deformed FISH image (B), and a deformed FISH image with multiple components (C), quantified by the RMSE (upper panel), AAID (middle panel), and SSIM (lower panel). RMSE and AAID values are smaller when the registration result is better, whereas higher SSIM values denote a better aligned image. The asterisks indicate better performance with either smaller RMSE and AAID values, or higher SSIM values. 4. Discussion The integration of multiple multimodal data streams is critical to gain new insights into the functioning of microbial communities. Here we present the results of a processing pipeline that merges spatially explicit data sets on the identity and activity of microorganisms in the form of images. The alignment of such multimodal images to resolving individual cells can be challenging due to image distortion. We successfully used CNNs that are pretrained on the ImageNet database to replace tedious manual alignment and image registration. Our results indicate that all six CNN models yield high registration accuracy at both, the pixel and structural levels (Figs. 2 and 3, Table 1), even though the ImageNet database does not contain microbial imagery. Nevertheless, our pipeline produces results that compare favorably with manually registered images. This good agreement illustrates that automated registration is a valuable tool for microbial image analysis. The finding that binary thresholding significantly improved image registration shows that aggregate shape is a useful characteristic or feature to employ and that alignment of (deformed) aggregate contours that are consistent between image modalities yields robust results (Figures 2 and 3; Table 1). Our analysis also shows that the selection of regions of interest is not extremely critical and that the results are not sensitive to small mismatches. This is largely due to the observation that features extracted using the CNNs were mostly found to be from the dominant object in the image (Figure 5). This facilitates the alignment of FISH and nanoSIMS images, with the former covering larger areas compared to the more detailed, high-resolution nanoSIMS observations. However, the registration performance is negatively impacted when objects in the image are fragmented resulting in the absence of a dominant object. We further demonstrate that the use of more involved registration methods can improve the results substantially. While computationally more intensive, TPS-based registration introduces smooth, elastic deformations, producing a reasonably well-aligned image even for a significantly deformed FISH image (Figure 2). This finding is consistent with the reported high accuracy and robustness of TPS in data interpolation and image registration [45]. While all CNNs performed well – better than several standard extraction methods, and comparable to CoMIR, there are some differences between them. Notably, features extracted by ResNet and ShuffleNet were generally more complex than those extracted by their VGG and GoogLeNet counterparts; thus, potentially contributing to slightly better registration results for a significantly deformed FISH image (Figure 2 and Table 1), or a deformed FISH image with multiple components (Figure 3). Moreover, we found that fine-tuning did not improve registration significantly. As it consumes significantly more computing power and takes substantially longer to finish, we deemed that fine-tuning is not necessary for this type of image registration task. Lastly, graph theory-based [46] and phase-based [47] image registration techniques have also demonstrated promising registration accuracy for multimodal images. Future avenues of work will include the incorporation of these techniques. The code for our processing pipeline is publicly available on the Bitbucket repository at https://bitbucket.org//MeileLab/he_imageregistration/src/master. 5. Conclusions Our workflow employed advanced CNN models to successfully extract shared feature points in FISH and nanoSIMS images for multimodal image registration. CNN-derived, feature-based non- rigid TPS registration methods significantly outperformed conventional similarity-based rigid- body registration methods and produced registration results that were very comparable to those of the state-of-art method CoMIR method that is based on contrastive learning. We tested six CNN models using TPS-based non-rigid registration for different FISH and nanoSIMS images. The differences between the registration results obtained from the different CNN models considered in this study were minor. We demonstrated that image preprocessing with binarization is critical for final image registration and aggregate shape is a robust feature for microbiology-derived images such as FISH and nanoSIMS images. This may be largely owing to the significant differences in intra-aggregate patterns between the FISH and nanoSIMS images, leading to poor registration performance when using raw RGB images as input. It is also noted that images with significant background noise (non-ROI components) that cannot be easily removed via simple thresholding and binarization still pose a significant challenge. This highlights the importance of aggregate morphology and reducing background noise in images with multiple aggregates. Acknowledgements We thank Victoria Orphan and Gray Chadwick for providing the FISH and nanoSIMS images and the data on manual image registration used in [23]. This work was supported by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research, Genomic Sciences Program under award numbers DE-SC0020373 and DE-SC0022991 to Christof Meile. References [1] S.G. Boxer, M.L. Kraft, and P.K. Weber, Advances in imaging secondary ion mass spectrometry for biological samples. Annual Review of Biophysics, 2009. 38: p. 53-74. [2] A.E. Dekas, et al., Activity and interactions of methane seep microorganisms assessed by parallel transcription and FISH-NanoSIMS analyses. The ISME Journal, 2015. 10: p. 678–92. [3] L.G. Brown, A survey of image registration techniques. ACM Computing Surveys, 1992. 24(4): p. 325-76. [4] P.S. Heckbert, Fundamentals of texture mapping and image warping. MS Thesis in Electrical Engineering and Computer Science. 1989, University of California, Berkeley: Berkeley, CA. p. 88. [5] N. Arad and D. Reisfeld, Image warping using few anchor points and radial functions, In the Proceedings of Computer Graphics Forum. 1995, Blackwell Science Ltd. p. 35-46. [6] H. Hermessi, O. Mourali, and E. Zagrouba, Convolutional neural network-based multimodal image fusion via similarity learning in the shearlet domain. Neural Computing and Applications, 2018. 30(7): p. 2029-2045. [7] G. Haskins, U. Kruger, and P. Yan, Deep learning in medical image registration: a survey. Machine Vision and Applications, 2020. 31(1): p. 8. [8] A. Zampieri, et al., Multimodal image alignment through a multiscale chain of neural networks with application to remote sensing. In the Proceedings of the European Conference on Computer Vision (ECCV). 2018. [9] H. Zhang, et al., Registration of multimodal remote sensing image based on deep fully convolutional neural network. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019. 12(8): p. 3028-3042. [10] M. Zhang, W. Li, and Q. Du, Diverse region-based CNN for hyperspectral image classification. IEEE Transactions on Image Processing, 2018. 27(6): p. 2623-2634. [11] D. Han, Q. Liu, and W. Fan, A new image classification method using CNN transfer learning and web data augmentation. Expert Systems with Applications, 2018. 95: p. 43-56. [12] B. Kayalibay, G. Jensen, and P. van der Smagt, CNN-based segmentation of medical imaging data. arXiv preprint arXiv:1701.03056, 2017. [13] S. Bao and A.C. Chung, Multi-scale structured CNN with label consistency for brain MR image segmentation. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 2018. 6(1): p. 113-117. [14] H. Sokooti, et al., Nonrigid image registration using multi-scale 3D convolutional neural networks. In the Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017. Springer. [15] P. Jiang and J.A. Shackleford. CNN driven sparse multi-level b-spline image registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018. [16] H. Uzunova, et al., Training CNNs for image registration from few samples with model-based data augmentation. In the Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). 2017. Springer. [17] E. Ferrante, et al., On the adaptability of unsupervised CNN-based deformable image registration to unseen image domains. In the Proceedings of the International Workshop on Machine Learning in Medical Imaging. 2018. Springer. [18] J.A. Lee, et al., A deep step pattern representation for multimodal retinal image registration. In the Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019: p. 5077-5086. [19] J. Hu, et al., End-to-end multimodal image registration via reinforcement learning. Medical Image Analysis, 2021(68): p. 101878. [20] A. Hering, et al., CNN-based Lung CT Registration with Multiple Anatomical Constraints. Medical Image Analysis, 2021: p. 102139. [21] H.R. Boveiri, et al., Medical image registration using deep neural networks: A comprehensive review. Computers & Electrical Engineering, 2020. 87: p. 106767. [22] S. Belongie and J. Malik. Matching with Shape Contexts. In the 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries. 2000. Head Island, SC, USA. [23] S.E. McGlynn, et al., Single cell activity reveals direct electron transfer in methanotrophic consortia. Nature, 2015. 526(7574): p. 531-535. [24] L. Polerecky, et al., Look@NanoSIMS-a tool for the analysis of nanoSIMS data in environmental microbiology. Environmental Microbiology, 2012. 14(4): p. 1009-1023. [25] X. Zhang, et al., ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. [26] C. Szegedy, et al., Going deeper with convolutions. In the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. [27] K. He, et al., Deep residual learning for image recognition. In the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. [28] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition. In the Proceedings of the 3rd International Conference on Learning Representations. 2015: San Diego, CA, USA. [29] N. Otsu, A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics. 1979, 9(1), 62–66. [30] R. Jonker and A. Volgenant, A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing, 1987. 38(4): p. 325-340. [31] M.J.D. Powell, A thin plate spline method for mapping curves into curves in two dimensions. Computational Techniques and Applications (CTAC '95), 1995. [32] A. Goshtasby, Image registration by local approximation methods. Image and Vision Computing, 1988. 6: p. 255-261. [33] A. Goshtasby, Piecewise linear mapping functions for image registration. Pattern Recognition, 1986. 19: p. 459-466. [34] N. Pielawski, et al., CoMIR: Contrastive multimodal image representation for registration. In the Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020). 2021: Vancouver, Canada. [35] H. Bay, et al., SURF:Speeded Up Robust Features. Computer Vision and Image Understanding (CVIU), 2008. 110(3): p. 346–359. [36] P.F. Alcantarilla, A. Bartoli, and A.J. Davison, KAZE Features. In the Proceedings of the Computer Vision – ECCV 2012. Lecture Notes in Computer Science, A. Fitzgibbon, et al., Editors. 2012, Springer: Berlin, Heidelberg. p. 7577. [37] S. Leutenegger, M. Chli, and R. Siegwart, BRISK: Binary Robust Invariant Scalable Keypoints. In the Proceedings of the 2011 International Conference on Computer Vision. 2011. Barcelona, Spain. [38] C. Harris and M. Stephens, A Combined Corner and Edge Detector. In the Proceedings of the 4th Alvey Vision Conference, 1988: p. 147-151. [39] E. Rosten and T. Drummond, Fusing Points and Lines for High Performance Tracking. In the Proceedings of the IEEE International Conference on Computer Vision, 2005. 2: p. 1508– 1511. [40] Y. Bentoutou, et al., An automatic image registration for applications in remote sensing. IEEE Transactions on Geoscience and Remote Sensing, 2005. 43(9), pp.2127-2137. [41] Z. Wang, et al., Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 2004. 13(4), pp.600-612. [42] Z. Zhang, et al., A new image registration algorithm based on evidential reasoning. Sensors, 2019. 19(5): p. 1091. [43] A. Goshtasby, 2-D and 3-D Image Registration for Medical, Remote Sensing, and Industrial Applications. 2005: John Wiley & Sons. [44] N. Tajbakhsh, et al., Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE Transactions on Medical Imaging, 2016. 35(5): p. 1299-1312. [45] R. Sprengel, et al., Thin-plate spline approximation for image registration. In the Proceedings of the 18th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 1996. 3: p. 1190-1191. [46] B.W. Papież, et al., Non-local graph-based regularization for deformable image registration. In the Proceedings of the Medical Computer Vision and Bayesian and Graphical Models for Biomedical Imaging: MICCAI 2016 International Workshops, MCV and BAMBI, Athens, Greece, October 21, 2016, pp. 199-207. Springer International Publishing. [47] L. Tautz, et al., 2010, April. Phase-based non-rigid registration of myocardial perfusion MRI image sequences. In the Proceedings of the 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 516-519. IEEE.