Application of Deep Learning Techniques to Digital Holographic Microscopy for Numerical Reconstruction Raghavendra Vijayanagaram 1 3pc GmbH Neue Kommunikation, Berlin, Germany rvijayanagaram@3pc.de Abstract. Digital Holography is an emerging field of a new paradigm for general, as well as micro- scopic imaging applications. The goal of the project is to conduct experiments for the processing of digital holograms using deep neural networks as an approach to AI-based/semantic large scale and complex information and content analysis. With the usage of deep neural networks, this project de- velops new techniques for a more robust and performant processing of digital holograms, especially for the task of numerical hologram reconstruction. We aim to explore the semantic segmentation network models, and challenge to improve the perfor- mance of processing holograms. Among the most popular network models, the architecture called U-Net was chosen to achieve this task. In these works, several experiments were conducted applying this method to artificially generated holograms, with promising results. These positive results lead to the believe that a significant improvement could be made on real holographic data, that could lead to achieving a more robust hologram reconstruction. Keywords: Digital Holography · Deep Neural Network · U-Net. 1 Introduction 1.1 Digital Holography Hologram refers to an image that is created when the light diffracts when meeting an object. The first practical optical hologram was recorded of a 3-D object. Various types of Holograms, such as transmission, rainbow, reflection holograms have been developed. Digital Holography (DH) was developed as a new tool to record 3D images of objects such as microspheres, cells and other biological specimens. In addition to that, DH was also designed to capture the motion of objects in water and bacterias in cells. One of the differences between a hologram and a normal image is, that in an image record only the light intensity is captured, while a hologram record captures both inten- sity and phase of a light field. Holograms contain a 3D image structure of the object. This means that the reconstructed image is visible from different viewpoints. In general, holography is based on the physical properties of light propagation, most importantly the diffraction of light and the interference of light waves. Digital Holography is a well-known technique to obtain volumetric information about a sample of interest e.g. particles in the solution of water processing. Beside a simpler setup, DH can be performed at video rate, which also imposes other challenges such as the processing of much more data, hence an automatic system is of high-value [1]. To Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 1 Master Thesis, March 08, 2019. A Thesis submitted in partial fulfilment of the requirements for the Degree of Master of Information Engineering (MIE) at University of Applied Sciences, Kiel, Germany. recreate the object, the hologram is recorded digitally and reconstructed. To perform this, the amplitude and phase of the object waves are computed based on the diffraction theory of light. This derivation of the wave front in a particular plane is called numerical reconstruction. 1.2 Project Motivation For the automatic processing of holograms in DH, two main tasks are of interest: finding the right reconstruction distance and performing the numerical reconstruction for a given hologram. Usually, both tasks are performed separately and may result in very long processing times. In the first phase, it has to be evaluated if these two tasks can be performed by a dedicated deep neural net. If each problem can be solved by a neural net, the idea is to solve both problems with a single pass through a single neural network. The main goal of the project is: – Detection of particle in streaming water. – To explore if deep neural nets can be used to process digital holograms. – To evaluate if the usage of U-Net will speed-up the process of holography and increase overall performance and accuracy. We proposed a modified U-net model which can improve the performance of a wide range of images. We showed promising results on both computer-generated and real holo- gram datasets. Figure 1 shows the vision of the project. Fig. 1: Project Vision 2 Background 2.1 Loss Functions In any learning network model, an error of fit is estimated to decide whether the model is a good fit or not. The error is, the difference between ground truth g and the predicted output p. n 1X L(x) = (gi − pi ) (1) n i=1 2 The function which is used to evaluate error is called loss function L(x). There are dif- ferent loss functions applied for different types of tasks. Among which the best fitted loss functions for the tasks will be identified. It is a backbone of the algorithm to train the network. The goodness of the model depends on the range of error in the network, which is calculated by loss functions. In this task, the loss is evaluated from the image pixels, which are compared be- tween ground truth object and predicted the output. Currently, for the proposed network model, which is working on different datasets: artificially generated holograms and real holographic data, there are few interesting loss functions. Those are Cross Entropy, Huber Loss, Mean Squared Error and Mean Pairwise Squared Error. Cross Entropy: It measures the performance of a model, whose outputs are a prob- ability value between 0 and 1. This loss function is also called log loss or sigmoid cross entropy because it applies with the sigmoid activation function. The formulation of loss is defined as: 1X C=− [glog(p) + (1 − g)log(1 − p)] (2) n x where n = total number of items, x = sum of training inputs, g = ground truth, p = prediction. Huber Loss: It is used in robust regression problems. It inherits the properties from two popular different loss functions: Mean Absolute Error (MAE) or L1 loss and Mean Square Error (MSE) or L2 loss [2]. For minimizing the loss, if the error is small then the loss inherits the properties of MSE or if its high it takes the MAE. This especially leads to robustness to outliers when the estimated data is very noisy. The Huber loss is defined as follows: ( 1 2 (g − p)2 if |(g − p)| < δ L(a) = (3) δ(g − p) − 12 δ Otherwise where g = Ground Truth, p = Prediction output. Mean Pairwise Squared Error(MPSE): MPSE is similar to Mean Square Error (MSE) [2]. It measure the difference between pairs of corresponding elements predictions and ground truth values. The formulation of MPSE is n 1X L(a) = |p(xi ) − g(xi )|2 (4) n i=1 Where n is the number of pixels in the ground truth, p(x) is the prediction output, g(x) is the ground truth. 2.2 U-Net Architecture U-Net stands for Unity Networking. U-Net is one of the most popular end-to-end Auto- encoder networks for semantic segmentation. It was developed and first used by R. Fischer for biomedical image segmentation at the Computer Science Department of the University of Freiburg, Germany [3]. Semantic Segmentation is the partition of an image into different 3 regions. For example, classifying each pixel that belongs to a person, place, object or any entity in the dataset. Auto-encoder is a type of neural network which is used to learn efficient data encoding. The main goal of this network is to learn to compress the data from the input layer, into a short code manner and then to compress it into a size that is similar to the original image used as input. Encoder gradually decreases the spatial dimensions of the input image, and the decoder gradually recovers the object information and its spatial dimensions of the image. For this decoding, the recovered objects information, have shortcut connections. It is purely based on traditional convolutional neural networks. In this network, the main advantage is when upsampling will be, to learn deeper and concatenating the feature resolutions from the down part. 3 State of the Art In the book, Digital Holography Pascal Picart and Jun-chang Li state the fundamentals of holography [4]. They explain the process of digital holography and wave propagation theory. A hologram contains encoded 3-D information of sample particles, which are obtained with the help of interference of light. A drawback of this approach is to decode an image from holograms, phase recovery and amplitude, these steps are necessary for the reconstruction process. Furthermore, the distance of the object from the sensor is evaluated. Standard methods to recalculate the image with the distance generate high amounts of images. These images need to be analyzed and a decision regarding which of the objects will be reconstructed needs to be made. This whole process is very time- consuming. In the article Practical algorithms for simulation and reconstruction of digital in-line holograms Tatiana Latzchevskaia and Hans-Werner Fink [5] describe the methods for simulation and reconstruction of in-line holograms, which are recorded with the plane and spherical waves. In holograms specifically, the optimal parameters related to distances, sampling rates, and other major factors, help to reconstruct the holograms, which are easily evaluated. They showed some results based on numerical procedures which are helpful for the reconstruction of recorded holograms. In our research work, we observed the numerical procedures to reconstruct the recorded holograms on our task. In his article, Yair Riverson et.al.[6] implemented an approach for phase recovery and holographic image reconstruction using deep neural nets. This article demonstrates the application of neural networks for learning phase recovery, with holographic images after training. A Convolutional Neural Networks (CNN) performs the auto-focusing sample object image and extends the depth of field (DOF) method to get the image reconstruc- tion. In the first step of this auto-focusing process, the neural network can train on the data. The setup of holograms is specified by particular distances of the image set-up environment. As a result, the image reconstruction quality is bad. Later deep learning- based holographic image reconstruction method performs the auto-focusing and image phase recovery, using the sample hologram intensity. This approach is called Holographic Imaging using Deep Learning for extended Focus (HIDEF). Olaf Ronneberger and his team at the University of Freiburg implemented a new approach especially applicable for reconstructing the medical images of objects which are smaller than 50 micro millimeters [3]. They proposed a new method called a sliding- 4 window convolutional network for semantic segmentation problems. U-Net is based on convolutional architecture for fast and precise segmentation of images. The proposed architecture contains the two different paths: contracting path that is used for capturing the context and expanding path that enables the precise localization of the object. This semantic segmentation network is very useful for our hologram reconstruction task. 4 Experiments 4.1 Description of Dataset Data preparation is an important step in any learning network model. The quality of input data strongly influences the results which are produced by the model. As part of the baseline System, we used holograms that are artificially generated by a computer. These holograms are generated for three simple object shapes rectangle, triangle, and star. Each hologram was generated with varying rotations and object sizes and for a broad range of propagation distances. The figure 2 shows the generated hologram samples. (a) Hologram of a Triangle (b) Hologram of a Star (c) Hologram of a Rectangle Fig. 2: Generated Holograms of Triangle, Star, Rectangle. After successfully training the baseline systems, We used two different datasets diffrac- tion and real diffraction datasets for training real holograms. Both datasets contain gen- erated holograms, however, they are based on real recorded holograms. The two reasons for using generated holograms are: it was not possible to record and especially annotate a huge amount of holograms, and data was needed which was much closer to real recorded holograms for conducting experiments. 4.2 More Realistic Data The diffraction dataset was used to train the model in the first experiments. As men- tioned above, this dataset consists of generated holograms based on recordings of real holograms. The process starts with the input of a given real hologram, cropped to a fixed region of 256x256 pixels. Afterward, a reconstruction step is performed with the help of a standard reconstruction algorithm so-called Fresnel algorithm and a binary mask is 5 applied to the reconstructed image. The binary mask marks are region containing the object of interest. Using the binary mask, the object is cropped which is done to remove all the background noise hence then left out only with object shapes. Afterward, various Fig. 3: Data Generating Process geometrical transformations are performed to the object e.g. rotations, translations. The transformed input is fed into the hologram generator which will generate holograms for lots of different propagation distances. This process results in a large dataset of hologram images and their respective ground truths. The total process is depicted in the figure 3. From the first set of experiments gave us good results on the training set, the test set as well as the blind test sets. The results of real holograms were still not satisfying. That is why we implemented different strategies to solve this: we applied different levels of noise to the dataset to see if this helps to generalize on real holograms and to improve the hologram generation of even more realistic holograms. The holograms are similar to the previous dataset with an important difference: did not remove the background noise after the reconstruction step, hence then the generated holograms will contain interference patterns from several noise sources like interference from other particles, sensor noise or dust particles in the background. This limits the number of image transformations which could be applied and therefore reduces the amount of generated data. The ground truth of this set still contains the target objects without the noise background. It means the goal is to try to learn a model which is implicitly removes the noise during the hologram reconstruction. This dataset consists of 30100 hologram images. A few sample images are illustrated below in 4. 6 Fig. 4: Real Diffraction dataset example holograms 4.3 Evaluation Datasets The trained models were evaluated using several evaluation sets. The sets represent dif- ferent levels of difficulty. The list of evaluation dataset are as follows: – Training Set - This is the simplest set which includes the training sets, depending on the experiment. This set will always be similar to the training set. – Test set - This is the normal set from the training corpus, this is as usual a subset of the training set, which was not used for the model training or weight updates but only for testing. This set does not contain the same images as found in training, they will be still very similar because small steps in the propagation distance used during the hologram generation will lead to the very small difference from one image to the next. – Blind test - This set contains the same training datasets but with different trans- formations e.g. rotations which were not part of the training. This means, still the same objects as used in the training model but the objects have different rotation. – Blind test of another dataset - These sets are even more difficult, it contains holograms of objects that are not part of the training corpus e.g. if the model was trained on the diffraction corpus, then the evaluation was done on real diffraction set. They also contain different translations of objects. – Real holograms blind test - The most difficult evaluation data for the network will always be the real holograms test set. It contains the unaltered holograms as they are recorded by the in-line holography system. It is also important to note that this subset does not contain any holograms of the particles which were used for hologram generation. 5 Implementation 5.1 Building Input Pipeline The first part of the experiment is to prepare the holograms for the input pipeline. Handling this large amount of data is quite a challenging job. In general, the baseline system, we used so far loading the artificial holograms and preprocessing the holograms was done with external libraries and then into the model with the feed dict method. 7 Tensorflow introduced the new version of Dataset API in an easy way to create a simple record-oriented format which is popular for handling large data called TFRecords [7]. TFRecord is nothing but a binary file format or TensorFlow file. In a layman definition, when to train a deep network, the input data has two ways to feed the network model. Loading the data with pure naive code and feed it into the computational graph which we used in our baseline system. The other way is to use an input pipeline that takes the list of filenames, shuffles them, creates a file queue and decodes the data. This can be done with a TFRecord which was used in our current model. The advantage of using this type of file format is, it takes less space on disk, takes less time to copy and can be read from disk much more efficiently. 5.2 Model Definition Figure 5 illustrates the current model architecture which is used. We conducted all the experiments with this model as default and in between updated the hyperparameters. Fig. 5: U-Net Model Architecture The experiments start with preparing the input pipeline for passing the data into the model. In the above figure, the right-hand side colored boxes represent the downsampling or contracting path and left-hand side colored boxes are upsampling or extracting path. These blocks are joined together with grey arrows marks are addressed by skip connec- tions. The skip-connection is a technique that creates a shortcut to ignore the unnecessary next layers. The feature map is passed through the two types of connections: the con- tinuous connection and the skip connection or shortcut. The adjacently previous layers as continuous connections and the connection that skips the layers are skip-connections. There is a ReLU activation function and batch normalization are succeeding the above each convolutional layer. The feature maps are afterward fed to the next block which is similar to the first block. The second block convolves and down-sampled from 32 feature maps with the size of 128x128 feature maps with the size of 64x64. Furthermore, the feature maps fed to the five consecutive blocks with the same configuration of parameters, which are 256 kernels with the size of 3x3. The output of the seventh down-sampling block which is the last 8 Fig. 6: Schematic representation of downsampling process block of this part is 256 feature maps with a size of 4x4. The schematic representation of downsampling was shown in figure 6. In the Up-sampling or extraction path, the Eighth block convolves and up-sampled from 256 feature maps of the 4x4 size to 256 feature maps with the size of 8x8. This block of layer contains the transposed convolution layer with 256 kernels with the size of 3x3 slicing 2 pixels for each step. The decoding block of layers contains the concatenation operation with the feature maps from the preceding layers through skip connections. After that, it combined feature maps followed by 3 regular convolutional layers with ReLU as activation function, batch normalization, and dropout. The schematic representation of upsampling was shown in the figure 7. Fig. 7: Schematic representation of upsmapling process Furthermore, the feature maps are fed to the five consecutive Up-sampling blocks with the same configuration of parameters, which are 32 kernels with a size of 3x3. The output of the final block is a single feature map with the size of 256x256. 9 6 Results and Analysis The model was trained on difficult datasets using an NVIDIA 12 GB Graphical Processing Unit. The default network settings are as follow: – Used Optimizer Adam – A learning rate of 0.0001 – image size of 256x256 pixels – Used training data is shuffled randomly 6.1 Base Line Results The U-Net model was trained by Adam optimizer with an initial learning rate of 0.001. The model was learned with the number of epochs is 200 with the size of 64 training holograms. The criterion of evaluation is Sigmoid cross entropy. The best results come from holograms with very small distance 0.0001 millimeter, while the worst outputs were generated by a large distance of 0.0600 millimeter. Different rotations didn’t affect the result. The loss curve of the training process is depicted in figure 8. The loss curve decreases over the training epochs via Sigmoid Cross Entropy Loss function. Fig. 8: Loss curve Figure 9 illustrated the heat-map of U-Net with three shapes of objects. In the figure, the two curves of the rectangle and triangle were decreasing when the distance is increased, except the star object curve. The loss curve of star increased by small distances, but nearly leveled off in the large distances. Based on these results, we achieved the state of art deep neural methods, with promis- ing results. These positive results let us believe that a significant improvement could be made of real holographic data. 10 Fig. 9: Comparing with objects loss curve Exp. Loss Function Activation Function Experimental Observation Id 1 Sigmoid Cross Entropy Linear Function The training and predicted results are blurred 2 Sigmoid Cross Entropy Linear Function with the Images are blank subset of dataset 3 Mean Squared Error Linear Function with the Training images are getting better but eval- subset of dataset uation images are still blank 4 Mean Squared Error Linear Trained images are better with full set, still validation images are not good Table 1: List of Conducted Experiments 11 The first experiments were trained using the baseline architecture without changing any parameters. The idea was, to see if the baseline model can be using the real holo- graphic dataset. The diffraction dataset, which continues to generated holograms based on real objects. After the training, the predicted results are blurred images. The rea- son for this might increase the complexity of data variations, which network was not able to learn. In the table 1 the list of each experiment were conducted with different hyperparameters and the results are not expected. 6.2 Improving generalization on unknown data After the model could be applied to the generated dataset, the next experiments aimed at improving performance on more difficult data, like rotation or object shapes not part of the training set. In other words, the goal was to test the generalization capabilities of the model. Therefore, in addition to the default network training settings different loss functions: Mean Pairwise Squared Error, Huber Loss and Mean Squared Error in com- bination with the activation functions: linear and sigmoid were tested. The maximum training time for each experiment was 48 hours. The experiments which were conducted during the period time were shown in the table 2, the first column represents the ex- periment id, next columns are used loss function, data preprocessing methods, different datasets, output activation functions and results of each experiment. Exp. Loss Function Activation + Data Augmenta- Experimental Observations Id tion 5 Mean Pairwise Sigmoid Activation The training corpus curve is decreasing but Squared Error evaluation curve in tensorboard fluctuates. 6 Huber Loss Sigmoid Activation The Reconstructed images are white back- ground. The training was erroneous. 7 Mean Pairwise Sigmoid Activation It shows better performance but the real holo- Squared Error gram set images are predicted white circle im- ages instead of circle objects 8 Mean Pairwise Sigmoid Activation + Added Noise Both training and evaluation curves are de- Squared Error creasing. But some images are slightly pre- dicted erroneous 9 Mean Pairwise Sigmoid Activation + Added The trained model was predicted well and Squared Error Noise+ Random Standard devia- showing good results tion Table 2: List of Conducted Experiments with Diffraction. 7 Interesting Observations 7.1 Using more realistic training data The previous experiment showed a good performance of the model on generated data, for different levels of difficulty. However, the model did not improve substantially for real holograms. Therefore, in the final experiments, the real generated hologram dataset was used, which resembles a more realistic holograms. The goal was to check if this kind of data will help improving on real holograms. For this again different combinations of loss and activation functions were used. The list of conducted experiments with this dataset is shown in table 3. 12 Experiment Loss Function Activation + Data Augmentation Id 10 Mean Pairwise Squared Error Sigmoid Activation + (Added Noise + Random deviation + Image Rotation) 11 Mean Pairwise Squared Error Sigmoid Activation + (Without Noise + Image Rotation) 12 Huber Loss Sigmoid Activation + (Added Noise + Random deviation + Image Rotation) Table 3: List of Conducted Experiments with Real Diffraction. 7.2 Exp.Id- 10 Quantitative Analysis: The Analysis of this experiment was impressed. The results were improved in terms of both training and evaluation. The loss curve is depicted in the figure 10. During the training corpus, the curve is slightly fluctuating but in a decreasing order. Even the evaluation test which is a different level of difficulty levels set results were shown the good to compare previous steps and leveled off with the training corpus curve. The blind test set of real holograms was showed better than previous and the curve goes in decreasing point but in times the curve rose when the steps are large. Fig. 10: Experiment Id:10 Training Loss 7.3 Exp.Id- 11 Quantitative Analysis: The evaluation loss curve was leveled off with different blind datasets including the real holograms test set, in a small range of values. The loss curve decreases over the training 13 steps were shown in figure 11. In the loss curves, the light blue color curve is the real hologram blind test set. It showed the best results among the previous experiments. It clearly showed that the experiment was a great success. Fig. 11: Experiment Id:11 Training Loss 7.4 Exp.Id- 12 Quantitative Analysis: During the training corpus and evaluation, blind test sets were predicted great and de- creasing enormously. In previous experiments the similar configuration without noise and augmentation operation was used, it predicts the blank images. But for this, it predicts good images due to the reason for applied noise and augmentation. We used this because to avoid the overfitting problem. This quite an interesting finding for this experiment. The evaluation loss curves were showed similar to the previous one but not accurate. The curve fluctuates when it reaches the large steps and predicts the overfitting. 8 Conclusion The proposed network architecture U-Net has proved to be a suitable approach for recon- structing real recorded holograms. The network design is highly adaptable for solving the image translation problems, especially recorded holograms, regardless of ground truth. During the experimental process, different loss functions were used because the architec- ture had faced problems with reconstructing the ground truth images, particularly for real holograms. In the end, the U-Net has shown the best performance using the Mean Pairwise Squared Error Loss Function combined with the Sigmoid Activation Functions. 14 In previous work, U-Net was successfully applied with several medical images, but not on generated images of holographic microscopy. We applied this network to reconstruct objects from real holograms and showed this is an additional way to utilize the bene- fits of architecture. Furthermore, we modified the network by adding data augmentation and changing hyperparameters, which leads to an increase in its performance. The up- dated network was evaluated for several difficulty levels of different test sets, to test the generalization capability. The entire holographic reconstruction process from the U-Net architecture was dependent on the four major steps outlined in this experimental setup: datasets, input pipeline, training and evaluating observations. References [1] Myung K. Kim. “Principles and Techniques of digital holographic Microscopy”. In: CoRR 21 (2011). http://faculty.cas.usf.edu/mkkim/papers.pdf, pp. 85–91. [2] “Visualization of some loss functions for Deep Learning with Tensorflow”. In: (2018 (Accessed: February, 2019) @Online). https://mc.ai/visualization-of-some-loss-functions- for-deep-learning-with-tensorflow/. [3] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. “U-Net: Convolutional Net- works for Biomedical Image Segmentation”. In: CoRR abs/1505.04597 (2015). https: //arxiv.org/pdf/1505.04597.pdf. [4] Pascal Picart and Jun-chang Li. Digital Holography. ISTE Ltd and John Wiley Sons, Inc., 2012. isbn: 978-1-84821-344-9. [5] Tatiana Latychevskaia, Loı, and Hans-Werner Fink. “Practical algorithms for simu- lation and reconstruction of digital in-line holograms”. In: CoRR abs/1412.23674v7 (2016). https://arxiv.org/pdf/1412.3674.pdf. [6] Yair Rivenson et al. “Phase recovery and holographic image reconstruction using deep learning in neural networks”. In: CoRR abs/1705.04286 (2017). http://arxiv. org/abs/1705.04286. [7] Daniil Pakhomov. “Tfrecords Guide”. In: (2016 (Accessed: February, 2019)@Online). http : / / warmspringwinds . github . io / tensorflow / tf - slim / 2016 / 12 / 21 / tfrecords-guide. 15