Embryo cell detection using regions with convolutional neural networks Arnoldas Matusevičius1, Darius Dirvanauskas1, Rytis Maskeliūnas1, Vidas Raudonis2 1 Multimedia Engineering Department, Kaunas University of Technology, Kaunas, Lithuania email: arnoldas.matusevicius@ktu.edu, darius.dirvanauskas@ktu.edu 2 Department of Automation, Kaunas University of Technology, Kaunas, Lithuania email: arnoldas.matusevicius@ktu.edu, darius.dirvanauskas@ktu.edu Abstract—this research provide approach for embryo results shows that CNN can be used to provide state-of-the-art cell detection from images based on convolutional neural cell counting, also detect overlapping cells. networks. Deep neural network used for experiment consist of 15 layers and is trained using GPU for calculations. In II. GENERAL DESCRIPTION OF R-CNN METHOD research training data set size impact to model training Object detection system R-CNN can be divided into three duration is identified. R-CNN model embryo cell detection main modules. The first module is responsible for generation results are compared to human expert labeling data to evaluate its precision. category-independent region proposals [1]. Proposals are used to define the set of possible candidates available to the detector. Keywords—Machine vision; Object recognition; The second one - convolutional network is used to extract a Supervised learning fixed-length feature vector from each region. The last one module is a set of class-specific linear SVMs. I. INTRODUCTION According to the Pan and Yang taxonomy [2] Regional – Identify and count objects in an image or sequence of images Convolutional Neural Networks training is substantiated on is challenging computer vision problem, which can be found in learning inductive transfer. For correct R-CNN train, first step many applications and systems, ranging from traffic monitoring is to classify ImageNet as dataset and source task. Second step to biological research. This paper is focused on biological is network training using supervision, after that network is research embryo cells detection. However, developed transferred to the target task and dataset using supervised fine- methodology can be used in numerous medicine procedures tuning. At first look, this methodology is related to traditional that requires counting and detection, such as red or white blood multi-task learning [3], [4]. However, this training is except for cells count for patient’s health, clinical pathology or cell the task sequentially and furthermore, are only based on concentration investigation. performing well on the specific target task. Manual embryo cell detection is very monotonous and time- Donahue et al. [5] also mentioned CNNs learning using consuming work that is prone to errors. According to this supervised transfer in work. They state that once trained on automating, the detection process has many benefits, such as ImageNet, further it can be treated as a black box feature reducing time consumption, minimizing errors possibility and extractor. This method is suitable for recognition with scene cost. In addition, it is improving consistency of results between classification and domain adaptation. One more author individuals and clinics. Our goal is to simplify the task and Hoffman et al. [6] states transfer learning for R-CNN training improve its robustness. is right choice and can be used for image – level label classes, One of the difficulties is to count non-stained cells in dark but not for bounding – box training data. images, because of constraints, such as the light intensity, Regions with Convolutional Neural Networks consists of transparency or exposure time. All these factors cause image two sibling output layers. The first one is used for discrete quality and result in faint cell boundaries. One more challenge probability distribution, p = (p0,...,pK), over K + 1 categories. is that embryos cells has wide variability in appearance and Always parameter p is computed by a softmax over outputs of shape. Furthermore, every embryo grows in different individual layer. The second one layer outputs bounding – box regression manner, there cells overlap each over. Also between cells could offsets, tk = ( ) given in [7], tk is used to specify a be found extracellular material hand crafted algorithms. scale – invariant translation and log – space height and width In this paper, we develop a Convolutional Regression shift relative to an object proposal [8]. Networks (CNN) approach for regression of density map. Our R-CNN regions of interest training is labeled with u (ground main goal is to automatically detect and count the number of – truth class) and v (ground – truth bounding – box regression human cells in developing embryos. In addition, experimental Copyright © 2017 held by the authors 89 target). Then multitask loss L for each RoI classification (1). (1) Here Lcls is equal to log loss for true class Lcls(p,u) = - log pu the hyper – parameter is used to control the balance between the two task losses. For normalization the ground – truth regression targets vi is equal to zero mean and unit variance. In the second task is defined loss over a tuple of true bounding – box regression targets Lloc for class u and v and is equal to v = (vx,vy,vw,vh), predicted tuple for class u. For background regions of interest there is not used notion of ground – truth bounding box also Lloc is not involved. In this case for bounding – box regression is used (2) expression. (2) There smoothL equal (3). (3) If the regression targets are unbounded, there is probability, that training with L2 loss can require tuning of learning rates to prevent gradients explosion. III. THE ALGORITHM USED IN THE STUDY Deep learning is package of different methods used in machine learning which attempts to present detail features in multiple-layer structure data. R-CNN is one of the most effective learning techniques and is able to minimize learnable parameters significantly by using the same basis function across different image locations. Fig. 1 The workflow of the embryo cell detection framework. In this research, we suggest an automatic learning based cell detection framework, which is suitable for 3D and 2D As well, there are more algorithms used for embryo cell microscopy images. This framework can be used, for the detection in machine vision and medical image analysis areas efficiency and accuracy improvement of training a CNN from [12, 13]. Nonetheless, most of these automated three- larger size images, an SVM classifier is applied to detect cell dimensional cell detection methods are not a suitable for regions for collecting the CNN training set [9]. manual cell detection [13]. There are two main types of cell The exposure time range is dynamical and may not be equal detection algorithms. The first one is based on segmentation or for each session of recording through the light microscope, thresholding [14] and different software implementations according to this the color of each stack may be different. Also appeared including various plugins as “ImageJ” [12] and the in this research we apply Image Intensity Standardization (IIS), “FARSIGHT” toolkit [15]. The second type is feature or which was considered in [10] the main advantage is intensity modeling based methods [16, 17]. Due to machine learning normalization of 2D grayscale images. According to Bogunovic techniques development, capabilities of cell detection based on [11] after some modifications IIS algorithm is suitable for learning are increased. Also, for two-dimensional normalizing the intensity of the three-dimensional grey scale immunohistochemistry images there are learning based on cell Rotational Angiography. Furthermore, we use the original detection methods [18, 19]. However, there is not universal Intensity Standardization as a color normalization method for automatic cell detection method for microscopy images. 3D microscopy images. After that, calculation is performed of In this research, cell regions R are determined to discard the three histograms of the three channels of the whole RGB stack irrelevant background regions. Selecting background patches is first. Also, the stack histogram of every used channel is aligned important for training a CNN. Wherefore, cell regions detection to the corresponding reference based on the non-linear is more efficient and rough using an SVM classifier, after that registration method described in [10]. Algorithm of this cell and background training patches are gathered from R operation is shown in Fig. 1. instead of the whole stack. The Support Vector Machine (SVM) detector is used for cell region detection and for collecting CNN training patches, 90 which are used to remove large part of background pixels. This For experiment there was randomly selected thousand part of process is like feature selection pre-process. In our case, embryo photos Fig. 3 which was labeled by human expert. To accuracy of CNN could be increased using cell detection evaluate training data set size impact to detection precision samples in the cell region. Similarly, in the test case, to increase there was trained 14 R-CNN networks with different size accuracy, in first step we apply the SVM detector to identify training data set. Data set size for training was increased from those regions. In the training mode of conventional CNN, the 5% to 70% with 5% steps. 30% of data set was used to evaluate cell samples are the same, however the non-cell samples are network cell detection precision. To decrease training time different. there was used pre trained CIFAR-10 network. Pre trained Then the cell region R is detected using SVM-RGB network biases and neuron weights there adjust to detect Histogram detector, second step is to extract cell and patches in embryos cells in photos. region R from all test stacks for training CNN which is also the same size of patches and neighborhood. Pixels in the cell region R have almost same colors. According to this color feature in the cell region is not reliable for distinguishing cell and background patches. On purpose to decrease time range for training all RGB patches are transferred into the YUV color space and only the Y channel patches are needed. Every Y- channel cell patch, is rotated 0, 90, 180, 270 degrees to ensure the detector rotation invariant and increase the amount of cell samples. Also there is probability that cell and background patches can have overlapping pixels. This is useful for increasing the probability of correct cell detection. Approximately half million cell patches are extracted from all training stacks, and the same amount of background patches from the cell region R. After the last step, max-pooling CNN is ready for testing on the test stacks. The cell region is detected by the SVM RGB Fig. 3 Embryo images Histogram detector for each frame of every stack in the dataset used for testing. After that, the pre-trained CNN is used for identifying embryo cells by scanning each pixel in region and Training data set size impact to training time is linear and it every pixel is given a probability value P. can be seen in Fig. 4. Training duration using biggest training dataset with 700 images was 38 minutes 40 seconds. IV. R-CNN TRAINING Experiment was done using MATLAB 2016b software in a personal computer with i5-4570 CPU clocked at 3.2 GHz, 8 GB memory 64-bit operating system and video card GeForce GTX 650 Ti. Training process was done with GPU processor instead of CPU to accelerate training procedure. We train the R-CNN network demonstrated in Fig. 2. It consist of 1 input layer, 13 hidden layers (convolutional, Relu, Max Pooling, Fully Connected, Softmax) and classification output layer. Training run for 100 epoch, with base learning rate of 0.001 and Stochastic Gradient Descent training method. Fig. 4 Neural network training time V. SIMULATION RESULTS Trained R-CNN network was tested with new, do not used at training process, 300 embryo images Fig. 5. Predicted Fig. 2 The outline of the convolutional neural network architecture. embryo position and size was compared with human expert labeled embryo size and position results. 91 From Fig. 6 it is seen that best results got with 20% and 70% size of training data set where models results compared with human expert gives 11.92% mean square error. It shows that not only training data set size impacts model accuracy, but images distribution in training data set influence model prediction accuracy. Fig. 5 Detected embryo cells After comparing specialist data labeling results with deep neural network result, received size mean squared error and standard deviation presented at Table. 1. Some trained neural networks do not detected one or two embryos cell at images. Models with 30% or bigger size training dataset detected all embryos in images. Fig. 6 Detected cell size error TABLE I. PREDICTED SIZE RESULTS Comparing model position predicting results with human expertise prediction from Table 2 it seen that error rate is Training Mean square Standard Undetected smaller than size error rate. Smallest mean square error rate got data set error, % deviation, embryos size % using model trained with 30%, 40% and 65% training data set size. Close error rate got using 25% training data set, but this 5% 20,73 13,86 1 model do not detect one embryo cell. 10% 17,32 11,08 2 TABLE II. PREDICTED POSITION RESULTS Training Mean square Standard Undetected 15% 13,82 8,74 0 data set error, % deviation, embryos size % 20% 11,55 7,54 1 5% 6,05 2,55 1 25% 15,79 10,83 1 10% 5,59 2,42 2 30% 13,48 8,53 0 15% 5,45 2,76 0 35% 13,43 8,87 0 20% 5,29 2,58 1 40% 14,67 8,41 0 25% 4,64 2,07 1 45% 12,40 7,88 0 30% 4,64 2,17 0 50% 12,57 7,77 0 35% 6,82 2,92 0 55% 17,60 9,31 0 40% 4,63 2,28 0 60% 18,32 8,99 0 45% 5,29 2,5 0 65% 12,15 7,65 0 50% 5,06 2,25 0 70% 11,92 7,18 0 55% 5,42 2,31 0 92 60% 5,35 2,35 0 REFERENCES [1] Region-based Convolutional Networks for Accurate Object 65% 4,62 2,18 0 Detection and Segmentation. [2] S. J. Pan, Q. Yang, “A survey on transfer learning,” TPAMI, 2010. [3] R. Caruana, “Multitask learning: A knowledge-based source of 70% 5,68 2,49 0 inductive bias,” in ICML, 1993. [4] S. Thrun, “Is learning the n-th thing any easier than learning the first?” NIPS, 1996. At Fig. 7 it seen whole error distribution. Inaccuracies [5] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, T. appears using model with 35% training data set. That means Darrell, “DeCAF: A Deep Convolutional Activation Feature for few images could distort model parameters and decrease its Generic Visual Recognition,” in ICML, 2014. accuracy. [6] J. Hoffman, S. Guadarrama, E. Tzeng, J. Donahue, R. Girshick, T. Darrell, K. Saenko, “From large-scale object classifiers to large- scale object detectors: An adaptation approach,” in NIPS, 2014. [7] R. Girshick, J. Donahue, T. Darrell, J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” In CVPR, 2014. 1 [8] Ross Girshick Microsoft Research, “Fast R-CNN” 27 Sep. 2015 [9] B. Dong, L. Shao, M. Da Costa, O. Bandmann, A. F. Frangi, “Deep Learning for Automatic Cell Detection in Wide-Field Microscopy Zebrafish Images,” IEEE, pp. 772-776, 2015. [10] L. G. Nyu, J. K. Udupa, “On standardizing the MR image intensity scale,” Magnetic Resonance in Medicine, vol. 42, pp. 1072–1081, 1999. [11] H. Bogunovic, J. M. Pozo, M. C. Villa-Uriol, C. B. Majoie, R. van der Berg, H. A. Gramata van Andel, J. M. Macho, J. Blasco, L. S. Roman, A. F. Frangi, “Automated segmentation of cerebral vasculature with aneurysms in 3DRA and TOFMRA using geodesic active regions: An evaluation study,” Medical Physics, vol. 38(1), pp. 210-222, 2011. [12] M. D. Abramoff, P. J. Magalhaes, S. J. Ram, “Image processing with ImageJ,” Biophotonics International, vol. 11(7), pp. 36-43, 2004. [13] C. Schmitz, B. S. Eastwood, S. J. Tappan, J. R. Glaser, D. A. Peterson, P. R. Hof, “Current automated 3D cell detection methods Fig. 7 Detected cell position error are not a suitable replacement for manual stereologic cell counting,” Frontiers in Neuroanatomy, vol. 8, 2014. [14] Y. Al-Kofahi, W. Lassoued, W. Lee, B. Roysam, “Improved VI. CONCLUSIONS automatic detection and segmentation of cell nuclei in histopathology images,” Biomedical Engineering, IEEE From experiment results it is possible to confirm that deep Transactions on, vol. 57(4), pp. 841-852, 2010. neural network training time is linearly dependent to training [15] G. Lin, M. K. Chawla, “A multi-model approach to simultaneous data set size. After detected region size comparison with human segmentation and classification of heterogeneous populations of cell expertise prediction best result with mean square error rate nuclei in 3D confocal microscope images,” Cytometry Part A, vol. 71(9), pp. 724-736, 2007. 11.92% without any undetected embryos cell got using biggest [16] M. K. K. Niazi, A. A. Satoskar, M. N. Gurcan, “An automated 70% training data set size. More precise result got comparing method for counting cytotoxic T-cells from CD8 stained images of embryos cell position. Smallest error 4.62% got using 65% renal biopsies,” in SPIE Medical Imaging, vol. 8676, 2013. training data set size. This shows that offered model better [17] S. Wienert, D. Heim, K. Saeger, “Detection and segmentation of cell nuclei in virtual microscopy images: a minimum-model approach,” works for position prediction. Scientific reports, vol. 2, 2012. [18] T. Chen, C. Chefdhotel, “Deep learning based automatic immune cell detection for immunohistochemistry images,” in Machine Learning in Medical Imaging, pp. 17-24, 2014. 93