Image Processing, Geoinformatics and Information Security INJURED LUNG VOLUME ESTIMATION ON CT DATA USING LINEAR METRICS N.A. Smelkina1, A.V. Kolsanov2, S.S. Chaplygin2, P.M. Zelter2, A.G. Khramov1, A.V. Nikonorov1 1 Samara National Research University, Samara, Russia 2 Samara State Medical University, Samara, Russia Abstract. The article proposes a method for measuring the volume of the in- jured regions of human lung tissue on CT scan based on the volume of healthy lung tissue subtraction from the total lung volume in the absence of pathology. The total lung volume is calculated on the base of the regression model using a combination of linear metrics produced manually. The volume of healthy lung tissue is determined using adaptive thresholding. Experimental studies have shown that the error of the proposed technique does not exceed 10% in compar- ison with ground true evaluation made by expert. This technique is useful for the case of large injury, when accurate lung segmentation is impossible. Keywords: diagnostic images, the image linear metrics, image processing, data mining, linear regression model Citation: Smelkina NA, Kolsanov AV, Chaplygin SS, Zelter PM, Khramov AG, Nikonorov AV. Injured lung volume estimation on CT data using linear metrics. CEUR Workshop Proceedings, 2016; 1638: 401-410. DOI: 10.18287/ 1613-0073-2016-1638-401-410 1 Introduction Historically medicine considered being a kind of art, but nowadays it is a science, that’s why tendency to make medical diagnose more objective, sometimes quantita- tive is very popular. Segmentation based on computed tomography – CT-scans is the best accurate method for lung volume evaluation, quantitative principles were used in patients with emphysema and cancer. These parameters would help doctor to predict effectiveness of surgery or sometimes avoid it because of high risk. Unfortunately, segmentation of contused or injured tissue cannot be done accurate enough. Currently, the main methods for the segmentation of lung [1] are based on the voxel- wise analysis of the intensity function, texture features of the CT-scans. In any of these methods presence of pulmonary contour is mandatory. In case of interstitial lung lesions (pneumonia, alveolar infiltration) (fig.1) or severe mechanical injury the con- tour of the lung is often unexpressed or even absent. Various methods to segment the lungs with this pathology and calculate their volume exist [1, 2, 3], but they are com- Information Technology and Nanotechnology (ITNT-2016) 401 Image Processing, Geoinformatics and Information Security Smelkina NA. et al… putationally expensive, while achieving error of 10%, and require a lot of initial in- formation and manual image processing, such as: the location of the damage, main points that characterize the lung, etc. a) b) Fig. 1. Alveolar infiltration in the lung: a) a 2D slice; b) a 3D polygonal model (red color – pathology, blue color – lungs) Anyway, all of these methods of segmentation aimed at diagnosis lungs disease and the determining of its quantitative characteristics. As the methods for finding the vol- ume of injuries are very complicated and slow, reached insufficient accuracy of calcu- lations, some papers addressed the methods of quantification the volume of lung dis- ease, avoiding organs segmentation. For example, in [4] proposed a method for de- termining the amount of pleural effusion, having the characteristic form, using simple measurements on the plane and in space, without segmenting pathology. The accuracy of this method varies 4.5-5.1%, depending on the selected measurements. In this pa- per, we propose to estimate the total lung volume with the help of a simple linear metrics, without using segmentation tools. 2 Linear metrics for abnormal lungs volume estimation We propose the following linear metrics for abnormal lungs volume estimation. Met- ric a (2a) is Euclidian distance from the maximum lung point in the vertical projection on the lower slice to the minimum lungs point in the vertical projection on the upper slice. Metrics b and c (2b, 2c) represent the “depth” and “thickness” of the lungs. Metric d (2d) is measured between the extremes, the easiest points to select. Metric e (2e) contains a diametral characteristic of the pleural cavity and represents a Euclidian distance from the minimum lung point to the maximum lung point in the horizontal projection. Thus, estimating the total lung volume, we can find the volume of the pathological areas. Is should also be noted that the anatomical structure of the lung (the position, shape, texture) is unique, and in this case, in contrast to the [4], it is impossible to use lung’s shape prior information to increase the model accuracy. Here we evaluated linear regression models on the base of CT dataset consisting of 49 patients with healthy lungs. For each of these patients were founds the metrics and volume of the lungs. The test sample consisted of 8 patients. Two patients had severe Information Technology and Nanotechnology (ITNT-2016) 402 Image Processing, Geoinformatics and Information Security Smelkina NA. et al… mechanical injury to the lungs as a result of the accident, and one patient with alveo- lar infiltration in the lung. As the best model was chosen with minimum sum of squared residuals coupled with the cross-checking. The cross-validation has been used as an additional criterion of the quality of regression models. a) b) c) d) e) Fig. 2. Lungs metrics: a) metric a; b) metric b; c) metric c; d) metric d; e) metric e 3 Linear regression model for lung volume estimation Here we used metrics a-e and their products as regressors to estimate lungs vol- ume using linear regression model. The classical linear regression model with the parameter vector θ is given by [5]: Y  F , Information Technology and Nanotechnology (ITNT-2016) 403 Image Processing, Geoinformatics and Information Security Smelkina NA. et al… where F - is an input regressors, Y - observation vector,  - observation noise vec- tor. One of the quality criteria of this model is the minimum sum of squared residuals: 2 R0  y Fθ 2  min . (1) θ The cross-validation procedure has been chosen as an additional quality criterion. This procedure is a standard method of testing and comparing of regression models and is an empirical evaluation of generalization capability of algorithms. In contrast with the average error on the training set, used earlier as a criterion, the average error of cross-validation is unbiased [6]. In our research we used the following cross-validation procedure – the CT dataset randomly divided into training and control samples. The criterion (1) was calculated in the control sample to obtain a more accurate estimation. Stepwise regression [7] was also used for the selection of the most informative metrics combination, with the largest contribution to the variation of the depend- ent variable. The forward stepwise regression consists of successive inclusion of factors in the equation of regression (forward stepwise regression), checking their significance. The criterion of informativeness of different metrics is also criterion (1). The best model of stepwise regression is following: Y  7.4646  5.2426  a  3.2191  b  6.63  e  3.3367  ( a  e), (2) R0  0.4928. Thus, the most informative metrics were a, b, e. These measurements more fully de- scribe the lungs in three basic dimensions. Note, that models containing information about the "thickness" of lungs (metric c) and the distance between the extreme points of lungs (metric d) - showed less accurate result. Best value of the criterion (1), was obtained for the model depending on the product of the three most informative metrics and has the following form: 2 Y  0.1608  0.2496  ( a  b  e)  0.0097  ( a  b  e) , (3) R0  0.1635. The graph of this model is shown in fig. 3. The source data are sufficiently large variation along the regression curve. That’s why we also used more robust criterion for constructing regression using Huber’s M- estimator [8]. The feature of the method of robust regression is as follows: points located far from the main regression curve, make a linear contribution to the model. On the criterion (1) this method showed slightly more accurate results than the ordi- nary regression. The model is as follows: Information Technology and Nanotechnology (ITNT-2016) 404 Image Processing, Geoinformatics and Information Security Smelkina NA. et al… 2 Y  0.8243  0.1135  ( a  b  e)  0.0148  ( a  b  e) , (4) R0  0.1445. The graph of this model is also shown in fig. 3. Fig. 3. Graphs of regression models (3) and (4) The best value of criterion (1) obtained using the cross-validation and depending of only one metric, is significantly larger, the model has following form: 2 Y  0.3049  0.6282  (b  b)  0.1582  (b  b) , (5) R0  1.5215. Respectively, the robust regression model: 2 Y  4.0058  1.5649  (b  b )  0.4493  (b  b ) , (6) R0  1.7005. The graphs of models are shown in fig. 4. 4 Estimation of the healthy lung tissue volume Estimation of the volume of healthy lung regions is by using the Otsu’s method. We divide the image into two regions according to the intensity. One region includes the area of healthy lung tissue and the air around the body, and the second region consists of the body. And then we set a point to the lung area in first region, leaving areas only associated with this point. So we find the area of healthy lung tissue. Information Technology and Nanotechnology (ITNT-2016) 405 Image Processing, Geoinformatics and Information Security Smelkina NA. et al… Fig. 4. Graphs of models (5) and (6) 4.1 Otsu’s method Otsu’s method, due to its universality, is widely used in image processing and pattern recognition. The algorithm is based on dividing the pixels in the grayscale image on “useful” and “background” by finding the threshold reducing intraclass dispersion which determined as a weighted sum of the dispersions of the two classes [9]:  w (t )  w1 (t )1 (t )  w2 (t ) 2 (t ), 2 2 2 where weights wi - the probability of two classes, separated by threshold t ,  i - 2 dispersions on these classes. The resulting binary image is as follows: 1, x(i, j , k )  D y (i , j , k )   , 0, x(i, j , k )  D where D - lung area and air, x (i , j , k ) - pixel of source grayscale image, y (i , j , k ) - pixel of resulting binary image. 4.2 Determination of connected region After dividing image into 2 regions, we have an image with lungs and air around body. Thus, we set a point in the lungs, defining the area of interest and cast away the air. Thereafter processed area connected with this point. Thus, the resulting image is given by: 1, x(i, j , k )  D  x(i, j , k )  O y (i, j , k )   , 0, x(i, j , k )  D  x(i, j , k )  O Information Technology and Nanotechnology (ITNT-2016) 406 Image Processing, Geoinformatics and Information Security Smelkina NA. et al… where O - connected region with a given initial point. Then we calculate the amount of lungs received by the formula binary image: 1 n m l V     x1 (i, j , k )  size(i )  size( j )  size(k ), 10 i 1 j 1 k 1 6 where x1 (i , j , k )  pixels with value 1, size (i )  size ( j )  size ( k ) - the sizes of the corresponding voxels in cubic millimeters ( mm ), [V ]  liters. . 3 5 Experimental results Cross-validation of proposed technique was made by dividing dataset of 49 CT-scans into train and control samples. The control sample consists of 5 patients with normal lungs and 3 patients with pathology. Pathology volume is calculated as a difference between total volume of healthy lungs, determined by the formula of linear regres- sion, and the volume of healthy lung tissue: V path  Vtotal  Vhealthy . Foe CT-scans of patients with normal lungs difference should be close to 0. Accuracy of the method for patients with pathology is calculated as follows: Vtotal  (Vhealthy  V path ) eps  . (7) V path For patients with normal lungs the formula calculation error is as follows: Vtotal  Vhealthy eps  . Vhealthy Table 1 shows the results of a study of the model (2). Dashes in the table indicate that the model gives an unacceptable error value, and in this case model cannot be used. As shown in table 2, the error of this method does not exceed 10% and some cases as low as 0.3%. Ground true value of pathology volume was evaluated manually for estimation of the criterion (7). A model based on a single metric is less accurate as compared to the model, depend- ing on the composition of different metrics. Information Technology and Nanotechnology (ITNT-2016) 407 Image Processing, Geoinformatics and Information Security Smelkina NA. et al… Table 1. The experimental results for the model (2) Patient 1 2 3 4 5 6 7 8 number Pathology no no no no no yes yes yes a, dm 2.452 2.210 2.681 2.362 2.641 2.613 2.302 2.722 b, dm 2.073 1.990 2.211 1.988 2.319 1.563 1.446 1.756 e, dm 3.233 2.588 3.031 2.725 2.535 2.623 2.392 2.607 Vhealthy , l 6.305 4.09 8.388 4.705 6.200 3.14 1.974 4.003 Vtotal , l 6.299 4.383 7.792 5.086 6.819 4.484 2.732 5.456 Vpath , l - - - - - 0.596 0.697 0.584 eps 0.001 0.072 0.071 0.081 0.099 - 0.087 - Table 2. The experimental results for the models (3), (4) Patient 1 2 3 4 5 6 7 8 number Pathology no no no no no yes yes yes a, dm 2.452 2.210 2.681 2.362 2.641 2.613 2.302 2.722 b, dm 2.073 1.990 2.211 1.988 2.319 1.563 1.446 1.756 e, dm 3.233 2.588 3.031 2.725 2.535 2.623 2.392 2.607 Vhealthy , l 6.305 4.09 8.388 4.705 6.200 3.14 1.974 4.003 (3) V total ,l 6.869 4.253 7.760 4.933 6.363 3.948 2.742 4.777 V (4)total , l 6.675 4.029 7.627 4.691 6.145 3.739 2.666 4.537 Vpath , l - - - - - 0.596 0.697 0.584 eps(3) 0.09 0.04 0.075 0.049 0.026 0.212 0.101 0.326 eps(4) 0.059 0.015 0.091 0.003 0.009 0.004 0.007 0.086 According to table 3, error of models (5), (6) significantly exceeds the error of models (3), (4). In some cases, the model is inapplicable due to a very large calculations error. This is due to lack of knowledge about the form of lungs and lack of spatial dimen- sions. Enough high error due to the fact that the dispersion of metrics is rather large. Information Technology and Nanotechnology (ITNT-2016) 408 Image Processing, Geoinformatics and Information Security Smelkina NA. et al… Table 3. The experimental results for the models (5), (6) Patient 1 2 3 4 5 6 7 8 number Pathology no no no no no yes yes yes b, dm 2.073 1.990 2.211 1.988 2.319 1.563 1.446 1.756 Vhealthy , l 6.305 4.09 8.388 4.705 6.200 3.14 1.974 4.003 (5) V total ,l 5.922 5.275 7.154 5.255 8.258 1.840 2.310 3.746 V (6)total , l 5.573 4.856 7.090 4.835 8.584 2.864 2.698 3.452 Vpath , l - - - - - 0.596 0.697 0.584 eps(5) 0.061 0.29 0.147 0.117 0.332 - 0.518 - eps(6) 0.116 0.187 0.155 0.028 0.384 - 0.039 - 6 Conclusion This work aims to determine the pathological lungs amount for the purpose of diag- nosing the severity of injuries. We studied the method of simple calculation patholog- ical lung volume using linear metrics. Estimation error, obtained for the proposed method less than 10%, when compared with ground true value. There are two main causes of model inaccuracy. First is human intervention - the arrangement most accu- rately describing the simple lung’s measurement. Second cause is lung’s anatomical structure. Also, the accuracy of the regression model somewhat reduced because qual- ity of images in the training sample is very different. The main error of the algorithm associated with a large spread of metrics. This problem will be addressed in future works. Currently we are researching the possibility of automatic measurement ar- rangement for minimizing computation errors. Acknowledgements The work was performed as part of the project "Autoplan" (automatic system of plan- ning and monitoring of the operation) under the state contract to perform research and development work "Development of technology and organization of production of the automated systems of planning, management and monitoring of results of surgical treatment" ("4.3-Autoplan-2014"). Information Technology and Nanotechnology (ITNT-2016) 409 Image Processing, Geoinformatics and Information Security Smelkina NA. et al… References 1. Mansoor A, Bagsi U, Foster B, Xu Z, Papadakis GZ, Folio LR, Udupa JK, Mollura DJ. Segmentation and Image Analysis of Abnormal Lungs at CT: Current Approaches, Chal- lenges, and Future Trends. RadioGraphics, 2015; 35(1): 1056-1076. 2. Birkbeck N, Sofka M, Kohlberger T, Zhang J, Wetzl J, Kaftan J, Zhou SK. Robust Seg- mentation of Challenging Lungs in CT Using Multi-stage Learning and Level Set Optimi- zation. Computational Intelligence in Biomedical Imaging. Springer, New York, 2014; 185-208. 3. Mansoor A, Bagsi U,Foster B, Xu Z, Papadakis GZ, Folio LR, Udupa JK, Mollura DJ. A Generic Approach to Pathological Lungs Segmentation. IEEE Transactions on Medical Imaging, 2014; 33(12): 2293-2310 4. Hazlinger M, Ctvrtlik F, Langova K, Herman M. Quantification of pleural effusion on CT by simple measurement. Biomed Pap Med Fac Univ Palacky Olomouc Czech Repub, 2014; 158(1): 107-111. 5. Draper NR, Smith H, Pownell E. Applied regression analysis. New York: Wiley, 1966; 3. 6. Vorontsov KV. Combinatorial approach to the assessment of the quality of the trained al- gorithms. Mathematical questions of Cybernetics, 2004; 13: 5-36. [in Russian] 7. Efroymson MA. Multiple regression analysis. Mathematical methods for digital comput- ers, 1960; 1: 191-203. 8. Huber PJ. Robust statistics. – Springer Berlin Heidelberg, 2011: 1248-1251. 9. Otsu N. A threshold selection method from gray-level histograms. Transactions on sys- tems, man, and cybernetics, 1979; 9(1): 62-66. Information Technology and Nanotechnology (ITNT-2016) 410