Image Processing, Geoinformatics and Information Security


     INJURED LUNG VOLUME ESTIMATION ON CT
           DATA USING LINEAR METRICS

    N.A. Smelkina1, A.V. Kolsanov2, S.S. Chaplygin2, P.M. Zelter2, A.G. Khramov1,
                                 A.V. Nikonorov1
                    1
                     Samara National Research University, Samara, Russia
                      2
                        Samara State Medical University, Samara, Russia


        Abstract. The article proposes a method for measuring the volume of the in-
        jured regions of human lung tissue on CT scan based on the volume of healthy
        lung tissue subtraction from the total lung volume in the absence of pathology.
        The total lung volume is calculated on the base of the regression model using a
        combination of linear metrics produced manually. The volume of healthy lung
        tissue is determined using adaptive thresholding. Experimental studies have
        shown that the error of the proposed technique does not exceed 10% in compar-
        ison with ground true evaluation made by expert. This technique is useful for
        the case of large injury, when accurate lung segmentation is impossible.

        Keywords: diagnostic images, the image linear metrics, image processing, data
        mining, linear regression model


        Citation: Smelkina NA, Kolsanov AV, Chaplygin SS, Zelter PM, Khramov
        AG, Nikonorov AV. Injured lung volume estimation on CT data using linear
        metrics. CEUR Workshop Proceedings, 2016; 1638: 401-410. DOI: 10.18287/
        1613-0073-2016-1638-401-410


1       Introduction

Historically medicine considered being a kind of art, but nowadays it is a science,
that’s why tendency to make medical diagnose more objective, sometimes quantita-
tive is very popular. Segmentation based on computed tomography – CT-scans is the
best accurate method for lung volume evaluation, quantitative principles were used in
patients with emphysema and cancer. These parameters would help doctor to predict
effectiveness of surgery or sometimes avoid it because of high risk. Unfortunately,
segmentation of contused or injured tissue cannot be done accurate enough.
Currently, the main methods for the segmentation of lung [1] are based on the voxel-
wise analysis of the intensity function, texture features of the CT-scans. In any of
these methods presence of pulmonary contour is mandatory. In case of interstitial lung
lesions (pneumonia, alveolar infiltration) (fig.1) or severe mechanical injury the con-
tour of the lung is often unexpressed or even absent. Various methods to segment the
lungs with this pathology and calculate their volume exist [1, 2, 3], but they are com-


Information Technology and Nanotechnology (ITNT-2016)                                     401
Image Processing, Geoinformatics and Information Security                    Smelkina NA. et al…


putationally expensive, while achieving error of 10%, and require a lot of initial in-
formation and manual image processing, such as: the location of the damage, main
points that characterize the lung, etc.


       a)                                            b)

    Fig. 1. Alveolar infiltration in the lung: a) a 2D slice; b) a 3D polygonal model (red color –
                                     pathology, blue color – lungs)

Anyway, all of these methods of segmentation aimed at diagnosis lungs disease and
the determining of its quantitative characteristics. As the methods for finding the vol-
ume of injuries are very complicated and slow, reached insufficient accuracy of calcu-
lations, some papers addressed the methods of quantification the volume of lung dis-
ease, avoiding organs segmentation. For example, in [4] proposed a method for de-
termining the amount of pleural effusion, having the characteristic form, using simple
measurements on the plane and in space, without segmenting pathology. The accuracy
of this method varies 4.5-5.1%, depending on the selected measurements. In this pa-
per, we propose to estimate the total lung volume with the help of a simple linear
metrics, without using segmentation tools.


2           Linear metrics for abnormal lungs volume estimation

We propose the following linear metrics for abnormal lungs volume estimation. Met-
ric a (2a) is Euclidian distance from the maximum lung point in the vertical projection
on the lower slice to the minimum lungs point in the vertical projection on the upper
slice. Metrics b and c (2b, 2c) represent the “depth” and “thickness” of the lungs.
Metric d (2d) is measured between the extremes, the easiest points to select. Metric e
(2e) contains a diametral characteristic of the pleural cavity and represents a Euclidian
distance from the minimum lung point to the maximum lung point in the horizontal
projection.
Thus, estimating the total lung volume, we can find the volume of the pathological
areas. Is should also be noted that the anatomical structure of the lung (the position,
shape, texture) is unique, and in this case, in contrast to the [4], it is impossible to use
lung’s shape prior information to increase the model accuracy.
Here we evaluated linear regression models on the base of CT dataset consisting of 49
patients with healthy lungs. For each of these patients were founds the metrics and
volume of the lungs. The test sample consisted of 8 patients. Two patients had severe


Information Technology and Nanotechnology (ITNT-2016)                                           402
Image Processing, Geoinformatics and Information Security                   Smelkina NA. et al…


mechanical injury to the lungs as a result of the accident, and one patient with alveo-
lar infiltration in the lung. As the best model was chosen with minimum sum of
squared residuals coupled with the cross-checking. The cross-validation has been used
as an additional criterion of the quality of regression models.


             a)                                      b)


            c)                                      d)


                                 e)
      Fig. 2. Lungs metrics: a) metric a; b) metric b; c) metric c; d) metric d; e) metric e


3      Linear regression model for lung volume estimation

Here we used metrics a-e and their products as regressors to estimate lungs vol-
ume using linear regression model. The classical linear regression model with the
parameter vector θ is given by [5]:
Y  F ,


Information Technology and Nanotechnology (ITNT-2016)                                          403
Image Processing, Geoinformatics and Information Security              Smelkina NA. et al…


where F - is an input regressors, Y - observation vector,  - observation noise vec-
tor.
One of the quality criteria of this model is the minimum sum of squared residuals:
             2
R0  y Fθ 2  min .                                                                   (1)
                    θ

The cross-validation procedure has been chosen as an additional quality criterion.
This procedure is a standard method of testing and comparing of regression models
and is an empirical evaluation of generalization capability of algorithms. In contrast
with the average error on the training set, used earlier as a criterion, the average error
of cross-validation is unbiased [6].
In our research we used the following cross-validation procedure – the CT dataset
randomly divided into training and control samples. The criterion (1) was calculated
in the control sample to obtain a more accurate estimation.
Stepwise regression [7] was also used for the selection of the most informative
metrics combination, with the largest contribution to the variation of the depend-
ent variable. The forward stepwise regression consists of successive inclusion of
factors in the equation of regression (forward stepwise regression), checking their
significance. The criterion of informativeness of different metrics is also criterion
(1).
The best model of stepwise regression is following:

Y  7.4646  5.2426  a  3.2191  b  6.63  e  3.3367  ( a  e),
                                                                                       (2)
R0  0.4928.

Thus, the most informative metrics were a, b, e. These measurements more fully de-
scribe the lungs in three basic dimensions. Note, that models containing information
about the "thickness" of lungs (metric c) and the distance between the extreme points
of lungs (metric d) - showed less accurate result.
Best value of the criterion (1), was obtained for the model depending on the product
of the three most informative metrics and has the following form:
                                                          2
Y  0.1608  0.2496  ( a  b  e)  0.0097  ( a  b  e) ,
                                                                                       (3)
R0  0.1635.

The graph of this model is shown in fig. 3.
The source data are sufficiently large variation along the regression curve. That’s why
we also used more robust criterion for constructing regression using Huber’s M-
estimator [8]. The feature of the method of robust regression is as follows: points
located far from the main regression curve, make a linear contribution to the model.
On the criterion (1) this method showed slightly more accurate results than the ordi-
nary regression. The model is as follows:


Information Technology and Nanotechnology (ITNT-2016)                                 404
Image Processing, Geoinformatics and Information Security                Smelkina NA. et al…


                                                          2
Y  0.8243  0.1135  ( a  b  e)  0.0148  ( a  b  e) ,
                                                                                         (4)
R0  0.1445.

The graph of this model is also shown in fig. 3.


                       Fig. 3. Graphs of regression models (3) and (4)

The best value of criterion (1) obtained using the cross-validation and depending of
only one metric, is significantly larger, the model has following form:
                                                2
Y  0.3049  0.6282  (b  b)  0.1582  (b  b) ,
                                                                                         (5)
R0  1.5215.

Respectively, the robust regression model:
                                                  2
Y  4.0058  1.5649  (b  b )  0.4493  (b  b ) ,
                                                                                         (6)
R0  1.7005.

The graphs of models are shown in fig. 4.


4      Estimation of the healthy lung tissue volume

Estimation of the volume of healthy lung regions is by using the Otsu’s method. We
divide the image into two regions according to the intensity. One region includes the
area of healthy lung tissue and the air around the body, and the second region consists
of the body. And then we set a point to the lung area in first region, leaving areas only
associated with this point. So we find the area of healthy lung tissue.


Information Technology and Nanotechnology (ITNT-2016)                                   405
Image Processing, Geoinformatics and Information Security                 Smelkina NA. et al…


                                   Fig. 4. Graphs of models (5) and (6)

4.1     Otsu’s method
Otsu’s method, due to its universality, is widely used in image processing and pattern
recognition. The algorithm is based on dividing the pixels in the grayscale image on
“useful” and “background” by finding the threshold reducing intraclass dispersion
which determined as a weighted sum of the dispersions of the two classes [9]:

 w (t )  w1 (t )1 (t )  w2 (t ) 2 (t ),
  2                 2                 2


where weights wi - the probability of two classes, separated by threshold t ,  i -
                                                                                            2

dispersions on these classes.
The resulting binary image is as follows:

                 1, x(i, j , k )  D
y (i , j , k )                      ,
                 0, x(i, j , k )  D
where D - lung area and air, x (i , j , k ) - pixel of source grayscale image, y (i , j , k ) -
pixel of resulting binary image.


4.2     Determination of connected region
After dividing image into 2 regions, we have an image with lungs and air around
body. Thus, we set a point in the lungs, defining the area of interest and cast away the
air. Thereafter processed area connected with this point. Thus, the resulting image is
given by:

                1, x(i, j , k )  D  x(i, j , k )  O
y (i, j , k )                                         ,
                0, x(i, j , k )  D  x(i, j , k )  O

Information Technology and Nanotechnology (ITNT-2016)                                      406
Image Processing, Geoinformatics and Information Security                   Smelkina NA. et al…


where O - connected region with a given initial point.
Then we calculate the amount of lungs received by the formula binary image:

        1  n m l
V           x1 (i, j , k )  size(i )  size( j )  size(k ),
      10 i 1 j 1 k 1
        6


where x1 (i , j , k )  pixels with value 1, size (i )  size ( j )  size ( k ) - the sizes of the
corresponding voxels in cubic millimeters ( mm ), [V ]  liters. .
                                                      3


5       Experimental results

Cross-validation of proposed technique was made by dividing dataset of 49 CT-scans
into train and control samples. The control sample consists of 5 patients with normal
lungs and 3 patients with pathology. Pathology volume is calculated as a difference
between total volume of healthy lungs, determined by the formula of linear regres-
sion, and the volume of healthy lung tissue:

V path  Vtotal  Vhealthy .

Foe CT-scans of patients with normal lungs difference should be close to 0. Accuracy
of the method for patients with pathology is calculated as follows:

        Vtotal  (Vhealthy  V path )
eps                                      .                                                    (7)
                     V path

For patients with normal lungs the formula calculation error is as follows:

        Vtotal  Vhealthy
eps                           .
             Vhealthy

Table 1 shows the results of a study of the model (2).
Dashes in the table indicate that the model gives an unacceptable error value, and in
this case model cannot be used.
As shown in table 2, the error of this method does not exceed 10% and some cases as
low as 0.3%. Ground true value of pathology volume was evaluated manually for
estimation of the criterion (7).
A model based on a single metric is less accurate as compared to the model, depend-
ing on the composition of different metrics.


Information Technology and Nanotechnology (ITNT-2016)                                          407
Image Processing, Geoinformatics and Information Security                              Smelkina NA. et al…


                                 Table 1. The experimental results for the model (2)

Patient                    1              2         3        4        5        6           7       8
number
Pathology                  no             no        no       no       no       yes         yes     yes

    a, dm              2.452         2.210      2.681    2.362     2.641     2.613      2.302    2.722

    b, dm              2.073         1.990      2.211    1.988     2.319     1.563      1.446    1.756

    e, dm              3.233         2.588      3.031    2.725     2.535     2.623      2.392    2.607
 Vhealthy , l          6.305             4.09   8.388    4.705     6.200     3.14       1.974    4.003

  Vtotal , l           6.299         4.383      7.792    5.086     6.819     4.484      2.732    5.456

  Vpath , l            -             -          -        -        -         0.596      0.697     0.584

      eps              0.001        0.072       0.071    0.081    0.099            -   0.087           -

                               Table 2. The experimental results for the models (3), (4)

Patient                    1              2         3        4        5        6           7       8
number
Pathology                  no             no        no       no       no       yes         yes     yes

    a, dm              2.452         2.210      2.681    2.362    2.641     2.613      2.302     2.722

    b, dm              2.073         1.990      2.211    1.988    2.319     1.563      1.446     1.756

    e, dm              3.233         2.588      3.031    2.725    2.535     2.623      2.392     2.607
 Vhealthy , l          6.305         4.09       8.388    4.705    6.200     3.14       1.974     4.003
    (3)
V         total   ,l   6.869         4.253      7.760    4.933    6.363     3.948      2.742     4.777

V (4)total , l         6.675         4.029      7.627    4.691    6.145     3.739      2.666     4.537

  Vpath , l                -              -         -        -        -     0.596      0.697     0.584

    eps(3)             0.09          0.04       0.075    0.049    0.026     0.212      0.101     0.326

    eps(4)             0.059         0.015      0.091    0.003    0.009     0.004      0.007     0.086
According to table 3, error of models (5), (6) significantly exceeds the error of models
(3), (4). In some cases, the model is inapplicable due to a very large calculations error.
This is due to lack of knowledge about the form of lungs and lack of spatial dimen-
sions. Enough high error due to the fact that the dispersion of metrics is rather large.


Information Technology and Nanotechnology (ITNT-2016)                                                      408
Image Processing, Geoinformatics and Information Security                          Smelkina NA. et al…


                            Table 3. The experimental results for the models (5), (6)

Patient                     1        2         3         4         5         6          7         8
number
Pathology                   no       no        no        no        no        yes        yes       yes

     b, dm              2.073    1.990     2.211     1.988     2.319     1.563     1.446      1.756
 Vhealthy , l           6.305    4.09      8.388     4.705     6.200     3.14      1.974      4.003
     (5)
V          total   ,l   5.922    5.275     7.154     5.255     8.258     1.840     2.310      3.746

V (6)total , l          5.573    4.856     7.090     4.835     8.584     2.864     2.698      3.452

    Vpath , l           -        -         -         -         -         0.596     0.697      0.584

     eps(5)             0.061    0.29      0.147     0.117     0.332     -         0.518      -

     eps(6)             0.116    0.187     0.155     0.028     0.384     -         0.039      -


6             Conclusion

This work aims to determine the pathological lungs amount for the purpose of diag-
nosing the severity of injuries. We studied the method of simple calculation patholog-
ical lung volume using linear metrics. Estimation error, obtained for the proposed
method less than 10%, when compared with ground true value. There are two main
causes of model inaccuracy. First is human intervention - the arrangement most accu-
rately describing the simple lung’s measurement. Second cause is lung’s anatomical
structure. Also, the accuracy of the regression model somewhat reduced because qual-
ity of images in the training sample is very different. The main error of the algorithm
associated with a large spread of metrics. This problem will be addressed in future
works. Currently we are researching the possibility of automatic measurement ar-
rangement for minimizing computation errors.


Acknowledgements

The work was performed as part of the project "Autoplan" (automatic system of plan-
ning and monitoring of the operation) under the state contract to perform research and
development work "Development of technology and organization of production of the
automated systems of planning, management and monitoring of results of surgical
treatment" ("4.3-Autoplan-2014").


Information Technology and Nanotechnology (ITNT-2016)                                                   409
Image Processing, Geoinformatics and Information Security             Smelkina NA. et al…


References
 1. Mansoor A, Bagsi U, Foster B, Xu Z, Papadakis GZ, Folio LR, Udupa JK, Mollura DJ.
    Segmentation and Image Analysis of Abnormal Lungs at CT: Current Approaches, Chal-
    lenges, and Future Trends. RadioGraphics, 2015; 35(1): 1056-1076.
 2. Birkbeck N, Sofka M, Kohlberger T, Zhang J, Wetzl J, Kaftan J, Zhou SK. Robust Seg-
    mentation of Challenging Lungs in CT Using Multi-stage Learning and Level Set Optimi-
    zation. Computational Intelligence in Biomedical Imaging. Springer, New York, 2014;
    185-208.
 3. Mansoor A, Bagsi U,Foster B, Xu Z, Papadakis GZ, Folio LR, Udupa JK, Mollura DJ. A
    Generic Approach to Pathological Lungs Segmentation. IEEE Transactions on Medical
    Imaging, 2014; 33(12): 2293-2310
 4. Hazlinger M, Ctvrtlik F, Langova K, Herman M. Quantification of pleural effusion on CT
    by simple measurement. Biomed Pap Med Fac Univ Palacky Olomouc Czech Repub,
    2014; 158(1): 107-111.
 5. Draper NR, Smith H, Pownell E. Applied regression analysis. New York: Wiley, 1966; 3.
 6. Vorontsov KV. Combinatorial approach to the assessment of the quality of the trained al-
    gorithms. Mathematical questions of Cybernetics, 2004; 13: 5-36. [in Russian]
 7. Efroymson MA. Multiple regression analysis. Mathematical methods for digital comput-
    ers, 1960; 1: 191-203.
 8. Huber PJ. Robust statistics. – Springer Berlin Heidelberg, 2011: 1248-1251.
 9. Otsu N. A threshold selection method from gray-level histograms. Transactions on sys-
    tems, man, and cybernetics, 1979; 9(1): 62-66.


Information Technology and Nanotechnology (ITNT-2016)                                   410