=Paper= {{Paper |id=Vol-1391/47-CR |storemode=property |title=Automatic Classification of Body Parts X-ray Images |pdfUrl=https://ceur-ws.org/Vol-1391/47-CR.pdf |volume=Vol-1391 |dblpUrl=https://dblp.org/rec/conf/clef/AboudSJ15 }} ==Automatic Classification of Body Parts X-ray Images== https://ceur-ws.org/Vol-1391/47-CR.pdf
    Automatic Classification of Body Parts X-ray
                       Images

             Moshe Aboud1 , Assaf B. Spanier1,2 , and Leo. Joskowicz2
       1
         Department of Software Engineering, Jerusalem College of Engineering
2
    The Selim and Rachel Benin School of Engineering, The Hebrew Univ., Jerusalem,
                                       Israel



        Abstract. The development of automatic analysis and classification
        methods for large databases of X-ray images is a pressing need that
        may have a great impact on clinical practice. To advance this objective
        the ImageCLEF-2015 clustering of body part X-ray images challenge
        was created. The aim of the challenge is to group digital X-ray images
        into five structural groups: head-neck, upper-limb, body, lower-limb, and
        other. This paper presents the results of an experimental evaluation of
        X-ray images classification in the ImageCLEF-2015 challenge. We apply
        state-of-the-art classification and feature extraction methods for image
        classification and optimize them for the challenge task with emphasis
        on features indicating bone size and structure. The best classification
        results were obtained using the intensity, texture and HoG features and
        the KNN classifier. This combination has an accuracy of 86% and 73%
        for the 500 training images and 250 test images, respectively.

        Keywords: Classification, X-ray images, ImageCLEF-2015




                           Source code are available at:
             https://bitbucket.org/mosheab/classifying-medical-images


1     Introduction

The increasing amount of medical imaging data acquired in clinical practice con-
stitutes a vast database of untapped diagnostically-relevant information, millions
of images are acquired worldwide each year. Clinicians are struggling under the
burden of diagnosis and follow up of such an immense amount of images. This
phenomenon gave rise to a plethora of methods to improve and assist clinicians
using efficient search capabilities.
    Content-Based Image Retrieval (CBIR) is a popular growing research topic
[1]. The goal of CBIR is to assist physicians with diagnosis by finding similar
cases to the case at hand. Therefore, CBIR requires efficient search capabilities in
a vast database of medical images. The problem is emphasized in X-ray imaging,
the most widely used medical imaging modality today as many clinical home
health-care centers are equipped with X-ray scanners and maintain their own
database of images.
    This paper elucidates the problem of classification of digital X-ray image into
five groups: head-neck, upper-limb, body, lower-limb and other (Fig 1).
    A variety of methods exist for medical image feature extraction and classi-
fication, Haralick et al. [14] suggest feature extraction based on gray level co-
occurrences matrices, whereas Weszka et al.[23] perform a classification based
on local binary patterns (LBP). Another strategy is to combine local and global
features presented by Rublee et al. [18], using pixel values and shape features
extracted with the Canny edge detection method. The pixel values and shape
features are then used as a unique multi-feature vector used for classification.
    Advanced methods include image classification based on the IRMA code
[16]. In this method, features are extracted from the modality, body orientation,
anatomic region and biological system. More recently, the Bag of Visual Words
model (BoVW) [4] was used for X-ray images. In the BoVW approach, a visual
word vocabulary is created from local image patches to represent an image,
which is obtained by extracting feature descriptors around interest points.
    Ghofrani et al. recently proposed the classification-based fuzzy set theory
[12] They performed a fuzzy set classification with feature extraction based on
a combination of shape and texture using the Canny Edge Detector and the
Discrete Gabor Transform. Zare et al. [24] present three techniques for image
annotation: the probabilistic latent semantic analysis (PLSA) image annotation,
binary classification annotation, and annotation based on similar images. In their
approach, semantic information is captured from textual and visual modalities
and the correlation between them is learned.
    This paper presents the results of an experimental evaluation of X-ray images
classification [3] in the ImageCLEF-2015 challenge [21]. The goal of the challenge
is to group digital X-ray images into five groups: head-neck, upper-limb, body,
lower-limb, and other. In the context of the challenge, we apply state-of-the-
art classification and feature extraction methods for image classification and
optimize them for the challenge task with an emphasis on features indicating
bone size and structure.


2   Method

The aim of our method is to group digital X-ray images into five groups: head-
neck, upper-limb, body, lower-limb, and other (see Fig 1) . Our objective is to
apply state-of-the-art classifiers and feature selection methods and to optimize
them for the challenge task with an emphasis on features indicating bone size
and structure.
    The input to our method is a set of (1) label X-ray images from five groups,
(2) features extraction techniques and; (3) classifiers. The output of our method
is a combination of 10 features-classifier pairs that achieve the highest classifi-
cation accuracy on the given five groups classification task.
Fig. 1. Examples of the five X-ray images groups: Body, Head-Neck, Upper limb, lower-
limb and other.


    Our method consists of two steps. (1) a two-class experiment was used in
order to select the features-classifier pair that best distinguished between X-ray
images containing big and long bones (e.g. skull, arm and leg) against small and
short bones (e.g. chest and abdomen bones). (2) The features-classifier pairs that
that provide an average accuracy of grater then 90% in the first step evaluate
on the five groups of X-ray images (head-neck, upper-limb, body, lower-limb and
other Fig 1) set to find 10 best combination of features-classifier. Those 10 best
features-classifier pairs were submitted to the challenge evaluation.
    Fig 2 illustrates the flow of our method. Next, we describe each step in details.

2.1    Features Extraction
Nine features extraction methods were evaluated in this study. Below we describe
the various features that were examined in our study.
 – Color Extracting. A gray-scale histogram [22] is used to represent the color
   distribution of the image. We divide the image into equal patches (7x7) and
   compute an 8 bit histogram for each region. Then we add all patch based
   histogram into a single vector that serves as the color feature of our method.
 – Texture Extracting. Texture features are examined using Local Binary Pat-
   tern (LBP) [13] which provides highly discriminative texture information
   and is used to provide robust pattern-related information. We divide each
   image into equal (10x10) patches and extract LBP values for each patch.
   The patches are then represented in a single vector to serve as the texture
   feature of our method.
Fig. 2. The input to our method is a set of (1) labeled X-ray images (2) features
extraction techniques, and (3) classifiers. Our method consists of two steps. Step 1:
2-class experiment was used in order to select the best features-classifier pair that
distinguish between X-ray images containing big and long bones against small and
short bones. Step 2: The features-classifier pairs that that provide an average accuracy
of grater then 90% in the first step train on the five groups of X-ray images set to find
10 best combination of features-classifier. Those 10 best features-classifier pairs were
submitted to the challenge evaluation.


 – HoG This is a histogram of neighborhood pixels according to their gradient
   orientation, weighted by their gradient magnitude. HoG features were shown
   to be particularly discriminative of people and body shapes. We extract
   the HoG values for each 10x10 patch in the image. These values are then
   represented as a single vector to serve as the HoG feature of our method. [8].
 – BoVW [9] This method produces a visual vocabulary. The method descrip-
   tors were extracted from detected key points using the following algorithms:
     • Scale invariant feature transform (SIFT) [17].
     • Speded up robust features (SURF) [5].
     • Binary robust independent elementary features Brief (BRIEF) [6].
     • Oriented fast and rotated BRIEF (ORB) [20].
   These descriptors were then clustered using the k-means algorithm. The
   cluster centers act as the BoVW feature of our method. Applying this scheme
   using the mentioned descriptor algorithms provides four additional methods.
   Thus, we have 4 different types of BoVW features.
 – Color+Texture A combination of the color and texture values represented
   as as a multi-feature vector.
 – Color+Texture+HoG A combination of the color, texture and HoG values
   represented as a multi-feature vector.
2.2    Classifiers
We tested the following four classification methods:
 1. KNN assigns a label according to the majority labels of the K-nearest neigh-
    bor in space [7].
 2. SVM is a linear classification of the points in space into two distinct classes
    [11] .
 3. LR is a probability model that predicts a binary output based on the model
    predictor variables [10].
 4. DBN constructs deep hierarchical layers based on a representation of the
    training data. The DBN performs an unsupervised pre-training learning and
    then sets the weights of the network in order to successfully use a supervised
    learning for classification [2] .

2.3   Model Selection
Given a set of feature extraction and classifier methods our goal is to find the 10
best combinations of feature-classifier that will provide the highest classification
accuracy for the five groups of X-ray images.
     To reduce the number of combinations and the complexity of the problem, a
two-class selection is first applied to distinguish between X-ray images containing
big and long bones (e.g., skull, arm and leg) and those with small and short
bones (e.g., chest and abdomen bones). An additional motivation is to identify
the features that will isolate different bone structures.
     Next, we select the feature-classifier pairs that provide and average accuracy
of greater than 90% in the two-class experiment and train them on the five groups
of X-ray images set to find the 10 best combinations of feature-classifiers. We test
each feature-classifiers pair in leave-one-out cross-validation, in which training
is learned based on all cases besides a single case that is not part of the training
process and used for testing.
     The 10 best feature-classifier combination were then submitted to the chal-
lenge to be tested on an unlabeled test set of images released by the ImageCLEF
organization [21] [3]. We use the OpenCV-Python library [15] for the feature ex-
traction and Python scikit-Learn Machine learning tool [19] to examine the four
selected classifiers.

3     Results
The data set released by ImageCLEF[21] [3] consists of 750 X-ray images: (a)
500 images were labeled images (100 images from each group) and were released
for training purposes. The five image groups are Head-Neck, Body, Upper-Limb,
Lower-Limb and other (Fig 1). and (b) 250 unlabeled images released for the
challenge evaluation and benchmarking.
    We first present the results of the 500 labeled images training set of X-ray
images obtained in our two-step approach. Then, we present the results of the
250 unlabeled images as validated by the challenge organizers.
3.1   Training

In the two-class experiment to we use all four classifiers and nine sets of features,
thus creating combination of 36(4∗9) feature-classifiers. In the first step we select
feature-classifier pairs that provide an average accuracy greater than 90%. This
selection reduced the number of combination from 36 to 16. The results for all
36 feature-classifiers are shown in Table 1:


Table 1. Results of the first training step: Classify between long and short bones, the
nine sets of features were used, each represented as a row in the table. The columns
represent the four classifiers. Values in the table are the results of the leave-one-out
cross-validation process.

             Feature/Classifier LR     DBN SVM KNN
             BoVW BRIEF         79.55% 74.06% 80.05% 75.81%
             BoVW ORB           79.05% 74.06% 80.30% 78%
             BoVW SIFT          82.54% 74.06% 81.30% 76.81%
             BoVW SURF          87%    74.31% 87.28% 84.29%
             HoG                90.77% 79.55% 92.52% 93.02%
             TEXTURE            89.78% 91.27% 92.27% 90.52%
             COLOR              89.28% 92.77% 91.02% 92.52%
             COLOR+TEXTURE      91.27% 93.27% 91.52% 91.77%
             COLOR+TEXTURE+HoG 93.52% 92.77% 92.27% 92%



    The BoVW and HoG features exhibit low accuracy regardless of the classifier
tested. The combination of color and texture yield high accuracy rate of 89-93%.
Using color, texture and HoG features all together yields the highest average
classification accuracy in all classifiers.
    In the second step, 5-class, classification was preformed on all five group.
We select 16 combinations of features-classifier that yield an accuracy greater
than 90%. The goal of this second step is to investigate the performance of the
methods and to reduce the number of feature-classifiers to the best 10 .
    Table 2 presents the results of the second step on the training set.


Table 2. Results of the second training phase for the 5-groups training set classifica-
tion, the four sets of features were used, each represented as a row in the table. The
columns represent the four classifiers. The 10 best combinations are marked in the
table with (*) and were sent for the challenge

       Feature/Classifier LR        DBN       SVM       KNN
       TEXTURE            73.40%    77.40%    75.00%    78.80(*)%
       COLOR              72.40%    79.20%    74.80%    80.80(*)%
       COLOR+TEXTURE      75.60(*)% 80.20(*)% 79.80(*)% 83.20(*)%
       COLOR+TEXTURE+HoG 80.80(*)% 83.40(*)% 83.80(*)% 86(*)%
    To sum-up, the 10 best features-classifier pairs are: 1. Color+Texture+HoG
and KNN 2. Color+Texture+HoG and SVM 3. Color+Texture + HoG and
DBN 4. Color+Texture+HoG and LR 5. Texture+HoG and KNN 6. Tex-
ture+HoG and SVM 7. Texture+HoG and DBN 8. Color+Texture and LR 9.
Color and KNN 10. Texture and KNN. These 10 best combinations are marked
in Table 2 with (*) were sent to the challenge organizer for evaluation.


3.2   Challenge Results

Table 3 shows the classification results of the 250 training images with the 10
best combinations of feature-classifier methods which were submitted to the
ImageCLEF organization for evaluation. Our best summation ranked 12th out
of 30 using the color histogram features and the KNN classifier achieving an
accuracy of 73.2%. The challenge results for all 10 methods are presented in the
table below:

Table 3. Challenge Results for 250 training images with our 10 best combinations
features-classifier methods

               Feature/Classifier LR    DBN SVM KNN
               TEXTURE                              66.4%
               COLOR                                73.2%
               COLOR+TEXTURE      71.2% 68.0% 71.2% 71.2%
               COLOR+TEXTURE+HoG 69.2% 69.2% 72.8% 72.4%



    Note that our best submission (73.2% accuracy) is lower than the best accu-
racy obtained in the five group training set (80-85%) using the same combination
of color texture and HoG features. The challenge results are similar to the aver-
age results achieved on the training set. This may indicate that the variability of
the training dataset does not fully reflect the images variability of the challenge
dataset.


4     Discussion

In this work we have evaluated four state-of-the-art classifier and nine sets of
features, resulting in 36 combinations of feature-classifiers. Surprisingly, the sim-
plest classifier, in terms of implementation and computational complexity, KNN,
exhibited the best results. Moreover, despite the major trend of using Deep Belief
Network (DBN) methods for many image-based classification problems, the use
of DBN in our study exhibited reliable results only when applied to a large-scale
dataset. However, suboptimal results were obtained when used on small-scale
datasets (see Table 2). This is in line with the theory of DBN, which requires
large-scale databases for reliable performance. The BoVW method was the least
efficient method among all feature extraction schemes that have been tested.
    From all feature extraction methods we evaluated, the color feature yielded
the highest accuracy. This is surprising, considering that X-ray images are gray-
level based images. This could be explained by the algorithm implementation,
which extracts the gray-scale histogram features from different regions of the
image and provides more specific information and a better perspective on the
distribution of the color. Another advantage of using the color features is low
computational complexity as compared to the texture, HoG and BoVW.


5   Conclusions

This paper presents research on medical X-ray image classification. We analyze
state-of-the-art classifiers and feature extraction methods for image classifica-
tion. The image features that have been used include the color, texture, HoG
and BoVW, which were used by our tested classifiers: SVM, KNN, LR and DBN.
We used the datasets of the ImageCLEF-2015 [19] clustering of body part X-
ray challenge [3]: 500 X-ray images were used for training and 250 for testing.
The highest classification accuracy results were obtained when using the inten-
sity, texture and HoG features and the KNN classifier. This combination has
an accuracy of 86% and 73% for the 500 training images and 250 test images,
respectively.
    Future work consists of examining an additional set of classifiers and extend-
ing the completeness of our algorithm to estimate the partitioning of the initial
clusters into sub-clusters. For example, the upper-limb cluster can be further di-
vided into the following categories: clavicle, scapula, humerus, radius, ulna and
hand.
    Future work consists of examining an additional set of classifiers and extend-
ing the completeness of our algorithm to estimate the partitioning of the initial
clusters into sub-clusters, for example the upper-limb cluster can be farther di-
vided into: Clavicle, Scapula, Humerus, Radius, Ulna, and Hand.


References

 1. Akgül, C.B., Rubin, D.L., Napel, S., Beaulieu, C.F., Greenspan, H., Acar, B.:
    Content-based image retrieval in radiology: Current status and future directions.
    Journal of Digital Imaging 24(2), 208–222 (2011)
 2. Ali, K.H., Wang, T.: Learning features for action recognition and identity with
    deep belief networks. In: 2014 International Conference on Audio, Language and
    Image Processing (ICALIP),. pp. 129–132. IEEE (2014)
 3. Amin, M.A., Mohammed, M.K.: Overview of the ImageCLEF 2015 medical clus-
    tering task. In: CLEF2015 Working Notes. CEUR Workshop Proceedings, CEUR-
    WS.org, Toulouse, France (September 8-11 2015)
 4. Avni, U., Goldberger, J., Sharon, M., Konen, E., Greenspan, H.: Chest x-ray char-
    acterization: from organ identification to pathology categorization. In: Proceedings
    of the international conference on Multimedia information retrieval. pp. 155–164.
    ACM (2010)
 5. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf).
    Computer vision and image understanding 110(3), 346–359 (2008)
 6. Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: Binary robust independent el-
    ementary features. In: Computer Vision–ECCV 2010, pp. 778–792. Springer (2010)
 7. Cunningham, P., Delany, S.J.: k-nearest neighbour classifiers. Multiple Classifier
    Systems pp. 1–17 (2007)
 8. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In:
    IEEE Computer Society Conference on Computer Vision and Pattern Recognition,
    CVPR 2005. vol. 1, pp. 886–893 (2005)
 9. Deselaers, T., Pimenidis, L., Ney, H.: Bag-of-visual-words models for adult im-
    age classification and filtering. In: Pattern Recognition, 2008. ICPR 2008. 19th
    International Conference on. pp. 1–4 (2008)
10. Dreiseitl, S., Ohno-Machado, L.: Logistic regression and artificial neural network
    classification models: a methodology review. Journal of biomedical informatics
    35(5), 352–359 (2002)
11. Fan, Y., Shen, D., Davatzikos, C.: Classification of structural images via high-
    dimensional image warping, robust feature extraction, and svm. In: Medical Image
    Computing and Computer-Assisted Intervention–MICCAI 2005, pp. 1–8. Springer
    (2005)
12. Ghofrani, F., Helfroush, M.S., Rashidpour, M., Kazemi, K.: Fuzzy-based medical
    x-ray image classification. Journal of medical signals and sensors 2(2), 73 (2012)
13. Guo, Z., Zhang, L., Zhang, D.: Rotation invariant texture classification using lbp
    variance (lbpv) with global matching. Pattern recognition 43(3), 706–719 (2010)
14. Haralick, R.M., Shanmugam, K., Dinstein, I.H.: Textural features for image classi-
    fication. IEEE Transactions on Systems, Man and Cybernetics (6), 610–621 (1973)
15. Howse, J.: OpenCV Computer Vision with Python. Packt Publishing Ltd (2013)
16. Lehmann, T.M., Schubert, H., Keysers, D., Kohnen, M., Wein, B.B.: The irma
    code for unique classification of medical images. In: Medical Imaging 2003. pp.
    440–451. International Society for Optics and Photonics (2003)
17. Liu, X., Shao, Z., Liu, J.: Ontology-based image retrieval with sift features. In:
    First International Conference on Pervasive Computing Signal Processing and Ap-
    plications (PCSPA). pp. 464–467. IEEE (2010)
18. Mueen, A., Zainuddin, R., Baba, M.S.: Automatic multilevel medical image anno-
    tation and retrieval. Journal of digital imaging 21(3), 290–295 (2008)
19. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
    Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: Ma-
    chine learning in python. The Journal of Machine Learning Research 12, 2825–2830
    (2011)
20. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to
    sift or surf. In: IEEE International Conference on Computer Vision (ICCV). pp.
    2564–2571 (2011)
21. Villegas, M., Müller, H., Gilbert, A., Piras, L., Wang, J., Mikolajczyk, K., de Her-
    rera, A.G.S., Bromuri, S., Amin, M.A., Mohammed, M.K., Acar, B., Uskudarli,
    S., Marvasti, N.B., Aldana, J.F., del Mar Roldán Garcı́a, M.: General Overview of
    ImageCLEF at the CLEF 2015 Labs. Lecture Notes in Computer Science, Springer
    International Publishing (2015)
22. Wang, S.L., Liew, A.: Information-based color feature representation for image
    classification. In: International Conference on Image Processing (ICIP). vol. 6, pp.
    VI–353 (2007)
23. Weszka, J.S., Dyer, C.R., Rosenfeld, A.: A comparative study of texture measures
    for terrain classification. IEEE Transactions on Systems, Man and Cybernetics (4),
    269–285 (1976)
24. Zare, M.R., Mueen, A., Seng, W.C.: Automatic medical x-ray image classification
    using annotation. Journal of digital imaging 27(1), 77–89 (2014)