=Paper= {{Paper |id=Vol-1173/CLEF2007wn-ImageCLEF-ChengEt2007 |storemode=property |title=CYU_IM@ImageCLEF 2007: Medical Image Annotation Task |pdfUrl=https://ceur-ws.org/Vol-1173/CLEF2007wn-ImageCLEF-ChengEt2007.pdf |volume=Vol-1173 |dblpUrl=https://dblp.org/rec/conf/clef/ChengY07 }} ==CYU_IM@ImageCLEF 2007: Medical Image Annotation Task== https://ceur-ws.org/Vol-1173/CLEF2007wn-ImageCLEF-ChengEt2007.pdf
          CYU_IM@ImageCLEF 2007: Medical image annotation task

                                      Pei-Cheng Cheng1 and Wei-Pang Yang2

                           1
                            Department of Information Management, Ching Yun University
                                  229, Chien-Hsin Rd., Jung-Li. Taiwan 320. R.O.C.
                                               pccheng@cyu.edu.tw
                        2
                          Department of Information Management, National Dong Hwa University
                             1, Sec. 2,Da Hsueh Rd., Shou-Feng, Hualien,Taiwan, R.O.C.
                                           wpyang@mail.ndhu.edu.tw



                                                    Abstract
      The ImageCLEF 2007 Medical Automatic Annotation Task, base on the IRMA project a database of
      11,000 fully classified radiographs was used to train a classification system and 1,000 radiographs
      have to be classified. Radiography medical image always contain particular anatomic regions (lung,
      liver, head, and so on). Thus, similar images have similar spatial structures. We proposed a relative
      vector representation that represents the local spatial relationship between pixels. In this experiment,
      we transform the gray value to relative vector which is an illumination invariant feature. We
      calculate the occurrence frequency and standard deviation of relative vector as the image feature.
      Based on the image feature we can find the similar image with less distance. Finally, we use the
      nearest neighbor method to classify the 1000 test images.
      In this task, we have submitted one run to medical annotation task. 1,000 radiographs for participants
      have to be classified. The score of our result is 79.303 which is the error count. The rank of our
      result is 33 of all 68 runs. The best score around all runs is 26.847 and the worst score of rank 68 is
      505.618. The image feature we proposed is easy to implement and the performance of our method is
      good.

Categories and Subject Descriptors
H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Information Search and
Retrieval; H.3.4 Systems and Software; H.3.7 Digital Libraries; H.2.3 [Database Management]:
Languages—Query Languages


Keywords
Content-based image retrieval; Pattern recognition; Medical Image Classification;

1. Introduction

Automatic image annotation is an important step when searching for images from a database. In this task a
database of 11,000 fully classified radiographs is made available and can be used to train a classification system.
1,000 test radiographs for which classification labels are not available have to be classified. The automatic image
annotation task does not contain any text as input for this task and is aimed at image analysis technologies.
In this task, we proposed a relative vector representation that represents the local spatial relationship between
pixels. In this experiment, we transform the gray value to relative vector which is an illumination invariant
feature. We calculate the occurrence frequency and standard deviation of relative vector as the image feature.
Based on the image feature we can find the similar image with less distance. Finally, we use the nearest neighbor
method to classify the 1000 test images.
The classification result consider the complete IRMA code to evaluate what level of detail the images will be
annotated. Therefore, errors in the annotation will be counted depending on the depth in the tree, and the
difficulty of the choice. The rest of this paper is organized as follows. In Section 2, the employed image features
are described. Section 3 illustrates the submissions to the ImageCLEF 2007 Evaluation. Finally, Section 4
provides concluding remarks and future directions for medical image retrieval.
2. Image features
This section describes the features used in this paper for the ImageCLEF 2007 evaluation. In an image retrieval
system, image features are extracted from pixels. The extracted features are then used for similarity comparison.
For fast response, image features must be concise, and for precision, image features must contain meaningful
information to represent the image itself. Image features will directly affect the retrieval result. In this paper we
propose an illuminative invariant image features to emphasize the contrast of an image and handle images with
little illuminative influence.

2.1 Relative local image feature.

Color histogram[Swain91] is a basic method and has good performance with regards to representing image
content. The color histogram method gathers statistics about the proportion of each color as the signature of an
image. However, in medical image data, the images are gray. Histogram method quantize the gray value into
256(0~255) levels, and calculate the ratio of each level as a histogram. The histogram method lacks spatial
information between neighbors and illumination influence.
      In this paper, we propose an illumination invariant relative feature to represent the content of medical image.
The brightness will affect the gray value in digital computer. However, human’s perception emphasizes the
contrast rather than the illumination of an image. The representation we designed considers human’s perception
concomitant with low level image features that have illumination invariant character.
      We consider the relationship between neighbor pixels as relative features and defined three relative
relationships: equal, bigger, and less. Previous techniques using a single pixel as the basic unit of measure are
dominated by the absolute gray value. We propose a relative local feature in which each pixel refers to two
neighbor pixels as a basic unit of measure to generate relative vectors. Thus, the basic unit of measure is a
relative vector and thus illumination invariant. The detailed definition of relative vector is defined as following:

Definion 1: Relation R, any two neighbor pixels pi and pj have a relationship; the relation R is defined as
follows:
   R(pi, pj)={ 0 if gray_value(pi)=gray_value(pj)
                1 if gray_value(pi)>gray_value(pj)
                2 if gray_value(pi)

      All pixels of images will be translated into a relative vector, RV∈{[0-2],[0-2],[0-2]}. The relative vector
total can be categorized into 27 classes, we can translate the relative vector RV(pixel(x,y)) into an integer for fast
computations. We hash the relative vector into one of 27 classes by equation (1).

Let RV(pixel(x,y))=< i, j, k>,
Class(RV(pixel(x,y)))=i*3^2+j*3^1+k (1)

     The representation based on the relative relation between pixels to denote an image will have an
illumination invariant character. Second, as in above definition each pixel refers to two neighbors (down, right)
to generate the RV. The relative vector we proposed will have local spatial information between neighbor pixels.
Up to now, an image will be translated into an illumination invariant representation. Each pixel refers to two
neighboring pixels will fall into a class. Next, we partition an image into four areas and calculate the occurrence
frequency, average distribution distance and spatial standard deviation of each area separate as the image feature.
As an image, the final signature representation totally has 324 features. We will now describe the calculation of
average distribution distance and spatial standard deviation.
     There are n pixels in a class. (xi, yi) are the coordinates of pixel pi, and (ux, uy) is the centroid of the pixels.
ud is the average distance of pixels from (ux, uy). std is the standard deviation of pixels from (ux, uy). The
calculation equation is described as following.
           n                        n

          ∑                        ∑
        1                 1
μx =           xi , μ y =        yi
        n i =1            n i =1

           n

          ∑
        1
μd =           ( xi − u x ) 2 + ( y i − u y ) 2
        n i =1


d i = ( xi − u x ) 2 + ( y i − u y ) 2

           n
std =     ∑ (d − ud )
          i =1
                 i
                             2




      For each class, we get the ud and std as the moment of spatial distribution to describe the global spatial
relationship. An image will first be translated into an illumination invariant vector space. We calculate the
occurrence frequency, average distance and standard deviation of each class as the feature of an image. As an
image, signature representation totally has 324 vectors in total.
      The similarity between two features is evaluated by a distance metric. Two images Ia and Ib that have
smaller distance are more similar. The equation (2) is the distance metric formula.

                          | f i ( I a ) − f i ( I b ) | +α × | f i ( I a ) + f i ( I b ) | ×
                     26
SIM_RF(Ia,Ib)= ∑                                                                                          (2)
                     i =0   (| ud i ( I a ) − ud i ( I b ) | + λ× | std i ( I a ) − std i ( I b ) |)


Where fi is the occurrence frequency of classi, udi is the average distance of pixels from centroid of classi and stdi
is the standard deviation of distance with respect to the average distance. We set α=0.4 and λ=2 in this
experiment.


2.2 Correlogram feature.
Based on the relative vector, the extracted image feature is more stable than directly operate on gray value in
different illumination. After the relative feature transform, we analyze the image pixels by modified correlogram
algorithm. The definition of the correlogram [Ma97] is in Eq. (3). Let D denote a set of fixed distances {d1, d2,
d3,…, dn}. The correlogram of an image I is defined as the probability of a color pair (ci, cj) at a distance d.
                                              γ cdi ,c j ( I ) =       Pr          { p2 ∈ c j | p1 − p 2 |= d } .   (3)
                                                                   p1∈ci , p2 ∈I


For computational efficiency, the autocorrelogram is defined in Eq. (4)

                                               λdci ( I ) =           Pr         { p2 ∈ ci | p1 − p2 |= d } .       (4)
                                                               p1 ∈ci , p 2 ∈I


Our modified correlogram algorithm works as follows. Any two pixels have a distance, and we estimate the
probability that the distance falls within an interval. The distance intervals we set are {(0,2), (2,4), (4,6), (6,8),
(8,12), (12,16), (16,26), (26,36), (36,46), (46,56), (56,76), (76,100)}.We calculate the probability of each interval
to form the correlogram vector. For 27 classes, we totally have 324 feature bins.
The similarity between two features is evaluated by a distance metric. Two images Ia and Ib that have smaller
distance are more similar.
                       26 11
SIM_CF(Ia,Ib)= ∑ ∑ (cf a − cf b )
                       i =0 j =0


3. Submissions to the ImageCLEF 2007 Evaluation

In ImageCLEF 2007, the medical automatic annotation task considers the complete IRMA code and penalizes
misclassifications at different levels of the code differently. For every code, the maximal possible error is
calculated and the errors are normed such that a completely wrong decision (i.e. all positions wrong) gets an
error count of 1 and a completely correctly classified image has an error of 0. A detailed description of the error
counting scheme is in [Thomas07]
Examples:
correct: 318a
classified error count
318a 0.0
318* 0.0244653860094
3187 0.0489307720188
31*a 0.0824574121058
31** 0.0824574121058
3177 0.164914824212
3*** 0.34342152954
32** 0.686843059079
1000 1.0

In this task, we have submitted one run to medical annotation task. 1,000 radiographs for participants have to be
classified. The score of our result is 79.303 which is the error count. The rank of our result is 33 of all 68 runs.
The best score around all runs is 26.847 and the score of rank 68 is 505.618. The image feature we proposed is
easy to implement and the performance of our method is good.


4. Conclusions and future work
The medical image application is unlike general-propose images. In general propose images, the image
representation always consider the invariance in image rotation, zooming and shift. Medical images have more
stable camera settings than general propose images; therefore, the spatial information becomes very important in
medical images, and we must improve the representation regarding spatial relation in this kind of images. On the
other hand, radiography medical image is gray image that the similarity counting wills be influence by
illumination. In this paper, an illumination invariant image feature is proposed for medical image data. The
proposed image feature immunizes against illumination and has local spatial information. In the result also have
good performance. In the future, we try to design a pyramid relative vector for fast medical image retrieval.

Acknowledgements
This work was supported by the National Science Council (grant number: NSC- 96-2221-E-259-017-MY3). Any
opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors only and
do not necessarily reflect the views of the National Science Council.


Reference
[Flickner95] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee,
D. Petkovic, D. Steele, P. Yanker, Query by Image and Video Content: The QBIC system, IEEE Computer 28 (9)
(1995) 23-32.
[Keysers04] D. Keysers, W. Macherey, H. Ney, and J. Dahmen. Adaptation in Statistical Pattern Recognition
using Tangent Vectors. IEEE transactions on Pattern Analysis and Machine Intelligence, 26(2):269-274,
February 2004.
[Swain91] Swain M.J. and D. H. Ballard, “Color Indexing”, International Journal of Computer Vision, Vol. 7,
pp.11-32, 1991.
[Cheng 04] Pei-Cheng Cheng, Been-Chian Chien, Hao-Ren Ke, and Wei-Pang Yang, “KIDS’s evaluation in
medical image retrieval task at ImageCLEF 2004”, Working Notes for the CLEF 2004 Workshop September,
Bath,UK , pp. 585-593.
[Cheng 06] Pei-Cheng Cheng, Been-Chian Chien, Hao-Ren Ke, and Wei-Pang Yang, “Combining Textual and
Visual Features for Cross-Language Medical Image Retrieval”, In Multilingual Information Access for Text,
Speech and Images, Lecture Notes in Computer Science (LNCS 4022), Springer, 2006, pp.712-723.
[Thomas07] Thomas Deselaers, Jayashree Kalpathy-Cramer, Henning M¨uller, and Thomas M. Deserno,
Hierarchical classification for ImageCLEF 2007 Medical Image Annotation.
http://www-i6.informatik.rwth-aachen.de/~deselaers/imageclef07/medaat.html
[Ma97] W. Y. Ma, Y. Deng, B. S. Manjunath, Tools for texture and color-based search of images, Human Vision
and Electronic Imaging II, Vol. 3016 of SPIE Proceedings, San Jose, CA, 1997.