=Paper= {{Paper |id=Vol-1178/CLEF2012wn-ImageCLEF-KuEt2012 |storemode=property |title=KIDS Lab at ImageCLEF 2012 Personal Photo Retrieval |pdfUrl=https://ceur-ws.org/Vol-1178/CLEF2012wn-ImageCLEF-KuEt2012.pdf |volume=Vol-1178 |dblpUrl=https://dblp.org/rec/conf/clef/KuCCGHW12 }} ==KIDS Lab at ImageCLEF 2012 Personal Photo Retrieval== https://ceur-ws.org/Vol-1178/CLEF2012wn-ImageCLEF-KuEt2012.pdf
KIDS Lab at ImageCLEF 2012 Personal Photo Retrieval

Chia-Wei Ku, Been-Chian Chien, Guan-Bin Chen, Li-Ji Gaou, Rong-Sing Huang, and
                                Siao-En Wang

                Knowledge, Information, and Database System Laboratory
               Department of Computer Science and Information Engineering
                      National University of Tainan, Tainan, Taiwan
                                cooperku@msn.com
                           bcchien@mail.nutn.edu.tw



           Abstract. The personal photo retrieval task at ImageCLEF 2012 is a pilot
       task for testing QBE-based retrieval scenarios in the scope of personal informa-
       tion retrieval. This pilot task is organized as two subtasks: the visual concepts
       retrieval and the events retrieval. In this paper, we develop a framework of
       combining different visual features, EXIF data and similarity measures based
       on two clustering methods to retrieve the relevant images having similar visual
       concepts. We first analyze and select the effective visual features including
       color, shape, texture, and descriptor to be the basic elements of recognition. A
       flexible similarity measure is then given to achieve high precise image retrieval
       automatically. The experimental results show that the proposed framework can
       provide good effectiveness in distinct measures of evaluation.

       Keywords: Image retrieval, Concept retrieval, Features clustering, Similarity
       measure


1      Introduction

The main aim of the ImageCLEF 2012 personal photo retrieval task is providing a test
bed for image retrieval based on some given query images [1]. The task is further
divided into two subtasks: the visual concepts retrieval and the events retrieval. Com-
pare with traditional image retrieval, the topics of this task are more abstract or more
general. It might cause image retrieval to be more difficult. The benchmark data set
used in this task consists of 5,555 images downloaded from Flickr. Both the visual
concepts retrieval and the events retrieval use the same dataset.
   The visual concepts retrieval is a great challenge to the developers. Some of the
concepts are abstract like the topic “sign,” and some of them are very subjective like
“art object.” Even different people would draw different opinions on the same image.
The events retrieval is to find the images with the same kinds of events. Some of the
target topics like “fire” and “conference” are too general to define in visual concept.
Parts of the events in this subtask connect with geographical topics. Thus, most of the
topics are difficult to retrieve in visual. In such a case, EXIF features may support
much more information about the event concept.
   In our participation to the ImageCLEF 2012 personal photo retrieval task, we de-
veloped a framework for the visual concepts retrieval and the events retrieval. First,
we selected 7 visual features from the given features set for the task. Each selected
feature is used to cluster all the images into groups individually. We first define the
similarity degree for visual features and EXIF’s information. Then, the similarity
measures for different image features are integrated to estimate the similarity scores
between each image and the query image. The cluster of each feature is used to help
weighting the image similarity. Finally, the framework combines and ranks the simi-
larity degrees between an image and the different QBE images to retrieve the photos
with the same concept.
   The remainder of this paper is organized as follows. We describe the used features
provided by the organizers in Section 2. Section 3 introduces the proposed similarity
measures and retrieval methods. In Section 4, we present the experimental results of
our proposed framework. Finally, we conclude the paper with a discussion and future
work.


2      Process of Image Features

2.1    Visual Features
The original datasets in the personal photo retrieval task provided 19 extracted visual
features. After our estimating test, 7 features were selected from the 19 features. They
are AutoColorCorrelogram [2], BIC [3], CEDD [4], Color Structure, Edge Histogram,
FCTH [5], and SURF [6]. The selected features cover different kinds of popular vis-
ual perception including color, shape, and texture. SURF is a robustly scale-invariant
and rotation-invariant descriptor feature. The features are summarized in Table 1.

                            Table 1. The selected 7 features.

          Visual Features           Color         Shape         Texture   Descriptor
      AutoColorCorrelogram           ○
      BIC                                           ○             ○
      CEDD                            ○             ○
      Color Structure                 ○
      Edge Histogram                                ○
      FCTH                            ○             ○
      SURF                                                                    ○

Visual Features Clustering. We first cluster images by the individual visual features
to find the groups of images with the similar visual features. Two different clustering
methods are developed for the SURF descriptor and the others visual features, respec-
tively. We depict the clustering algorithms in the following.

SURF feature clustering. The SURF descriptor is the feature with scale-invariant and
rotation-invariant. In this paper we defined the matching pair to measure the similar-
ity between two images. If the SURF descriptor di for the image Ii matches another
descriptor dj in the image Ij and vice versa, the descriptors di and dj form a matching
pair. The distance between two images Ii and Ij is defined as

                                                            1
                          dist SURF ( I i , I j )                         ,             (1)
                                                      N mp ( I i , I j )

where Nmp(Ii, Ij) is the number of matching pairs between the two images Ii and Ij. The
larger Nmp is, more similar two images are. Based on the measure of the matching
pair, we propose the clustering algorithm for SURF descriptors, shown as Table 2.
Before describing the detailed algorithm, we define two cluster distances: the intra-
cluster Dintra(Ck) and the inter-cluster Dinter(Ck, Cl).

                   Table 2. The clustering algorithm for SURF feature.

     Algorithm: SurfCluster
     Input: the set of images I
     Output: the clusters of images C
     C = {};
     while ( min{distSURF(Ii, Ij)} < ) //  is the threshold of SURF distance
       Case 1: Ii  Ck and Ij  Cl for Ck, Cl  C and Ck  Cl
         if ( Dintra(Ck ∪ Cl ) ≦1 × min{Dintra(Ck), Dintra(Cl)}) // 1 is a constant.
            C = C ∪ {Ck ∪ Cl} - {Ck} - {Cl};
         else
            SurfCluster(Ci ∪ Cj);
         end if
       Case 2: Ii  Ck and Ij  Ck for Ck  C
         if ( Dinter(Ij, Ck) ≦ 2 × Dintra(Ck) ) // 2 is a constant.
            C = C ∪ {Ck ∪ {Ij}};
         else
            C = C ∪ {{Ii, Ij}};
         end if
       Case 3: Ii  Ck and Ij  Ck for all Ck  C
         C = C ∪ {{Ii, Ij}};
     end while

     for Ck, Cl in C
       if (Ck ∩ Cl  )
          if ( Dintra(Ck ∪ Cl ) ≦1 × min{Dintra(Ck), Dintra(Cl)})
             C = C ∪ {Ck ∪ Cl} - {Ck} - {Cl};
          else
             SurfCluster(Ci ∪ Cj);
          end if
       end if
     end for
                                          1
                      Dintra (Ck ) 
                                       | Ck |2    dist   SURF   ( I i , I j ), for Ii, Ij  Ck ;   (2)


                                                1
                     Dinter ( I j , Ck ) 
                                             | Ck |    dist   SURF   ( I i , I j ), for Ii  Ck.   (3)

  According to our observation, if the number of matching pairs is larger than four,
the images look similar in visual. Hence, we define the similarity for SURF feature as

                                                 N mp ( I i , I j )  4 
                    S SURF ( I i , I j )  max 1,                         .                       (4)
                                                           4             


Other Visual Features. For other visual features, the clustering methods consider only
the similarity between two images using the distance measures of Table 3. The de-
tailed algorithm is list as Table 4.

                      Table 3. Features and their distance measures.

                      Visual Feature                          Distance Measure
               AutoColorCorrelogram                        L1 measure
               BIC                                         L1 measure
               CEDD                                        Tanimoto measure
               Color Structure                             L1 measure
               Edge Histogram                              L1 measure
               FCTH                                        Tanimoto measure



               Table 4. The clustering algorithm for general visual features.

     Algorithm: VisualCluster
     Input: the set of images I
     Output: the cluster of images C
     C = {};
     for Ii  I
       C = C ∪{{Ii}};
     end for
     for Ck, Cl in C
       if ( min{dist(Ck, Cl)} < ) // is the threshold of the minimum distance.
           C = C ∪ {Ck ∪ Cl} - {Ck} - {Cl};
       end if
     end for
2.2    Textual Features
The textual features are mainly extracted from EXIFs. There are totally 63 features in
EXIFs; for example, ApertureValue, BrightnessValue, ColorSpace, CompressedBits-
PerPixel, Contrast, etc. However, only two features, the GPS and the time, were con-
sidered and used in our methods. The values of the GPS and the time are also clus-
tered by the same clustering algorithm of general visual features shown in Table 4
using L1 distance measure.


3      The Measure for Similarity Image Retrieval

3.1    Normalization of Visual Features
The ranges of feature distances are quite different for all visual features. Before com-
bining all the features to measure the similarity of images, the normalization process
is necessary. We use the approximation proposed by Abramowitz & Stegun [7] to
approximate the values of normalization. The approximation step is very fast and
accurate. Let x be the similarity between two images of an image feature, the normali-
zation was calculated by the following equation,

                                                                                           1
          ( x )  1   ( x )(b1t  b2 t 2  b3t 3  b4 t 4  b5t 5 )   ( x ), t             ,   (5)
                                                                                        1  b0 x

where (x) is the normal probability density function of the similarity degrees among
all images in the feature, b0 to b5 are constants, and the absolute error |ε(x)| would be
smaller than 7.5 × 10−8.

3.2    Similarity Measures of Image Features

The Similarity Measure of Visual Features. Let Ii, Ij denote two images. Then the
visual similarity between the images Ii and Ij, SV(Ii, Ij), is defined as

                    SV ( I i , I j )  S SURF ( I i , I j )     w S ( I , I ) ,
                                                                 k
                                                                     k    fk   i   j                 (6)


where Sfk(Ii, Ij) means the similarity between the images Ii and Ij of the k-th feature, wk
is the weight of the k-th feature. Two weighting methods, the cluster weighting and
the non-cluster weighting, are proposed as follows:
● Cluster Weighting. We use the clustering results of Section 2 to automatically
  weight the features. If a query image belongs to a cluster for a specific visual fea-
  ture, the average similarity between the query image and each image in the cluster
  is computed as the weight of the specific visual feature.
● Non-Cluster Weighting. In this method, the weights wk are set to 1, except for the
  weights of AutoColorCorrelogram, Color Structure, and SURF features double
  other visual features.

The Similarity Measure of the GPS feature. Two distance similarity measures are
proposed for the geographical distance:
● Boolean measure. The Boolean measure of the GPS feature is defined as

                                    1 if GPS ( I i ) and GPS ( I j ) are in the same cluster,
           SG ( B ) ( I i , I j )                                                                                (7)
                                    0 otherwise;

  where GPS(Ii) and GPS(Ij) denote the values of the GPS feature in EXIF for Ii, Ij.
● Similarity measure. The continuous similarity measure on geographical distance is
  defined as
                                                                  dist ( GPS ( I i ), GPS ( I j ))  radius
                                                                                     
                               SG ( S ) ( I i , I j )  1 1  e                                               ,    (8)

      where  and radius are smoothing parameters; dist(GPS(Ii), GPS(Ij)) means the
      real geographical distance on earth between the two positions GPS(Ii), GPS(Ij).

The Similarity Measure of the Time feature. Two time similarity measures are
proposed for time duration:
● Boolean measure. The Boolean measure of the time feature is defined as

                                        1 if T ( I i ) and T ( I j ) are in the same cluster,
               ST ( B ) ( I i , I j )                                                                            (9)
                                        0 otherwise;

  where T(Ii) and T(Ij) denote the time feature in EXIF of Ii and Ij.
● Similarity Measure. The continuous similarity measure on time is defined as
                                                            
                               S T (S) ( I i , I j )  1   dist (T ( I t ), T ( I j )) .                       (10)

      where dist(T(Ii), T(Ij)) denote the real time difference in second between two time-
      stamp T(Ii) and T(Ij).

3.3      The Ranking of Image Similarity
Finally, we define the similarity between two images Ii and Ij by integrate the features
SV(Ii, Ij), SG()(Ii, Ij), and ST()(Ii, Ij) into a linear combination. The image similarity
Sim(Ii, Ij) is defined as

           Sim( I i , I j )  wV  SV ( I i , I j )  wG  SG () ( I i , I j )  wT  ST () ( I i , I j ) .     (11)
   Given a set of query images Qj, 1  j  m, the similarity of each query image Qj
and the image Ii in the image set is measured by Sim(Ii, Qj). The maximum similarity
Sim(Ii, Qj) is the similarity degree of the image Ii for the visual concept via the m
query images Qj. It can be formally defined as
                                 max {Sim ( I i , Q j )} .                          (12)
                                1 j  m




4      Experiments and Discussion
4.1    Experimental Environments
The system is implemented on a Microsoft Windows XP SP 3, 2.33 GHz PC with
3.00GB RAM. The developed software and related systems are written in Java lan-
guage, so the system is cross-platform. The methods in five runs used different image
features, which are shown in Table 5. The notations in the table are: V stands for the
visual features; G denotes the GPS feature; T is the time feature. While the parameter
C, N means the cluster weighting and the non-clustering weighting, respectively.
Finally, the parameter B represents the Boolean measures and S is the similarity
measures.

                        Table 5. Features we used in our methods.

                                Visual Features              GPS     Time
              V                            C
              V+G                          N                 B
              V+T                          N                          B
              V+G+T                        N                 B        B
              G+T                                            S        S
              T                                                       S


4.2    Results of Subtask 1: Retrieval of Visual Concepts
In this subtask, 24 visual concept queries were given to be evaluated from the totally
32 concepts. The retrieval results for the visual concepts are evaluated by three differ-
ent measures: precision, NDCG (normalize discount cumulative gain) [8], and MAP
(mean average precision). The experimental results are shown in Table 6.
   As Table 6 shows, the Run 5 using all of the image features is the best one for all
measures. The second place is the Run 2 which uses the time feature only. The Run 3
with the visual features and the GPS feature is the third place. The Run 1 and the Run
4 are worse than the above three runs.
   The results show that the visual features are not useful for most of the visual con-
cepts in the task. The reason is that most of the concept topics are semantically related
to each other. There is not much common characteristic in visual features among QBE
images. While combining the visual features with the EXIF features, the performance
increases obviously. The GPS feature can help us to find the images photographed in
the neighboring positions easily. The geographic-related topics like “Asian temple &
palace” and “temple (ancient)” have good results. However, some topics are not ex-
pected to be good, like “animals” and “submarine scene,” which returned high preci-
sion. The main reason is that a photographer generally tries to take pictures with the
similar topics at the same place. Although the GPS feature is precise for geographic-
related topics, the missing values on the GPS feature will degrade the precision
greatly. Some non-geographical topics have obviously bad results in the runs of using
the GPS features, like “clouds.” The time feature is also an important factor for
searching personal photos. Since the images photographed in short time are usually
very similar or dependent in visual concept. As the above discussion, the Run 5 get-
ting the best results shows that our image similarity measure method can combine the
different image features effectively.

                   Table 6. Performance on retrieval of visual concepts.

                 Run 1     Run 2        Run 3             Run 4                 Run 5
 Features          V        T           V+G               G+T                 V+G+T
                  wV        wT       wV     wG           wG     wT         wV    wG   wT
 Weights
                   1         1       0.45 0.17         0.975 0.025         0.45 0.18 0.22
 P@5            0.6750    0.8000        0.7667           0.6500               0.8333
 P@10           0.6125    0.7292        0.6583           0.6500               0.7833
 P@15           0.5778    0.6667        0.6222           0.6083               0.7222
 P@20           0.5354    0.6354        0.6104           0.5771               0.6896
 P@30           0.4486    0.6083        0.5639           0.5611               0.6347
 P@100          0.3054    0.4117        0.3925           0.3925               0.4379
 NDCG@5         0.5701    0.5858        0.5800           0.4073               0.6405
 NDCG@10        0.5062    0.5348        0.5184           0.4268               0.6017
 NDCG@15        0.4798    0.5028        0.4951           0.4123               0.5658
 NDCG@20        0.4545    0.4836        0.4872           0.4066               0.5459
 NDCG@30        0.4016    0.4728        0.4615           0.4046               0.5213
 NDCG@100       0.3303    0.4144        0.3979           0.3717               0.4436
 MAP@30         0.0632    0.0952        0.0906           0.0854               0.1026
 MAP@100        0.0930    0.1589        0.1558           0.1518               0.1777


4.3    Results of Subtask 2: Retrieval of Events
In the subtask, totally 15 different events queries are given to find the pictures with
the same event. Each query contains three QBE images. The evaluations are done by
precision, NDCG, and MAP as the subtask 1. The experimental results are shown in
Table 7.
   As Table 7 shows, the best results are the Run 2 and Run 5. The Run 1 using the
visual features is still the worst as the subtask 1. The Run 3 using the visual and the
GPS features is a little better than the Run 4 taking the visual and the time features.
    Owing to the event queries usually describe the images with the properties of hap-
pening in specific time duration or location area, the time and the GPS features are
relatively important here. For example, the topics “Australia,” “Bali,” and “Egypt” are
related in geographical; the topics of activities like “conference,” “party,” and “rock
concert” are temporal-related. Hence, the provided EXIFs of the images are very use-
ful in this subtask of events retrieval. The Run 5 combining all features is not ex-
pected to be the best as the subtask 1. The reason might be that the event queries are
not so related with the visual concept, but highly dependent on time and location.
However, the proposed similarity measure method did not degrade the precision
much.

                          Table 7. Performance on retrieval of events.

                 Run 1          Run 2          Run 3           Run 4           Run 5
Features          V             G+T            V+G             V+T           V+G+T
                  wV         wG      wT      wV    wG        wG     wT     wV   wG wT
Weights
                   1         0.975 0.025     0.45 0.17      0.45   0.22   0.45 0.18 0.22
P@5              0.6533        1.0000         0.9333           0.9200         1.0000
P@10             0.5800        1.0000         0.9000           0.8733         1.0000
P@15             0.5156        0.9644         0.8533           0.8400         0.9644
P@20             0.4833        0.9333         0.8100           0.7867         0.9267
P@30             0.4467        0.8889         0.7622           0.6956         0.8756
P@100            0.2693        0.6787         0.5740           0.4613         0.6307
NDCG@5           0.6904        1.0000         0.9417           0.9201         1.0000
NDCG@10          0.6247        1.0000         0.9153           0.8877         1.0000
NDCG@15          0.5727        0.9837         0.8884           0.8681         0.9841
NDCG@20          0.5446        0.9697         0.8636           0.8357         0.9655
NDCG@30          0.5186        0.9586         0.8458           0.7854         0.9489
NDCG@100         0.4101        0.9126         0.8042           0.6638         0.8601
MAP@30           0.1100        0.3305         0.2800           0.2287         0.3225
MAP@100          0.1484        0.5533         0.4282           0.3179         0.4947


5      Conclusion

In this paper we proposed a framework and similarity measure methods to combine
different image features for retrieving images from a set of conceptual photos. The
proposed method can handle the visual concepts retrieval subtask in part. However,
the time and position information are more important than other visual features in the
event retrieval subtask. Although the proposed method could adjust the weights to fit
the requirements, it has still a lot of problems to be solved. The proposed framework
retrieved the relevant images weighted by manual in most of the cases. As we know,
the feature selection is important in retrieval individual concept. For example, the
experimental results show that the GPS and the time features are very useful for re-
trieval in this dataset. However, it may be not so effective in other dataset. The prob-
lem of selecting and weighting the features automatically is a challenge in the task.
   This pilot task is its first year announced at ImageCLEF. The dataset seems too
small for evaluating modern applications. Further, the concept queries often contain
some irrelevant images in visual. The procedure of determining concepts and their
relevant images may need to be fixed for providing as a benchmark.


References
 1. Zellhöfer, D.: Overview of the Personal Photo Retrieval Pilot Task at ImageCLEF 2012. In:
    CLEF 2012 working notes, Rome, Italy (2012)
 2. Huang, J., Kumar, S., Mitra, M., Zhu, W. J., Zabih, R.: Image Indexing Using Color Cor-
    relograms. IEEE Computer Society Conference on Computer Vision and Pattern Recogni-
    tion, pp. 762-768 (1997)
 3. Stehling, R. O., Nascimento, M. A., Falcão, A. X.: A Compact and Efficient Image Re-
    trieval Approach Based on Border/Interior Pixel Classification. In: Proceedings of the
    eleventh international conference on Information and knowledge management, pp. 102-
    109 (2002)
 4. Chatzichristofis, S. A., Boutalis, Y. S.: CEDD: Color and Edge Directivity Descriptor. A
    Compact Descriptor for Image Indexing and Retrieval. In: Proceedings of the 6th Interna-
    tional Conference on Computer Vision Systems, vol. 5008/2008, pp. 312-322, Springer
    (2008)
 5. Chatzichristofis, S. A., Boutalis, Y. S.: FCTH: Fuzzy Color and Texture Histogram a Low
    Level Feature for Accurate Image Retrieval. In: Proceedings of the ninth International
    Workshop on Image Analysis for Multimedia Interactive Services, pp 191-196 (2008)
 6. Bay, H., Tuytelaars, T., Gool, L. V.: SURF: Speeded-Up Robust Features. In: 9th Euro-
    pean Conference on Computer Vision, vol. 3951/2006, pp. 404-417, Springer (2006)
 7. Abramowitz, M. and Stegun, I. A.: Handbook of Mathematical Functions: with Formulas,
    Graphs, and Mathematical Tables. ISBN: 0-486-61272-4. Dover Publications (1965)
 8. Järvelin, K., Kekäläinen, J.: Cumulated Gain-based Evaluation of IR Techniques. ACM
    Transactions on Information Systems, vol. 20, no. 4, 422 - 446 (2002)