=Paper= {{Paper |id=Vol-1178/CLEF2012wn-ImageCLEF-FekiEt2012 |storemode=property |title=REGIMvid at ImageCLEF2012: Improving Diversity in Personal Photo Ranking Using Fuzzy Logic |pdfUrl=https://ceur-ws.org/Vol-1178/CLEF2012wn-ImageCLEF-FekiEt2012.pdf |volume=Vol-1178 }} ==REGIMvid at ImageCLEF2012: Improving Diversity in Personal Photo Ranking Using Fuzzy Logic == https://ceur-ws.org/Vol-1178/CLEF2012wn-ImageCLEF-FekiEt2012.pdf
1   G. Feki et al.




           REGIMvid at ImageCLEF2012: Improving Diversity in
               Personal Photo Ranking Using Fuzzy Logic

               Ghada Feki , Amel Ksibi, Anis Ben Ammar and Chokri Ben Amar

                               REGIM: REsearch Group on Intelligent Machines,
                               University of Sfax, ENIS, BP W, 3038, Sfax, Tunisia

                                      ghada.feki@yahoo.fr

     {amel.ksibi, anis.benammar, chokri.benamar}@ieee.org


             Abstract. This paper handles with two main challenges: retrieving the best
             matching images to a given query and improving diversity in ranking using
             fuzzy logic. The proposed scheme proceeds as follows: First, an off line module
             is performed before starting the image retrieval process in order to reduce
             both, the execution time and the algorithm complexity. This module contains
             an inter-images semantic similarity graph and an inter-images visual similarity
             graph. Second, an on-line part implies the relevance-based ranking, the diver-
             sity-based ranking and their combination. We deal with the redundancy prob-
             lem using fuzzy logic. Moreover, the vector of the relevance scores and the
             vector of the diversity scores are joined in order to have final scores of each
             image according to a given query. The experiments are conducted on
             ImageCLEF12 benchmark for the Personal Photo Retrieval task and show satis-
             fying results.

             Keywords: concept-based image retrieval, diversity-based ranking, fuzzy logic,
             inter-images visual similarity graph, inter-images semantic similarity graph.


    1        Introduction

    Our group REGIMvid within REGIM laboratory research participates in the
    Personal Photo Retrieval task [1]. This task aims to overcome the redundancy
    in the returned results. The general scheme of our proposed approach is as
    following. Before starting the image retrieval process, we compute an inter-
    images semantic similarity graph and an inter-images visual similarity graph
    in order to reduce both, the execution time and the algorithm complexity.
    This off-line part of the algorithm is supplied to decrease the collection ac-
    Feki et al., p. 1, 2012.

    © Springer-Verlag Berlin Heidelberg 2012
2   G. Feki et al.




    cess and to better organize the images in. The on-line part of the algorithm
    implies the following steps: First, the relevance-based ranking is the basic
    part of the process, in which we fix the image semantic similarity scores ac-
    cording to a given query. Second, the diversity-based ranking is a refine-
    ment’s process, in which we attribute a diversity score for each image ac-
    cording to its relations with other images from the collection while respect-
    ing its position in the rank list. Finally, both of these scores are combined.




                                 Fig. 1. General scheme
3   G. Feki et al.




    The rest of the paper is organized as follows. Section 2 details our proposed
    relevance and diversity-based approach. Section 3 discusses the experimen-
    tal results.


    2       Relevance and diversity-based approach: Personal Photo
            Retrieval task

    Relevance computing in our approach implies three phases. First, a matching
    process is performed basing on concepts from the query and the image data.
    Second, we compute the diversity scores. Finally, we establish the combina-
    tion.

    The following notations will be used. Given a set of query concepts                =
    { , ,..,         }, we denote by     = {     ,   ,..,   } the collection of images
    that are associated with the set of query concepts        . This collection is a part
    of the large collection                    .

    Giving an image       , we denote by       = { , ,..,    } the set of its associated
    concepts [2]. The relevance scores of all images in D are represented in a
    vector                                   , where        denotes the relevance
    score of image      with respect to the set of query concepts .


                                                                                     (1)

     Relevance score      reflects the degree of the existence of a given concept
    in the image . This score is normalized that we range it from 0 to 1.


    2.1     Relevance-based scores
    Based on experimental semantic similarity measures study, we decide to
    adopt an approach for semantic similarity between a given query and an
    image that is analogous in term of Cosine similarity measure. The semantic
    similarity between    and , which are respectively the sets of query
    concepts and image        concepts, is defined as:
4   G. Feki et al.




    This semantic similarity is computed between the query set of concepts and
    each concept sets of images that belong to the sub-collection    relative to
    this query. For other images belonging to , their similarity scores are evi-
    dently equal to zero.

    We denote by the vector of semantic similarity between a query and the
    collection . It is defined as follows:



    This vector supposed be used as an input for refinement phase is provided
    thanks to an inter-images semantic similarity based random walk with re-
    start. However, we note that when we use random walk with the collections,
    which present redundant images, we risk decreasing the diversity. In fact, it
    supposes that images, which are similar, have generally the same relevance
    scores. As consequence, in a first step, we use inter-images relationship to
    get closer the similar images and in a second, we want to separate them.
    Therefore, we have ignored the random walk.


    2.2     Diversity-based scores
    To satisfy more the user, results should be not only relevant but also diverse.
    Therefore, the final scores are the combination of relevance scores and di-
    versity scores. In fact, relevance scores, which are computed as mentioned in
    the previous section, serve not only for the combination but they are the
    input for the diversity-based algorithm that when we have two related imag-
    es, we should put the most relevant and neglect the other. So if we verify the
    diversity characteristic while considering the order, the most relevant will be
    selected the first and after, when we come to verify the other, we discover
    that they are similar and we class it at the end without doubt that it can be
    more relevant than the other.
5   G. Feki et al.




    Indeed, we attempt to give higher scores to diverse images. Thanks to the
    greedy [3] algorithm, we guarantee that images in the top of the list will be
    diverse and the other images supposed less diversified will be the last ones.
    Indeed, it is a double-edged weapon that there is relatively no loss of infor-
    mation since user can find not diverse images in the end of the list but this
    part is rarely visited.


    Greedy search ranking.

    The greedy algorithm works in phases. At each phase, we take the best we
    can get right now, without regard for future consequences. Moreover, we
    hope that by choosing a local optimum at each step, we will end up at a
    global optimum. This strategy incrementally builds a more diverse set of re-
    sults from the existing result set.

    The greedy algorithm seeks to provide a more efficient approach to improve
    diversity by using a specific condition to guide the construction of a result set
    in an incremental fashion. During each iteration, the remaining images are
    ordered according to their diversity degree. The images are chosen according
    to the order in relevance based ranking list. In other words, the first image to
    be selected is always the one with the highest similarity to the query. More-
    over, we will verify if this selected image is the one with the highest diversity
    degree with respect to the set of images selected during the previous itera-
    tion.
6   G. Feki et al.




    The following figure shows the greedy search algorithm procedure for image
    ranking:




                                    Fig. 3. Greedy search process


    Hence, greedy algorithm builds up a solution piece by piece, always choosing
    the next piece that offers the most evident and immediate benefit. Indeed,
    we just rank all images and keep the diverse ones in the top of the list.
    Therefore, it is a permutation problem, in which users will not miss infor-
    mation.


    Fuzzy logic necessity.

    Diversity can have more than unique definition. In fact it can be solution for
    ambiguity, uncertainty, redundancy and vagueness which are usually present
    in the image content, the user query and the similarity measures. Indeed, it
    is a source of novelty and optimal understanding of the query that results
    will be different from each other.

    To model these constraints, we make appeal to the fuzzy logic [4] thanks to
    its flexibility and its ease-of-use. Fuzzy set theory provides many tools for
    dealing with this type of problem. In effect, since users communicate and
    express their needs in linguistic terms, we would suppose that for receiving
7   G. Feki et al.




    correct image retrieval, extracting the information with fuzzy logic would be
    more natural [5].In addition, image collection content is not sufficiently or-
    ganized and cleaned from copies or near-duplicate images. Therefore, select-
    ing diverse images entails a particular dealing with the image collection that
    demands a fuzzy decision.


    Diversification strategy.

    We search to be more practical in dealing with diversity intention. For giving
    high score to an image for a given query, it must be not redundant compar-
    ing to others ranked before it.

    Diversity refers to no redundancy. Therefore, we define diversity score of an
    image as its minimal difference with the images appearing before it for a
    given query. It must be visually far from each image ranked before it. In oth-
    er words, there are no images having higher relevance scores, similar to this
    image. In fact, scores reflecting the redundancy of an image are computed
    tanks to the inter-images visual similarity graph representing the collection.




                      Fig. 4. Inter-images visual similarity graph example

    Another factor can have an impact on the quality of the diversity-based rank-
    ing that the collection contains an image, which has low relevance score but
    which is highly semantically dissimilar from all other images in this collection.
8   G. Feki et al.




    As a consequent, it will necessarily have a high diversity score that will make
    wrong the final score. Therefore, we must include another intermediary
    score. Based on the inter-images semantic similarity graph, this score should
    reflect the degree of homogeneity of this image with all images in the collec-
    tion.

      The following graph illustrates an example of semantic similarities be-
    tween some images extracted from a collection from ImageCLEF 2012.




                     Fig. 5. Inter-images semantic similarity graph example

    For optimal diversity, scores take into account these two factors, which are
    redundancy and prevalence. In fact, we verify the situation of this image not
    only according to images ranked before it but also in relation with the whole
    collection. The used connective to combine rules is the logical disjunction
    taken from the Lukasiewicz logic [6].


    2.3     Combined scores
    In order to balance between the relevance and the diversity user’s needs,
    the final ranking list is obtained after combining the relevance score and di-
    versity score of an image for given query. As a final stage, the combination
    between relevance-based ranking scores and diversity-based ranking scores
    is a decisive phase. In fact, we have thinking a lot about the balancing man-
    ner especially about the importance that we should give to one factor at the
9   G. Feki et al.




    expense of the other. We consider that diversity requirement has the same
    degree of importance that has relevance necessity. As a result, global score
    not only reflects the similarity between the query and the image collection
    but also respects some specificity in queries like image redundancy con-
    straint.


    3        Experimental results

    We describe the experimental study conducted to evaluate the proposed
    approach within the relevance computing and the diversity enhancement.
    The relevance and diversity-based approach for image retrieval is evaluated
    with the Personal photo task1, in which the challenge is to overcome the re-
    dundancy problem.

    The submitted runs are inspired from our proposed approach described
    above that for a run, we use only the redundancy factor (run1), for another
    we add the prevalence factor (run 4) and a baseline run using only relevance
    constraint (run5). Moreover, we try our two proposed graphs, the semantic
    graph and the visual one, in calculating these two factors previously men-
    tioned.

    We notice that the forth run, which represents the complete proposed ap-
    proach has the best result but with restricted difference. In fact, the used
    diversity strategy, which is proposed by [7] shows experimentally that it is
    designed for limited collection. Like for prevalence score previously men-
    tioned, which depends enormously of the number of images in the collec-
    tion. Therefore, when we add this factor, the original diversity score slightly
    changes.

    In addition, our results concerning this task revel better in P_5, P_10, P_15
    and P_20 than in P_30 and P_100, which is explained by the use of the
    greedy search algorithm. In fact, we think that user need diversity in the top


    1
        http://www.imageclef.org/2012/personal
10   G. Feki et al.




     of the list and prefer to keep the other images judged no diverse in the rest
     of the list.




                          Fig. 6. Personal Photo Retrieval task results

     Diversity-based ranking proved its necessity in eliminating redundancy within
     the Retrieval of visual concepts task with our five submitted runs.

     Furthermore, the proposed approach shows acceptable results with the Re-
     trieval of events task. Comparing with the results achieved for the Retrieval
     of visual concepts task, we notice a notable degradation, which can be ex-
     plained that our proposed approach is designed for image retrieval whereas
     event retrieval is closer to shot detection.


     4       Conclusion

     In this interesting participation in ImageCLEF, we propose an approach,
     which improves diversity in ranking using fuzzy logic. The proposed scheme
     proceeds as follow: First, an off line module was performed before starting
     the image retrieval process in order to reduce both, the execution time and
11   G. Feki et al.




     the algorithm complexity. This module contains an inter-images semantic
     similarity graph and inter-images visual similarity graph. Second, an on-line
     part implies the relevance-based ranking, the diversity-based ranking and
     their combination. Thanks to fuzzy logic, we deal with the redundancy prob-
     lem. Moreover, the vector of the relevance scores and the vector of the di-
     versity scores are joined in order to have final scores of each image according
     to a given query. The experiments are conducted on ImageCLEF12 bench-
     mark for the Personal Photo Retrieval task and show good results.


     References
      1. N.Elleuch, M. Zarka, A.B. Ammar, A.M. Alimi: A Fuzzy Ontology-Based Framework for rea-
         soning in Visual Video Content Analysis and Indexing, In the intl Workshop on Multimedia
         Data Mining (2011)

      2. A. Ksibi, A. Ben Ammar, C. Ben Amar: Effective Concept Detection using Second Order Co-
         occurence Flickr Context Similarity measure SOCFCS, CBMI2012 (2012)

      3. T. Pahikkala, A. Airola, P. Naula, T. Salakoski, Greedy RankRLS: a Linear Time Algorithm for
         Learning Sparse Ranking Models, In: EvgeniyGabrilovich, Alexander J. Smola, NaftaliTishby
         (Eds.), SIGIR 2010 Workshop on Feature Generation and Selection for Information Re-
         trieval, 11-18, ACM, 2010, Geneva, Switzerland (2010)

      4. J. Janssen, S. Schockaert, D. Vermeir, M. De Cock: General Fuzzy Answer Set Programs,
         http://ebookbrowse.com/wilf2009-pdf-d5924426 (2009)

      5. E. L. Walker: Image Retrieval on the Internet - How can Fuzzy Help?, Fuzzy Information
         Processing Society, 2002. Proceedings. NAFIPS. 2002 Annual Meeting of the North Ameri-
         can, 526 – 528 (2002)

      6. T. Lukasiewicz and U. Straccia: Tightly Integrated Fuzzy Description Logic Programs under
         the Answer Set Semantics for the Semantic Web, Infsys Research Report 1843-07-03,
         February 2007 (2007)

      7. S. Schockaert and M. De Cock: Diversification of search results as a fuzzy satisfiability
         problem, DDR-2011: 18th April 2011, Dublin, Ireland, in conjunction with ECIR 2011 - the
         33rd European Conference on Information Retrieval (2011)