=Paper= {{Paper |id=Vol-1172/CLEF2006wn-DomainSpecific-StempfhuberEt2006 |storemode=property |title=Domain-Specific Track CLEF 2006: Overview of Results and Approaches, Remarks on the Assessment Analysis |pdfUrl=https://ceur-ws.org/Vol-1172/CLEF2006wn-DomainSpecific-StempfhuberEt2006.pdf |volume=Vol-1172 |dblpUrl=https://dblp.org/rec/conf/clef/StempfhuberB06a }} ==Domain-Specific Track CLEF 2006: Overview of Results and Approaches, Remarks on the Assessment Analysis== https://ceur-ws.org/Vol-1172/CLEF2006wn-DomainSpecific-StempfhuberEt2006.pdf
     Domain-Specific Track CLEF 2006: Overview of Results and
        Approaches, Remarks on the Assessment Analysis

                                    Maximilian Stempfhuber and Stefan Baerisch

             Informationszentrum Sozialwissenschaften (IZ), Lennéstrasse 30, 53113 Bonn, Germany
                                   { stempfhuber, baerisch}@iz-soz.de

                                                   Abstract

      The CLEF domain-specific track uses databases in different languages from the social science
      domain as the basis for retrieval of documents relevant to a user’s query. Predefined topics
      simulate the user’s information needs and are used by the research groups participating in this
      track to generate actual queries. The documents are structured into individual elements so that they
      can be used for optimizing the search strategies. One type of the retrieval task was on cross-
      language retrieval, the finding of information using queries in a different language than the actual
      language of the documents. English, German and Russian were used as languages for queries and
      documents. Queries therefore had to be translated from one of the languages to one or more of the
      other languages. Besides the cross-language tasks also monolingual retrieval tasks were carried
      out were query and document are in the same language. The focus here was to map the query onto
      the language and internal structure of the documents.
      This paper gives an overview of the domain-specific track and reports on noteworthy trends in
      approaches and results as well as on the topic creation and assessment process.

Categories and Subject Descriptors
H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Information Search
and Retrieval

General Terms
Measurement, Performance, Experimentation

Keywords
Information Retrieval, Evaluation

1     Introduction
    The CLEF domain-specific task is focused on cross-language retrieval in English, German and Russian.
Queries as well as documents are taken from the social science domain and are provided in all three languages.
This section presents the domain-specific task with the focus on the corpora used. It gives an overview on the
2006 track. The following sections are on the methods used and results achieved in the retrieval and on the
processes of topic-creation and assessment. The paper concludes with an outlook on the future development of
the domain-specific track.

2     The CLEF Domain-specific track
   The domain-specific track can divided into 3 subtasks depending on the language of the query and the
documents.
       • In the monolingual subtask queries and documents are in the same language, the Russian subtask
           uses the INION Collection while the English und German subtask work on the GIRT corpus
       • In the bilingual subtasks queries in one language are used with a collection in a different language.
           The domain-specific track includes 6 bilingual subtasks for all combinations of English, German,
           and Russian.
       • The multilingual task uses queries in one language against the combination of the content of all
           collection.
    Queries in the CLEF are provided as topics. Each topic is the representation of an information need form the
social science domain. Topics as well as documents are structured with the topic including a title, a short
description of the information need and a longer narrative for further clarification. The documents, among other
information, include fields such as author, title and year of publication, terms from a thesaurus and an abstract:

        Title          Childlessness in Germany
        Description    Find information on the factors for childlessness in Germany.
        Narrative      All documents examining causes and consequences of childlessness among
                       couples in Germany are of interest. This also includes documents covering the
                       topic’s macroeconomic aspects.

   The participating groups submits one or more runs for each subtask. A run includes the 1000 highest ranked
document for each of the 25 topics. The runs from all groups for a given subtask are pooled, the highest ranking
document are then assessed by domain experts in order to obtain a reference on the relevance of the documents.

2.1    The GIRT4 and INION Collections
    GIRT, the German Indexing and Retrieval Testdatabase is a pseudo-parallel corpus with 302.638 documents
in English and German. The documents cover primarily the domain of the German Social Sciences, in particular
scientific publications. Details about the Girt corpus, especially about the development of the current fourth
version, can be found in [1,2]. The INION corpus covers the Russian social sciences and economics. The
collection includes 145.802 documents and replaces the Russian Social Science Corpus used in 2005.

2.2    Participants and submitted Runs
    The University of California in Berkeley, the University of Hagen in Germany, the University of Technology
in Chemnitz and the University of Neuchatel took part in the 2006 domain specific track, submitting 36 runs in
total: 12 bilingual runs and 2 multilingual runs. For detailed figures see the following table:

  Sub-task                        # Participants      # Runs      Topic Language
  Multi-lingual                   1                   2           DE:1 EN:1
  Bilingual X → DE                2                   6           EN:6
  Bilingual X → EN                1                   3           DE:2 RU:2
  Bilingual X → RU                1                   3           EN:2 DE:1
  Monolingual DE                  4                   13
  Monolingual EN                  3                   8
  Monolingual RU                  1                   1
  Total                           4                   36

 Table 1: Submitted runs

Overview of the 2006 Domain Specific Track
2.3    Methods and Results Overview
    Details on the results and employed methods of all groups are given in the corresponding chapters of this
volume. This chapter gives only a brief overview of the methods employed. The University of Hagen used a re-
ranking approach based on antagonistic terms, improving the mean average precision. The Chemnitz Technical
University used Apache Lucene as the basis for the retrieval and achieved the best results with a combination of
suffix stripping, stemming, and decompounding. The University of California, Berkeley, did not use methods for
thesaurus-based query expansion and de-compounding as they were employed before. The University of
Neuchatel used DFR GL2 and Okapi with several combinations of the fields included in a topic against the
different fields of GIRT4 documents.

2.4    Topics
   The team responsible for the topic-creation process changed in 2006. This has lead to changes in the topic-
creation process itself as well. In order to produce a large number of potential information needs covering a wide
area of the social sciences, a number of domain experts familiar with the GIRT corpus were asked for
contributions to the topics. 25 topics were selected from 42 submissions. These topics were compared to the
topics from the years 2001-2005 in order to remove topics too similar to topics already used. Removed topics
were replaced from the pool of previously unused topics. The definition of topics by domain experts with
knowledge about the corpus resulted in many topics of interest but also caused more effort in the selection of the
actual topic-set. This was due to the fact that not all submitters of topics had detailed knowledge of the CLEF
topic-creation rules and corrections had to be made afterwards to provide consistent structure and wording. For
future CLEF-campaigns, this process will be further optimized.
    The definition of topics suitable for the GIRT corpus as well as the Russian INION corpus remains
challenging. Lack of Russian language skills in the topic-creation team increases the difficulty of creating a set
of topics well suited for both corpora. An additional factor in 2006 was lack of experience with the INION
corpus. Also in this regard, we hop to improve the topic cretation process in the next campaign as the multi-
lingual corpus is now stable again.

2.5       Assessment Process
    As in 2005 the assessment for English and German were done using a Java Swing program while our partner
from the Research Computing Center of the M.V.Lomonosov Moscow State University used the CLEF
assessment tool for the assessment of the Russian documents. The decision to use this software for English and
German was made based on the fact that the assessments of English and German documents from the pseudo-
parallel corpus should be compared to each other. An interesting finding was the statement of both assessors that
they liked the ability to choose between keyboard-shortcuts and a mouse driven interface since it allowed for
different working styles in the otherwise monotonous assessment task. Figures 1a and 1b show the number of
documents in the pool for each topic and the percentage of relevant documents. It should be noted that, while the
number of documents found for each topic is relatively stable, the percentage of relevant documents shows
outliers with the topics number 167 and 169. Investigation on this outliers and discussion with the assessors
showed that the outliers were caused by complex exclusion criteria in case of topic 167 and the use of the
complex concepts of gender specific education and the specifics of the German primary education system in case
of topic 169.

                                               Assessment Results: Number of Documents per Topic for GIRT4


  1000
                  917
   900
                                              854                                                                                               EN                 DE
   800                                                                    782
                                                                                                                         766
                                                                                               738
                                                            712
                                       681                                              687
   700                                                     668                                              661
                                                                         639
                                                                                 613                                                   620                                      619
                                      584                                                                                      591                  585
   600                  568                                                            571           573
           533                                                                  536                               532
                 506                                                                                                                                             516
                                500                  495                                                                        486
   500                                                                                                                  469                  466                  470
                                             450                                                                                                           440
                         428                        420                                       421 435                                              409
                                                                                                                                                                         429
                                                                                                           394 406
   400                                                             361                                                                                                  374
          344                                                                                                                                             347                  348
                               333                                                                                                    332 320

   300


                                                                  186
   200


   100


      0
          151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175


Figure 1a: Number of documents per topic
                                     Assessment Results: Percentage of relevant documents


    80

                                                                                                                      EN      DE
    70



    60



    50



    40



    30



    20



    10



    0
         151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175


Figure 1b: Percentage of relevant documents per topic

    The reassessment of ”pair” documents present in the English and the German pools of the parallel corpus -
first introduced in the 2005 campaign – was repeated in 2006. When the two document of a pair had different
relevance judgement after the first round of the assessment, a second round of assessment was done after an
discussion between the assessors on their individual criteria for the assessment.
     In addition to the reassessment of pairs a second round of assessment on the topic-level was conducted in
2006. This reassessment focused on the topics with the most significant difference in the percentage of the
relevant documents between German and English.

3         Results
   The detailed information on recall and precision for the participants and their runs can be found in the groups
chapters in this volume [2,3,4,5] and shall not be repeated here in full. The Recall-Precision graph for the tracks
which attracted the most participants can be found in the figures 2, 3 and 4.

                                       Top 4 Participants DS Monolingual German Recall vs Precision


                      120
                                                                                    Neuchatel-DS-MONO-DE-
                                                                                    CLEF2006_unine_uninede3

                      100                                                           Hagen-DS-MONO-DE-
                                                                                    CLEF2006_hagen_fuhggyynbfl500r
                                                                                    Chemnitz-DS-MONO-DE-
                                                                                    CLEF2006_tuchemniz_tucmigirtde4
                       80
                                                                                    Berkeley-DS-MONO-DE-
                                                                                    CLEF2006_berkeley_berk_mo_de_t2fb


                       60




                       40




                       20




                        0
                            0   10       20      30       40        50       60       70         80           90        100




                     Figure 2: DS Monolingual Task German
                                          Top 3 Participants DS Monolingual English Recall vs Precision


                   100
                                                                                             Chemnitz-DS-MONO-EN-
                    90                                                                       CLEF2006_tuchemniz_tucmigirten
                                                                                             2
                    80                                                                       Neuchatel-DS-MONO-EN-
                                                                                             CLEF2006_unine_unineen3

                    70
                                                                                             Berkeley-DS-MONO-EN-
                                                                                             CLEF2006_berkeley_berk_mo_en
                    60                                                                       _t2fb

                    50


                    40


                    30


                    20


                    10


                     0
                         0           10          20        30     40       50       60         70        80        90         100


                 Figure 3: DS Monolingual Task English


                                               Top 2 Participants DS Bilingual EN to DE Recall vs Precision


                         80
                                                                                         Berkeley-DS-BILI-X2DE-
                                                                                         CLEF2006_berkeley_berk_bi_ende_t
                         70                                                              2fb_b
                                                                                         Hagen-DS-BILI-X2DE-
                         60                                                              CLEF2006_hagen_fuhegpyynl500


                         50


                         40


                         30


                         20


                         10


                             0
                                 0        10          20    30      40      50      60        70        80       90         100



                  Figure 4: DS Bilingual Task English to German

4      Outlook

    Several methods are being investigated to improve the quality of the topic-creation and assessment process.
In Order to create topics covering a broader scope of the social sciences as well as realistic information needs it
is planned to involve more domain experts into the topic creation process. These experts will be asked for topic
proposals in the form of narratives. The most suitable topics from the resulting pool will then be selected and
produced as topics conforming to the topic creation guidelines. In order to better verify the topics against the
INION and GIRT4 corpora, a new retrieval system will be used which will contain both corpora (deployment
planned for autumn of 2006) and will allow also our Russian partner from the Research Computing Center of the
M.V.Lomonosov Moscow State University to better participate in the topic creation process.

   Concerning the assessment process several methods are evaluated in order to allow the assessors to better
coordinate their assessment criteria. Proposed methods include breaking the assessment und reassessment
processes into smaller batches of 5 topics or the notification of the assessors in case of notable differences in the
results for a topics.

Acknowledgements
    We greatly acknowledge the partial funding by the European Commission through the DELOS Network of
Excellence on Digital Libraries. Natalia Lookachevitch from the Research Computing Center of the
M.V.Lomonosov Moscow State University helped in acquiring the Russion corpus and organized the translation
of topics into Russian and the assessment process for the Russian corpus. Claudia Henning and Jeof Spiro did
the German and English assessments.

References

  1. Kluck, M.: The GIRT Data in the Evaluation of CLIR Systems – from 1997 until 2003. In: Peters, C., Gonzalo, J.,
  Braschler, M., Kluck, M. (eds.): Comparative Evaluation of Multilingual Information Access Systems. 4th Workshop of
  the Cross-Language Evaluation Forum, CLEF 2003, Trondheim, Norway, August 21-22, 2003, Revised Selected Papers.
  Lecture Notes in Computer Science, Vol. 3237. Springer-Verlag Berlin Heidelberg New York, 379-393, (2004)

  2. Kluck, M.: The Domain-Specific track in CLEF 2004: Overview of the Results and Remarks on the Assessment
  Process. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.): Multilingual Information
  Access for Text, Speech and Images. 5th Workshop of the Cross-Language Evaluation Forum, CLEF 2004, Bath, UK,
  September 2004, Revised Selected Papers. Lecture Notes in Computer Science, Vol. 3491. Springer-Verlag Berlin
  Heidelberg New York, 260-270 (2005)

  3. Jens Kürsten, Maximilian Eibl: Monolingual Retrieval Experiments with a Domain Specific Document Corpus at the
  Chemnitz Technical University

  4. Ray R. Larson: Domain Specic Retrieval: Back to Basics (this volume) (2006)

  5. Johannes Leveling: University of Hagen at CLEF2006: Reranking documents for the domain-specific task (this
  volume) (2006)

6. Jacques Savoy, Samir Abdou: UniNE at CLEF 2006: Experiments with Monolingual, Bilingual, Domain-Specific and
   Robust Retrieval (this volume) (2006)