Domain Specific Retrieval Experiments with MIMOR
                           at the University of Hildesheim

                         René Hackl, Ralph Kölle, Thomas Mandl, Christa Womser-Hacker

                       University of Hildesheim, Information Science, Marienburger Platz 22
                                         D-31141 Hildesheim, Germany
                        {koelle, mandl, womser}@rz.uni-hildesheim.de


Abstract. For our first participation in CLEF we chose the domain specific GIRT corpus. We implemented the
adaptive fusion model MIMOR (Multiple Indexing and Method-Object Relations) which is based on relevance
feedback. The linear combination of several retrieval engines was optimized. As a basic retrieval engine, IRF
from NIST was employed. The results are promising. For several topics, our runs achieved a performance above
the average. The optimization based on last years’ topics and relevance judgements proved to be a fruitful
strategy.


1 Introduction

For the CLEF 2002 campaign we tested an adaptive fusion system based on the MIMOR model [Womser-
Hacker 1997; Mandl & Womser-Hacker 2001b]. The experiments are done fully automatic and deal with the
domain specific corpus of social science documents. We choose GIRT for our first participation data because we
are especially interested in the challenges of the domain specific task and because it allows mono-lingual
retrieval. As basic retrieval engine for our fusion system, we used the IRF package from NIST.


2    Fusion in Information Retrieval

The basic idea behind fusion is to delegate a task to different algorithms and consider all the results returned.
These single results are then combined into one final result. This approach is especially promising, when the
single results are very different.
As investigations on the outcome of the TREC conference have shown, the results of information retrieval
systems performing similarly well are often different. This means, the systems find the same percentage of
relevant documents, however, the overlap between their ranked lists is sometimes low [Womser-Hacker 1997].
An overview of fusion methods in information retrieval is given by [McCabe et al. 1999]. Research concentrates
on issues like which methods can be combined, how the retrieval status values of two or more systems can be
treated mathematically and which features of collections indicate that a fusion might lead to positive results.
Different retrieval methods can be defined according to various parameters. One possibility is using different
indexing approaches, like word and phrase indexing.
The values are combined statically by taking the sum, the minimum or the maximum of the results from the
individual systems [Mandl & Womser-Hacker 2001a]. Linear combinations assign a weight to each method
which determines its influence on the final result. These weights may be improved for example by heuristic
optimization or learning methods [Vogt & Cottrell 1998].
In experimental systems, the methods to be fused are applied to the same collection. However, fusion has been
applied to collections without overlap as well. A corpus can be split into artificial sub-sets which are treated by a
retrieval system. In such a case, the goal of the fusion can be regarded as an attempt to derive knowledge about
which collection leads to good results. For internet meta search engines, fusion usually means elimination of
documents returned by at least two search engines.


3    MIMOR

MIMOR (Multiple Indexing and Method-Object Relations) represents a learning approach to the fusion task
which is based on results of information retrieval research which show that the overlap between different
systems is often small [Womser-Hacker 1997]. On the other hand, relevance feedback is a very promising
strategy for improving retrieval quality. As a consequence, the linear combination of different results is
optimized through learning from relevance feedback. MIMOR represents an information retrieval system
managing poly-representation of queries and documents by selecting appropriate methods for indexing and
matching [Mandl & Womser-Hacker 2001b]. By learning from user feedback on the relevance of documents, the
model adapts itself by assigning weights to the different basic retrieval engines. Therefore, MIMOR follows a
long term learning strategy in which the relevance assessments are not just used for the current query. MIMOR is
not limited to text documents but open to other data types such as structured data and multimedia objects.
Learning is implemented as a delta rule. More complex learning schemes should be evaluated [Joachims 1998;
Drucker et al. 1999, Iyer et al. 2000].


4     CLEF Retrieval Experiments with the GIRT Corpus

The GIRT data is part of a digital library for the social sciences. The domain specific task is described in [Kluck
& Gey 2000].


4.1   Tools

Our main tool was the IRF retrieval package available from NIST1. IRF is an object oriented retrieval framework
which is programmed in JAVA. The sample IR application proved to meet our retrieval needs sufficiently, which
means ranked results, setting of parameters etc. Thus, only few changes were necessary.
In addition we used the Lucene package for topic and data preprocessing. It is written in JAVA, too, and it can
be obtained from the Jakarta Apache Project’s homepage2. Lucene comprises both a built-in German analyzer
and a German stemmer apart from plenty other features. Both may easily be altered as desired. The indexing
application could be run without major changes, so we slightly modified the filter settings and provided a
different, more thorough stop word list3.


           GIRT-                       L                                        M
          Collection                   U                                        I
                                       C                                        M
                                       E                                        O
                                       N                                        R
                                                              IRF
                                       E                                        M            Results
                                                                                E
                                                                                R
              Topics                                                            G
                                                                                E
                                                                                R

                                       Diagramm 1. Sequence of operations


4.2   Processing steps

The current MIMOR implementation of the University of Hildesheim could not be employed for all processing
steps. Therefore the necessary in-between actions were carried out by AWK4 scripts. We had some formatting

1
  http://www.itl.nist.gov/iaui/894.02/projects/irf/irf.html
2
  http://jakarta.apache.org/lucene/docs/index.html
3
  http://www.unine.ch/info/clef/
4
  http://members.cox.net/dos/txtfrmt.htm
done and cut off all information except for but document number, title and text. The most important function
however was to simulate MIMOR’s merging algorithm (cf. diagramm 1). Several scripts were necessary to
statically simulate this functionality.
Three different indexing parameter settings of the IRF – inverse document frequency (weight title:1, weight
text:1), inverse document frequency (weight title:3, weight text:1), keyword (weight title:1, weight text:1) -
represented the individual retrieval systems.
We also ran the topics from last year with the same setup. Subsequently, we compared the results obtained with
the relevant documents assessed for last years campaign. Based on these results we globally optimized the fusion
by assigning a higher weight to the system with better results.
For comparison we submitted two runs:
•     Run with equal weights for both individual systems
•     Run with optimized weights derived from last years results


4.3    Results

As table 1 shows, the average precision for both runs is almost equal. Speaking in absolute figures, run UHi02r2
(where optimized fusion was applied) produced ten more relevant documents.

                                              Table 1. Result overview

                 Run              Retrieved       Relevant         Rel_ret      Avg. prec.
                 UHi02r1             23.751           961                387    0.1093
                 UHi02r2             23.751           961                397    0.1097

Query 67 – where out of five possible relevant documents none was retrieved – is considered an outlier and is
left out for the following operations.
Taking the difference between the two precision scores as a method to compare, the fused result was better on
nine, equal on two and worse on twelve occasions. However the sum of differences

                   75                   75
                   ∑ prec(r1) - ∑ prec(r2)
                  i=51                 i=51

produces a slight advantage in favor of the optimized fusion: -0.0089 which is less than one percent and results
in a fourth decimal point mean deviation of 0.000387 for each query.

Twenty out of 23 topics (=86.96%) could be found within a 0.0060 range and still nine of 23 (=39.13%) were
within a 0.0010 range. However, these deviations are not necessarily statistically significant.
The precision at five documents is at 0.2667 respectively 0.2750, which we consider not too bad for our first
participation. Some of our results were above the average of all participants, others were below. The analysis of
the particular documents will presumably reveal IRF’s strengths and weaknesses.
To summarize, we gained valuable experience from the CLEF experiments. The lack of genuinely different
information retrieval engines makes a more sophisticated evaluation of fusion versus non fusion unavailing as
can be seen from the overall rather narrow interval scales. Consequently, high priority will be given to the
integration of further information retrieval engines in the MIMOR architecture. We expect that a learning
process over some cycles will eventually lead to an advantageous system performance.


5      Outlook

Several improvements need to be implemented for next years participation in the GIRT experiments:
• MIMOR needs to be further automated
•   The optimization process needs to be refined
•   More basic retrieval systems need to be integrated

Furthermore, we plan to further extend our work on the CLEF challenge and participate also in the multi-lingual
track with our fusion system.


Acknowledgements

We would like to thank NIST and especially Paul Over and Darrin Dimmick for providing the source code of
the IRF package.
We also thank the Jakarta and Apache projects’ teams for sharing Lucene with a wide community.
Furthermore, we acknowledge the work of several students from the University of Hildesheim who implemented
MIMOR as part of their course work.


References

Drucker, H.; Wu, D.; Vapnik, V. (1999): Support Vector Machines for Spam Categorization. In: IEEE Trans. on
   Neural Networks , vol 10 (5). pp. 1048-1054.
Joachims, T. (1998): Text Categorization with Support Vector Machines: Learning with Many Relevant
   Features. In: European Conference on Machine Learning (ECML). pp. 137-142.
Kluck, M.; Gey, F. (2000): The Domain-Specific Task of CLEF - Specific Evaluation Strategies in Cross-
   Language Information Retrieval. In: Peters, Carol (ed.): Cross-Language Information Retrieval and
   Evaluation. Workshop of the Cross-Language Information Evaluation Forum (CLEF 2000) Lisbon, Portugal,
   Sept.21-22, 2000. Springer [LNCS 2069] pp. 48-56.
Iyer, R.; Lewis, D.; Schapire, R.; Singer, Y.; Singhal, A. (2000): Boosting for Document Routing. In: Ninth
   ACM Conference on Information and Knowledge Management (CIKM).
Mandl, T.; Womser-Hacker, C. (2001a): Fusion Approaches for Mappings Between Heterogeneous Ontologies.
   In: Constantopoulos, Panos; Sølvberg, Ingeborg (eds.): Research and Advanced Technology for Digital
   Libraries: 5th European Conference (ECDL 2001) Darmstadt Sept. 4-8 2001. Springer [LNCS 2163]. pp. 83-
   94.
Mandl, T.; Womser-Hacker, C. (2001b): Probability Based Clustering for Document and User Properties. In:
   Ojala, T. (ed.): Infotech Oulo International Workshop on Information Retrieval (IR 2001). Oulo, Finnland.
   Sept 19-21 2001. pp. 100-107.
McCabe, M. Catherine; Chowdhury, A.; Grossmann, D.; Frieder, O. (1999): A Unified Framework for Fusion of
   Information Retrieval Approaches. In Eigth ACM Conference on Information and Knowledge Management
   (CIKM). pp. 330-334.
Vogt, C.; Cottrell, G. (1998): Predicting the Performance of Linearly Combined IR Systems. In: 21th Annual Intl
   ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’98). pp. 190-196.
Womser-Hacker, C. (1997): Das MIMOR-Modell. Mehrfachindexierung zur dynamischen Methoden-Objekt-
   Relationierung im Information Retrieval. Habilitationsschrift. Universität Regensburg, Informations-
   wissenschaft.