Hummingbird's Fulcrum SearchServer at CLEF 2001 Stephen Tomlinson1 Hummingbird Ottawa, Ontario, Canada August 4, 2001 Abstract Hummingbird submitted ranked result sets for all 5 Monolingual Information Retrieval tasks (German, French, Italian, Spanish and Dutch) of the Cross-Language Evaluation Forum (CLEF) 2001. For each language, at least one SearchServer run had more average precision scores above the median than below. The submitted German, Dutch and Spanish runs, compared to the medians in average precision by topic, had a significance level of less than 1% (by the sign test), which is statistically significant. SearchServer’s linguistic expansion was found to boost all investigated precision scores, including average precision and Precision@5, for all 6 investigated languages (German, French, Italian, Spanish, Dutch and English). Enabling linguistic expansion was found to increase average precision by 43% in German, 30% in Dutch, 18% in French, 16% in Italian, 12% in Spanish and 12% in English. In terms of the number of topics of higher and lower average precision, the German, Dutch and Spanish runs with linguistics enabled, compared to corresponding runs with linguistics disabled, had a significance level of less than 1% (by the sign test). 1 Introduction Hummingbird's Fulcrum SearchServer kernel is an indexing, search and retrieval engine for embedding in Windows and UNIX information applications. SearchServer, originally a product of Fulcrum Technologies, was acquired by Hummingbird in 1999. Fulcrum, founded in 1983 in Ottawa, Canada, produced the first commercial application program interface (API) for writing information retrieval applications, Fulcrum Ful/Text. The SearchServer kernel is embedded in 5 Hummingbird products, including SearchServer, an application toolkit used for knowledge-intensive applications that require fast access to unstructured information. SearchServer supports a variation of the Structured Query Language (SQL), called SearchSQL, which has extensions for text retrieval. SearchServer conforms to subsets of the Open Database Connectivity (ODBC) interface for C programming language applications and the Java Database Connectivity (JDBC) interface for Java applications. Almost 200 document formats are supported, such as Word, WordPerfect, Excel, PowerPoint, PDF and HTML. Many character sets and languages are supported, including the major European languages, Japanese, Korean, Chinese, Greek and Arabic. SearchServer's Intuitive Searching algorithms were updated for version 4.0 which shipped in Fall 1999, and in subsequent releases of other products. SearchServer 5.0, which shipped in Spring 2001, works in Unicode internally [3] and contains improved natural language processing technology, particularly for languages with many compound words, such as German, Dutch and Finnish. 2 System Description All experiments were conducted on a single-cpu desktop system, OTWEBTREC, with a 600MHz Pentium III cpu, 512MB RAM, 186GB of external disk space on one e: partition, and running Windows NT 4.0 Service Pack 6. For the official CLEF runs, internal development builds of SearchServer 5.0 were used (5.0.501.115 1 Core Technology, Research and Development, stephen.tomlinson@hummingbird.com Hummingbird's Fulcrum SearchServer at CLEF 2001: Notebook Paper – Page 1 of 10 plus some experimental changes motivated by tests on the CLEF 2000 collections). For the diagnostic runs, internal build 5.0.504.157 was used. 3 Setup We describe how SearchServer was used to handle the 5 Monolingual tasks of CLEF 2001. 3.1 Data The CLEF 2001 collections consisted of tagged (SGML-formatted) news articles (mostly from 1994) in 6 different languages: German, French, Italian, Spanish, Dutch and English. The articles (herein called “documents” or “logical documents”) were combined into “library files” of typically a few hundred logical documents each. The collections were made available in compressed tar archives on the Internet. We downloaded the archives, uncompressed them, untarred them and removed any support files (e.g. dtd files) which weren’t considered part of the collection. The library files were stored in 6 subdirectories of e:\data\CLEF (German, French, Italian, Spanish, Dutch and English). No further pre-processing was done on the data, i.e. SearchServer indexed the library files directly. Operations such as identifying logical documents and enforcing the restrictions on fields permitted for indexing were handled by the text reader (described below). Language Text Size (uncompressed) Number of Documents Number of Library Files German 555,285,140 bytes (530 MB) 225,371 520 French 253,528,734 bytes (242MB) 87,191 682 Italian 290,771,116 bytes (277MB) 108,578 721 Spanish 544,347,121 bytes (519MB) 215,738 364 Dutch 558,560,087 bytes (533MB) 190,604 1228 English 441,048,231 bytes (421MB) 113,005 365 Table 1: Sizes of CLEF 2001 Collections For more information on the CLEF collections, see the CLEF web site [1]. 3.2 Text Reader To index and retrieve data, SearchServer requires the data to be in Fulcrum Technologies Document Format (FTDF). SearchServer includes “text readers” for converting most popular formats (e.g. Word, WordPerfect, etc.) to FTDF. A special class of text readers, “expansion” text readers, can insert a row into a SearchServer table for each logical document inside a container, such as directory or library file. Users can also write their own text readers in C for expanding proprietary container formats and converting proprietary data formats to FTDF. The library files of the CLEF 2001 collections consisted of several logical documents, each starting with a tag and ending with a tag. After the tag, the unique id of the document, e.g. SDA.940101.0001, was included inside .. tags. The custom text reader called cTREC, originally written for handling TREC collections [6], handled expansion of the library files of the CLEF collections and was extended to support the CLEF guidelines of only indexing specific fields of specific documents. In expansion mode (/E switch), cTREC scans the library file and for each logical document determines its start offset in the file (i.e. offset of tag), its length in bytes (i.e., distance to tag), and extracts its document id (from inside .. tags). SearchServer is instructed to insert a row for each logical document. The filename column (FT_SFNAME) stores the library filename. The text reader column (FT_FLIST) includes the start offset and length for the logical document (e.g. Hummingbird's Fulcrum SearchServer at CLEF 2001: Notebook Paper – Page 2 of 10 nti/t=Win_1252_UCS2:cTREC/C/100000/30000). The document id column (controllable with the /d switch), contains the document id. In CLEF format translation mode (/C switch), cTREC inserts a control sequence to turn off indexing at the beginning of the document. cTREC inserts a control sequence to enable indexing after tags for which indexing is permitted (e.g. after in a Frankfurter Rundschau document), and inserts control sequences to disable indexing just before its corresponding closing tag (e.g. ). The document type is determined from the prefix of the DOCNO (e.g. all Frankfurter Rundschau document ids had “FR” as a prefix). The entities described in the DTD files were also converted, e.g. “=” was converted to the equal sign “=”. The documents were assumed to be in the Latin-1 character set, the code page which, for example, assigns e- acute (é) hexadecimal 0xe9 or decimal 233. cTREC passes through the Latin-1 characters, i.e. does not convert them to Unicode. SearchServer’s Translation Text Reader (nti), was chained on top of cTREC and the Win_1252_UCS2 translation was specified via its /t option to translate from Latin-1 to the Unicode character set desired by SearchServer. 3.3 Indexing A separate SearchServer table was created for each language, created with a SearchSQL statement such as the following: CREATE SCHEMA CLEF01DE CREATE TABLE CLEF01DE (DOCNO VARCHAR(256) 128) TABLE_LANGUAGE 'GERMAN' STOPFILE 'GER_AW.STP' PERIODIC BASEPATH 'E:\DATA\CLEF'; The TABLE_LANGUAGE parameter specifies which language to use when performing linguistic operations at index time, such as breaking compound words into component words and stemming them to their base form. The STOPFILE parameter specifies a stop file containing typically a couple hundred stop words to not index; the stop file also contains instructions on changes to the default indexing rules, for example, to enable accent- indexing, or to change the apostrophe to a word separator. The PERIODIC parameter prevents immediate indexing of rows at insertion time. The BASEPATH parameter specifies the directory from which relative filenames of insert statements will be applied. The DOCNO column was assigned number 128 and a maximum length of 256 characters. Here are the first few lines of the stop file used for the French task: IAC = "\u0300-\u0345" PST="'`" STOPLIST = a à afin # 112 stop words not shown The IAC line enables indexing of the specified accents (Unicode combining diacritical marks 0x0300-0x0345). Accent indexing was enabled for all runs except the Italian and English runs. Accents were known to be specified in the Italian queries but were not consistently used in the Italian documents. The PST line adds the specified characters (apostrophes in this case) to the list of word separators. The apostrophes were changed to word separators for all submitted runs except the German and English runs. Probably it would have made no difference to have also included it in the German runs. Note that the IAC syntax is new to SearchServer 5.0, and the interpretation of the PST line may differ from previous versions. Hummingbird's Fulcrum SearchServer at CLEF 2001: Notebook Paper – Page 3 of 10 Into each table, we just needed to insert one row, specifying the top directory of the library files for the language, using an Insert statement such as the following: INSERT INTO CLEF01DE ( FT_SFNAME, FT_FLIST ) VALUES ('GERMAN', 'cTREC/E/d=128:s!nti/t=Win_1252_UCS2:cTREC/C/@:s'); To index each table, we just executed a Validate Index statement such as the following: VALIDATE INDEX CLEF01DE VALIDATE TABLE TEMP_FILE_SIZE 2000000000 BUFFER 256000000; The VALIDATE TABLE option of the VALIDATE INDEX statement causes SearchServer to review whether the contents of container rows, such as directory rows and library files, are correctly reflected in the table. In this particular case, SearchServer initially validated the directory row by inserting each of its sub-directories and files into the table. Then SearchServer validated each of those directory and library file rows in turn, etc. Validating library file rows invoked the cTREC text reader in expansion mode to insert a row for each logical document in the library file, including its document id. After validating the table, SearchServer indexed the table, in this case using up to 256MB of memory for sorting (as per the BUFFER parameter) and using temporary sort files of up to 2GB (as per the TEMP_FILE_SIZE parameter) (no CLEF collection actually required a sort file that big). The index includes a dictionary of the distinct words (after some Unicode-based normalizations, such as converting to upper-case and decomposed form) and a reference file with the locations of the word occurrences. Additionally, by default, each distinct word is stemmed and enough information saved so that SearchServer can efficiently find all occurrences of any word which has a particular stem. 4 Search Techniques The CLEF organizers created 50 “topics” and translated them into many languages. Each translated topic set was provided in a separate file (e.g. the German topics were in a file called “Top-de01.txt”). The topics were numbered from C041 to C090. Each topic contained a “Title” (subject of the topic), “Description” (a one- sentence specification of the information need) and “Narrative” (more detailed guidelines for what a relevant document should or should not contain). The participants were asked to use the Title and Description fields for at least one automatic submission per task this year to facilitate comparison of results. We created an ODBC application, called QueryToRankings.c, based on the example stsample.c program included with SearchServer, to parse the CLEF topics files, construct and execute corresponding SearchSQL queries, fetch the top 1000 rows, and write out the rows in the results format requested by CLEF. SELECT statements were issued with the SQLExecDirect api call. Fetches were done with SQLFetch (typically 1000 SQLFetch calls per query). 4.1 Intuitive Searching For all runs, we used SearchServer's Intuitive Searching, i.e. the IS_ABOUT predicate of SearchSQL, which accepts unstructured text. For example, for the German version of topic C041, the Title was “Pestizide in Babykost” (Pesticides in Baby Food), and the Description was “Berichte über Pestizide in Babynahrung sind gesucht” (Find reports on pesticides in baby food). A corresponding SearchSQL query would be: SELECT RELEVANCE('V2:3') AS REL, DOCNO FROM CLEF01DE WHERE FT_TEXT IS_ABOUT 'Pestizide in Babykost Berichte über Pestizide in Babynahrung sind gesucht' ORDER BY REL DESC; Hummingbird's Fulcrum SearchServer at CLEF 2001: Notebook Paper – Page 4 of 10 This query would create a working table with the 2 columns named in the SELECT clause, a REL column containing the relevance value of the row for the query, and a DOCNO column containing the document's identifier. The ORDER BY clause specifies that the most relevant rows should be listed first. The statement “SET MAX_SEARCH_ROWS 1000” was previously executed so that the working table would contain at most 1000 rows. 4.2 Linguistic Expansion SearchServer uses lexicon-based natural language processing technology to “stem” each distinct word to one or more base forms, called stems. For example, in English, “baby”, “babied”, “babies”, “baby’s” and “babying” all have “baby” as a stem. Compound words in languages such as German, Dutch and Finnish should produce multiple stems; e.g., in German, “babykost” has “baby” and “kost” as stems. By default, Intuitive Searching stems each word in the query, counts the number of occurrences of each stem, and creates a vector. Optionally some stems are discarded (secondary term selection) if they have a high document frequency or to enforce a maximum number of stems, but we didn’t discard any stems for our CLEF runs. The index is searched for documents containing terms which stem to any of the stems of the vector. Whether or not to stem, whether or not to allow multiple stems per term, and which language to use are controlled by the VECTOR_GENERATOR set option: Language Recommended VECTOR_GENERATOR (SearchServer 5.0) German 'word!ftelp/lang=german/base | * | word!ftelp/lang=german/expand' French 'word!ftelp/lang=french/base/single | * | word!ftelp/lang=french/expand' Italian 'word!ftelp/lang=italian/base/single | * | word!ftelp/lang=italian/expand' Spanish 'word!ftelp/lang=spanish/base/single | * | word!ftelp/lang=spanish/expand' Dutch 'word!ftelp/lang=dutch/base | * | word!ftelp/lang=dutch/expand' English 'word!ftelp/lang=english/base/single | * | word!ftelp/lang=english/expand' Table 2: Recommended VECTOR_GENERATOR Settings (SearchServer 5.0) The above settings were the ones used for the submitted runs, and subsequent experiments (below) confirmed they were the best ones. The main issue was whether to specify “/base” or “/base/single” in the first part of the VECTOR_GENERATOR. Experiments on last year’s collections found that “/base” worked better for German and “/base/single” worked better for French and Italian. For languages with compound words, such as German, “/base” is necessary for all components of the compound words to be included. For languages with few compound words, in the rare cases when multiple stems are returned, they often are spurious; e.g. in English, for “Acknowledgements”, the 2 stems currently returned are “acknowledgement” and “acknowledgment”, which appear to be just alternate spellings, so keeping both stems would improperly double-weight the term, hurting ranking. Based on this reasoning, we assumed “/base” would be better for Dutch (because Dutch contains many compound words), and “/base/single” would be better for Spanish (because it does not contain many compound words); again, experiments below confirmed these were the best choices. Besides linguistic expansion, we did not do any other kinds of query expansion. For example, we did not use approximate text searching for spell-correction because the queries were believed to be spelled correctly. We did not use row expansion or any other kind of blind feedback technique. 4.3 Statistical Relevance Ranking SearchServer calculates a relevance value for a row of a table with respect to a vector of stems based on several statistics. The inverse document frequency of the stem is estimated from information in the dictionary. The term frequency (number of occurrences of the stem in the row (including any term that stems to it)) is determined from the reference file. The length of the row (based on the number of indexed characters in all columns of the row, which is typically dominated by the external document), is optionally incorporated. The Hummingbird's Fulcrum SearchServer at CLEF 2001: Notebook Paper – Page 5 of 10 already-mentioned count of the stem in the vector is also used. SearchServer synthesizes this information into a relevance value in a manner similar to [4] (particularly the Okapi approach to term frequency dampening) and also shares some elements of [5]. SearchServer's relevance values are always an integer in the range 0 to 1000. SearchServer's RELEVANCE_METHOD setting can be used to optionally square the importance of the inverse document frequency (by choosing a RELEVANCE_METHOD of 'V2:4' instead of 'V2:3'). SearchServer's RELEVANCE_DLEN_IMP parameter controls the importance of document length (scale of 0 to 1000) to the ranking. Because the evaluation program (trec_eval) may re-order rows with the same relevance value, we post-process the results files by adding a descending fraction (0.999, 0.998, 0.997, etc.) to the relevance values of the rows for each topic to ensure that the order of evaluation is identical to the order SearchServer returned the documents. 4.4 Query Stop Words Our QueryToRankings program removed words such as “find”, “relevant” and “document” from the topics before presenting them to SearchServer, i.e. words which are not stop words in general but were commonly used in the topics as general instructions. The lists for the CLEF languages were developed by examining the CLEF 2000 topics (not this year’s topics). After receiving the relevance assessments this year, we did an experiment, and this step had only a minor benefit; the average precision increased just 1% or 2% in all languages (in absolute terms, from 0.0031 in Italian to 0.0107 in French). It doesn’t appear to be important to comb the old topics files for potential query stop words. 5 Results Below we present an analysis of our results, including results of some unofficial “diagnostic” runs. We look at the following evaluation measures: Precision is the percentage of retrieved documents which are relevant. Precision@n is the precision after n documents have been retrieved. Average precision for a topic is the average of the precision after each relevant document is retrieved (using zero as the precision for relevant documents which are not retrieved). Recall is the percentage of relevant documents which have been retrieved. Interpolated precision at a particular recall level for a topic is the maximum precision achieved for the topic at that or any higher recall level. For a set of topics, the measure is the average of the measure for each topic (i.e. all topics are weighted equally). The Monolingual Information Retrieval tasks were to run 50 queries against document collections in the same language and submit a list of the top-1000 ranked documents to CLEF for judging (in June 2001). The 5 languages were German, French, Italian, Spanish and Dutch. CLEF produced a “qrels” file for each of the 5 tasks: a list of documents judged to be relevant or not relevant for each topic. From these, the evaluation measures were calculated with Chris Buckley's trec_eval program. Additionally, the CLEF organizers translated the topics into many more languages, including English, and also provided a comparable English document collection, for use in the multilingual task. By grepping the English results out of the multilingual qrels, we were able to produce a comparable monolingual English test collection for diagnostic runs. For some topics and languages, no documents were judged relevant. The precision scores are just averaged over the number of topics for which at least one document was judged relevant. When comparing two runs and finding that one run had a higher score on more topics than the other, we like to know how likely it is that this result could have happened by chance. For each result, we use the sign test to compute the significance level (also known as the p-value) of the result in relation to the hypothesis that when the two runs differ in their average precision score on a topic (by the 4th decimal place), they are equally likely to have the higher score. If the significance level is 1% or less, then we can say the hypothesis is contradicted by the result at the 1% level and consider the result to be statistically significant, i.e. this result is unlikely to have happened if the runs were really using equally viable approaches regarding the average precision measure. Hummingbird's Fulcrum SearchServer at CLEF 2001: Notebook Paper – Page 6 of 10 The computation of the significance level is straightforward. Let f(x,n) = (n!/((n-x)!x!))*(0.5^n), which is the probability of x successes in n trials when the probability of success is 0.5. Let X1 be the number of topics on which run 1 scored higher, and let X2 be the number of topics on which run 2 scored higher. Let n = X1 + X2. Let Xmin = min(X1, X2). If X1 <> X2 , then the significance level is 2*(f(0,n) + f(1,n) + … + f(Xmin,n)). If X1==X2, then the significance level is 100%. Example: if in 50 topics, one run scores higher in 34, lower in 15, and ties in 1, then the significance level is 2*(f(0,49) + f(1,49) + … + f(15,49)), which is 0.9%. 5.1 Submitted runs Tables 3.1 to 3.5 show the precision scores of our submitted runs for each language of the Monolingual Information Retrieval task. The CLEF organizers have also given the participants the median average precision scores on each topic for each language. We show the number of topics on which our runs scored higher, lower and tied (to 4 decimal places) with the median in average precision, and compute the significance level. For each language, at least one SearchServer run had more average precision scores above the median than below. The submitted German, Dutch and Spanish runs had a significance level of less than 1%. All submitted runs used both the Title and Description fields. Runs humDE01, humFR01, humIT01, humES01 and humNL01 used relevance method ‘V2:3’ and RELEVANCE_DLEN_IMP 500; these settings typically worked best on last year’s collections. Runs humDE01x, humFR01x, humIT01x, humES01x, humNL01x used relevance method ‘V2:4’ and RELEVANCE_DLEN_IMP 750, which worked well on the TREC-9 Main Web Task last year [6]. After receiving the relevance assessments, preliminary experiments suggest that a combination of ‘V2:3’ and 750 would have been best for most languages this year, but it makes little difference. Run AvgP P@5 P@10 P@20 Rec0 Rec30 vs Median SigL humDE01 0.4403 54.7% 49.8% 44.6% 0.7836 0.5377 39-8-2 0.0% humDE01x 0.4474 56.3% 51.4% 44.8% 0.8206 0.5521 36-9-4 0.0% Median 0.3660 n/a n/a n/a n/a n/a 0-0-49 - Table 3.1: Precision of Submitted German runs Run AvgP P@5 P@10 P@20 Rec0 Rec30 vs Median SigL humFR01 0.4825 55.5% 44.3% 36.3% 0.8149 0.5936 28-14-7 4.4% humFR01x 0.4789 49.8% 42.2% 35.4% 0.7788 0.5811 27-18-4 23% Median 0.4635 n/a n/a n/a n/a n/a 0-0-49 - Table 3.2: Precision of Submitted French runs Run AvgP P@5 P@10 P@20 Rec0 Rec30 vs Median SigL humIT01 0.4555 54.0% 49.4% 41.2% 0.7830 0.5727 23-20-4 76% humIT01x 0.4332 50.6% 46.2% 39.8% 0.7069 0.5421 15-30-2 2.6% Median 0.4578 n/a n/a n/a n/a n/a 0-0-47 - Table 3.3: Precision of Submitted Italian runs Hummingbird's Fulcrum SearchServer at CLEF 2001: Notebook Paper – Page 7 of 10 Run AvgP P@5 P@10 P@20 Rec0 Rec30 vs Median SigL humES01 0.5378 68.6% 60.8% 52.8% 0.8708 0.6644 35-7-7 0.0% humES01x 0.5363 68.2% 60.0% 51.1% 0.8514 0.6681 38-7-4 0.0% Median 0.4976 n/a n/a n/a n/a n/a 0-0-49 - Table 3.4: Precision of Submitted Spanish runs Run AvgP P@5 P@10 P@20 Rec0 Rec30 vs Median SigL humNL01 0.3844 52.0% 41.8% 34.1% 0.7558 0.5145 39-6-5 0.0% humNL01x 0.3831 50.4% 45.4% 33.7% 0.7504 0.4940 37-10-3 0.0% Median 0.2986 n/a n/a n/a n/a n/a 0-0-50 - Table 3.5: Precision of Submitted Dutch runs Glossary: AvgP: Average Precision (defined above) P@5, P@10, P@20: Precision after 5, 10 and 20 documents retrieved, respectively Rec0, Rec30: Interpolated Precision at 0% and 30% Recall, respectively vs Median: Number of topics on which the run scored higher, lower and equal (respectively) to the median average precision (to 4 decimal places) SigL: Significance Level: the probability of a result at least as extreme (vs Median) assuming it is equally likely that a differing score will be higher or lower 5.2 Impact of Linguistic Expansion To measure the benefits of SearchServer’s linguistic technology, Tables 4.1 to 4.6 show runs which were done with a more recent SearchServer build in August 2001. For each language, the runs vary in their VECTOR_GENERATOR setting. The first run set VECTOR_GENERATOR to the empty string, which disables linguistic processing. The second run set /base but not /single in the first part of the VECTOR_GENERATOR, which allowed all stems for a term to be added to the vector. The third run set /base and /single in the first part of the VECTOR_GENERATOR, allowing only a single stem to be added to the vector for each term. For all these runs, both the Title and Description were used, the relevance method was ‘V2:3’ and the RELEVANCE_DLEN_IMP setting was 750: Run AvgP P@5 P@10 P@20 Rec0 Rec30 vs no ling SigL DE: no ling 0.3269 46.1% 43.7% 36.4% 0.7102 0.4038 - - DE: /base 0.4669 58.0% 52.4% 47.4% 0.8186 0.5699 42-6-1 0.0% DE: /base/single 0.3843 49.8% 45.7% 38.3% 0.7489 0.4957 35-13-1 0.2% Table 4.1: Impact of Linguistic Expansion in German Run AvgP P@5 P@10 P@20 Rec0 Rec30 vs no ling SigL FR: no ling 0.4114 46.9% 38.0% 33.0% 0.7301 0.5047 - - FR: /base 0.4712 49.8% 42.7% 35.5% 0.7991 0.5776 27-18-4 23% FR: /base/single 0.4855 52.2% 43.1% 36.2% 0.8121 0.5927 28-17-4 14% Table 4.2: Impact of Linguistic Expansion in French Hummingbird's Fulcrum SearchServer at CLEF 2001: Notebook Paper – Page 8 of 10 Run AvgP P@5 P@10 P@20 Rec0 Rec30 vs no ling SigL IT: no ling 0.3900 48.1% 43.0% 35.5% 0.7076 0.4729 - - IT: /base 0.4309 50.6% 48.5% 40.1% 0.7401 0.5422 25-19-3 29% IT: /base/single 0.4514 50.6% 48.5% 40.9% 0.7456 0.5553 28-16-3 9.6% Table 4.3: Impact of Linguistic Expansion in Italian Run AvgP P@5 P@10 P@20 Rec0 Rec30 vs no ling SigL ES: no ling 0.4848 63.3% 55.9% 47.1% 0.8117 0.6154 - - ES: /base 0.5316 68.2% 60.2% 51.0% 0.8621 0.6608 28-18-3 18% ES: /base/single 0.5429 70.6% 61.6% 51.9% 0.8842 0.6726 33-13-3 0.5% Table 4.4: Impact of Linguistic Expansion in Spanish Run AvgP P@5 P@10 P@20 Rec0 Rec30 vs no ling SigL NL: no ling 0.3189 42.4% 36.8% 28.0% 0.6603 0.4197 - - NL: /base 0.4144 54.0% 45.6% 35.1% 0.7939 0.5313 37-10-3 0.0% NL: /base/single 0.3700 49.2% 40.8% 32.6% 0.7349 0.4912 31-16-3 4.0% Table 4.5: Impact of Linguistic Expansion in Dutch Run AvgP P@5 P@10 P@20 Rec0 Rec30 vs no ling SigL EN: no ling 0.4732 48.9% 39.1% 29.7% 0.7765 0.5959 EN: /base 0.5245 52.8% 41.5% 31.6% 0.8112 0.7026 26-13-8 5.3% EN: /base/single 0.5317 52.8% 41.7% 31.7% 0.8213 0.7029 26-13-8 5.3% Table 4.6: Impact of Linguistic Expansion in English Impact of Linguistic Expansion: For all 6 investigated languages (German, French, Italian, Spanish, Dutch and English), SearchServer’s linguistic expansion was found to boost all investigated precision scores, including average precision and Precision@5. For 3 languages, German, Dutch and Spanish, the results (of comparing the number of topics of higher and lower average precision with linguistics enabled and disabled) had a significance level of less than 1%. Table 5 summarizes the percentage increases in the precision scores using the recommended VECTOR_GENERATOR setting compared to disabling linguistics. It appears that German and Dutch are the biggest beneficiaries of linguistic expansion. The percentage changes for French and Italian were generally larger than for Spanish, even though more topics were helped in Spanish: Language AvgP P@5 P@10 P@20 Rec0 Rec30 Range German +43% +26% +20% +30% +15% +41% 15-43% Dutch +30% +27% +24% +25% +20% +27% 20-30% French +18% +11% +13% +10% +11% +17% 10-18% Italian +16% +5% +13% +15% +5% +17% 5-17% Spanish +12% +12% +10% +10% +9% +9% 9-12% English +12% +8% +7% +7% +6% +18% 6-18% Table 5: Percentage Increase from Linguistic Expansion, Descending Order by Average Precision Hummingbird's Fulcrum SearchServer at CLEF 2001: Notebook Paper – Page 9 of 10 References [1] Cross-Language Evaluation Forum web site. http://www.clef-campaign.org/ [2] Martin Braschler. CLEF 2000 Result Overview. Slides of presentation at CLEF 2000 Workshop. http://www.iei.pi.cnr.it/DELOS/CLEF/CLEF_OVE.pdf [3] Andrew Hodgson. Converting the Fulcrum Search Engine to Unicode. In Sixteenth International Unicode Conference, Amsterdam, The Netherlands, March 2000. [4] S.E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, M. Gatford. (City University.) Okapi at TREC-3. In D.K. Harman, editor, Overview of the Third Text REtrieval Conference (TREC-3). NIST Special Publication 500-226. http://trec.nist.gov/pubs/trec3/t3_proceedings.html [5] Amit Singhal, John Choi, Donald Hindle, David Lewis and Fernando Pereira. AT&T at TREC-7. In E.M. Voorhees and D.K. Harman, editors, Proceedings of the Seventh Text REtrieval Conference (TREC-7). NIST Special Publication 500-242. http://trec.nist.gov/pubs/trec7/t7_proceedings.html [6] Stephen Tomlinson and Tom Blackwell. Hummingbird’s Fulcrum SearchServer at TREC-9. To appear in E.M. Voorhees and D.K. Harman, editors, Proceedings of the Ninth Text REtrieval Conference (TREC-9). NIST Special Publication 500-xxx. http://trec.nist.gov/pubs/trec9/t9_proceedings.html ____________________________ Hummingbird, Fulcrum, SearchServer, SearchSQL and Intuitive Searching are the intellectual property of Hummingbird Ltd. All other company and product names are trademarks of their respective owners. Hummingbird's Fulcrum SearchServer at CLEF 2001: Notebook Paper – Page 10 of 10