TREC Tasks Answers, not documents Q&A Report from TREC-9 Web Web searching Very large collection Video Beyond text Speech OCR X? Euro., Chin., Arab. Beyond just English Chinese Donna Harman, Ellen Voorhees Spanish Retrieval Group Human-in-the-loop Interactive Streamed text Filtering Information Access Division Routing Static text Ad Hoc National Institute of Standards and 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 Technology TREC-9 Tracks Cross Language Track • Cross-language (English to Chinese) • Task: ad hoc search for documents written in one language using topics in • Filtering another language • Interactive – 25 topics in English created by bilingual • Query assessors; Chinese version also available • Question Answering – 126,937 documents; 188 MB in BIG5 • Spoken Document Retrieval – Hong Kong newspapers donated by Wiser Ltd. • Hong Kong Commercial Data (Aug 98-Jul 99) • Web • Hong Kong Daily News (Feb 99-July 99) • Takongnao (Oct 98-Mar 99) Relevance Judgments % Contributions to Pool by Run • Judged highest priority mono- and cross- Type (Relevant documents) lingual run from each group 59 – 39 cross (75%) / 13 mono (25%) – 51 auto / 1 manual (Thank you, Berkeley!) Monolingual Crosslingual • Added top 50 documents from each 28 judged run to the pool 13 • Mean actual pool size = 598 (39% of max) within expected range 1 Participants More participants BBN Technologies Queens College, CUNY Fudan University RMIT University IBM T.J. Watson Research Center Telcordia Technologies, Inc. Johns Hopkins University The Chinese University of Hong Kong Korea Advanced Institute of Science and Trans-EZ Inc. Technology Microsoft Research, China University of California at Berkeley MNIS-TextWise Labs University of Maryland National Taiwan University University of Massachusetts Resources: dictionaries/word lists Resources: software & services • MT – HuaJian MT system – LDC English - Mandarin word list – IBM AlphaWorks translation server (~120,000 pairs) – Alis Gist-in-Time MT system • English analysis – Chinese-English Translation – InXight LinguistX (English linguistic analysis) Assistance (CETA) dictionary – Apple Pie parser – KingSoft online bilingual dictionary – Brill’s POS tagger • Chinese analysis/conversion – WordNet – Various Chinese segmenters (e.g., NMSU’s ch_seg) – other local (proprietary) – BIG5->GB converters (e.g., NJStar’s) • Miscellaneous dictionaries – CMU’s WEAVER translation-pair extraction – Yahoo search English to Chinese Results Cross-language vs. Monolingual 1 1 0.9 BBN9XLA 0.9 BBN9MONO 0.8 msrcn1 0.8 BBN9XLA 0.7 0.7 fdut9xl2 msrcn3 0.6 0.6 Precision Precision CHUHK00XEC1 msrcn1 0.5 0.5 0.4 pir0XHxD 0.4 pir0Xori 0.3 INQ7XL3 0.3 pir0XHxD 0.2 ibmcl9a 0.2 ibmcl9m 0.1 KAIST9xlqm 0.1 ibmcl9a 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Recall 2 Average Precision by Topic: What was learned from the 1 Crosslingual Chinese CLIR track? • Many approaches to English to Chinese 0.8 topic translation, including use of various 0.6 dictionaries, word lists, parallel text, and commercial MT systems 0.4 • Extensive set of Chinese retrieval experiments performed ranging from 0.2 various n-gram methods to word based to complete language modeling 0 • Because of the tight focus of this track, 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 cross-system comparison is possible TREC 2001 trec.nist.gov • Cross language – Chinese ? NTCIR workshop (NII, Japan) – TREC task will be English, French? Arabic • Filtering track using new Reuters corpus • Interactive to investigate live web • Expanded web and QA tracks • New video track 3