<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CLIR System Evaluation at NTCIR Workshops</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Noriko Kando</string-name>
          <email>kando@nii.ac.jp</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>NTCIR: NII-NACSIS Test Collections for Information Retrieval and Text Processing</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper introduces , a series of evaluation workshops, which is designed to enhance research in information retrieval and related text processing techniques, such as summarization, extraction, by providing large-scale test collections and a forum for researchers. A brief history, tasks, participants, test collections, CLIR evaluation at the workshops, and plan for the next workshop are described in this paper. To conclude, some thoughts on future directions are suggested. The [1] is a series of evaluation workshops, which is designed to enhance research in information retrieval and related text processing techniques, such as summarization, extraction, by providing large-scale test collections and a forum for researchers. The purposes of the NTCIR Workshop are the following: 1. to encourage research in information retrieval (IR), and related text processing technology, including term recognition and summarization, by providing large-scale reusable test collections and a common evaluation setting that allows crosssystem comparisons; 2. to provide a forum for research groups interested in comparing results and exchanging ideas or opinions in an informal atmosphere; 3. to investigate methods for constructing test collections or data sets usable for experiments, and methods for laboratory-type testing of IR and related technology.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        We call the whole process from the data
distribution to the final meeting the
since we have placed emphasis on the interaction
among participants, and the experience gained as all
participants learn each other from each other's
experience.
The started with the
distribution of the training data set on 1 November
1998, and ended with the workshop meeting, which
was held on 30 August - 1 September 1999 in
Tokyo, Japan [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Many interesting papers with
various approaches were presented at the meeting.
The third day of the meeting was organized as the
. The
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], another evaluation workshop of information
retrieval and information extraction (named entities)
using Japanese newspaper articles, was held
consecutively. IREX and NTCIR joined in 2000 and
have worked together to organize the NTCIR
Workshop. The new tasks of
and became feasible with this
collaboration.
      </p>
      <p>The international collaboration to organize Asian
languages IR evaluation was proposed at the
, which was held in
November 1999, in Taipei, Taiwan. According to
the proposal, the are
organized by Hsin-Hsi Chen and Kuang-hua Chen,
National Taiwan University at the second workshop
and of Asian languages at
the third workshop.</p>
      <p>In the aspect of the organization, the first and
second workshop were co-sponsored by the
(NII, formerly the National
Center for Science Information Systems, NACSIS)
and the
(JSPS) as part of the</p>
      <p>(JSPS-RFTF 96P00602). After the first
workshop the NACSIS reorganized and changed its
name to the NII, in April 2000. At the same time, the
(RCIR),
a permanent host of the NTCIR Project was
launched by the NII. The third workshop will be
sponsored by the RCIR at the NII.</p>
      <p>
        From the second workshop [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], tasks are
proposed and organized by separate groups outside
of the NII. This venture added a variety of tasks to
the NTCIR Workshop and as a result, attracted
participants from various groups.
      </p>
      <p>From the beginning of the NTCIR project, we have
focused on two directions of investigation, i.e. (1)
traditional laboratory-type text retrieval system
testing, and (2) challenging issues.</p>
      <p>For the former, we have placed emphasis on
retrieval with Japanese and other Asian languages
and cross-lingual information retrieval (CLIR).
Indexing texts written in Japanese or other East
Asian languages, such as Chinese, is quite different
from indexing texts in English, French or other
European languages since there is no explicit
boundary (i.e., no space) between words in a
sentence. CLIR is critical in the Internet
environment, especially between languages with
completely different origins and structure, such as
English and Japanese.</p>
      <p>
        Moreover, in scientific texts or everyday-life
documents, for example Web documents, in East
Asian languages, foreign language terms often
appear in the native language texts both in their
original spelling and in transliterated forms. To
overcome the word mismatch that may be caused by
such expression variance, cross-linguistic strategies
are needed for even the monolingual retrieval of
documents of this type [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Traditionally, IR has meant the technology that
retrieves documents from a huge document
collection and produces a ranked list of the retrieved
documents in the order of the likelihood of
relevance. However, retrieving documents that may
contain relevant information is not all that the user
may require, and the information in the documents is
not always immediately usable. Research on the
techniques helping to make the information in the
documents more usable, for example, by pinpointing
the answer passages in the documents,
summarization, etc., and the appropriate evaluation
methods are needed.</p>
      <p>Each document genre has its own characteristic
and usage pattern, and the criteria determining
"successful search" may vary accordingly, although
traditional IR research has looked at generalized
systems which can handle any kind of document
based on the generalized criteria of "successful
search". For example, Web document retrieval has
different characteristics from those of newspaper or
patent retrieval, both with respect to the nature of the
document itself and the way it is used. We have been
interested in the appropriate evaluation methods for
each document genre as well as generalized ones.</p>
      <p>In the next section we outline the previous
workshops. Section 3 describes the test collections
used and Section 4 report the results. Section 5
introduces the tasks for the third workshop and
discusses some thoughts on future directions.
2. The Previous NTCIR Workshops</p>
      <sec id="sec-1-1">
        <title>This section</title>
        <p>Workshops.
outlines
the
previous</p>
      </sec>
      <sec id="sec-1-2">
        <title>NTCIR</title>
        <p>Each participant has conducted one or more of the
following tasks at the workshop.
to investigate
the retrieval performance of systems that search a
static set of documents using new search
topics.(J&gt;JE)
an ad
hoc task in which the documents are in English
and the topics are in Japanese.(J&gt;E)</p>
        <p>(1) to extract terms from titles and abstracts
of documents, and (2) to identify the terms
representing the "object", "method", and "main
operation" of the main topic of each document.</p>
        <p>The test collection NTCIR-1 was used in these
three tasks. In the Ad Hoc Information Retrieval
Task, the document collection containing Japanese,
English and Japanese-English paired documents is
retrieved by Japanese search topics. In Japan,
document collections often naturally consist of such
a mixture of Japanese and English. Therefore the Ad
Hoc IR Task at the NTCIR Workshop 1 is
substantially CLIR though some of the participating
groups discarded the English part and did the task as
Japanese monolingual IR.
including
English-Chinese CLIR (ECIR; E&gt;C) and Chinese
monolingual IR (CHIR tasks, C&gt;C) using the test
collection CHIB01, consisting of newspaper
articles from five newspapers in Taiwan R.O.C.
using the test
collection of NTCIR-1 and -2, including
monolingual retrieval of Japanese and English
(J&gt;J, E&gt;E) and CLIR of Japanese and English
(J&gt;E, E&gt;J, J&gt;JE, E&gt;JE).
text summarization of
Japanese newspaper articles of various kinds. The
NTCIR-2 Summ collection Collection was used.</p>
        <p>Each task has been proposed and organized by a
different research groups rather in an independent
way, while keeping good contact and discussion
with the NTCIR Project organizing group headed by
the author. How to evaluate and what should be
evaluated have been thoroughly discussed in a
discussion group.</p>
        <p>Below is the list of active participating groups that
submitted task results. Thirty-one groups, enrolled to
participate in the first NTCIR Workshop. Of these
groups, twenty-eight groups enrolled in IR tasks (23
in the Ad Hoc Task and 16 in the Cross-Lingual
Task), and nine in the Term Recognition task.
Twenty-eight groups from six countries submitted
results. Two groups worked without any Japanese
language expertise.</p>
        <p>Communications Research Laboratory (Japan),
Fuji Xerox (Japan), Fujitsu Laboratories (Japan),
Central Research Laboratory, Hitachi
Co.(Japan), JUSTSYSTEM Corp. (Japan),
Kanagawa Univ. (2) (Japan),
KAIST/KORTERM (Korea), Manchester
Metropolitan Univ. (UK), Matsushita Electric
Industrial (Japan), NACSIS (Japan), National
Taiwan Univ.(Taiwan ROC), NEC (2) (Japan),
NTT (Japan), RMIT &amp; CSIRO (Austraria),
Tokyo Univ. of Technology (Japan), Toshiba
(Japan), Toyohashi Univ. of Technology (Japan),
Univ. of California Berkeley (US), Univ. of Lib.
and Inf. Science (Tsukuba, Japan), Univ. of
Maryland (US), Univ. of Tokushima (Japan),
Univ. of Tokyo (Japan), Univ. of Tsukuba
(Japan), Yokohama National Univ.(Japan),
Waseda Univ.(Japan)
As shown in the Table 1, 45 groups from eight
countries registered for the Second NTCIR
Workshop and 36 groups submitted results. Among
the above, four groups submitted results to both
CHTR and JEIR, and three groups submitted results
to both JEIR and TSC, and one group did all three
tasks. Table 2 shows the distribution of the attribute
of each participating group across the tasks.</p>
        <p>ATT Labs &amp; Duke Univ. (US), Communications
Research Laboratory (Japan), Fuji Xerox
(Japan), Fujitsu Laboratories (Japan), Fujitsu
R&amp;D Center (China), Central Research
Laboratory, Hitachi Co. (Japan), Hong Kong
Polytechnic (Hong Kong, China), Institute of
Software, Chinese Academy of Sciences (China),
Johns Hopkins Univ. (US), JUSTSYSTEM Corp.
(Japan), Kanagawa Univ. (Japan), Korea
Advanced Institute of Science and Technology
(KAIST/KORTERM) (Korea), Matsushita
Electric Industrial (Japan), National. TsinHua
Univ. (Taiwan, ROC), NEC Media Research
Laboratories (Japan), National Institute of
Informatics (Japan), NTT-CS &amp; NAIST (Japan),
OASIS, Aizu Univ. (Japan), Osaka Kyoiku Univ.
(Japan), Queen College-City Univ. of New York
(US), Ricoh Co. (2) (Japan), Surugadai Univ.
(Japan), Trans EZ Co. (Taiwan ROC), Toyohashi
Univ. of Technology (2) (Japan), Univ. of</p>
      </sec>
      <sec id="sec-1-3">
        <title>Task</title>
      </sec>
      <sec id="sec-1-4">
        <title>CHTR</title>
      </sec>
      <sec id="sec-1-5">
        <title>JEIR TSC total</title>
        <p>subtask</p>
      </sec>
      <sec id="sec-1-6">
        <title>CHIR</title>
      </sec>
      <sec id="sec-1-7">
        <title>ECIR</title>
      </sec>
      <sec id="sec-1-8">
        <title>CHTR total monoLIR total J-J E-E</title>
        <p>J-E
E-J
J-JE
E-JE</p>
      </sec>
      <sec id="sec-1-9">
        <title>J/E CLIR total</title>
      </sec>
      <sec id="sec-1-10">
        <title>JEIR total</title>
      </sec>
      <sec id="sec-1-11">
        <title>A extrinsic</title>
      </sec>
      <sec id="sec-1-12">
        <title>B intrinsic</title>
        <p>TSC total
14
13
16
22
11
22</p>
        <p>Among them, four groups participated in JEIR
without any Japanese language expertise. Many
groups could not submit the results (more precisely
could not conduct the task) in the TSC because they
could not obtain the document data..
Of the 18 participants of the Ad Hoc IR of Japanese
and English documents at the first workshop: 10
groups participated in the equivalent tasks at the
second workshop, i.e., JEIR monolingual IR tasks,
or added participating tasks; one changed task to
JEIR CLIR; one changed task to TSC; and six did
not participate.</p>
        <p>Among 10 CLIR participants at the first
workshop: six continued to participate in the
equivalent task, i.e., JEIR-CLIR; two groups
changed the tasks to CHTR; and two changed to
TSC.</p>
        <p>Among nine participating groups in the Term
Recognition Task at the first workshop: six changed
tasks to JEIR; two changed to TSC; and two did not
participate in the second workshop.</p>
        <p>60
50
40
30
20
10
0</p>
        <p>TSC
TermExtractio</p>
        <p>CHTR</p>
        <p>CLIRJEIR/CLIR</p>
        <p>AdHoc</p>
        <p>JEIR/mono
ntcir-ws1
ntcir-ws2</p>
        <p>Of the eight groups from the first workshop that
did not participate in the second workshop, six are
from Japanese universities, one is from a Japanese
company and one is from a university in the UK.</p>
        <p>Among the participants of CHTR, JEIR, and TSC
at the second workshop, seven, 12, and four,
respectively, are new to the NTCIR Workshop.
A participant could submit the results of more than
one run for each task. Both automatic and manual
query constructions were allowed. In the case of
automatic construction in the JEIR task, the
participants had to submit at least one set of results
of the searches using only &lt;Description&gt; fields of
the topics as . The intention of
this is to enhance cross-system comparison. For
optional automatic runs and manual runs, any field,
or fields, of the topics could be used. In addition,
each participant had to complete a system
description form describing the detailed features of
the system.</p>
        <p>
          The relevance judgments were undertaken by
pooling methods. The same number of runs were
selected from each participating group and the same
number of top ranked documents from each run for
the topic were extracted and put into the document
pool to be judged in order to retain the "fairness" and
"equal opportunities" among each participating
group. In order to increase the exhaustiveness of the
relevance judgments, additional manual searches
were conducted for those topics with more relevant
documents than a certain threshold (50 in NTCIR-1
and 100 in NTCIR-2). A detailed description of the
pooling procedure and the analysis of "fairness" are
reported in Kuriyama et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] in this volume.
        </p>
        <p>
          Human analysts assessed the relevance of
retrieved documents to each topic in multi-grades:
three grades in the NTCIR-1 and IREX-IR, and four
grades in the NTCIR-2 and CIRB010: highly
relevant (S), relevant (A), partially relevant (B),
irrelevant (C). Some documents will be more
relevant than others: either because they contain
more relevant information or because the
information they contain is highly relevant, then we
believe that multi-grade relevance judgments are
more natural, or closer to the judgments made in real
life [
          <xref ref-type="bibr" rid="ref7 ref8 ref9">7-9</xref>
          ]. However the majority of test collections
have viewed relevance judgments as binary and this
simplification is helpful for evaluators and system
designers.
        </p>
        <p>For NTCIR-1 and -2, two assessors judged the
relevance to a topic separately and assigned one of
the three or four degrees of relevance. After
crosschecking, the primary assessors of the topic, who
created the topic, made the final judgment. The
was run against two different lists of
relevant documents produced by two different
thresholds of relevance, i.e., (or "relevant
level file" in NTCIR-1, in CIRB010),
in which S and A-judgments were rated as "relevant",
and (or "partial relevant level file" in
NTCIR-1, in CIRB010), in which
S, A and B-judgments were rated as "relevant", even
though the NTCIR-1 does not contain S.</p>
        <p>In addition, we proposed new measures,</p>
        <p>
          and , for IR
system testing with ranked output based on
multigrade relevance judgments [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Intuitively, the
highly relevant documents are more important for
users than partial relevant ones and the documents
retrieved in the higher ranks in the ranked list are
more important. Therefore the systems producing
the search results in which higher relevant
documents in higher ranks in the ranked list should
be rated as better. Based on the review of existing IR
system evaluation measures, decided that either of
proposed measures is single number and averageable
over number of topics.
        </p>
        <p>
          Most of IR systems and experiments have
assumed that the highly relevant items are useful to
all users. However some user-oriented studies have
suggested that partially relevant items may
important for a specific users and they sould not be
collapsed into relevant items, but should be analyzed
separately [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. More investigation is needed.
        </p>
        <p>More than half of the documents in the NTCIR-1
JE Collection are English-Japanese paired. NTCIR-2
contains author abstracts of conference papers and
extended summaries of grant reports. About
onethird of the documents are Japanese- and
English&lt;REC&gt;
&lt;ACCN&gt;gakkai-0000011144&lt;/ACCN&gt;
&lt;TITL
TYPE="kanji"&gt;dq´eEdqoÅEdq}ÙuSGMLÀ±vÌì¬À±ðÊµÄ&lt;/TITL&gt;
&lt;TITE TYPE="alpha"&gt;Electronic manuscripts, electronic
publishing, and electronic library &lt;/TITE&gt;
&lt;AUPK TYPE="kanji"&gt;ªÝ ³õ&lt;/AUPK&gt;
&lt;AUPE TYPE="alpha"&gt;Negishi, Masamitsu&lt;/AUPE&gt;
&lt;CONF TYPE="kanji"&gt;¤­\ï(îñwîb)&lt;/CONF&gt;
&lt;CNFE TYPE="alpha"&gt;The Special Interest Group Notes of
IPSJ&lt;/CNFE&gt;
&lt;CNFD&gt;1991. 11. 19&lt;/CNFD&gt;
&lt;ABST TYPE="kanji"&gt;&lt;ABST.P&gt;dqoÅÆ¢¤L[[h
ðSÉA¶£Ì·MAÒWAóüA¬ÊÌßöÌdq»
ÉÂ¢ÄA»Ì»óð®µÄ¡ãÌ®üð¢·éBÆ
­ÉAdqoÅÉÖ·éÛKiÅ é SGML (Standard
Generalized Markup Language)ÉÎ·éíªÅÌ®«É
ÚµAwpîñZ^[É¨¯éuSGML À±v¨aeÑ
»ÌS¶ CD-ROM ÅÌì¬À±ðÊ¶Ä¾çê½m©ðñ
·éBÜ½dq}ÙÉÂ¢ÄA»Ì`ÔðW]·éB
oÅ¶»ÉË·é±ÌíÌÐïVXeÌêAZpI
ÈâèÆ¢¤ÌÍA»ÌZpÌÐïIÈóeEZ§Ìâè
Å  èA±Ì Ï_ ©çW »Ì dv«ð _¶ é B
&lt;/ABST.P&gt;&lt;/ABST&gt;
&lt;ABSE TYPE="alpha"&gt;&lt;ABSE.P&gt;Current situation on
electronic processing in preparation, editing, printing, and
distribution of documents is summarized and its future trend is
discussed, with focus on the concept: "Electronic publishing:
Movements in the country concerning an international standard
for electronic publishing. Standard Generalized Markup
Language (SGML) is assumed to be important, and the results
from an experiment at NACSIS to publish an "SGML
Experimental Journal" and to make its full-text CD-ROM version
are reported. Various forms of "Electronic Library" are also
investigated. The author puts emphasis on standardization, as
technological problems for those social systems based on the
cultural settings of publication of the country, are the problems of
acceptance and penetration of the technology in the
society.&lt;/ABSE.P&gt;&lt;/ABSE&gt;
&lt;KYWD TYPE="kanji"&gt;dqoÅ // dq}Ù // dq´e //
SGML // wpîñZ^[ // S¶f[^x[X&lt;/KYWD&gt;
&lt;KYWE TYPE="alpha"&gt;Electronic publishing // Electronic
library // Electronic manuscripts // SGML // NACSIS // Full text
databases&lt;/KYWE&gt;
&lt;SOCN TYPE="kanji"&gt;îñwï&lt;/SOCN&gt;
&lt;SOCE TYPE="alpha"&gt;Information Processing Society of
Japan&lt;/SOCE&gt;
&lt;/REC&gt;
paired, but the correspondence between English and
Japanese is unknown during the workshop. A
sample document record of the JE Collection in the
NTCIR-1 is shown in Fig. 2. Documents are plain
text with SGML-like tags in the NTCIR collections
and the IREX-IR. A record may contain document
ID, title, a list of author(s), name and date of the
conference, abstract, keyword(s) that were assigned
by the author(s) of the document, and the name of
the host society.</p>
        <p>A sample Document record used in the CLIR at
the NTCIR Workshop 3 is shown in Fig. 3. All the
document collection in four languages are coded in
the same set of mandatory tags and some optional
tags. A document record in the CIRB010 is coded by
XML, but the elements are similar.</p>
        <p>
          A sample topic record which will be used in the
CLIR at the NTCIR Workshop 3 is shown in Fig. 4.
Topics are defined as statements of "users requests"
rather than "queries", which are the strings actually
submitted to the system, since we wish to allow both
manual and automatic query construction from the
topics. Among the 83 topics of the NTCIR-1, 20
topics were translated into Korean and were used
with the Korean HANTEC Collection [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]
        </p>
        <p>The topics contain SGML-like tags. A topic in
NTCIR-1, NTCIR-2 and CIRB010 contains similar
tag set though tags are longer than above (ex.
&lt;DESCRIPTION&gt;), and consists of the title of the
topic, a description (question), a detailed narrative,
and a list of concepts and field(s). The title is a very
short description of the topic and can be used as a
very short query that resembles those often
submitted by end-users of Internet search engines.
Each narrative may contain a detailed explanation of
the topic, term definitions, background knowledge,
the purpose of the search, criteria for judgment of
relevance, etc.
&lt;TOPIC&gt;
&lt;NUM&gt;013&lt;/NUM&gt;
&lt;SLANG&gt;CH&lt;/SLANG&gt;
&lt;TLANG&gt;EN&lt;/TLANG&gt;
&lt;TITLE&gt;NBA labor dispute&lt;/TITLE&gt;
&lt;DESC&gt;
To retrieve the labor dispute between the two parties of the US
National Basketball Association at the end of 1998 and the
agreement that they reached.
&lt;/DESC&gt;
&lt;NARR&gt;
&lt;/NARR&gt;
&lt;CONC&gt;
&lt;/CONC&gt;
&lt;/TOPIC&gt;
The content of the related documents should include the causes of
NBA labor dispute, the relations between the players and the
management, main controversial issues of both sides,
compromises after negotiation and content of the new agreement,
etc. The document will be regarded as irrelevant if it only touched
upon the influences of closing the court on each game of the
season.</p>
        <p>NBA (National Basketball Association), union, team, league,
labor dispute, league and union, negotiation, to sign an agreement,
salary, lockout, Stern, Bird Regulation.</p>
        <p>The relevance judgments were conducted using
multi-grades as stated in the section 2.3. In NTCIR-1
and -2, relevance judgment files contain not only the
relevance of each document in the pool, but also
contain extracted phrases or passages showing the
reason the analyst assessed the document as
"relevant". These statements were used to confirm
the judgments and also hoped future use in
experiments of the extracting answer passages or so.
NTCIR-1 contains "Tagged Corpus". This contains
detailed hand-tagged part-of-speech (POS) tags for
2,000 Japanese documents selected from NTCIR-1.
Spelling errors are manually collected. Because of
the absence of explicit boundaries between words in
Japanese sentences, we set three levels of lexical
boundaries (i.e., word boundaries, and strong and
weak morpheme boundaries).</p>
        <p>
          In NTCIR-2, the segmented data of the whole J
(Japanese document) collection is provided. They
are segmented into three levels of lexical boundaries
using a commercially available morphological
analyzer called HAPPINESS. An analysis of the
effect of segmentation is reported in Yoshioka et al.
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]
The test collections NTCIR-1 and -2 have been
tested for the following aspects so that they can be
used as a reliable tool for IR system testing:
exhaustiveness of the document pool
inter-analyst consistency and its effect on system
evaluation
topic-by-topic evaluation.
        </p>
        <p>
          The results have been reported and published on
various occasions [
          <xref ref-type="bibr" rid="ref13 ref14 ref15 ref16">13-16</xref>
          ]. In terms of
exhaustiveness, pooling the top 100 documents from
each run worked well for topics with fewer than 100
relevant documents. For topics with more than 100
relevant documents, although the top 100 pooling
covered only 51.9% of the total relevant documents,
coverage was higher than 90% if combined with
additional interactive searches. Therefore, we
conducted additional interactive searches for the
topics with more than 50 relevant documents in the
first workshop, and those with more than 100
relevant documents in the second workshop.
        </p>
        <p>When the pool size was larger than 2500 for a
specific topic, the number of documents collected
from each submitted run was reduced to 90 or 80. It
was done to keep the pool size practical and
manageable for assessors to keep consistency in the
pool. Even though the numbers of documents
collected to the pool were different according to
each topic, the number of documents collected from
each run is exactly the same for a specific topic.</p>
        <p>
          It was found a strong correlation between the
system rankings produced using different relevance
judgments and different pooling methods, regardless
of the inconsistency of the relevance assessments
among analysts and regardless of the different
pooling methods [
          <xref ref-type="bibr" rid="ref13 ref14 ref15 ref6">6,13-15</xref>
          ]. It served as an additional
support to the analysis reported by Voorhees [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
The 17 search results of ECIR task are submitted
from 7 participating groups. According to the task
overview report [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], query expansion is a good
method to increase system performance. In general,
the probabilistic model shows better performance.
For ECIR task, select-all approach seems to be better
than other select-X approaches in dictionary look-up,
if no further techniques are adopted. PIRCS used
MT approach and it out performed. For ECIR task,
word-based indexing approach is better.
        </p>
        <p>EC IR (E&gt;C ) all Rigid Re levance
1
0.8
0.6
n
iisco
e
rp 0.4
0.2
0
There were 95 submitted runs for CLIR of Japanese
and English from 14 groups. For J-E, E-J, J-JE, E-JE,
40 runs from 12 group, 30 runs from 10, 14 runs
from 6, and 11 runs from 4 were submitted
respectively.</p>
        <p>Most of groups used query translation approach
but LISIF group used an approach combined query
translation and query translation. The top 1000
documents in the initial search were translated and
further processing was done on them. Three groups
used corpus based approach but generally the
performances were less effective compared with
other approach though some of them participated in
the NTCIR Workshop 1 and the relative
performance was better. New approaches including
flexible pseudo-relevance feedback, segmented LSI
were proposed.</p>
        <p>In the round table dicussion at the NTCIR
Workshop 3 and the Program committee meeting,
and after Workshop meeting, some issues were
raised to conduct more appropriate and valid
evaluation at the next workshop.</p>
        <p>CHTR and JEIR at the second workshop were
organized rather an independent way but we aimed
to follow the consistent or at least compatible
procedures each other. However regretably we could
find unintended incompatibility between CHTR and
JEIR including categories of query types and
pooling methods. The CLIR task at the NTCIR
Workshop 3 will be organized by the organizers of
CHTR and JEIR, and HANTEC group. The
organizers had face-to-face meetings and decided
detailed procedures included topic creation, topic
format, document format, query types and
mandatory runs. Pooling will be done once, so there
will never be inconsistency. For query type, the
mandatory run is the one using &lt;DESCRIPTION&gt;
only and we are also keen to the difference between
search using &lt;CONCEPT&gt; or without it. For the
details, please consult
http://research.nii.ac.jp/ntcir/workshop/clir/CFPinN
TCIR3CLIRr.htm</p>
        <p>The other issue is reuse of training set and
experiment design using paired corpus. At the
NTCIR Workshop 3, bigger and higher quality
paired corpus of English and Japanese will be
provided in the Patent Retrieval Task, but we plan to
allow to use 1995-1997 parallel corpus for training
and dictionary development and the test will be done
using full patent documents of 1998-1999 and
parallel corpus of 1998-1999 are not allowed to use.</p>
        <p>Documents sets were also problematic. At the
Second workshop, text summarization task used
Mainichi Newspaper corpus of 1994, 1995 and 1998
and asked the participants obtained the data from the
newspaper company since they sell the corpus for
research purpose use. As a results some of the
participating groups did not obtain the data and
could not conduct the task. For the next workshop,
the NII will provide all the data for participants
though the Mainichi Newspaper documents allowed
only limited years of use; two years for Japanese
participants, and up to 7 years for participants from
outside Japan.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>5. NTCIR Workshop 3</title>
      <p>The third NTCIR Workshop will start from
September 2001 and the workshop meeting will be
held in October 2002. We picked five areas of
research as tasks. The updated information will be
found at http://research.nii.ac.jp/ntcir/workshop/.
Below is a brief summary of the tasks envisaged for
the Workshop. A participant will conduct one or
more of the tasks or subtasks below. Participation in
only one subtask (for example Japanese monolingual
IR (J-J) in the CLIR Task) is available:
Documents and topics are in four languages
(Chinese, Korean, Japanese and English). 50 topics
for the collections of 1998-1999 (Topic98) and 30
topics for the collection of 1994.(Topic94) Both
topic sets contain four languages (Chinese, Korean,
English and Japanese).
(a)
: Search document
collection more than one languages by one of
four languages of topics. Excepting Korean
documents because of time range difference.
(Xtopic98&gt;CEJ)
different
documents,
documents
Xtopic98&gt;J)</p>
      <sec id="sec-2-1">
        <title>Search of Chinese,</title>
        <p>Japanese.(Ctopic98&gt;C,
Jtopic98&gt;J)
: Search of any two
languages as language and
excepting search of English
(Xtopic98&gt;C, Xtopic94&gt;K,
: Monolingual</p>
        <p>Korea, or
Ktopic94&gt;K,
(b)
(c)
(b)
DOCUMENT: newspapers publish in Asia:
- Chinese: ,</p>
        <p>1999)
- Korean:
- Japanese:
- English:
: retrieve
patents in response to J/E/C newspaper
articles associated with technology and
commercial products. 30 query articles with
short description of search request.
: retrieve
patents associated with an input Japanese
patent. 30 query patents with short
description of search requests.</p>
        <p>: Any research reports are invited
on patent processing using the above data,
including, but not limited to: generating patent
maps, paraphrasing claims, aligning claims and
examples, summarization for patents,
clustering patents.</p>
      </sec>
      <sec id="sec-2-2">
        <title>DOCUMENT:</title>
        <p>- Japanese patents: 1998-1999 (ca. 17GB, 700K
docs)
- Japio patent abstracts: 1995-1999 (ca.1750K
docs)
Patent Abstracts of Japan (English translations
for Japio patent abstracts): 1995-1999 (ca.
1750K)
Patolis test collection (34 topics and relevance
assessment on the Patent 1998 )
Newspaper articles (Japanese/ English/
Traditional Chinese)</p>
        <p>: System extracts five answers from the
documents in some order. 100 questions.
System is required to return support
information for each answer of the questions.
We assume the support informationas a
paragraph, 100 letter passage or document
which includes the answer.</p>
        <p>: System extracts only one answer from
the documents. 100 questions. Support
information is required.</p>
        <p>: evaluation of a series of questions. The
related questions are given for the 30 of
questions of Task 2.</p>
        <p>DOCUMENT: Japanese newspaper
(Mainichi Newspaper 1998-1999)
articles
(a)
(b)
(c)
(a)
(b)
(a)
(b)
(c)
DOCUMENT: Web documents mainly collected
from jp domain (ca.100GB &amp; ca.10GB) Available at
the "Open-Lab" in the NII</p>
      </sec>
      <sec id="sec-2-3">
        <title>Application Due</title>
        <p>Document release (newspaper)</p>
        <p>Dry Run and Round-Table
Discussion (varied with on each task)
Open Lab start</p>
        <p>Formal Run (varied with each
task)</p>
      </sec>
      <sec id="sec-2-4">
        <title>Evaluation Results Delivery</title>
        <p>Paper for Working Note Due</p>
        <p>NCIR Workshop 3 Meeting
Days 1-2: Closed session (task participants only)
Day 3: Open session</p>
        <p>Paper for Final Proceedings Due
For the next workshop, we plan some new ventures
including below;.</p>
        <p>(1) Multilingual CLIR (CLIR)
(2) Search by Document (Patent, Web)
(3) Passage Retrieval or submit "evidential
passages", passages to show the reason why
the documents are supposed to be relevant
(Patent, QA, Web)
(4) Optional Task (Patent,Web)
(5) Multigrade Relevance Judgments (CLIR,</p>
        <p>Patent, Web)
(6) Precision Oriented Evaluation (QA, Web)</p>
        <p>For (1), it was our first trial of the CLEF model in
the Asia. Also we would like to invite any other
language groups who wish to join us by providing
document data and relevance judgments or by
providing query tranlsation.</p>
        <p>For (3), we suppose that idintifying most relevant
passage in the retrieved documents are needed when
retrieving longer documents like Web documents or
patents. The primary evaluation will be done
document base but we will use the submitted
passages as a secondary information for further
analysis.</p>
        <p>(4). For Patent and Web tasks, we invite any
research groups who are interested in the research
using the document collection provided in the tasks
for any research projects. Those document
collections are rather new to our research
community and many interesting characteristics are
included. Also we expect that this venture will
explore the new possible tasks for the future
workshop.</p>
        <p>For (5), we have used multigrade relevance
judgment so far and proposed new measures,
Weighted Average Precision and Weighted R
Precision for the purpose. We will continue this line
Given the texts to be summarized and
summarization lengthes, the participants
submit summaries for each text in plain text
format.</p>
        <p>Given a set of texts, the participants produce
summaries of it in plain text format. The
information which was used to produce the
document set, such as queries, as well as
summarization lengthes are given to the
participants.</p>
        <p>DOCUMENT: Japanese newspaper
(Mainichi Newspaper 1998-1999)*
articles
A. Survey Retrieval (both recall and precision
are evaluated)
- A1. Topic Retrieval
- A2. Similarity Retrieval
B. Target Retrieval (precision-oriented)
C. Optional Task
- C1.Search Results Classification
- C2. Speech-Driven Retrieval
- C3. other
of investigation and will add "top relevant" for Web
Task as well as evaluation by trec_eval.</p>
        <p>In the future, we desire the enhancement of the
investigation in the following directions:</p>
      </sec>
      <sec id="sec-2-5">
        <title>Evaluation of CLIR systems</title>
        <p>Evaluation of retrieval of new document genres
and more realistic evaluation</p>
        <p>Evaluation of technology to make information in
the documents immediately usable.</p>
        <p>One of the problems of CLIR is the availability of
resources that can be used for translation.
Enhancement of the processes of creating and
sharing the resources is important. In the NTCIR
Workshops, some groups automatically constructed
a bilingual lexicon from a quasi-paired document
collection. Such paired documents can be easily
found in non-English speaking countries and on the
Web. Studying the algorithms to construct such
resources and sharing them is one practical way to
enrich the applicability of CLIR. International
collaboration is needed to construct multilingual test
collections and to organize the evaluation of CLIR,
since creating topics and relevance judgments are
language- and cultural-dependent, and must be done
by native speakers. Cross-lingual summarization and
qustion answering are also considered for the future
workshops.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>NTCIR</given-names>
            <surname>Project</surname>
          </string-name>
          : http://research.nii.ac.jp/ntcir/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[2] NTCIR Workshop 1: Proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition</source>
          ,
          <volume>30</volume>
          Aug.1 Sept.,
          <year>1999</year>
          , Tokyo,
          <fpage>ISBN4</fpage>
          -924600-77-6. http://research.nii.ac.jp/ntcir/workshop/OnlineProcee dings/)
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>IREX</surname>
            <given-names>URL</given-names>
          </string-name>
          :http://cs.nyu.edu/cs/projects/proteus/irex/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[4] NTCIR Workshop 2 : Proceedings of the Second NTCIR Workshop on Research in Chinese &amp; Japanese Text Retrieval and Text Summarization</source>
          , Tokyo, June 2000- March 2001iISBNF4-
          <fpage>924600</fpage>
          - 96-2)
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Kando</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Cross-Linguistic Scholarly</surname>
          </string-name>
          Information Transfer and
          <article-title>Database Services in Japan. Annual Meeting of the ASIS</article-title>
          , Washington DC.
          <source>Nov. 1</source>
          , 1997
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Kuriyama</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoshioka</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kando</surname>
          </string-name>
          , N.:
          <article-title>Effect of Cross-Lingual Pooling</article-title>
          .
          <source>In NTCIR Workshop 2 : Proceedings of the Second NTCIR Workshop on Research in Chinese &amp; Japanese Text Retrieval and Text Summarization</source>
          , Tokyo, June 2000- March
          <year>2001</year>
          iISBNF4-
          <fpage>924600</fpage>
          -96-2)
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Spink</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bateman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>From highly relevant to not relevant: Examining different regions of relevance</article-title>
          .
          <source>Information Processing and Management</source>
          , Vol.
          <volume>34</volume>
          , No.
          <issue>5</issue>
          , pp.
          <fpage>599</fpage>
          -
          <lpage>622</lpage>
          ,
          <year>1998</year>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Dunlop</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          <article-title>Reflections on Mira</article-title>
          ,
          <source>Journal of the Americal Society for Information Sciences</source>
          , Vol.
          <volume>51</volume>
          , No.
          <volume>14</volume>
          , pp.
          <fpage>1269</fpage>
          -
          <lpage>1274</lpage>
          ,
          <year>2000</year>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Spink</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Greisdorf</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>Regions and levels: Measuring and mapping users' relevance judgments</article-title>
          .
          <source>Journal of the Americal Society for Information Sciences</source>
          , Vol.
          <volume>52</volume>
          , No.
          <issue>2</issue>
          , pp.
          <fpage>161</fpage>
          -
          <lpage>173</lpage>
          ,
          <year>2001</year>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Kando</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuriyama</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoshioka</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Evaluation based on multi-grade relevance judgements</article-title>
          .
          <source>IPSJ SIG Notes</source>
          , Vol.
          <volume>2001</volume>
          <source>-FI-63</source>
          , pp.
          <fpage>105</fpage>
          -
          <lpage>112</lpage>
          ,
          <year>July 2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Sung</surname>
            ,
            <given-names>H.M.</given-names>
          </string-name>
          <article-title>"HANTEC Collection"</article-title>
          .
          <article-title>Presented at the panel on IR Evaluation in the 4th IRAL</article-title>
          ,
          <string-name>
            <surname>Hong</surname>
            <given-names>Kong</given-names>
          </string-name>
          ,
          <volume>30</volume>
          Sept.-3
          <string-name>
            <surname>Oct</surname>
          </string-name>
          .
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Yoshioka</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuriyiama</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kando</surname>
            ,
            <given-names>N.:</given-names>
          </string-name>
          <article-title>Analysis on the Usage of Japanese Segmented Texts in the NTCIR Workshop 2</article-title>
          .
          <source>In NTCIR Workshop 2 : Proceedings of the Second NTCIR Workshop on Research in Chinese &amp; Japanese Text Retrieval and Text Summarization</source>
          , Tokyo, June 2000- March
          <year>2001</year>
          iISBNF4-
          <fpage>924600</fpage>
          -96-2)
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Kando</surname>
            ,
            <given-names>N</given-names>
          </string-name>
          , Nozue,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Kuriyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Oyama</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          : NTCIR-1
          <article-title>: Its Policy and Practice</article-title>
          ,
          <source>IPSJ SIG Notes</source>
          , Vol.
          <volume>99</volume>
          , No.
          <volume>20</volume>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>40</lpage>
          ,
          <year>1999</year>
          [in Japanese].
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Kuriyama</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nozue</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kando</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oyama</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Pooling for a Large Scale Test Collection: Analysis of the Search Results for the Pre-test of the NTCIR-</article-title>
          1 Workshop, IPSJ SIG Notes, Vol.
          <volume>99</volume>
          -FI-54, pp.
          <fpage>25</fpage>
          -
          <issue>32</issue>
          <year>May</year>
          ,
          <year>1999</year>
          [in Japanese].
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Kuriyama</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kando</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Construction of a Large Scale Test Collection: Analysis of the Training Topics of the NTCIR-1</article-title>
          ,
          <source>IPSJ SIG Notes</source>
          , Vol.
          <volume>99</volume>
          -
          <issue>FI55</issue>
          , pp.
          <fpage>41</fpage>
          -
          <lpage>48</lpage>
          ,
          <year>July 1999</year>
          [in Japanese].
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Kando</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eguchi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuriyama</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Construction of a Large Scale Test Collection: Analysis of the Test Topics of the NTCIR-1</article-title>
          , In Proceedings of IPSJ Annual Meeting [in Japanese]. pp.
          <fpage>3</fpage>
          -
          <lpage>107</lpage>
          -- 3-
          <issue>108</issue>
          , 30 Sept -3
          <string-name>
            <surname>Oct</surname>
          </string-name>
          .
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Voorhees</surname>
            ,
            <given-names>E.M.:</given-names>
          </string-name>
          <article-title>Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness</article-title>
          ,
          <source>In Proceedings of 21st Annual International ACMSIGIR Conference on Research and Development in Information Retrieval</source>
          . pp.
          <fpage>315</fpage>
          -
          <lpage>323</lpage>
          , Melbourne, Australia,
          <year>August</year>
          . 1998
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H.H.</given-names>
          </string-name>
          :
          <article-title>The Chinese Text Retrieval Tasks of NTCIR Workshop II</article-title>
          .
          <source>In NTCIR Workshop 2 : Proceedings of the Second NTCIR Workshop on Research in Chinese &amp; Japanese Text Retrieval and Text Summarization</source>
          , Tokyo, June 2000- March
          <year>2001</year>
          iISBNF4-
          <fpage>924600</fpage>
          -96-2)
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Kando</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuriyama</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoshioka</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Overview of Japanese and English Information Retrieval Tasks (JEIR) at the Second NTCIR Workshop</article-title>
          .
          <source>In NTCIR Workshop 2 : Proceedings of the Second NTCIR Workshop on Research in Chinese &amp; Japanese Text Retrieval and Text Summarization</source>
          , Tokyo, June 2000- March 2001iISBNF4-
          <fpage>924600</fpage>
          -96-2)
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>