=Paper=
{{Paper
|id=None
|storemode=property
|title=Temporal Information Retrieval: Challenges and Opportunities
|pdfUrl=https://ceur-ws.org/Vol-707/TWAW2011-paper1.pdf
|volume=Vol-707
|dblpUrl=https://dblp.org/rec/conf/www/AlonsoSBG11
}}
==Temporal Information Retrieval: Challenges and Opportunities==
Temporal Information Retrieval:
Challenges and Opportunities
Omar Alonso Jannik Strötgen
Microsoft Corp. Institute of Computer Science
Mountain View, CA University of Heidelberg, Germany
omar.alonso@microsoft.com stroetgen@uni-hd.de
Ricardo Baeza-Yates Michael Gertz
Yahoo! Research Institute of Computer Science
Barcelona, Spain University of Heidelberg, Germany
rbaeza@acm.org gertz@uni-hd.de
ABSTRACT describing the chronological context of a document or a col-
Time is an important dimension of any information space. lection of documents. As an extension to existing ranking
It can be very useful for a wide range of information re- techniques, which are primarily based on popularity or rep-
trieval tasks such as document exploration, similarity search, utation, time can be in particular valuable for exploring
summarization, and clustering. Traditionally, information search results along well-defined timelines and at multiple
retrieval applications do not take full advantage of all the time granularities due to the key characteristics of temporal
temporal information embedded in documents to provide information:
alternative search features and user experience. However, in
• Temporal information is well-defined: Assuming two
the last few years there has been exciting work on analyzing
points in time or two intervals, the relationship be-
and exploiting temporal information for the presentation,
tween them can be identified, e.g., the relationship can
organization, and in particular the exploration of search re-
be of the types before, overlap, or after [3].
sults.
In this paper, we review the current research trends and • Temporal information can be normalized: Regardless
present a number of interesting applications along with open of the used terms or the used language, every temporal
problems. The goal is to discuss interesting areas and future expression referring to the same semantics can be nor-
work for this exciting field of information management. malized to the same value in some standard format.
This property makes temporal information term- and
language-independent.
Categories and Subject Descriptors
I.2.7 [Artificial Intelligence]: Natural Language Process- • Temporal information can be organized hierarchically:
ing—Language models, Text analysis Temporal expressions can be of different granularities,
e.g., of type day (“May 20, 2011”) or of type year
(“2011”). Due to the fact that years consist of months,
Keywords and months and weeks consist of days, temporal ex-
temporal information, information retrieval pressions can be mapped to coarser granularities based
on the hierarchy of temporal expressions.
1. INTRODUCTION Using these key characteristics, temporal information about
Time clearly plays a central role in any information space, documents can be used to develop time-specific information
and it has been studied in several areas like information retrieval and exploration applications. The most obvious
extraction, topic-detection, question-answering, query log type of temporal information associated with a document is
analysis, and summarization. Time and temporal measure- its creation time or the date of its last modification. This
ments can help recreating a particular historical period or kind of information, which is directly accessible through
the metadata of a document, can already be used for sev-
eral tasks such as time-aware search or temporal clustering.
However, the document creation time is only valuable in a
specific context such as the news domain. In other areas,
and even in the news domain itself, a lot of temporal infor-
mation is neglected if the document creation time is used as
the only time information associated with a document. This
Copyright 2011 for the individual papers by the papers’ authors. Copying is because there is a lot of temporal information latently
permitted only for private and academic purposes. This volume is published
and copyrighted by its editors. available in a document’s text. Assume a news document
TWAW 2011, March 28, 2011, Hyderabad, India. reports on an event that is already dated. Then, if only the
1
TWAW 2011, Hyderabad, India
document creation time is taken into account, the informa- out any further knowledge. That is, the expression itself
tion when the event occurred is ignored. But to make use contains all the information needed for normalization and is
of such information, so-called temporal taggers are applied thus fully specified.
to extract and normalize temporal expressions contained in In contrast, relative temporal expressions cannot be nor-
documents. malized without taking into account some context informa-
The remainder of the paper is organized as follows. After tion. For example, the expression “today” cannot be nor-
a discussion of how time appears in documents and how malized without knowing the corresponding reference time.
it is possible to extract such temporal data, in Section 3, This reference time can either be the document creation time
we survey research on temporal tagging. In Section 4, we or another temporal expression in the document. Typically,
present the current research trends on temporal information in news articles, the document creation time is important
retrieval. We then describe application areas and challenges. and often used as reference time. Note that this kind of
Finally, we present our concluding remarks. information is directly accessible in form of a timestamp
through the metadata of a document. The expression “yes-
2. TIME IN DOCUMENTS terday” in Thousands of prisoners in Egypt managed to es-
cape from prison yesterday can be normalized to “2011-01-
As indicated in the introduction, there is a lot of temporal
29” if we know the document creation time to be “2011-
information in any collection of documents, be it ranked doc-
01-30”. In other types of documents, the reference time is
uments in a hit list or a corpus of topic specific documents.
usually represented by another temporal expression in the
To take advantage of such time related information in par-
document. In general, there are many occurrences of relative
ticular for document exploration purposes, in a document
temporal expressions. While sometimes the reference time is
processing step, it is important to extract this information,
sufficient for normalization, further information is needed if
anchor it in time, compute some (aggregated) measures, and
the relation to the reference time has to be identified as well.
make all this information explicit to subsequent exploration
For example, “on Monday” can either refer to the previous or
tasks.
to the next Monday with respect to the reference time. An
In this section, we give a description of the different types
indicator for determining the relationship can be the tense
of temporal information mentioned in documents (Sec. 2.1),
of the sentence with future tense and present tense indicat-
explain how temporal expressions can be realized in natu-
ing an after-relationship to the reference time and past tense
ral language (Sec. 2.2), and demonstrate how they can be
a before-relationship. Figure 1 shows some parts of a news
extracted and normalized (Sec. 2.3).
article containing explicit and relative temporal expressions
2.1 Types of Temporal Information and illustrate what kind of context information is needed for
normalizing the relative expressions.
Temporal expressions mentioned in text documents can
The third type of temporal expressions are implicit expres-
be grouped into four types according to TimeML [27], the
sions such as names of holidays or events. These expressions
standard markup language for temporal information: date,
can be anchored on a timeline if a mapping of the expres-
time, duration, and set. While duration expressions are used
sion to its normalized value is available. For example, “New
to provide information about the length of an interval (e.g.,
Year’s Day 2002” can be normalized to “2002-01-01” since
“three years” in they have been traveling through the U.S.
“New Year’s Day” always refers to January 1. In addition,
for three years), set expressions inform about the periodical
there are expressions for which a temporal function has to
aspect of an event (e.g., “twice a week” in she goes to the
be applied. “Labor Day”, for example, refers to the first
gym twice a week ). In contrast, time and date expressions
Monday in September so that “Labor Day 2009” can be nor-
(e.g., “3 p.m.” or “January 25, 2010”) both refer to a specific
malized to “2009-09-07” if we know this day to be the first
point in time – though in a different granularity.
Monday in September 2009.
An interesting key feature of temporal information is that
Although there are many different ways to refer to a spe-
it can be normalized to some standard format. Assuming
cific point in time, all expressions referring to the same point
a Gregorian calendar as representation of time, expressions
in time shall be normalized to the same value in the stan-
of time and date can be directly placed on a timeline. A
dard format. This normalization process is one of the tasks
date is then typically represented as YYYY-MM-DD, e.g.,
of so-called temporal taggers, as described in the next para-
the expression “January 25, 2010” is normalized to “2010-01-
graph.
25”. However, the normalization is not always as simple as in
this example, but depends on the way temporal information
is expressed in a document, which will be discussed in the 2.3 Temporal Tagging
next paragraph. Temporal tagging is a specific task in named entity recog-
nition and normalization. The goals of so-called temporal
2.2 Occurrences of Temporal Expressions taggers are (i) the extraction of temporal expressions and
There are many different ways how to express temporal (ii) the normalization of these expressions to some standard
information of the types date and time in documents. Sim- format. As this standard format, TIMEX2 and TIMEX3
ilar to the work by Schilder and Habel [31], we distinguish are often used. While TIMEX2 tags include pre- and post-
between explicit, implicit, and relative temporal expressions. modifiers of the temporal expression itself (e.g., dependent
Explicit temporal expressions refer to a specific point in clauses) and allow for nested temporal expressions [11], such
time. Note that this point in time can be of different gran- modifiers and nested tags are not included by TIMEX3 tags.
ularities. For example, the expression “January 25, 2010” Instead, TIMEX3 is part of the TimeML markup language
refers to a specific day while the expression “November 2005” in which further annotation types are available for captur-
refers to a specific month. The key characteristic of explicit ing more complex temporal semantics. Nevertheless, al-
temporal expressions is that they can be normalized with- though there are significant differences between TIMEX2
2
TWAW 2011, Hyderabad, India
Document Creation Time: 1998-04-18 temporal expressions, the normalization is usually done in a
rule-based way by all temporal taggers.
Hungarian astronaut Bertalan Farkas is leaving for the Due to their importance for temporal information retrieval,
United States to start a new career, he said today . we give an overview of existing temporal taggers and their
quality in the next section. In addition, we present resources
. . . On May 22, 1995 , Farkas was made a brigadier general,
for evaluating temporal taggers and survey temporal evalu-
and the following year he was appointed military attache ation challenges organized so far.
. . . However, cited by District of Columbia traffic police in
December for driving under the influence of . . . 3. RESEARCH ON TEMPORAL TAGGING
Temporal processing of text documents in terms of the ex-
traction and normalization of temporal expressions as well
Figure 1: Examples of temporal expressions in a as the extraction of temporal relations between events is
news article with explicit (transparent boxes) and very important for several NLP tasks requiring a deep un-
relative (solid boxes) expressions. Arrows indicate derstanding of language such as question answering or doc-
what kind of context information is needed to nor- ument summarization. Due to this fact, there has been sig-
malize the temporal expression. nificant research in temporal annotation of text documents.
The markup language TimeML has become an ISO standard
for temporal annotation [27], and the TimeBank corpus was
and TIMEX3, they are very similar in many ways and a
developed [28]. The latest version of the TimeBank corpus
detailed analysis goes beyond the scope of this paper.1 Ac-
contains 183 news articles and can be regarded as the gold
cording to the TimeML annotation guidelines, a TIMEX3
standard for temporal annotation. However, there has been
tag contains, among others, the following information about
important research activity before, and several evaluation
a temporal expression:
challenges have been held to bring forward research in the
• offset: the start and end position of the expression in area of temporal information extraction as described in the
the document following section.
• type: whether the expression is of type date, time, 3.1 Evaluation Challenges
duration, or set The earliest competitions for the extraction of temporal
expressions have been the named entity recognition tasks
• value: the normalized value of the expression
of the Message Understanding Conferences MUC 1995 and
To identify this information, i.e., to extract and normal- 1997 [8, 12]. A combination of the extraction and the nor-
ize temporal expressions, temporal taggers are applied after malization was introduced in the ACE (Automatic Content
the text is preprocessed. Usually, sentence and token bound- Extraction) time expression recognition and normalization
aries are detected and a part-of-speech tag is associated with (TERN) challenges in 2004, 2005, and 20072 . Several tem-
every token. This information can then be used by the tem- poral taggers have been developed by the participants of
poral tagger to identify temporal expressions. The first goal, these challenges (see Section 3.2). Often, the TERN 2004
i.e., the identification of the boundaries of temporal expres- and 2005 corpora3 are used to compare the quality of tempo-
sions, can be seen as typical classification task. Therefore, ral taggers. The TERN corpora are annotated with respect
there has been some work on addressing this problem by to the TIMEX2 annotation guidelines [11].
applying machine learning techniques (e.g., [15, 38]). The A further indication of the importance of temporal an-
classification problem can be described in the following way: notation and the activity in the research domain are the
For every token t, decide whether t is outside (O) of tem- TempEval challenges. Motivated by the importance of tem-
poral expressions, inside (I) a temporal expression, or the poral annotation for many NLP tasks, TempEval was orga-
beginning (B) of a temporal expression. The well-known nized the first time as one task of the SemEval workshop
IOB-format can be used for annotating tokens according to in 2007 [39]. In this competition, the organizers provided
their property. annotated text documents based on the TimeBank corpus.
In addition to machine-learning approaches, there are sev- While the annotations of events and temporal expressions
eral rule-based approaches to extract temporal expressions were given, the task for the tools to be developed was to
(e.g., [24, 34]). These are usually based on regular expres- identify temporal relations between events and the docu-
sions although they may use other information about the ment creation time, between events and temporal expres-
text as well, such as part-of-speech information. sions, and between two events.
The more difficult task is the normalization of the tempo- In 2010, the full task of identifying all temporal related ex-
ral expressions. While explicit expressions can be normal- pressions and relations was faced in the follow-up challenge.
ized without further knowledge, the normalization of rela- That is, for TempEval-2, two further tasks were added [41]:
tive expressions is challenging. As described above, context the extraction and normalization of temporal expressions
information has to be identified to determine the correct ref- and of events. In addition, the discovery of relations be-
erence time and the temporal relation between a temporal tween two events was split into two tasks, namely the iden-
expression and its reference time. While there are rule-based tification of relations between two main events in consecu-
and machine learning based approaches for the extraction of 2
http://www.itl.nist.gov/iad/mig/tests/ace/.
1 3
For further information on temporal annotation according The TERN development corpora are available through the
to TimeML and differences between TIMEX2 and TIMEX3, Linguistic Data Consortium. See: http://fofoca.mitre.
see http://www.timeml.org. org/tern.html.
3
TWAW 2011, Hyderabad, India
tive sentences and relations between two events where one
event syntactically dominates the other event. The Temp-
Eval corpora are based on the TimeBank corpus and an-
notated according the TimeML annotation guidelines, i.e.,
using TIMEX3 tags for temporal expressions. A further nov-
elty in the second TempEval challenge was that the tasks
were offered not only in English but in six languages. How-
ever, only two languages where addressed by the partici-
pants, namely English and Spanish. Nevertheless, thanks
to the creation of an annotation standard, a gold standard
corpus, and competitions such as the TempEval challenges,
there has been significant improvements in temporal relation Figure 2: Annotation of a timeline by workers using
identification and temporal tagging. Some existing temporal crowdsourcing.
taggers and their quality is presented in the next paragraph
by comparing their results in the TempEval-2 challenge.
4. RESEARCH TRENDS
Research work on fully utilizing the temporal informa-
3.2 Temporal Taggers tion embedded in the text of documents for exploration and
Having applied a temporal tagger on a document collec- search purposes is very recent. The work by Alonso et al.
tion, the previously hidden temporal information is made presents an approach for extracting temporal information
available for tasks such as temporal relation extraction or and how it can be used for clustering search results [5].
temporal clustering. One often applied temporal tagger is Berberich et al. describe a model for temporal information
GuTime, which is part of the Tarsqi tool kit [40]. It is needs [7]. Figure 2 shows the annotated timeline for the
based on the TempEx tagger, which was the first temporal NYTimes data set for the latter reference using the Time-
tagger for the extraction of temporal expressions and their line widget5 . These last two projects rely on crowdsourcing,
normalizations [22]. GuTime was developed as automatic mainly using Amazon Mechanical Turk, for evaluating parts
evaluation tool for TimeML and extends the capabilities of of their work.
the TempEx tagger. It was evaluated on the TERN 2004 News sources have been the primary focus of a number of
training corpus and achieves F-measures of 85%, 78%, and projects on exploiting time information in documents. For
82% for lenient and strict detection and for normalization, example, the Time Frames project realizes an approach to
respectively. augment news articles by extracting time information [14].
In the TempEval-2 challenge, eight teams participated in Google’s news timeline6 is an experimental feature that al-
the task for temporal expression extraction and normaliza- lows a user to explore news by time.
tion for English documents. The best-performing system Extensions to document operations such as comparing the
was HeidelTime with an F-Score of 86% for the extraction temporal similarity of two documents in the context of news
and an accuracy of 85% for the normalization [34]. For Span- articles is presented by Makkonen and Ahonen-Myka [17].
ish documents, the best result for the extraction was an F- An interesting approach that combines topic detection and
Score of 91% [16], while another system achieved the highest tracking with timelines as a browsing interface is presented
accuracy for normalization (83%) [42]. While both machine- by Swan and Allan [37]. Time information is also used in
learning and rule-based approaches were applied for the ex- temporal mining of blogs to extract useful information [29].
traction, the normalization was done in a rule-based way by New research has also emerged for future retrieval where
all systems. As the best performing system, HeidelTime uses temporal information is used for searching the future [6].
rules consisting of an extraction and a normalization part. There is exciting research on adding a time dimension
Thus, all temporal expressions that are identified are nor- to certain applications like news summaries [2], temporal
malized as well. Due to the strict separation of the code and patterns [33], and temporal Web search [26]. The special
the rules, HeidelTime is applicable for multi-lingual tempo- issue on temporal information processing gives a clear map
ral tagging4 . of current directions [20]. Harvesting temporal information
Although there has been significant advances in tempo- and how it can be used for entities and relationships is also
ral tagging, there is still room for improvement, especially a very recent rich area [43].
when switching the processing language or the domain of the Closely related to information extraction is the recent re-
document collection. For example, Mazur and Dale recently search on temporal annotations, which is covered in depth in
presented a new corpus for research on temporal expressions the book by Mani et al. [21]. Identification of time related
containing long, narrative-style documents, namely Wikipedia information depends heavily on the language and the cor-
articles describing the historical course of wars [25]. Using pora, so traditional information extraction systems tend to
their temporal tagger, they show that the normalization of fall short in terms of temporal extraction. Based on the lat-
temporal expressions in such documents is very challenging est advances, new research is emerging for automatic assign-
due to the rich discourse structure and the huge number ment of document event-time periods and automatic tagging
of often underspecified temporal expressions in these docu- of news messages using entity extraction [31].
ments compared to the usually used short news documents. Now, we outline a number of applications that can benefit
from leveraging more temporal information either by tem-
poral expressions or timestamps. For each application, we
4 5
For details, see http://dbs.ifi.uni-heidelberg.de/ http://simile.mit.edu/
6
stixx/. http://www.newstimeline.googlelabs.com
4
TWAW 2011, Hyderabad, India
describe why it is important and present a number of chal- • When to show a timestamp or temporal expressions?
lenges.
• Should the snippet present the matching lines in a
4.1 Exploratory Search timeline?
Research in exploratory search systems has gained a lot • Is a temporal summary a good surrogate for a docu-
of attention lately as they add a significant user interface ment?
component to help users search, navigate, and discover new
facts and relationships. As the amount of information on • For which kind of queries is a temporal summary ap-
the Web keeps growing, exploratory search interfaces are propriate?
starting to surface. That said, it is not clear how to leverage
temporal information. A few problems are: • Should temporal summaries be query independent?
• How to expose temporal information in exploratory 4.4 Temporal Clustering
search systems? The notion of clustering search results by temporal at-
tributes has been presented in [5]. Preliminary results in-
• What’s the best way of presenting temporal informa-
dicate that users are interested in dissecting a document
tion as a retrieval cue?
collection by time. At the same time it is not clear for
• For which data sources, besides news, does exploratory which kind of scenarios besides “research-like” questions this
search make sense? approach would work. Key issues are:
• Is e-discovery a vertical application that can benefit • Can we identify documents that are contemporary and
from temporal information? therefore related?
4.2 Micro-blogging and Real-time Search • Which chronons can be more useful for clustering?
Micro-blogging sites like Twitter have gained a lot of at-
tention lately as the ultimate mechanism to broadcast what’s • How can we cluster micro-blogging data by time?
going on. Due to its nature, a typical message is very short • Is a timeline the best way to cluster search results?
and its lifespan is basically the crowd interest about that
particular event be a football final game or an earthquake. 4.5 Temporal Querying
In the case of Twitter, it is very difficult to beat the timely The temporal information extracted from documents can
broadcasting of an important event if one compares this to directly be used to allow the user of a search engine to con-
a news article. Each tweet has a timestamp but the orga- strain his/her query in a temporal manner. That is, in addi-
nization of such information is still not clear. In the news tion to a textual part, a query contains a temporal part. For
context, the reporter has to write an article that contains a example, in addition to “world war” a temporal constraint
few paragraphs and submit the final version through some like “1944-1945” could be specified. The user would obvi-
content management version that would push it to an ex- ously expect documents about World War II as results for
ternal website so a search engine can hopefully crawl and his query. The objective when using a combination of a text
index it in time. In parallel, if a tweet is so important by and a temporal query can thus be formulated in the follow-
the time the reporter is finishing with the article, the main ing way: The more both parts of the query are satisfied, i.e,
idea would be trending in Twitter, therefore highlighting its the more the textual and the temporal parts fit to a docu-
importance at a world scale. This is very similar to the tra- ment, the higher should be the rank of this document. The
ditional notion of topic detection and tracking [1, 18], with main problems for such a combination of constraints is the
one key difference: speed to detect that the topic is impor- following:
tant and therefore a candidate for trending. Some problems
are: • How can a combined score for the textual part and the
temporal part of a query be calculated in a reasonable
• What is the best way to provide a timeline of events way?
in micro-blogging?
• Should a document in which the “textual match” and
• What is the lifespan of the main event? the “temporal match” are far away from each other be
• How fast and precise can one detect trending events? penalized?
• What is the fraction of new content on the topic stream? • What about documents satisfying one of the constraints
but “slightly” fail to satisfy the other constraint?
4.3 Temporal Summaries
There has been seminal work on temporal summaries of 4.6 Temporal Question Answering
news topics by Allan [2] that shows how important temporal To be able to answer time-related questions, a question an-
information is. One extension is to generate time sensitive swering system has to know when specific events took place.
summaries that can be used as temporal snippets [4]. For this, temporal information can be associated with ex-
By design, the main goal of a snippet (or caption) is to tracted facts from text documents [26]. While this may be
present a document surrogate that the user can quickly scan applicable for famous facts and events, question answering
in the search results page without the need to click and systems are often faced with imperfect temporal informa-
read the full content of a document. There is a limit to tion. For this, identifying relationships between events de-
the number of lines of text that the snippet should present. scribed in documents is important as it is for many further
Interesting questions include: NLP tasks (see Section 3.1). Especially historic events tend
5
TWAW 2011, Hyderabad, India
to have a gradual beginning and ending so that knowing • How far does one need to go back in time?
the temporal relationship between two events may allow to
answer a temporal query although no explicit temporal in- • Can we improve bibliographic search instead of just
formation is associated with the events [30]. Research issues sorting by publication date?
are:
• How can we evaluate the quality of such systems?
• How can inconsistent temporal information be dealt
with? 4.10 Web Archiving
• How can temporal reasoning be executed if temporal The goal of Web archiving is to collect and store digi-
relationships are missing? tal content so that it is accessible for future tasks. Besides
the detection of spam, which can be dealt with analyzing
4.7 Temporal Similarity the evolution of the link structure of web pages [9], a main
A related research question to temporal querying is tem- challenge in Web archiving is to take care of the temporal
poral document similarity. Instead of comparing a temporal coherence of Web pages since it is not possible to collect all
query with the temporal information of a document, two pages at the same time. Thus, the content of parts of the
documents can be compared with respect to their temporal collection may change during the crawling process. In [32],
similarity. The main problem arising here is what makes two Spaniol et al. introduce a coherence framework to overcome
documents temporally similar? This leads to the following the temporal diffusion of the Web crawls, i.e., to minimize
questions: the risk of incoherences. Nevertheless, open problems re-
main:
• Should two documents be considered similar if they
cover the same temporal interval? • How can temporal information be used to predict which
pages are likely to change over time?
• Should the temporal focus of the documents be impor-
tant for their temporal similarity? • How can temporal coherence be achieved for any point
in time or time interval?
• Can two documents be regarded as temporally similar
if one contains a small temporal interval of the other
document in a detailed way?
4.11 Spatio-temporal Information Exploration
Recently, there has been some research on combining spa-
4.8 Timelines and User Interfaces tial and temporal information extracted from documents for
One important use of time entities of a document is to exploration tasks [23, 36]. In the same way as temporal in-
create a sorted list of events, a timeline. A timeline can formation can be normalized using a timeline, spatial infor-
be shown as a list of vertical textual items or visualized in mation can be normalized according its latitude/longitude
many different ways. For example, as in Yahoo!’s Corre- information. To extract geographic expressions from docu-
lator7 . More sophisticated visualizations allow to focus on ments, so-called geo taggers can be applied. Combining the
specific named entities with respect to time like in Yahoo!’s information extracted from a temporal tagger with the in-
News Explorer [10, 19]. Here, interesting questions are: formation extracted from a geo tagger allows the exploration
of documents according to the events mentioned in the text
• What is the appropriate way to present a timeline? since events usually happen at some specific time and place.
A system for the exploration of such spatio-temporal infor-
• Is a linear timeline the only way to present and anchor mation from documents is TimeTrails [35]. Some questions
documents in time? are:
• How can one leverage document temporal measures to
• What’s the best way to represent maps and time?
present a good display?
• Are there specific visualizations or user interfaces that • Which kinds of scenarios can benefit from spatio-temporal
can benefit from temporal information? exploration?
4.9 Searching in Time 5. CONCLUDING REMARKS
Time entities can also be used to search in documents or Temporal information embedded in documents in the form
log files that can be used to search the past for different of temporal expressions offer an interesting means to further
purposes such as digital forensics, historical analysis or lin- enhance the functionality of current information retrieval
guistic analysis. We can even search the future [6, 13], for applications.
example, in news for events that are scheduled or may hap- We have presented a number of examples and scenarios
pen in the future. This idea is supported in the Yahoo!s where temporal information can be very useful. We have
News Explorer tool already mentioned [19]. Microsoft Aca- identified research trends in this new area and a number of
demic Search8 is an example of presenting publications and interesting practical applications as well as problems.
citations in a timeline. Some problems are: The problems we outline are difficult because they in-
clude several areas of computer science, mainly information
• Besides news, what other sources would one like to use
retrieval, natural language processing, and user interfaces.
to search in the past and/or the future?
Moreover, several of them are multidisciplinary because they
7
http://correlator.sandbox.yahoo.com/. touch issues related to psychology or design, to mention just
8
http://academic.research.microsoft.com/. two, making them even more challenging.
6
TWAW 2011, Hyderabad, India
6. REFERENCES TempEval-2: Shallow Approach for Temporal Tagger.
[1] J. Allan, editor. Topic Detection and Tracking: In Proceedings of the Workshop on Semantic
Event-based Information Organization. Kluwer Evaluations: Recent Achievements and Future
Academic Publishers, Norwell, MA, USA, 2002. Directions (SEW ’09), pages 52–57, 2009.
[2] J. Allan, R. Gupta, and V. Khandelwal. Temporal [16] H. Llorens, E. Saquete, and B. Navarro. TIPSEM
Summaries of New Topics. In Proceedings of the 24th (English and Spanish): Evaluating CRFs and
Annual International ACM SIGIR Conference on Semantic Roles in TempEval-2. In Proceedings of the
Research and Development in Information Retrieval 5th International Workshop on Semantic Evaluation
(SIGIR ’01), pages 10–18, 2001. (SemEval ’10), pages 284–291, 2010.
[3] J. F. Allen. Maintaining Knowledge about Temporal [17] J. Makkonen and H. Ahonen-Myka. Utilizing
Intervals. In Communications of the ACM, Temporal Information in Topic Detection and
26(11):832–843, 1983. Tracking. In Proceedings of 7th European Conference
[4] O. Alonso, R. Baeza-Yates, and M. Gertz. on Research and Advanced Technology for Digital
Effectiveness of Temporal Snippets. In Proceedings of Libraries (ECDL ’03), pages 393–404, 2003.
the Workshop on Web Search Result Summarization [18] J. Makkonen, H. Ahonen-Myka, and M. Salmenkivi.
and Presentation (WSSP 09), pages 1–4, 2009. Topic Detection and Tracking with Spatio-temporal
[5] O. Alonso, M. Gertz, and R. Baeza-Yates. Clustering Evidence. In Proceedings of the 25th European
and Exploring Search Results Using Timeline Conference on Information Retrieval Research
Constructions. In Proceedings of the 18th ACM (ECIR ’03), pages 251–265, 2003.
International Conference on Information and [19] M. Matthews, P. Tolchinsky, R. Blanco, J. Atserias,
Knowledge Management (CIKM ’09), pages 97–106, P. Mika, and H. Zaragoza. Searching Through Time in
2009. the New York Times. In Proceedings of the Fourth
[6] R. Baeza-Yates. Searching the Future. In Proceedings Workshop on Human-Computer Interaction and
of the ACM SIGIR 2005 Workshop on Information Retrieval (HCIR ’10), pages 41–44, 2010.
Mathematical/Formal Methods in Information [20] I. Mani, J. Pustejovsky, and B. Sundheim.
Retrieval (MF/IR 05), pages 1–6, 2005. Introduction to the Special Issue on Temporal
[7] K. Berberich, S. Bedathur, O. Alonso, and Information Processing. In ACM Transactions on
G. Weikum. A Language Modeling Approach for Asian Language Information Processing 3(1): 1–10,
Temporal Information Needs. In Proceedings of the 2004.
32nd European Conference on Information Retrieval [21] I. Mani, J. Pustejovsky, and R. Gaizauskas, editors.
Research (ECIR ’10), pages 13–25, 2010. The Language of Time. Oxford University Press, New
[8] N. Chinchor. Overview of MUC-7/MET-2. In York, NY, USA, 2005.
Proceedings of the 7th Conference on Message [22] I. Mani and G. Wilson. Robust Temporal Processing
Understanding (MUC ’97), pages 1–11, 1997. of News. In Proceedings of the 38th Annual Meeting on
[9] Y. Chung, M. Toyoda, and M. Kitsuregawa. A Study Association for Computational Linguistics (ACL ’00),
of Link Farm Distribution and Evolution Using a Time pages 69–76, 2000.
Series of Web Snapshots. In Proceedings of the 5th [23] B. Martins, H. Manguinhas, and J. Borbinha.
International Workshop on Adversarial Information Extracting and Exploring the Geo-temporal Semantics
Retrieval on the Web (AIRWeb ’09), pages 9–16, 2009. of Textual Resources. In Proceedings of the IEEE
[10] G. Demartini, M. Missen, R. Blanco, and H. Zaragoza. International Conference on Semantic Computing
TAER: Time-aware Entity Retrieval. Exploiting the (ICSC ’08), pages 1–9, 2008.
Past to Find Relevant Entities in News Articles. In [24] P. Mazur and R. Dale. The DANTE Temporal
Proceedings of the 19th ACM International Conference Expression Tagger. In Proceedings of the 3rd Language
on Information and Knowledge Management and Technology Conference (LTC ’09), pages 245–257,
(CIKM ’10), pages 1517–1520, 2010. 2009.
[11] L. Ferro, L. Gerber, I. Mani, B. Sundheim, and [25] P. Mazur and R. Dale. WikiWars: A New Corpus for
G. Wilson. TIDES - 2005 Standard for the Annotation Research on Temporal Expressions. In Proceedings of
of Temporal Expressions. 2005. the 2010 Conference on Empirical Methods in Natural
http://fofoca.mitre.org/annotation_guidelines/ Language Processing (EMNLP ’10), pages 913–922,
2005_timex2_standard_v1.1.pdf 2010.
[12] R. Grishman and B. Sundheim. Design of the MUC-6 [26] M. Pasca. Towards Temporal Web Search. In
Evaluation. In Proceedings of the 6th Conference on Proceedings of the 2008 ACM Symposium on Applied
Message Understanding (MUC ’95), pages 1–11, 1995. Computing (SAC ’08), pages 1117–1121, 2008.
[13] A. Jatowt, K. Kanazawa, S. Oyama, and K. Tanaka. [27] J. Pustejovsky, J. M. Castaño, R. Ingria, R. Sauri,
Supporting Analysis of Future-related Information in R. J. Gaizauskas, A. Setzer, G. Katz, and D. R.
News Archives and the Web. In Proceedings of the 9th Radev. TimeML: Robust Specification of Event and
Joint Conference on Digital Libraries (JCDL ’09), Temporal Expressions in Text. In Proceedings of the
pages 115–124, 2009. AAAI Spring Symposium on New Directions in
[14] D. Koen and W. Bender. Time Frames: Temporal Question Answering, pages 28–34, 2003.
Augmentation of the News. In IBM Systems Journal, [28] J. Pustejovsky, P. Hanks, R. Sauri, A. See,
39(4): 597–616, 2000. R. Gaizauskas, A. Setzer, D. Radev, B. Sundheim,
[15] O. Kolomiyets and M.-F. Moens. Meeting D. Day, L. Ferro, and M. Lazo. The TIMEBANK
7
TWAW 2011, Hyderabad, India
Corpus. In Proceedings of Corpus Linguistics International Conference on Computational
Conference, pages 647–656, 2003. Linguistics (Coling ’08), pages 189–192, 2008.
[29] A. Qamra, B. Tseng, and E. Chang. Mining Blog [41] M. Verhagen, R. Sauri, T. Caselli, and J. Pustejovsky.
Stories Using Community-based and Temporal SemEval-2010 Task 13: TempEval-2. In Proceedings of
Clustering. In Proceedings of the 15th ACM the 5th International Workshop on Semantic
International Conference on Information and Evaluation (SemEval ’10), pages 57–62, 2010.
Knowledge Management (CIKM ’06), pages 58–67, [42] M. T. Vicente-Dı́ez, J. Moreno-Schneider, and
2006. P. Martı́nez. UC3M System: Determining the Extent,
[30] S. Schockaert1, D. Ahn, M. De Cock, and E. Kerre. Type and Value of Time Expressions in TempEval-2.
Question Answering with Imperfect Temporal In Proceedings of the 5th International Workshop on
Information. In Proceedings of the 7th Conference on Semantic Evaluation (SemEval ’10), pages 329–332,
Flexible Query Answering Systems (FQAS 06), pages 2010.
647-658, 2006. [43] G. Weikum and M. Theobald From Information to
[31] F. Schilder and C. Habel. From Temporal Expressions Knowledge: Harvesting Entities and Relationships
to Temporal Information: Semantic Tagging of News from Web Sources. In Proceedings of the 29th ACM
Messages. In Proceedings of the Workshop on SIGMOD-SIGACT-SIGART Symposium on Principles
Temporal and Spatial Information Processing of Database Systems of Data (PODS’ 10), pages
(TASIP ’01), pages 65–72, 2001. 65–76, 2010.
[32] M. Spaniol, D. Denev, A. Mazeika, G. Weikum, and
P. Senellart. Data Quality in Web Archiving. In
Proceedings of the 3rd Workshop on Information
Credibility on the Web (WICOW ’09), pages 19–26,
2009.
[33] B. Shaparenko, R. Caruana, J. Gehrke, and
T. Joachims. Identifying Temporal Patterns and Key
Players in Document Collections. In Proceedings of the
IEEE ICDM Workshop on Temporal Data Mining:
Algorithms, Theory and Applications (TDM ’05),
pages 165–174, 2005.
[34] J. Strötgen and M. Gertz. HeidelTime: High Quality
Rule-based Extraction and Normalization of Temporal
Expressions. In Proceedings of the 5th International
Workshop on Semantic Evaluation (SemEval ’10),
pages 321-324, 2010.
[35] J. Strötgen and M. Gertz. TimeTrails: A System for
Exploring Spatio-Temporal Information in
Documents. In Proceedings of the 36th International
Conference on Very Large Data Bases (VLDB ’10),
pages 1569–1572, 2010.
[36] J. Strötgen, M. Gertz, and P. Popov. Extraction and
Exploration of Spatio-temporal Information in
Documents. In Proceedings of the 6th Workshop on
Geographic Information Retrieval (GIR ’10), pages
1–8, 2010.
[37] R. Swan and J. Allan. TimeMine: Visualizing
Automatically Constructed Timelines. In Proceedings
of the 23rd Annual International ACM SIGIR
Conference on Research and Development in
Information Retrieval (SIGIR ’00), page 393, 2000.
[38] N. UzZaman and J. Allen. TRIPS and TRIOS System
for TempEval-2: Extracting Temporal Information
from Text. In Proceedings of the 5th International
Workshop on Semantic Evaluation (SemEval ’10),
pages 276–283, 2010.
[39] M. Verhagen, R. Gaizauskas, F. Schilder, M. Hepple,
G. Katz, and J. Pustejovsky. SemEval-2007 Task 15:
TempEval Temporal Relation Identification. In
Proceedings of the 4th International Workshop on
Semantic Evaluation (SemEval ’07), pages 75–80,
2007.
[40] M. Verhagen and J. Pustejovsky. Temporal Processing
with the TARSQI Toolkit. In Proceedings of the 22nd
8