=Paper=
{{Paper
|id=None
|storemode=property
|title=Temporal Information Retrieval: Challenges and Opportunities
|pdfUrl=https://ceur-ws.org/Vol-707/TWAW2011-paper1.pdf
|volume=Vol-707
|dblpUrl=https://dblp.org/rec/conf/www/AlonsoSBG11
}}
==Temporal Information Retrieval: Challenges and Opportunities==
<pdf width="1500px">https://ceur-ws.org/Vol-707/TWAW2011-paper1.pdf</pdf>
<pre>
                                    Temporal Information Retrieval:
                                     Challenges and Opportunities

                               Omar Alonso                                                      Jannik Strötgen
                              Microsoft Corp.                                            Institute of Computer Science
                             Mountain View, CA                                         University of Heidelberg, Germany
                   omar.alonso@microsoft.com                                                stroetgen@uni-hd.de

                         Ricardo Baeza-Yates                                                     Michael Gertz
                              Yahoo! Research                                            Institute of Computer Science
                              Barcelona, Spain                                         University of Heidelberg, Germany
                            rbaeza@acm.org                                                     gertz@uni-hd.de


ABSTRACT                                                                         describing the chronological context of a document or a col-
Time is an important dimension of any information space.                         lection of documents. As an extension to existing ranking
It can be very useful for a wide range of information re-                        techniques, which are primarily based on popularity or rep-
trieval tasks such as document exploration, similarity search,                   utation, time can be in particular valuable for exploring
summarization, and clustering. Traditionally, information                        search results along well-defined timelines and at multiple
retrieval applications do not take full advantage of all the                     time granularities due to the key characteristics of temporal
temporal information embedded in documents to provide                            information:
alternative search features and user experience. However, in
                                                                                    • Temporal information is well-defined: Assuming two
the last few years there has been exciting work on analyzing
                                                                                      points in time or two intervals, the relationship be-
and exploiting temporal information for the presentation,
                                                                                      tween them can be identified, e.g., the relationship can
organization, and in particular the exploration of search re-
                                                                                      be of the types before, overlap, or after [3].
sults.
   In this paper, we review the current research trends and                         • Temporal information can be normalized: Regardless
present a number of interesting applications along with open                          of the used terms or the used language, every temporal
problems. The goal is to discuss interesting areas and future                         expression referring to the same semantics can be nor-
work for this exciting field of information management.                               malized to the same value in some standard format.
                                                                                      This property makes temporal information term- and
                                                                                      language-independent.
Categories and Subject Descriptors
I.2.7 [Artificial Intelligence]: Natural Language Process-                          • Temporal information can be organized hierarchically:
ing—Language models, Text analysis                                                    Temporal expressions can be of different granularities,
                                                                                      e.g., of type day (“May 20, 2011”) or of type year
                                                                                      (“2011”). Due to the fact that years consist of months,
Keywords                                                                              and months and weeks consist of days, temporal ex-
temporal information, information retrieval                                           pressions can be mapped to coarser granularities based
                                                                                      on the hierarchy of temporal expressions.

1.    INTRODUCTION                                                                  Using these key characteristics, temporal information about
  Time clearly plays a central role in any information space,                    documents can be used to develop time-specific information
and it has been studied in several areas like information                        retrieval and exploration applications. The most obvious
extraction, topic-detection, question-answering, query log                       type of temporal information associated with a document is
analysis, and summarization. Time and temporal measure-                          its creation time or the date of its last modification. This
ments can help recreating a particular historical period or                      kind of information, which is directly accessible through
                                                                                 the metadata of a document, can already be used for sev-
                                                                                 eral tasks such as time-aware search or temporal clustering.
                                                                                 However, the document creation time is only valuable in a
                                                                                 specific context such as the news domain. In other areas,
                                                                                 and even in the news domain itself, a lot of temporal infor-
                                                                                 mation is neglected if the document creation time is used as
                                                                                 the only time information associated with a document. This
Copyright 2011 for the individual papers by the papers’ authors. Copying         is because there is a lot of temporal information latently
permitted only for private and academic purposes. This volume is published
and copyrighted by its editors.                                                  available in a document’s text. Assume a news document
TWAW 2011, March 28, 2011, Hyderabad, India.                                     reports on an event that is already dated. Then, if only the


                                                                             1
TWAW 2011, Hyderabad, India

document creation time is taken into account, the informa-             out any further knowledge. That is, the expression itself
tion when the event occurred is ignored. But to make use               contains all the information needed for normalization and is
of such information, so-called temporal taggers are applied            thus fully specified.
to extract and normalize temporal expressions contained in                In contrast, relative temporal expressions cannot be nor-
documents.                                                             malized without taking into account some context informa-
   The remainder of the paper is organized as follows. After           tion. For example, the expression “today” cannot be nor-
a discussion of how time appears in documents and how                  malized without knowing the corresponding reference time.
it is possible to extract such temporal data, in Section 3,            This reference time can either be the document creation time
we survey research on temporal tagging. In Section 4, we               or another temporal expression in the document. Typically,
present the current research trends on temporal information            in news articles, the document creation time is important
retrieval. We then describe application areas and challenges.          and often used as reference time. Note that this kind of
Finally, we present our concluding remarks.                            information is directly accessible in form of a timestamp
                                                                       through the metadata of a document. The expression “yes-
2.    TIME IN DOCUMENTS                                                terday” in Thousands of prisoners in Egypt managed to es-
                                                                       cape from prison yesterday can be normalized to “2011-01-
   As indicated in the introduction, there is a lot of temporal
                                                                       29” if we know the document creation time to be “2011-
information in any collection of documents, be it ranked doc-
                                                                       01-30”. In other types of documents, the reference time is
uments in a hit list or a corpus of topic specific documents.
                                                                       usually represented by another temporal expression in the
To take advantage of such time related information in par-
                                                                       document. In general, there are many occurrences of relative
ticular for document exploration purposes, in a document
                                                                       temporal expressions. While sometimes the reference time is
processing step, it is important to extract this information,
                                                                       sufficient for normalization, further information is needed if
anchor it in time, compute some (aggregated) measures, and
                                                                       the relation to the reference time has to be identified as well.
make all this information explicit to subsequent exploration
                                                                       For example, “on Monday” can either refer to the previous or
tasks.
                                                                       to the next Monday with respect to the reference time. An
   In this section, we give a description of the different types
                                                                       indicator for determining the relationship can be the tense
of temporal information mentioned in documents (Sec. 2.1),
                                                                       of the sentence with future tense and present tense indicat-
explain how temporal expressions can be realized in natu-
                                                                       ing an after-relationship to the reference time and past tense
ral language (Sec. 2.2), and demonstrate how they can be
                                                                       a before-relationship. Figure 1 shows some parts of a news
extracted and normalized (Sec. 2.3).
                                                                       article containing explicit and relative temporal expressions
2.1    Types of Temporal Information                                   and illustrate what kind of context information is needed for
                                                                       normalizing the relative expressions.
   Temporal expressions mentioned in text documents can
                                                                          The third type of temporal expressions are implicit expres-
be grouped into four types according to TimeML [27], the
                                                                       sions such as names of holidays or events. These expressions
standard markup language for temporal information: date,
                                                                       can be anchored on a timeline if a mapping of the expres-
time, duration, and set. While duration expressions are used
                                                                       sion to its normalized value is available. For example, “New
to provide information about the length of an interval (e.g.,
                                                                       Year’s Day 2002” can be normalized to “2002-01-01” since
“three years” in they have been traveling through the U.S.
                                                                       “New Year’s Day” always refers to January 1. In addition,
for three years), set expressions inform about the periodical
                                                                       there are expressions for which a temporal function has to
aspect of an event (e.g., “twice a week” in she goes to the
                                                                       be applied. “Labor Day”, for example, refers to the first
gym twice a week ). In contrast, time and date expressions
                                                                       Monday in September so that “Labor Day 2009” can be nor-
(e.g., “3 p.m.” or “January 25, 2010”) both refer to a specific
                                                                       malized to “2009-09-07” if we know this day to be the first
point in time – though in a different granularity.
                                                                       Monday in September 2009.
   An interesting key feature of temporal information is that
                                                                          Although there are many different ways to refer to a spe-
it can be normalized to some standard format. Assuming
                                                                       cific point in time, all expressions referring to the same point
a Gregorian calendar as representation of time, expressions
                                                                       in time shall be normalized to the same value in the stan-
of time and date can be directly placed on a timeline. A
                                                                       dard format. This normalization process is one of the tasks
date is then typically represented as YYYY-MM-DD, e.g.,
                                                                       of so-called temporal taggers, as described in the next para-
the expression “January 25, 2010” is normalized to “2010-01-
                                                                       graph.
25”. However, the normalization is not always as simple as in
this example, but depends on the way temporal information
is expressed in a document, which will be discussed in the             2.3    Temporal Tagging
next paragraph.                                                           Temporal tagging is a specific task in named entity recog-
                                                                       nition and normalization. The goals of so-called temporal
2.2    Occurrences of Temporal Expressions                             taggers are (i) the extraction of temporal expressions and
   There are many different ways how to express temporal               (ii) the normalization of these expressions to some standard
information of the types date and time in documents. Sim-              format. As this standard format, TIMEX2 and TIMEX3
ilar to the work by Schilder and Habel [31], we distinguish            are often used. While TIMEX2 tags include pre- and post-
between explicit, implicit, and relative temporal expressions.         modifiers of the temporal expression itself (e.g., dependent
   Explicit temporal expressions refer to a specific point in          clauses) and allow for nested temporal expressions [11], such
time. Note that this point in time can be of different gran-           modifiers and nested tags are not included by TIMEX3 tags.
ularities. For example, the expression “January 25, 2010”              Instead, TIMEX3 is part of the TimeML markup language
refers to a specific day while the expression “November 2005”          in which further annotation types are available for captur-
refers to a specific month. The key characteristic of explicit         ing more complex temporal semantics. Nevertheless, al-
temporal expressions is that they can be normalized with-              though there are significant differences between TIMEX2


                                                                   2
                                                                                               TWAW 2011, Hyderabad, India

Document Creation Time: 1998-04-18                                    temporal expressions, the normalization is usually done in a
                                                                      rule-based way by all temporal taggers.
Hungarian astronaut Bertalan Farkas is leaving for the                  Due to their importance for temporal information retrieval,
United States to start a new career, he said            today .       we give an overview of existing temporal taggers and their
                                                                      quality in the next section. In addition, we present resources
. . . On May 22, 1995 , Farkas was made a brigadier general,
                                                                      for evaluating temporal taggers and survey temporal evalu-
and the following year he was appointed military attache              ation challenges organized so far.
. . . However, cited by District of Columbia traffic police in
    December for driving under the influence of . . .                 3.    RESEARCH ON TEMPORAL TAGGING
                                                                         Temporal processing of text documents in terms of the ex-
                                                                      traction and normalization of temporal expressions as well
Figure 1: Examples of temporal expressions in a                       as the extraction of temporal relations between events is
news article with explicit (transparent boxes) and                    very important for several NLP tasks requiring a deep un-
relative (solid boxes) expressions. Arrows indicate                   derstanding of language such as question answering or doc-
what kind of context information is needed to nor-                    ument summarization. Due to this fact, there has been sig-
malize the temporal expression.                                       nificant research in temporal annotation of text documents.
                                                                      The markup language TimeML has become an ISO standard
                                                                      for temporal annotation [27], and the TimeBank corpus was
and TIMEX3, they are very similar in many ways and a
                                                                      developed [28]. The latest version of the TimeBank corpus
detailed analysis goes beyond the scope of this paper.1 Ac-
                                                                      contains 183 news articles and can be regarded as the gold
cording to the TimeML annotation guidelines, a TIMEX3
                                                                      standard for temporal annotation. However, there has been
tag contains, among others, the following information about
                                                                      important research activity before, and several evaluation
a temporal expression:
                                                                      challenges have been held to bring forward research in the
     • offset: the start and end position of the expression in        area of temporal information extraction as described in the
       the document                                                   following section.

     • type: whether the expression is of type date, time,            3.1    Evaluation Challenges
       duration, or set                                                  The earliest competitions for the extraction of temporal
                                                                      expressions have been the named entity recognition tasks
     • value: the normalized value of the expression
                                                                      of the Message Understanding Conferences MUC 1995 and
   To identify this information, i.e., to extract and normal-         1997 [8, 12]. A combination of the extraction and the nor-
ize temporal expressions, temporal taggers are applied after          malization was introduced in the ACE (Automatic Content
the text is preprocessed. Usually, sentence and token bound-          Extraction) time expression recognition and normalization
aries are detected and a part-of-speech tag is associated with        (TERN) challenges in 2004, 2005, and 20072 . Several tem-
every token. This information can then be used by the tem-            poral taggers have been developed by the participants of
poral tagger to identify temporal expressions. The first goal,        these challenges (see Section 3.2). Often, the TERN 2004
i.e., the identification of the boundaries of temporal expres-        and 2005 corpora3 are used to compare the quality of tempo-
sions, can be seen as typical classification task. Therefore,         ral taggers. The TERN corpora are annotated with respect
there has been some work on addressing this problem by                to the TIMEX2 annotation guidelines [11].
applying machine learning techniques (e.g., [15, 38]). The               A further indication of the importance of temporal an-
classification problem can be described in the following way:         notation and the activity in the research domain are the
For every token t, decide whether t is outside (O) of tem-            TempEval challenges. Motivated by the importance of tem-
poral expressions, inside (I) a temporal expression, or the           poral annotation for many NLP tasks, TempEval was orga-
beginning (B) of a temporal expression. The well-known                nized the first time as one task of the SemEval workshop
IOB-format can be used for annotating tokens according to             in 2007 [39]. In this competition, the organizers provided
their property.                                                       annotated text documents based on the TimeBank corpus.
   In addition to machine-learning approaches, there are sev-         While the annotations of events and temporal expressions
eral rule-based approaches to extract temporal expressions            were given, the task for the tools to be developed was to
(e.g., [24, 34]). These are usually based on regular expres-          identify temporal relations between events and the docu-
sions although they may use other information about the               ment creation time, between events and temporal expres-
text as well, such as part-of-speech information.                     sions, and between two events.
   The more difficult task is the normalization of the tempo-            In 2010, the full task of identifying all temporal related ex-
ral expressions. While explicit expressions can be normal-            pressions and relations was faced in the follow-up challenge.
ized without further knowledge, the normalization of rela-            That is, for TempEval-2, two further tasks were added [41]:
tive expressions is challenging. As described above, context          the extraction and normalization of temporal expressions
information has to be identified to determine the correct ref-        and of events. In addition, the discovery of relations be-
erence time and the temporal relation between a temporal              tween two events was split into two tasks, namely the iden-
expression and its reference time. While there are rule-based         tification of relations between two main events in consecu-
and machine learning based approaches for the extraction of           2
                                                                       http://www.itl.nist.gov/iad/mig/tests/ace/.
1                                                                     3
 For further information on temporal annotation according              The TERN development corpora are available through the
to TimeML and differences between TIMEX2 and TIMEX3,                  Linguistic Data Consortium. See: http://fofoca.mitre.
see http://www.timeml.org.                                            org/tern.html.


                                                                  3
TWAW 2011, Hyderabad, India

tive sentences and relations between two events where one
event syntactically dominates the other event. The Temp-
Eval corpora are based on the TimeBank corpus and an-
notated according the TimeML annotation guidelines, i.e.,
using TIMEX3 tags for temporal expressions. A further nov-
elty in the second TempEval challenge was that the tasks
were offered not only in English but in six languages. How-
ever, only two languages where addressed by the partici-
pants, namely English and Spanish. Nevertheless, thanks
to the creation of an annotation standard, a gold standard
corpus, and competitions such as the TempEval challenges,
there has been significant improvements in temporal relation       Figure 2: Annotation of a timeline by workers using
identification and temporal tagging. Some existing temporal        crowdsourcing.
taggers and their quality is presented in the next paragraph
by comparing their results in the TempEval-2 challenge.
                                                                   4.     RESEARCH TRENDS
                                                                      Research work on fully utilizing the temporal informa-
3.2   Temporal Taggers                                             tion embedded in the text of documents for exploration and
   Having applied a temporal tagger on a document collec-          search purposes is very recent. The work by Alonso et al.
tion, the previously hidden temporal information is made           presents an approach for extracting temporal information
available for tasks such as temporal relation extraction or        and how it can be used for clustering search results [5].
temporal clustering. One often applied temporal tagger is          Berberich et al. describe a model for temporal information
GuTime, which is part of the Tarsqi tool kit [40]. It is           needs [7]. Figure 2 shows the annotated timeline for the
based on the TempEx tagger, which was the first temporal           NYTimes data set for the latter reference using the Time-
tagger for the extraction of temporal expressions and their        line widget5 . These last two projects rely on crowdsourcing,
normalizations [22]. GuTime was developed as automatic             mainly using Amazon Mechanical Turk, for evaluating parts
evaluation tool for TimeML and extends the capabilities of         of their work.
the TempEx tagger. It was evaluated on the TERN 2004                  News sources have been the primary focus of a number of
training corpus and achieves F-measures of 85%, 78%, and           projects on exploiting time information in documents. For
82% for lenient and strict detection and for normalization,        example, the Time Frames project realizes an approach to
respectively.                                                      augment news articles by extracting time information [14].
   In the TempEval-2 challenge, eight teams participated in        Google’s news timeline6 is an experimental feature that al-
the task for temporal expression extraction and normaliza-         lows a user to explore news by time.
tion for English documents. The best-performing system                Extensions to document operations such as comparing the
was HeidelTime with an F-Score of 86% for the extraction           temporal similarity of two documents in the context of news
and an accuracy of 85% for the normalization [34]. For Span-       articles is presented by Makkonen and Ahonen-Myka [17].
ish documents, the best result for the extraction was an F-        An interesting approach that combines topic detection and
Score of 91% [16], while another system achieved the highest       tracking with timelines as a browsing interface is presented
accuracy for normalization (83%) [42]. While both machine-         by Swan and Allan [37]. Time information is also used in
learning and rule-based approaches were applied for the ex-        temporal mining of blogs to extract useful information [29].
traction, the normalization was done in a rule-based way by        New research has also emerged for future retrieval where
all systems. As the best performing system, HeidelTime uses        temporal information is used for searching the future [6].
rules consisting of an extraction and a normalization part.           There is exciting research on adding a time dimension
Thus, all temporal expressions that are identified are nor-        to certain applications like news summaries [2], temporal
malized as well. Due to the strict separation of the code and      patterns [33], and temporal Web search [26]. The special
the rules, HeidelTime is applicable for multi-lingual tempo-       issue on temporal information processing gives a clear map
ral tagging4 .                                                     of current directions [20]. Harvesting temporal information
   Although there has been significant advances in tempo-          and how it can be used for entities and relationships is also
ral tagging, there is still room for improvement, especially       a very recent rich area [43].
when switching the processing language or the domain of the           Closely related to information extraction is the recent re-
document collection. For example, Mazur and Dale recently          search on temporal annotations, which is covered in depth in
presented a new corpus for research on temporal expressions        the book by Mani et al. [21]. Identification of time related
containing long, narrative-style documents, namely Wikipedia       information depends heavily on the language and the cor-
articles describing the historical course of wars [25]. Using      pora, so traditional information extraction systems tend to
their temporal tagger, they show that the normalization of         fall short in terms of temporal extraction. Based on the lat-
temporal expressions in such documents is very challenging         est advances, new research is emerging for automatic assign-
due to the rich discourse structure and the huge number            ment of document event-time periods and automatic tagging
of often underspecified temporal expressions in these docu-        of news messages using entity extraction [31].
ments compared to the usually used short news documents.              Now, we outline a number of applications that can benefit
                                                                   from leveraging more temporal information either by tem-
                                                                   poral expressions or timestamps. For each application, we
4                                                                  5
 For details,   see http://dbs.ifi.uni-heidelberg.de/                  http://simile.mit.edu/
                                                                   6
stixx/.                                                                http://www.newstimeline.googlelabs.com


                                                               4
                                                                                                TWAW 2011, Hyderabad, India

describe why it is important and present a number of chal-                 • When to show a timestamp or temporal expressions?
lenges.
                                                                           • Should the snippet present the matching lines in a
4.1    Exploratory Search                                                    timeline?
  Research in exploratory search systems has gained a lot                  • Is a temporal summary a good surrogate for a docu-
of attention lately as they add a significant user interface                 ment?
component to help users search, navigate, and discover new
facts and relationships. As the amount of information on                   • For which kind of queries is a temporal summary ap-
the Web keeps growing, exploratory search interfaces are                     propriate?
starting to surface. That said, it is not clear how to leverage
temporal information. A few problems are:                                  • Should temporal summaries be query independent?

   • How to expose temporal information in exploratory                  4.4   Temporal Clustering
     search systems?                                                       The notion of clustering search results by temporal at-
                                                                        tributes has been presented in [5]. Preliminary results in-
   • What’s the best way of presenting temporal informa-
                                                                        dicate that users are interested in dissecting a document
     tion as a retrieval cue?
                                                                        collection by time. At the same time it is not clear for
   • For which data sources, besides news, does exploratory             which kind of scenarios besides “research-like” questions this
     search make sense?                                                 approach would work. Key issues are:
   • Is e-discovery a vertical application that can benefit                • Can we identify documents that are contemporary and
     from temporal information?                                              therefore related?
4.2    Micro-blogging and Real-time Search                                 • Which chronons can be more useful for clustering?
  Micro-blogging sites like Twitter have gained a lot of at-
tention lately as the ultimate mechanism to broadcast what’s               • How can we cluster micro-blogging data by time?
going on. Due to its nature, a typical message is very short               • Is a timeline the best way to cluster search results?
and its lifespan is basically the crowd interest about that
particular event be a football final game or an earthquake.             4.5   Temporal Querying
  In the case of Twitter, it is very difficult to beat the timely          The temporal information extracted from documents can
broadcasting of an important event if one compares this to              directly be used to allow the user of a search engine to con-
a news article. Each tweet has a timestamp but the orga-                strain his/her query in a temporal manner. That is, in addi-
nization of such information is still not clear. In the news            tion to a textual part, a query contains a temporal part. For
context, the reporter has to write an article that contains a           example, in addition to “world war” a temporal constraint
few paragraphs and submit the final version through some                like “1944-1945” could be specified. The user would obvi-
content management version that would push it to an ex-                 ously expect documents about World War II as results for
ternal website so a search engine can hopefully crawl and               his query. The objective when using a combination of a text
index it in time. In parallel, if a tweet is so important by            and a temporal query can thus be formulated in the follow-
the time the reporter is finishing with the article, the main           ing way: The more both parts of the query are satisfied, i.e,
idea would be trending in Twitter, therefore highlighting its           the more the textual and the temporal parts fit to a docu-
importance at a world scale. This is very similar to the tra-           ment, the higher should be the rank of this document. The
ditional notion of topic detection and tracking [1, 18], with           main problems for such a combination of constraints is the
one key difference: speed to detect that the topic is impor-            following:
tant and therefore a candidate for trending. Some problems
are:                                                                       • How can a combined score for the textual part and the
                                                                             temporal part of a query be calculated in a reasonable
   • What is the best way to provide a timeline of events                    way?
     in micro-blogging?
                                                                           • Should a document in which the “textual match” and
   • What is the lifespan of the main event?                                 the “temporal match” are far away from each other be
   • How fast and precise can one detect trending events?                    penalized?

   • What is the fraction of new content on the topic stream?              • What about documents satisfying one of the constraints
                                                                             but “slightly” fail to satisfy the other constraint?
4.3    Temporal Summaries
  There has been seminal work on temporal summaries of                  4.6   Temporal Question Answering
news topics by Allan [2] that shows how important temporal                 To be able to answer time-related questions, a question an-
information is. One extension is to generate time sensitive             swering system has to know when specific events took place.
summaries that can be used as temporal snippets [4].                    For this, temporal information can be associated with ex-
  By design, the main goal of a snippet (or caption) is to              tracted facts from text documents [26]. While this may be
present a document surrogate that the user can quickly scan             applicable for famous facts and events, question answering
in the search results page without the need to click and                systems are often faced with imperfect temporal informa-
read the full content of a document. There is a limit to                tion. For this, identifying relationships between events de-
the number of lines of text that the snippet should present.            scribed in documents is important as it is for many further
Interesting questions include:                                          NLP tasks (see Section 3.1). Especially historic events tend


                                                                    5
TWAW 2011, Hyderabad, India

to have a gradual beginning and ending so that knowing                      • How far does one need to go back in time?
the temporal relationship between two events may allow to
answer a temporal query although no explicit temporal in-                   • Can we improve bibliographic search instead of just
formation is associated with the events [30]. Research issues                 sorting by publication date?
are:
                                                                            • How can we evaluate the quality of such systems?
     • How can inconsistent temporal information be dealt
       with?                                                           4.10     Web Archiving
     • How can temporal reasoning be executed if temporal                The goal of Web archiving is to collect and store digi-
       relationships are missing?                                      tal content so that it is accessible for future tasks. Besides
                                                                       the detection of spam, which can be dealt with analyzing
4.7     Temporal Similarity                                            the evolution of the link structure of web pages [9], a main
  A related research question to temporal querying is tem-             challenge in Web archiving is to take care of the temporal
poral document similarity. Instead of comparing a temporal             coherence of Web pages since it is not possible to collect all
query with the temporal information of a document, two                 pages at the same time. Thus, the content of parts of the
documents can be compared with respect to their temporal               collection may change during the crawling process. In [32],
similarity. The main problem arising here is what makes two            Spaniol et al. introduce a coherence framework to overcome
documents temporally similar? This leads to the following              the temporal diffusion of the Web crawls, i.e., to minimize
questions:                                                             the risk of incoherences. Nevertheless, open problems re-
                                                                       main:
     • Should two documents be considered similar if they
       cover the same temporal interval?                                    • How can temporal information be used to predict which
                                                                              pages are likely to change over time?
     • Should the temporal focus of the documents be impor-
       tant for their temporal similarity?                                  • How can temporal coherence be achieved for any point
                                                                              in time or time interval?
     • Can two documents be regarded as temporally similar
       if one contains a small temporal interval of the other
       document in a detailed way?
                                                                       4.11     Spatio-temporal Information Exploration
                                                                          Recently, there has been some research on combining spa-
4.8     Timelines and User Interfaces                                  tial and temporal information extracted from documents for
   One important use of time entities of a document is to              exploration tasks [23, 36]. In the same way as temporal in-
create a sorted list of events, a timeline. A timeline can             formation can be normalized using a timeline, spatial infor-
be shown as a list of vertical textual items or visualized in          mation can be normalized according its latitude/longitude
many different ways. For example, as in Yahoo!’s Corre-                information. To extract geographic expressions from docu-
lator7 . More sophisticated visualizations allow to focus on           ments, so-called geo taggers can be applied. Combining the
specific named entities with respect to time like in Yahoo!’s          information extracted from a temporal tagger with the in-
News Explorer [10, 19]. Here, interesting questions are:               formation extracted from a geo tagger allows the exploration
                                                                       of documents according to the events mentioned in the text
     • What is the appropriate way to present a timeline?              since events usually happen at some specific time and place.
                                                                       A system for the exploration of such spatio-temporal infor-
     • Is a linear timeline the only way to present and anchor         mation from documents is TimeTrails [35]. Some questions
       documents in time?                                              are:
     • How can one leverage document temporal measures to
                                                                            • What’s the best way to represent maps and time?
       present a good display?
     • Are there specific visualizations or user interfaces that            • Which kinds of scenarios can benefit from spatio-temporal
       can benefit from temporal information?                                 exploration?

4.9     Searching in Time                                              5.    CONCLUDING REMARKS
   Time entities can also be used to search in documents or               Temporal information embedded in documents in the form
log files that can be used to search the past for different            of temporal expressions offer an interesting means to further
purposes such as digital forensics, historical analysis or lin-        enhance the functionality of current information retrieval
guistic analysis. We can even search the future [6, 13], for           applications.
example, in news for events that are scheduled or may hap-                We have presented a number of examples and scenarios
pen in the future. This idea is supported in the Yahoo!s               where temporal information can be very useful. We have
News Explorer tool already mentioned [19]. Microsoft Aca-              identified research trends in this new area and a number of
demic Search8 is an example of presenting publications and             interesting practical applications as well as problems.
citations in a timeline. Some problems are:                               The problems we outline are difficult because they in-
                                                                       clude several areas of computer science, mainly information
     • Besides news, what other sources would one like to use
                                                                       retrieval, natural language processing, and user interfaces.
       to search in the past and/or the future?
                                                                       Moreover, several of them are multidisciplinary because they
7
    http://correlator.sandbox.yahoo.com/.                              touch issues related to psychology or design, to mention just
8
    http://academic.research.microsoft.com/.                           two, making them even more challenging.


                                                                   6
                                                                                       TWAW 2011, Hyderabad, India

6.   REFERENCES                                                       TempEval-2: Shallow Approach for Temporal Tagger.
 [1] J. Allan, editor. Topic Detection and Tracking:                  In Proceedings of the Workshop on Semantic
     Event-based Information Organization. Kluwer                     Evaluations: Recent Achievements and Future
     Academic Publishers, Norwell, MA, USA, 2002.                     Directions (SEW ’09), pages 52–57, 2009.
 [2] J. Allan, R. Gupta, and V. Khandelwal. Temporal             [16] H. Llorens, E. Saquete, and B. Navarro. TIPSEM
     Summaries of New Topics. In Proceedings of the 24th              (English and Spanish): Evaluating CRFs and
     Annual International ACM SIGIR Conference on                     Semantic Roles in TempEval-2. In Proceedings of the
     Research and Development in Information Retrieval                5th International Workshop on Semantic Evaluation
     (SIGIR ’01), pages 10–18, 2001.                                  (SemEval ’10), pages 284–291, 2010.
 [3] J. F. Allen. Maintaining Knowledge about Temporal           [17] J. Makkonen and H. Ahonen-Myka. Utilizing
     Intervals. In Communications of the ACM,                         Temporal Information in Topic Detection and
     26(11):832–843, 1983.                                            Tracking. In Proceedings of 7th European Conference
 [4] O. Alonso, R. Baeza-Yates, and M. Gertz.                         on Research and Advanced Technology for Digital
     Effectiveness of Temporal Snippets. In Proceedings of            Libraries (ECDL ’03), pages 393–404, 2003.
     the Workshop on Web Search Result Summarization             [18] J. Makkonen, H. Ahonen-Myka, and M. Salmenkivi.
     and Presentation (WSSP 09), pages 1–4, 2009.                     Topic Detection and Tracking with Spatio-temporal
 [5] O. Alonso, M. Gertz, and R. Baeza-Yates. Clustering              Evidence. In Proceedings of the 25th European
     and Exploring Search Results Using Timeline                      Conference on Information Retrieval Research
     Constructions. In Proceedings of the 18th ACM                    (ECIR ’03), pages 251–265, 2003.
     International Conference on Information and                 [19] M. Matthews, P. Tolchinsky, R. Blanco, J. Atserias,
     Knowledge Management (CIKM ’09), pages 97–106,                   P. Mika, and H. Zaragoza. Searching Through Time in
     2009.                                                            the New York Times. In Proceedings of the Fourth
 [6] R. Baeza-Yates. Searching the Future. In Proceedings             Workshop on Human-Computer Interaction and
     of the ACM SIGIR 2005 Workshop on                                Information Retrieval (HCIR ’10), pages 41–44, 2010.
     Mathematical/Formal Methods in Information                  [20] I. Mani, J. Pustejovsky, and B. Sundheim.
     Retrieval (MF/IR 05), pages 1–6, 2005.                           Introduction to the Special Issue on Temporal
 [7] K. Berberich, S. Bedathur, O. Alonso, and                        Information Processing. In ACM Transactions on
     G. Weikum. A Language Modeling Approach for                      Asian Language Information Processing 3(1): 1–10,
     Temporal Information Needs. In Proceedings of the                2004.
     32nd European Conference on Information Retrieval           [21] I. Mani, J. Pustejovsky, and R. Gaizauskas, editors.
     Research (ECIR ’10), pages 13–25, 2010.                          The Language of Time. Oxford University Press, New
 [8] N. Chinchor. Overview of MUC-7/MET-2. In                         York, NY, USA, 2005.
     Proceedings of the 7th Conference on Message                [22] I. Mani and G. Wilson. Robust Temporal Processing
     Understanding (MUC ’97), pages 1–11, 1997.                       of News. In Proceedings of the 38th Annual Meeting on
 [9] Y. Chung, M. Toyoda, and M. Kitsuregawa. A Study                 Association for Computational Linguistics (ACL ’00),
     of Link Farm Distribution and Evolution Using a Time             pages 69–76, 2000.
     Series of Web Snapshots. In Proceedings of the 5th          [23] B. Martins, H. Manguinhas, and J. Borbinha.
     International Workshop on Adversarial Information                Extracting and Exploring the Geo-temporal Semantics
     Retrieval on the Web (AIRWeb ’09), pages 9–16, 2009.             of Textual Resources. In Proceedings of the IEEE
[10] G. Demartini, M. Missen, R. Blanco, and H. Zaragoza.             International Conference on Semantic Computing
     TAER: Time-aware Entity Retrieval. Exploiting the                (ICSC ’08), pages 1–9, 2008.
     Past to Find Relevant Entities in News Articles. In         [24] P. Mazur and R. Dale. The DANTE Temporal
     Proceedings of the 19th ACM International Conference             Expression Tagger. In Proceedings of the 3rd Language
     on Information and Knowledge Management                          and Technology Conference (LTC ’09), pages 245–257,
     (CIKM ’10), pages 1517–1520, 2010.                               2009.
[11] L. Ferro, L. Gerber, I. Mani, B. Sundheim, and              [25] P. Mazur and R. Dale. WikiWars: A New Corpus for
     G. Wilson. TIDES - 2005 Standard for the Annotation              Research on Temporal Expressions. In Proceedings of
     of Temporal Expressions. 2005.                                   the 2010 Conference on Empirical Methods in Natural
     http://fofoca.mitre.org/annotation_guidelines/                   Language Processing (EMNLP ’10), pages 913–922,
     2005_timex2_standard_v1.1.pdf                                    2010.
[12] R. Grishman and B. Sundheim. Design of the MUC-6            [26] M. Pasca. Towards Temporal Web Search. In
     Evaluation. In Proceedings of the 6th Conference on              Proceedings of the 2008 ACM Symposium on Applied
     Message Understanding (MUC ’95), pages 1–11, 1995.               Computing (SAC ’08), pages 1117–1121, 2008.
[13] A. Jatowt, K. Kanazawa, S. Oyama, and K. Tanaka.            [27] J. Pustejovsky, J. M. Castaño, R. Ingria, R. Sauri,
     Supporting Analysis of Future-related Information in             R. J. Gaizauskas, A. Setzer, G. Katz, and D. R.
     News Archives and the Web. In Proceedings of the 9th             Radev. TimeML: Robust Specification of Event and
     Joint Conference on Digital Libraries (JCDL ’09),                Temporal Expressions in Text. In Proceedings of the
     pages 115–124, 2009.                                             AAAI Spring Symposium on New Directions in
[14] D. Koen and W. Bender. Time Frames: Temporal                     Question Answering, pages 28–34, 2003.
     Augmentation of the News. In IBM Systems Journal,           [28] J. Pustejovsky, P. Hanks, R. Sauri, A. See,
     39(4): 597–616, 2000.                                            R. Gaizauskas, A. Setzer, D. Radev, B. Sundheim,
[15] O. Kolomiyets and M.-F. Moens. Meeting                           D. Day, L. Ferro, and M. Lazo. The TIMEBANK


                                                             7
TWAW 2011, Hyderabad, India

     Corpus. In Proceedings of Corpus Linguistics                      International Conference on Computational
     Conference, pages 647–656, 2003.                                  Linguistics (Coling ’08), pages 189–192, 2008.
[29] A. Qamra, B. Tseng, and E. Chang. Mining Blog                [41] M. Verhagen, R. Sauri, T. Caselli, and J. Pustejovsky.
     Stories Using Community-based and Temporal                        SemEval-2010 Task 13: TempEval-2. In Proceedings of
     Clustering. In Proceedings of the 15th ACM                        the 5th International Workshop on Semantic
     International Conference on Information and                       Evaluation (SemEval ’10), pages 57–62, 2010.
     Knowledge Management (CIKM ’06), pages 58–67,                [42] M. T. Vicente-Dı́ez, J. Moreno-Schneider, and
     2006.                                                             P. Martı́nez. UC3M System: Determining the Extent,
[30] S. Schockaert1, D. Ahn, M. De Cock, and E. Kerre.                 Type and Value of Time Expressions in TempEval-2.
     Question Answering with Imperfect Temporal                        In Proceedings of the 5th International Workshop on
     Information. In Proceedings of the 7th Conference on              Semantic Evaluation (SemEval ’10), pages 329–332,
     Flexible Query Answering Systems (FQAS 06), pages                 2010.
     647-658, 2006.                                               [43] G. Weikum and M. Theobald From Information to
[31] F. Schilder and C. Habel. From Temporal Expressions               Knowledge: Harvesting Entities and Relationships
     to Temporal Information: Semantic Tagging of News                 from Web Sources. In Proceedings of the 29th ACM
     Messages. In Proceedings of the Workshop on                       SIGMOD-SIGACT-SIGART Symposium on Principles
     Temporal and Spatial Information Processing                       of Database Systems of Data (PODS’ 10), pages
     (TASIP ’01), pages 65–72, 2001.                                   65–76, 2010.
[32] M. Spaniol, D. Denev, A. Mazeika, G. Weikum, and
     P. Senellart. Data Quality in Web Archiving. In
     Proceedings of the 3rd Workshop on Information
     Credibility on the Web (WICOW ’09), pages 19–26,
     2009.
[33] B. Shaparenko, R. Caruana, J. Gehrke, and
     T. Joachims. Identifying Temporal Patterns and Key
     Players in Document Collections. In Proceedings of the
     IEEE ICDM Workshop on Temporal Data Mining:
     Algorithms, Theory and Applications (TDM ’05),
     pages 165–174, 2005.
[34] J. Strötgen and M. Gertz. HeidelTime: High Quality
     Rule-based Extraction and Normalization of Temporal
     Expressions. In Proceedings of the 5th International
     Workshop on Semantic Evaluation (SemEval ’10),
     pages 321-324, 2010.
[35] J. Strötgen and M. Gertz. TimeTrails: A System for
     Exploring Spatio-Temporal Information in
     Documents. In Proceedings of the 36th International
     Conference on Very Large Data Bases (VLDB ’10),
     pages 1569–1572, 2010.
[36] J. Strötgen, M. Gertz, and P. Popov. Extraction and
     Exploration of Spatio-temporal Information in
     Documents. In Proceedings of the 6th Workshop on
     Geographic Information Retrieval (GIR ’10), pages
     1–8, 2010.
[37] R. Swan and J. Allan. TimeMine: Visualizing
     Automatically Constructed Timelines. In Proceedings
     of the 23rd Annual International ACM SIGIR
     Conference on Research and Development in
     Information Retrieval (SIGIR ’00), page 393, 2000.
[38] N. UzZaman and J. Allen. TRIPS and TRIOS System
     for TempEval-2: Extracting Temporal Information
     from Text. In Proceedings of the 5th International
     Workshop on Semantic Evaluation (SemEval ’10),
     pages 276–283, 2010.
[39] M. Verhagen, R. Gaizauskas, F. Schilder, M. Hepple,
     G. Katz, and J. Pustejovsky. SemEval-2007 Task 15:
     TempEval Temporal Relation Identification. In
     Proceedings of the 4th International Workshop on
     Semantic Evaluation (SemEval ’07), pages 75–80,
     2007.
[40] M. Verhagen and J. Pustejovsky. Temporal Processing
     with the TARSQI Toolkit. In Proceedings of the 22nd


                                                              8

</pre>