Temporal Information Retrieval: Challenges and Opportunities Omar Alonso Jannik Strötgen Microsoft Corp. Institute of Computer Science Mountain View, CA University of Heidelberg, Germany omar.alonso@microsoft.com stroetgen@uni-hd.de Ricardo Baeza-Yates Michael Gertz Yahoo! Research Institute of Computer Science Barcelona, Spain University of Heidelberg, Germany rbaeza@acm.org gertz@uni-hd.de ABSTRACT describing the chronological context of a document or a col- Time is an important dimension of any information space. lection of documents. As an extension to existing ranking It can be very useful for a wide range of information re- techniques, which are primarily based on popularity or rep- trieval tasks such as document exploration, similarity search, utation, time can be in particular valuable for exploring summarization, and clustering. Traditionally, information search results along well-defined timelines and at multiple retrieval applications do not take full advantage of all the time granularities due to the key characteristics of temporal temporal information embedded in documents to provide information: alternative search features and user experience. However, in • Temporal information is well-defined: Assuming two the last few years there has been exciting work on analyzing points in time or two intervals, the relationship be- and exploiting temporal information for the presentation, tween them can be identified, e.g., the relationship can organization, and in particular the exploration of search re- be of the types before, overlap, or after [3]. sults. In this paper, we review the current research trends and • Temporal information can be normalized: Regardless present a number of interesting applications along with open of the used terms or the used language, every temporal problems. The goal is to discuss interesting areas and future expression referring to the same semantics can be nor- work for this exciting field of information management. malized to the same value in some standard format. This property makes temporal information term- and language-independent. Categories and Subject Descriptors I.2.7 [Artificial Intelligence]: Natural Language Process- • Temporal information can be organized hierarchically: ing—Language models, Text analysis Temporal expressions can be of different granularities, e.g., of type day (“May 20, 2011”) or of type year (“2011”). Due to the fact that years consist of months, Keywords and months and weeks consist of days, temporal ex- temporal information, information retrieval pressions can be mapped to coarser granularities based on the hierarchy of temporal expressions. 1. INTRODUCTION Using these key characteristics, temporal information about Time clearly plays a central role in any information space, documents can be used to develop time-specific information and it has been studied in several areas like information retrieval and exploration applications. The most obvious extraction, topic-detection, question-answering, query log type of temporal information associated with a document is analysis, and summarization. Time and temporal measure- its creation time or the date of its last modification. This ments can help recreating a particular historical period or kind of information, which is directly accessible through the metadata of a document, can already be used for sev- eral tasks such as time-aware search or temporal clustering. However, the document creation time is only valuable in a specific context such as the news domain. In other areas, and even in the news domain itself, a lot of temporal infor- mation is neglected if the document creation time is used as the only time information associated with a document. This Copyright 2011 for the individual papers by the papers’ authors. Copying is because there is a lot of temporal information latently permitted only for private and academic purposes. This volume is published and copyrighted by its editors. available in a document’s text. Assume a news document TWAW 2011, March 28, 2011, Hyderabad, India. reports on an event that is already dated. Then, if only the 1 TWAW 2011, Hyderabad, India document creation time is taken into account, the informa- out any further knowledge. That is, the expression itself tion when the event occurred is ignored. But to make use contains all the information needed for normalization and is of such information, so-called temporal taggers are applied thus fully specified. to extract and normalize temporal expressions contained in In contrast, relative temporal expressions cannot be nor- documents. malized without taking into account some context informa- The remainder of the paper is organized as follows. After tion. For example, the expression “today” cannot be nor- a discussion of how time appears in documents and how malized without knowing the corresponding reference time. it is possible to extract such temporal data, in Section 3, This reference time can either be the document creation time we survey research on temporal tagging. In Section 4, we or another temporal expression in the document. Typically, present the current research trends on temporal information in news articles, the document creation time is important retrieval. We then describe application areas and challenges. and often used as reference time. Note that this kind of Finally, we present our concluding remarks. information is directly accessible in form of a timestamp through the metadata of a document. The expression “yes- 2. TIME IN DOCUMENTS terday” in Thousands of prisoners in Egypt managed to es- cape from prison yesterday can be normalized to “2011-01- As indicated in the introduction, there is a lot of temporal 29” if we know the document creation time to be “2011- information in any collection of documents, be it ranked doc- 01-30”. In other types of documents, the reference time is uments in a hit list or a corpus of topic specific documents. usually represented by another temporal expression in the To take advantage of such time related information in par- document. In general, there are many occurrences of relative ticular for document exploration purposes, in a document temporal expressions. While sometimes the reference time is processing step, it is important to extract this information, sufficient for normalization, further information is needed if anchor it in time, compute some (aggregated) measures, and the relation to the reference time has to be identified as well. make all this information explicit to subsequent exploration For example, “on Monday” can either refer to the previous or tasks. to the next Monday with respect to the reference time. An In this section, we give a description of the different types indicator for determining the relationship can be the tense of temporal information mentioned in documents (Sec. 2.1), of the sentence with future tense and present tense indicat- explain how temporal expressions can be realized in natu- ing an after-relationship to the reference time and past tense ral language (Sec. 2.2), and demonstrate how they can be a before-relationship. Figure 1 shows some parts of a news extracted and normalized (Sec. 2.3). article containing explicit and relative temporal expressions 2.1 Types of Temporal Information and illustrate what kind of context information is needed for normalizing the relative expressions. Temporal expressions mentioned in text documents can The third type of temporal expressions are implicit expres- be grouped into four types according to TimeML [27], the sions such as names of holidays or events. These expressions standard markup language for temporal information: date, can be anchored on a timeline if a mapping of the expres- time, duration, and set. While duration expressions are used sion to its normalized value is available. For example, “New to provide information about the length of an interval (e.g., Year’s Day 2002” can be normalized to “2002-01-01” since “three years” in they have been traveling through the U.S. “New Year’s Day” always refers to January 1. In addition, for three years), set expressions inform about the periodical there are expressions for which a temporal function has to aspect of an event (e.g., “twice a week” in she goes to the be applied. “Labor Day”, for example, refers to the first gym twice a week ). In contrast, time and date expressions Monday in September so that “Labor Day 2009” can be nor- (e.g., “3 p.m.” or “January 25, 2010”) both refer to a specific malized to “2009-09-07” if we know this day to be the first point in time – though in a different granularity. Monday in September 2009. An interesting key feature of temporal information is that Although there are many different ways to refer to a spe- it can be normalized to some standard format. Assuming cific point in time, all expressions referring to the same point a Gregorian calendar as representation of time, expressions in time shall be normalized to the same value in the stan- of time and date can be directly placed on a timeline. A dard format. This normalization process is one of the tasks date is then typically represented as YYYY-MM-DD, e.g., of so-called temporal taggers, as described in the next para- the expression “January 25, 2010” is normalized to “2010-01- graph. 25”. However, the normalization is not always as simple as in this example, but depends on the way temporal information is expressed in a document, which will be discussed in the 2.3 Temporal Tagging next paragraph. Temporal tagging is a specific task in named entity recog- nition and normalization. The goals of so-called temporal 2.2 Occurrences of Temporal Expressions taggers are (i) the extraction of temporal expressions and There are many different ways how to express temporal (ii) the normalization of these expressions to some standard information of the types date and time in documents. Sim- format. As this standard format, TIMEX2 and TIMEX3 ilar to the work by Schilder and Habel [31], we distinguish are often used. While TIMEX2 tags include pre- and post- between explicit, implicit, and relative temporal expressions. modifiers of the temporal expression itself (e.g., dependent Explicit temporal expressions refer to a specific point in clauses) and allow for nested temporal expressions [11], such time. Note that this point in time can be of different gran- modifiers and nested tags are not included by TIMEX3 tags. ularities. For example, the expression “January 25, 2010” Instead, TIMEX3 is part of the TimeML markup language refers to a specific day while the expression “November 2005” in which further annotation types are available for captur- refers to a specific month. The key characteristic of explicit ing more complex temporal semantics. Nevertheless, al- temporal expressions is that they can be normalized with- though there are significant differences between TIMEX2 2 TWAW 2011, Hyderabad, India Document Creation Time: 1998-04-18 temporal expressions, the normalization is usually done in a rule-based way by all temporal taggers. Hungarian astronaut Bertalan Farkas is leaving for the Due to their importance for temporal information retrieval, United States to start a new career, he said today . we give an overview of existing temporal taggers and their quality in the next section. In addition, we present resources . . . On May 22, 1995 , Farkas was made a brigadier general, for evaluating temporal taggers and survey temporal evalu- and the following year he was appointed military attache ation challenges organized so far. . . . However, cited by District of Columbia traffic police in December for driving under the influence of . . . 3. RESEARCH ON TEMPORAL TAGGING Temporal processing of text documents in terms of the ex- traction and normalization of temporal expressions as well Figure 1: Examples of temporal expressions in a as the extraction of temporal relations between events is news article with explicit (transparent boxes) and very important for several NLP tasks requiring a deep un- relative (solid boxes) expressions. Arrows indicate derstanding of language such as question answering or doc- what kind of context information is needed to nor- ument summarization. Due to this fact, there has been sig- malize the temporal expression. nificant research in temporal annotation of text documents. The markup language TimeML has become an ISO standard for temporal annotation [27], and the TimeBank corpus was and TIMEX3, they are very similar in many ways and a developed [28]. The latest version of the TimeBank corpus detailed analysis goes beyond the scope of this paper.1 Ac- contains 183 news articles and can be regarded as the gold cording to the TimeML annotation guidelines, a TIMEX3 standard for temporal annotation. However, there has been tag contains, among others, the following information about important research activity before, and several evaluation a temporal expression: challenges have been held to bring forward research in the • offset: the start and end position of the expression in area of temporal information extraction as described in the the document following section. • type: whether the expression is of type date, time, 3.1 Evaluation Challenges duration, or set The earliest competitions for the extraction of temporal expressions have been the named entity recognition tasks • value: the normalized value of the expression of the Message Understanding Conferences MUC 1995 and To identify this information, i.e., to extract and normal- 1997 [8, 12]. A combination of the extraction and the nor- ize temporal expressions, temporal taggers are applied after malization was introduced in the ACE (Automatic Content the text is preprocessed. Usually, sentence and token bound- Extraction) time expression recognition and normalization aries are detected and a part-of-speech tag is associated with (TERN) challenges in 2004, 2005, and 20072 . Several tem- every token. This information can then be used by the tem- poral taggers have been developed by the participants of poral tagger to identify temporal expressions. The first goal, these challenges (see Section 3.2). Often, the TERN 2004 i.e., the identification of the boundaries of temporal expres- and 2005 corpora3 are used to compare the quality of tempo- sions, can be seen as typical classification task. Therefore, ral taggers. The TERN corpora are annotated with respect there has been some work on addressing this problem by to the TIMEX2 annotation guidelines [11]. applying machine learning techniques (e.g., [15, 38]). The A further indication of the importance of temporal an- classification problem can be described in the following way: notation and the activity in the research domain are the For every token t, decide whether t is outside (O) of tem- TempEval challenges. Motivated by the importance of tem- poral expressions, inside (I) a temporal expression, or the poral annotation for many NLP tasks, TempEval was orga- beginning (B) of a temporal expression. The well-known nized the first time as one task of the SemEval workshop IOB-format can be used for annotating tokens according to in 2007 [39]. In this competition, the organizers provided their property. annotated text documents based on the TimeBank corpus. In addition to machine-learning approaches, there are sev- While the annotations of events and temporal expressions eral rule-based approaches to extract temporal expressions were given, the task for the tools to be developed was to (e.g., [24, 34]). These are usually based on regular expres- identify temporal relations between events and the docu- sions although they may use other information about the ment creation time, between events and temporal expres- text as well, such as part-of-speech information. sions, and between two events. The more difficult task is the normalization of the tempo- In 2010, the full task of identifying all temporal related ex- ral expressions. While explicit expressions can be normal- pressions and relations was faced in the follow-up challenge. ized without further knowledge, the normalization of rela- That is, for TempEval-2, two further tasks were added [41]: tive expressions is challenging. As described above, context the extraction and normalization of temporal expressions information has to be identified to determine the correct ref- and of events. In addition, the discovery of relations be- erence time and the temporal relation between a temporal tween two events was split into two tasks, namely the iden- expression and its reference time. While there are rule-based tification of relations between two main events in consecu- and machine learning based approaches for the extraction of 2 http://www.itl.nist.gov/iad/mig/tests/ace/. 1 3 For further information on temporal annotation according The TERN development corpora are available through the to TimeML and differences between TIMEX2 and TIMEX3, Linguistic Data Consortium. See: http://fofoca.mitre. see http://www.timeml.org. org/tern.html. 3 TWAW 2011, Hyderabad, India tive sentences and relations between two events where one event syntactically dominates the other event. The Temp- Eval corpora are based on the TimeBank corpus and an- notated according the TimeML annotation guidelines, i.e., using TIMEX3 tags for temporal expressions. A further nov- elty in the second TempEval challenge was that the tasks were offered not only in English but in six languages. How- ever, only two languages where addressed by the partici- pants, namely English and Spanish. Nevertheless, thanks to the creation of an annotation standard, a gold standard corpus, and competitions such as the TempEval challenges, there has been significant improvements in temporal relation Figure 2: Annotation of a timeline by workers using identification and temporal tagging. Some existing temporal crowdsourcing. taggers and their quality is presented in the next paragraph by comparing their results in the TempEval-2 challenge. 4. RESEARCH TRENDS Research work on fully utilizing the temporal informa- 3.2 Temporal Taggers tion embedded in the text of documents for exploration and Having applied a temporal tagger on a document collec- search purposes is very recent. The work by Alonso et al. tion, the previously hidden temporal information is made presents an approach for extracting temporal information available for tasks such as temporal relation extraction or and how it can be used for clustering search results [5]. temporal clustering. One often applied temporal tagger is Berberich et al. describe a model for temporal information GuTime, which is part of the Tarsqi tool kit [40]. It is needs [7]. Figure 2 shows the annotated timeline for the based on the TempEx tagger, which was the first temporal NYTimes data set for the latter reference using the Time- tagger for the extraction of temporal expressions and their line widget5 . These last two projects rely on crowdsourcing, normalizations [22]. GuTime was developed as automatic mainly using Amazon Mechanical Turk, for evaluating parts evaluation tool for TimeML and extends the capabilities of of their work. the TempEx tagger. It was evaluated on the TERN 2004 News sources have been the primary focus of a number of training corpus and achieves F-measures of 85%, 78%, and projects on exploiting time information in documents. For 82% for lenient and strict detection and for normalization, example, the Time Frames project realizes an approach to respectively. augment news articles by extracting time information [14]. In the TempEval-2 challenge, eight teams participated in Google’s news timeline6 is an experimental feature that al- the task for temporal expression extraction and normaliza- lows a user to explore news by time. tion for English documents. The best-performing system Extensions to document operations such as comparing the was HeidelTime with an F-Score of 86% for the extraction temporal similarity of two documents in the context of news and an accuracy of 85% for the normalization [34]. For Span- articles is presented by Makkonen and Ahonen-Myka [17]. ish documents, the best result for the extraction was an F- An interesting approach that combines topic detection and Score of 91% [16], while another system achieved the highest tracking with timelines as a browsing interface is presented accuracy for normalization (83%) [42]. While both machine- by Swan and Allan [37]. Time information is also used in learning and rule-based approaches were applied for the ex- temporal mining of blogs to extract useful information [29]. traction, the normalization was done in a rule-based way by New research has also emerged for future retrieval where all systems. As the best performing system, HeidelTime uses temporal information is used for searching the future [6]. rules consisting of an extraction and a normalization part. There is exciting research on adding a time dimension Thus, all temporal expressions that are identified are nor- to certain applications like news summaries [2], temporal malized as well. Due to the strict separation of the code and patterns [33], and temporal Web search [26]. The special the rules, HeidelTime is applicable for multi-lingual tempo- issue on temporal information processing gives a clear map ral tagging4 . of current directions [20]. Harvesting temporal information Although there has been significant advances in tempo- and how it can be used for entities and relationships is also ral tagging, there is still room for improvement, especially a very recent rich area [43]. when switching the processing language or the domain of the Closely related to information extraction is the recent re- document collection. For example, Mazur and Dale recently search on temporal annotations, which is covered in depth in presented a new corpus for research on temporal expressions the book by Mani et al. [21]. Identification of time related containing long, narrative-style documents, namely Wikipedia information depends heavily on the language and the cor- articles describing the historical course of wars [25]. Using pora, so traditional information extraction systems tend to their temporal tagger, they show that the normalization of fall short in terms of temporal extraction. Based on the lat- temporal expressions in such documents is very challenging est advances, new research is emerging for automatic assign- due to the rich discourse structure and the huge number ment of document event-time periods and automatic tagging of often underspecified temporal expressions in these docu- of news messages using entity extraction [31]. ments compared to the usually used short news documents. Now, we outline a number of applications that can benefit from leveraging more temporal information either by tem- poral expressions or timestamps. For each application, we 4 5 For details, see http://dbs.ifi.uni-heidelberg.de/ http://simile.mit.edu/ 6 stixx/. http://www.newstimeline.googlelabs.com 4 TWAW 2011, Hyderabad, India describe why it is important and present a number of chal- • When to show a timestamp or temporal expressions? lenges. • Should the snippet present the matching lines in a 4.1 Exploratory Search timeline? Research in exploratory search systems has gained a lot • Is a temporal summary a good surrogate for a docu- of attention lately as they add a significant user interface ment? component to help users search, navigate, and discover new facts and relationships. As the amount of information on • For which kind of queries is a temporal summary ap- the Web keeps growing, exploratory search interfaces are propriate? starting to surface. That said, it is not clear how to leverage temporal information. A few problems are: • Should temporal summaries be query independent? • How to expose temporal information in exploratory 4.4 Temporal Clustering search systems? The notion of clustering search results by temporal at- tributes has been presented in [5]. Preliminary results in- • What’s the best way of presenting temporal informa- dicate that users are interested in dissecting a document tion as a retrieval cue? collection by time. At the same time it is not clear for • For which data sources, besides news, does exploratory which kind of scenarios besides “research-like” questions this search make sense? approach would work. Key issues are: • Is e-discovery a vertical application that can benefit • Can we identify documents that are contemporary and from temporal information? therefore related? 4.2 Micro-blogging and Real-time Search • Which chronons can be more useful for clustering? Micro-blogging sites like Twitter have gained a lot of at- tention lately as the ultimate mechanism to broadcast what’s • How can we cluster micro-blogging data by time? going on. Due to its nature, a typical message is very short • Is a timeline the best way to cluster search results? and its lifespan is basically the crowd interest about that particular event be a football final game or an earthquake. 4.5 Temporal Querying In the case of Twitter, it is very difficult to beat the timely The temporal information extracted from documents can broadcasting of an important event if one compares this to directly be used to allow the user of a search engine to con- a news article. Each tweet has a timestamp but the orga- strain his/her query in a temporal manner. That is, in addi- nization of such information is still not clear. In the news tion to a textual part, a query contains a temporal part. For context, the reporter has to write an article that contains a example, in addition to “world war” a temporal constraint few paragraphs and submit the final version through some like “1944-1945” could be specified. The user would obvi- content management version that would push it to an ex- ously expect documents about World War II as results for ternal website so a search engine can hopefully crawl and his query. The objective when using a combination of a text index it in time. In parallel, if a tweet is so important by and a temporal query can thus be formulated in the follow- the time the reporter is finishing with the article, the main ing way: The more both parts of the query are satisfied, i.e, idea would be trending in Twitter, therefore highlighting its the more the textual and the temporal parts fit to a docu- importance at a world scale. This is very similar to the tra- ment, the higher should be the rank of this document. The ditional notion of topic detection and tracking [1, 18], with main problems for such a combination of constraints is the one key difference: speed to detect that the topic is impor- following: tant and therefore a candidate for trending. Some problems are: • How can a combined score for the textual part and the temporal part of a query be calculated in a reasonable • What is the best way to provide a timeline of events way? in micro-blogging? • Should a document in which the “textual match” and • What is the lifespan of the main event? the “temporal match” are far away from each other be • How fast and precise can one detect trending events? penalized? • What is the fraction of new content on the topic stream? • What about documents satisfying one of the constraints but “slightly” fail to satisfy the other constraint? 4.3 Temporal Summaries There has been seminal work on temporal summaries of 4.6 Temporal Question Answering news topics by Allan [2] that shows how important temporal To be able to answer time-related questions, a question an- information is. One extension is to generate time sensitive swering system has to know when specific events took place. summaries that can be used as temporal snippets [4]. For this, temporal information can be associated with ex- By design, the main goal of a snippet (or caption) is to tracted facts from text documents [26]. While this may be present a document surrogate that the user can quickly scan applicable for famous facts and events, question answering in the search results page without the need to click and systems are often faced with imperfect temporal informa- read the full content of a document. There is a limit to tion. For this, identifying relationships between events de- the number of lines of text that the snippet should present. scribed in documents is important as it is for many further Interesting questions include: NLP tasks (see Section 3.1). Especially historic events tend 5 TWAW 2011, Hyderabad, India to have a gradual beginning and ending so that knowing • How far does one need to go back in time? the temporal relationship between two events may allow to answer a temporal query although no explicit temporal in- • Can we improve bibliographic search instead of just formation is associated with the events [30]. Research issues sorting by publication date? are: • How can we evaluate the quality of such systems? • How can inconsistent temporal information be dealt with? 4.10 Web Archiving • How can temporal reasoning be executed if temporal The goal of Web archiving is to collect and store digi- relationships are missing? tal content so that it is accessible for future tasks. Besides the detection of spam, which can be dealt with analyzing 4.7 Temporal Similarity the evolution of the link structure of web pages [9], a main A related research question to temporal querying is tem- challenge in Web archiving is to take care of the temporal poral document similarity. Instead of comparing a temporal coherence of Web pages since it is not possible to collect all query with the temporal information of a document, two pages at the same time. Thus, the content of parts of the documents can be compared with respect to their temporal collection may change during the crawling process. In [32], similarity. The main problem arising here is what makes two Spaniol et al. introduce a coherence framework to overcome documents temporally similar? This leads to the following the temporal diffusion of the Web crawls, i.e., to minimize questions: the risk of incoherences. Nevertheless, open problems re- main: • Should two documents be considered similar if they cover the same temporal interval? • How can temporal information be used to predict which pages are likely to change over time? • Should the temporal focus of the documents be impor- tant for their temporal similarity? • How can temporal coherence be achieved for any point in time or time interval? • Can two documents be regarded as temporally similar if one contains a small temporal interval of the other document in a detailed way? 4.11 Spatio-temporal Information Exploration Recently, there has been some research on combining spa- 4.8 Timelines and User Interfaces tial and temporal information extracted from documents for One important use of time entities of a document is to exploration tasks [23, 36]. In the same way as temporal in- create a sorted list of events, a timeline. A timeline can formation can be normalized using a timeline, spatial infor- be shown as a list of vertical textual items or visualized in mation can be normalized according its latitude/longitude many different ways. For example, as in Yahoo!’s Corre- information. To extract geographic expressions from docu- lator7 . More sophisticated visualizations allow to focus on ments, so-called geo taggers can be applied. Combining the specific named entities with respect to time like in Yahoo!’s information extracted from a temporal tagger with the in- News Explorer [10, 19]. Here, interesting questions are: formation extracted from a geo tagger allows the exploration of documents according to the events mentioned in the text • What is the appropriate way to present a timeline? since events usually happen at some specific time and place. A system for the exploration of such spatio-temporal infor- • Is a linear timeline the only way to present and anchor mation from documents is TimeTrails [35]. Some questions documents in time? are: • How can one leverage document temporal measures to • What’s the best way to represent maps and time? present a good display? • Are there specific visualizations or user interfaces that • Which kinds of scenarios can benefit from spatio-temporal can benefit from temporal information? exploration? 4.9 Searching in Time 5. CONCLUDING REMARKS Time entities can also be used to search in documents or Temporal information embedded in documents in the form log files that can be used to search the past for different of temporal expressions offer an interesting means to further purposes such as digital forensics, historical analysis or lin- enhance the functionality of current information retrieval guistic analysis. We can even search the future [6, 13], for applications. example, in news for events that are scheduled or may hap- We have presented a number of examples and scenarios pen in the future. This idea is supported in the Yahoo!s where temporal information can be very useful. We have News Explorer tool already mentioned [19]. Microsoft Aca- identified research trends in this new area and a number of demic Search8 is an example of presenting publications and interesting practical applications as well as problems. citations in a timeline. Some problems are: The problems we outline are difficult because they in- clude several areas of computer science, mainly information • Besides news, what other sources would one like to use retrieval, natural language processing, and user interfaces. to search in the past and/or the future? Moreover, several of them are multidisciplinary because they 7 http://correlator.sandbox.yahoo.com/. touch issues related to psychology or design, to mention just 8 http://academic.research.microsoft.com/. two, making them even more challenging. 6 TWAW 2011, Hyderabad, India 6. REFERENCES TempEval-2: Shallow Approach for Temporal Tagger. [1] J. Allan, editor. Topic Detection and Tracking: In Proceedings of the Workshop on Semantic Event-based Information Organization. Kluwer Evaluations: Recent Achievements and Future Academic Publishers, Norwell, MA, USA, 2002. Directions (SEW ’09), pages 52–57, 2009. [2] J. Allan, R. Gupta, and V. Khandelwal. Temporal [16] H. Llorens, E. Saquete, and B. Navarro. TIPSEM Summaries of New Topics. In Proceedings of the 24th (English and Spanish): Evaluating CRFs and Annual International ACM SIGIR Conference on Semantic Roles in TempEval-2. In Proceedings of the Research and Development in Information Retrieval 5th International Workshop on Semantic Evaluation (SIGIR ’01), pages 10–18, 2001. (SemEval ’10), pages 284–291, 2010. [3] J. F. Allen. Maintaining Knowledge about Temporal [17] J. Makkonen and H. Ahonen-Myka. Utilizing Intervals. In Communications of the ACM, Temporal Information in Topic Detection and 26(11):832–843, 1983. Tracking. In Proceedings of 7th European Conference [4] O. Alonso, R. Baeza-Yates, and M. Gertz. on Research and Advanced Technology for Digital Effectiveness of Temporal Snippets. In Proceedings of Libraries (ECDL ’03), pages 393–404, 2003. the Workshop on Web Search Result Summarization [18] J. Makkonen, H. Ahonen-Myka, and M. Salmenkivi. and Presentation (WSSP 09), pages 1–4, 2009. Topic Detection and Tracking with Spatio-temporal [5] O. Alonso, M. Gertz, and R. Baeza-Yates. Clustering Evidence. In Proceedings of the 25th European and Exploring Search Results Using Timeline Conference on Information Retrieval Research Constructions. In Proceedings of the 18th ACM (ECIR ’03), pages 251–265, 2003. International Conference on Information and [19] M. Matthews, P. Tolchinsky, R. Blanco, J. Atserias, Knowledge Management (CIKM ’09), pages 97–106, P. Mika, and H. Zaragoza. Searching Through Time in 2009. the New York Times. In Proceedings of the Fourth [6] R. Baeza-Yates. Searching the Future. In Proceedings Workshop on Human-Computer Interaction and of the ACM SIGIR 2005 Workshop on Information Retrieval (HCIR ’10), pages 41–44, 2010. Mathematical/Formal Methods in Information [20] I. Mani, J. Pustejovsky, and B. Sundheim. Retrieval (MF/IR 05), pages 1–6, 2005. Introduction to the Special Issue on Temporal [7] K. Berberich, S. Bedathur, O. Alonso, and Information Processing. In ACM Transactions on G. Weikum. A Language Modeling Approach for Asian Language Information Processing 3(1): 1–10, Temporal Information Needs. In Proceedings of the 2004. 32nd European Conference on Information Retrieval [21] I. Mani, J. Pustejovsky, and R. Gaizauskas, editors. Research (ECIR ’10), pages 13–25, 2010. The Language of Time. Oxford University Press, New [8] N. Chinchor. Overview of MUC-7/MET-2. In York, NY, USA, 2005. Proceedings of the 7th Conference on Message [22] I. Mani and G. Wilson. Robust Temporal Processing Understanding (MUC ’97), pages 1–11, 1997. of News. In Proceedings of the 38th Annual Meeting on [9] Y. Chung, M. Toyoda, and M. Kitsuregawa. A Study Association for Computational Linguistics (ACL ’00), of Link Farm Distribution and Evolution Using a Time pages 69–76, 2000. Series of Web Snapshots. In Proceedings of the 5th [23] B. Martins, H. Manguinhas, and J. Borbinha. International Workshop on Adversarial Information Extracting and Exploring the Geo-temporal Semantics Retrieval on the Web (AIRWeb ’09), pages 9–16, 2009. of Textual Resources. In Proceedings of the IEEE [10] G. Demartini, M. Missen, R. Blanco, and H. Zaragoza. International Conference on Semantic Computing TAER: Time-aware Entity Retrieval. Exploiting the (ICSC ’08), pages 1–9, 2008. Past to Find Relevant Entities in News Articles. In [24] P. Mazur and R. Dale. The DANTE Temporal Proceedings of the 19th ACM International Conference Expression Tagger. In Proceedings of the 3rd Language on Information and Knowledge Management and Technology Conference (LTC ’09), pages 245–257, (CIKM ’10), pages 1517–1520, 2010. 2009. [11] L. Ferro, L. Gerber, I. Mani, B. Sundheim, and [25] P. Mazur and R. Dale. WikiWars: A New Corpus for G. Wilson. TIDES - 2005 Standard for the Annotation Research on Temporal Expressions. In Proceedings of of Temporal Expressions. 2005. the 2010 Conference on Empirical Methods in Natural http://fofoca.mitre.org/annotation_guidelines/ Language Processing (EMNLP ’10), pages 913–922, 2005_timex2_standard_v1.1.pdf 2010. [12] R. Grishman and B. Sundheim. Design of the MUC-6 [26] M. Pasca. Towards Temporal Web Search. In Evaluation. In Proceedings of the 6th Conference on Proceedings of the 2008 ACM Symposium on Applied Message Understanding (MUC ’95), pages 1–11, 1995. Computing (SAC ’08), pages 1117–1121, 2008. [13] A. Jatowt, K. Kanazawa, S. Oyama, and K. Tanaka. [27] J. Pustejovsky, J. M. Castaño, R. Ingria, R. Sauri, Supporting Analysis of Future-related Information in R. J. Gaizauskas, A. Setzer, G. Katz, and D. R. News Archives and the Web. In Proceedings of the 9th Radev. TimeML: Robust Specification of Event and Joint Conference on Digital Libraries (JCDL ’09), Temporal Expressions in Text. In Proceedings of the pages 115–124, 2009. AAAI Spring Symposium on New Directions in [14] D. Koen and W. Bender. Time Frames: Temporal Question Answering, pages 28–34, 2003. Augmentation of the News. In IBM Systems Journal, [28] J. Pustejovsky, P. Hanks, R. Sauri, A. See, 39(4): 597–616, 2000. R. Gaizauskas, A. Setzer, D. Radev, B. Sundheim, [15] O. Kolomiyets and M.-F. Moens. Meeting D. Day, L. Ferro, and M. Lazo. The TIMEBANK 7 TWAW 2011, Hyderabad, India Corpus. In Proceedings of Corpus Linguistics International Conference on Computational Conference, pages 647–656, 2003. Linguistics (Coling ’08), pages 189–192, 2008. [29] A. Qamra, B. Tseng, and E. Chang. Mining Blog [41] M. Verhagen, R. Sauri, T. Caselli, and J. Pustejovsky. Stories Using Community-based and Temporal SemEval-2010 Task 13: TempEval-2. In Proceedings of Clustering. In Proceedings of the 15th ACM the 5th International Workshop on Semantic International Conference on Information and Evaluation (SemEval ’10), pages 57–62, 2010. Knowledge Management (CIKM ’06), pages 58–67, [42] M. T. Vicente-Dı́ez, J. Moreno-Schneider, and 2006. P. Martı́nez. UC3M System: Determining the Extent, [30] S. Schockaert1, D. Ahn, M. De Cock, and E. Kerre. Type and Value of Time Expressions in TempEval-2. Question Answering with Imperfect Temporal In Proceedings of the 5th International Workshop on Information. In Proceedings of the 7th Conference on Semantic Evaluation (SemEval ’10), pages 329–332, Flexible Query Answering Systems (FQAS 06), pages 2010. 647-658, 2006. [43] G. Weikum and M. Theobald From Information to [31] F. Schilder and C. Habel. From Temporal Expressions Knowledge: Harvesting Entities and Relationships to Temporal Information: Semantic Tagging of News from Web Sources. In Proceedings of the 29th ACM Messages. In Proceedings of the Workshop on SIGMOD-SIGACT-SIGART Symposium on Principles Temporal and Spatial Information Processing of Database Systems of Data (PODS’ 10), pages (TASIP ’01), pages 65–72, 2001. 65–76, 2010. [32] M. Spaniol, D. Denev, A. Mazeika, G. Weikum, and P. Senellart. Data Quality in Web Archiving. In Proceedings of the 3rd Workshop on Information Credibility on the Web (WICOW ’09), pages 19–26, 2009. [33] B. Shaparenko, R. Caruana, J. Gehrke, and T. Joachims. Identifying Temporal Patterns and Key Players in Document Collections. In Proceedings of the IEEE ICDM Workshop on Temporal Data Mining: Algorithms, Theory and Applications (TDM ’05), pages 165–174, 2005. [34] J. Strötgen and M. Gertz. HeidelTime: High Quality Rule-based Extraction and Normalization of Temporal Expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval ’10), pages 321-324, 2010. [35] J. Strötgen and M. Gertz. TimeTrails: A System for Exploring Spatio-Temporal Information in Documents. In Proceedings of the 36th International Conference on Very Large Data Bases (VLDB ’10), pages 1569–1572, 2010. [36] J. Strötgen, M. Gertz, and P. Popov. Extraction and Exploration of Spatio-temporal Information in Documents. In Proceedings of the 6th Workshop on Geographic Information Retrieval (GIR ’10), pages 1–8, 2010. [37] R. Swan and J. Allan. TimeMine: Visualizing Automatically Constructed Timelines. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’00), page 393, 2000. [38] N. UzZaman and J. Allen. TRIPS and TRIOS System for TempEval-2: Extracting Temporal Information from Text. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval ’10), pages 276–283, 2010. [39] M. Verhagen, R. Gaizauskas, F. Schilder, M. Hepple, G. Katz, and J. Pustejovsky. SemEval-2007 Task 15: TempEval Temporal Relation Identification. In Proceedings of the 4th International Workshop on Semantic Evaluation (SemEval ’07), pages 75–80, 2007. [40] M. Verhagen and J. Pustejovsky. Temporal Processing with the TARSQI Toolkit. In Proceedings of the 22nd 8