TREC 2018 News Track Shudong Huang Ian Soboroff Donna K. Harman 100 Bureau Drive 100 Bureau Drive 100 Bureau Drive Gaithersburg Gaithersburg Gaithersburg Maryland 20899-8940 Maryland 20899-8940 Maryland 20899-8940 shudong.huang@nist.gov ian.soboroff@nist.gov donna.harman@nist.gov National Institute of Standards and Technology topic, even in a minimal way, as long as that mention is worth including in a report on the topic. Abstract In 2018, people consume news overwhelmingly via social media recommendation, but also through web While more and more people are relying on browsing, search, and advertising recommendation. social media for news feeds, serious news con- Traditional news outlets more and more are taking a sumers still resort to well-established news “digital first” strategy, rather than hewing to the no- outlets for more accurate and in-depth report- tion of a newspaper front page. But the most change ing and analyses. They may also look for re- has come from social recommendation and news aggre- ports on related events that have happened gators. Google News, started in 2002, marked the end before and other background information in of publisher-driven news delivery by pivoting the focus order to better understand the event being from the publisher to the story. The diversification of reported. Many news outlets already create news delivery has democratized news publishing, and sidebars and embed hyperlinks to help news current news outlets reflect an enormous range of jour- readers, often with manual efforts. Technolo- nalistic standards and methods. gies in IR and NLP already exist to support NIST realized the time had come to reinvent news those features, but standard test collections search as a focus for information retrieval and natural do not address the tasks of modern news con- language processing research. In partnership with the sumption. To help advance such technologies Washington Post, NIST launched the News Track as and transfer them to news reporting, NIST, part of the 2018 Text Retrieval Conference (TREC).1 in partnership with the Washington Post, is One component of this is a new document collection, starting a new TREC track in 2018 known as the TREC Washington Post Collection, which is avail- the News Track. able as a free download from NIST. The second com- ponent is a pair of IR tasks driven by how content is structured for the Post’s website. 1 Motivation News content has long been part of information re- 2 Data trieval test collections, but the search tasks that those In partnership with the Washington Post, we have collections measure is ad hoc search. Ad hoc search made a large archive of digital news content avail- is a task where the user is seeking any and all infor- able to participants, extending from 2012 through Au- mation about a topic of interest. As such, articles are gust 2017. It contains both news articles and blogs judged to be relevant to a topic if they mention the as originally published by the Washington Post with a total of 608,180 documents (about 6.9GB uncom- Copyright c 2018 for the individual papers by the papers’ au- thors. Copying permitted for private and academic purposes. pressed in size), divided into 12 text files. Each text This volume is published and copyrighted by its editors. file represents a collection of either news articles or In: D. Albakour, D. Corney, J. Gonzalo, M. Martinez, blogs in one of those 6 years. The documents are B. Poblete, A. Vlachos (eds.): Proceedings of the NewsIR’18 stored in JSON format, with each line representing Workshop at ECIR, Grenoble, France, 26-March-2018, pub- lished at http://ceur-ws.org 1 http://trec.nist.gov/ a single news or blog document. Each document has ever. IR and NLP technology can support journalists meta-data including article title, original article URL, in suggesting links to articles and entities that provide author, date of publication, and sources for text and background and promote a deeper understanding of a embedded media. For more information on how to news story. For the first tasks in the News Track, we obtain the TREC Washington Post collection, visit have chosen to work on background linking and entity http://trec.nist.gov/data/wapost/. ranking. We also have a reformatted dump of English Wikipedia from close to the time of the latest news 3.1 Tasks 1: Background Linking articles available for download. The main task for this new track will be “Background Linking”, defined as follows: given a news article, the 3 Tasks system should retrieve other news articles that pro- vide important context and/or background informa- On news outlets’ websites, article content and hyper- tion that helps the reader better understand the query links are used to provide context and background. In article. This task is essentially an ad hoc search with a other words, browsing is not arbitrary but is guided specialized relevance criterion. Relevance for this task through stories in the sidebar and hyperlinks in the will be graded along a categorized scale: story to permit the reader to read more deeply. On the Washington Post’s website, for example, related 0: the document provides little or no useful back- stories are manually linked both on the side and at ground or contextual information that would help the end of articles, and links within the article fre- the user understand the broader context of the quently link to related stories or further information query article. about entities in the story. However creating such links manually is a tedious 1: the document provides some useful background . . . and cost-ineffective process. It is not surprising that crucial background stories as previously reported or 2: the document provides significant useful back- externally available are not always provided. Consider ground . . . for example an article on February 4, 2018 titled “N. 3: the document provides essential useful back- Korea to send nominal head of state to S. Korea”. ground . . . There is no single link to background information on the current state of the Korean conflict (other than 4: the document MUST appear in the sidebar; one about Kim Jong Un’s sister that was generated otherwise critical context is missing. at the time of accessing this article under “Most Read World” dated later than the current article) , but there We will refine this category scale with the help of are no links to recent stories such as “Hot heads or our partners at the Washington Post. The critical cold feet? North Korea’s mixed Olympic messages” points are that relevance hinges on providing “useful and “North Korean athletes arrive in South Korea for background information or context”, and that there Olympics” just reported a few days earlier, or “North are levels that align with utility for the reader. We an- Korea agrees to send athletes to Winter Olympics, ticipate that these relevance judgments would be made South says” and “Vice President Pence will lead U.S. at NIST by NIST assessors, with training support from delegation to Olympics in South Korea” a month be- journalists and data scientists at the Washington Post. fore. There was also a report back in 2014 about the As a research problem, we would like to investigate North’s high-level visit to the South at the end of the how this relevance criterion differs from “traditional Asian Games, titled “North Korean officials pay rare topical relevance” both in how it is applied by asses- and surprising visit to the South”. Needless to say, sors and how it measures systems differently. To that many names mentioned in the current story have ap- end, we may also ask whether the article is topically peared in previous news articles and/or have entries relevant to the query article. This could be imple- in other online resources such as Wikipedia. If the mented by adding one more level to the above scale to journalist had had at his/her disposal a utility that capture topical relevance. can automatically retrieve those relevant stories in or- We will use NDCG@5 [Jarvelin:2002] as the primary der of significance and link important entity mentions effectiveness measure: the sidebar has limited real es- to more in-depth articles about them elsewhere, s/he tate, and should ideally contain the best contextual- would have been able to make them available to the izing links. We will also report average precision and reader with much ease. the other standard trec eval measures. Getting context to the reader is very difficult in the The initial task is intentionally simple: we want to modern news landscape, but is more important than establish a baseline for the state of the art and use that performance to consider refinements to the task. References These might include: [Jarvelin:2002] K Järvelin and J Kekäläinen. Cu- mulated Gain-based Evaluation of IR Tech- • Having assessors cluster equivalent background niques. ACM Trans. Inf. Syst., 20(4):422– articles, to allow the measure to support “retrieve 446, October 2002. one of these critical articles”. • Snipped generation for the sidebar, where the snippet should provide the critical context with- out the need to click through. • Categories of background, for example about peo- ple and organizations. This would be measured using diversity metrics. 3.2 Task 2: Entity Ranking The second task is “Entity Ranking”: given an arti- cle, identify important entities mentioned in the article and rank those entities linkable to Wikipedia entries in the order of importance, in order to support the reader’s understanding of the story. An example of an important entity might be “the Supreme Court”, whereas an example of an unimportant entity might be “Washington” in a dateline. By structuring this as a ranking task, we are sepa- rating out the core NLP problems of entity detection and linking from determining importance to the user. The provided mentions and links may be useful to re- searchers working on entity extraction as well. Again working with criteria developed in conjunc- tion with Post staff, we will identify the top entities in each article along a graded relevance scale, and mea- sure the task as a retrieval task using nDCG. 4 SUMMARY We set out the New Track with two initial tasks, Back- ground Linking and Entity Ranking, which we believe are valuable to both the news creator and consumer. At the time of submitting this position paper, we are still in the process of refining the tasks and perfor- mance measurements via working with journalists and data scientists at the Washington Post and researchers in the IR and NLP communities. We welcome feed- back and suggestions on the current tasks as well as recommendations on future tasks. We also encourage participation from researchers around the world. 4.0.1 Acknowledgements Special thanks to Sam Han at the Washington Post for coordinating the efforts between the two organi- zations. We also appreciate the input from the other team members of the Retrieval Group at NIST.