FootballWhispers: Transfer Rumour Detection Neil Ireson1 and Fabio Ciravegna1 Sheffield University, Sheffield, UK Abstract. Social media has been shown to have potential to predict various real world events, such as movements in the stock market and the outcomes of political elections. In this paper we present the Football Whispers (FW), a website dedicated to fans discussing transfer rumours. The unique selling point of the site is that it provides a crowdsourced as- sessment of those rumours, measuring the relative likelihood of a player’s movements from social media chatter. This talk will focus on the rumour identification process, highlighting the role of open knowledge graphs and linked data to augment a domain knowledge-based to enable effective Named Entity Linking in noisy, informal social media messages. 1 Introduction Football is the world’s most popular sport, it is responsible for generating the ma- jority of sporting revenue, in excess of $20 billion per annum, and it is estimated that over a billion people can be counted as football fans. Social media has be- come an important tool for maintaining the relationship between fans and their teams. With Twitter activity assured by both the substantial global fan base and dedicated localised fans. Football Whispers (FW) (www.footballwhispers.com) provides a service focused on the rumoured transfer of players between teams. A fundamental feature of FW is the estimation of the veracity of the rumours; providing a relative likelihood of a player moving to a given team. The likeli- hood is determined using the Twitter conversations concerning transfers, this involves two processes: Named-Entity Linking (NEL); and the assessment and combination of rumours to determine their relative likelihood. 2 Named Entity Linking & Rumour Detection The NEL process involves the identification of entities (people and teams) men- tioned in tweets and the disambiguation of those mentions; linking them to specific instances in a domain knowledge base (KB). The inherent issues when dealing with such short text with a high degree of abbreviated, informal lan- guage use is accentuated by the international, and thus multilingual, conversa- tions concerning football. The initial domain KB is provided by Opta Sports (http://www.optasports.com/), while Opta is arguably the de facto source of football related information it only provides the official and possibly a single more common name. The 41,238 active players, from 3,266 teams, have only 2 46,631 name variations; with 17,489 (42%) players sharing a last name and 785 (2%) have identical names. In order to increase the number of alternative names the Opta entities are mapped to Wikidata and DBpedia entities. Wikidata con- tains 210,375 players and 30,710 teams, although a large number of the players are inactive, in order to identify potentially ambiguity it is necessary to include all names which may be mentioned. In addition to players, 11,439 managers, pundits, referees, etc. are also extracted. The talk will describe the entity map- ping process, which considers the similarity of entities’ available features, e.g. string similarity of names and numerical distance of dates, with the importance of a features being weighted according to the degree of variation in its values. The mapping process was also applied to DBpedia, which contains 126,790 play- ers; this resulted in a slight increase in alternative names, primarily due to the DBpedia extraction of nicknames. In total 32,754 (80%) of Opta players are mapped, and for these players name variations are tripled to 95,535. The team and player names are then used to generate a Deterministic Finite Automata to efficiently extract candidate entity mentions from the message text. The talk will describe how the contextual disambiguation processes are used to link mentions to an entity instance, where other entity candidates in the message provide the context. A name which occurs frequently in small number of (expected) contexts (e.g. player name mentioned only with their team) is deemed to maintain its meaning outside those contexts, while a name which occurs in multiple (unexpected) contexts is deemed too ambiguous to be used for entity linking when not contextualised. In addition, the message language is also considered, as names can be ambiguous within a given language context. Evidence for a rumour is given by a message containing player and team entities, and at least one transfer term. The talk will briefly outline the four determinants of the veracity of a rumour: consensus (amount of evidence), re- cency/constancy (evidence time decay), authority (evidence sources) and coher- ence/consistency (evidence is not contradictory). 3 Football Whispers In order to select tweets, which belong to the football domain, team names are used to filter the messages, this results in between 1-2 million tweets per day, and despite only English team names being used in the filter almost 56% of messages are in other languages. The rumour detection system processes the messages in real-time and the resultant rumours and likelihoods are validated by FW experts before appearing on the website. The talk will present the evaluation of the rumour detection and show how the use of knowledge graph (KG) data has led to significantly increased player NEL performance, and identification of the rumours concerning actual 2016 football transfers. Primarily this is due to the availability of multilingual data in the KGs and the use of language agnostic statistical disambiguation techniques in NEL. The success of FW (which currently has two million users) has now led to the developed of Sports Whispers and the application of this approach to other sporting domains.