3. IMPLEMENTATION allowed to extract news items that are relevant to the geo special locality The backend of the news recommender prototype developed is constructed context, personal interests and given point of time. These three relevance as a pipeline of operations transforming Rich Site Summary (RSS) entries factors are customizable and the user can select whether or not they should and raw text data into a semantic and searchable representation. The influence the retrieved news items. pipeline and its operations are implemented with using the Apache Storm2 To customize the geographical locality, the user specifies a circular framework. This distributed computing framework enables scalability and relevance region on a map. Figure 2a shows an example of such a ability to handle large amounts of news items from a magnitude of relevance region. By default, the relevance region is set to users current publishers continuously. GPS location with a 50 km radius. By moving the region or modifying the There are five steps involved in the data processing. The first step creates radius, users can generate a local newspaper for any region of the world. If an input stream by continuously monitoring a set of RSS feeds from a the location factor is disabled, it means that the system is recommending wide range of news publishers. Whenever a new news item occurs, RSS news from any location in the world and news that are not containing entry properties such as the title, lead text and HTML sources are location information. retrieved. The HTML sources are parsed and cleaned to extract a In the current Smartmedia prototype, we have predefined a handful of user representative body text. In the second step, natural language processing interest profiles. Each user profile contains an alias and a weighted vector operations such as language identification, sentence detection and part-of- of WikiData entities. Examples of predefined profiles in the system are speech tagging is applied to extract entity mentions from the textual data. stock trader, soccer fan, technology geek, etc. By selecting any of these The third step uses supervised models to map entity mentions to referent interest profiles, the retrieved news will be influenced and biased towards entities in the WikiData knowledge bases. These models combine textual the interest topics. When the personal interest factor is disabled, the user similarities, WikiData graph relations and entity frequencies and co- retrieve a news composition which is general and without such bias. occurrence statistics to classify the relevance of multiple referent candidates. First Story Detection is applied in the fourth step to group By changing the time-factor, the user is presented with a calendar where news items describing the same news story. In the fifth step this semantic can move in time and retrieve either recent or historic news items. When, representation is indexed and made searchable. As this backend the time-factor is disabled the user will retrieve news solely based on the architecture is stream based, it is able to index and promote recent news other relevance factors (location and personal interests). items soon after they are discovered. Figure 2b shows an example of how news stories are presented. Here we WikiData is the community-created knowledge base of Wikipedia[13]. see the same article as we had in Figure 1. The three circular buttons on Since its public launch in 2012, the knowledge base has gathered more the bottom of the screen allow users to toggle whether their locality, than 15 millions entities, including more than 34 million statements and personal interest profile and time setting such influence news story over 80 million labels and descriptions in more than 350 languages[4]. retrieval. Most geographical entities in WikiData provide a reference to Geonames containing more detailed geographical properties. In the implementation By clicking on a news story, the user gets the ingress of the news story and of the Smartmedia prototype, the entity information from these knowledge a list of the most salient entities for the selected news story. Figure 2c 3 bases where indexed in a Lucene based search index. This index makes shows the ingress and relevant WikiData entities from the news article the entities searchable and creates a foundation for addressing entity about Theresa May. As we can see, our news story about politics and labels, descriptions and aliases, entity relations and geospatial properties. terror related to Syria, Theresa May, ISIL and Sky News. By hovering these items, the user is presented with their textual WikiData description. Figure 1 shows an example of a news article from the Guardian On figure 1c, we can see that the WikiData entity for Theresa May where the text is parsed and enriched with WikiData entity annotations. contains the description “British politician”. The fields and nested data structure in this figure are similar to how the news stories are stored and indexed in the Lucene based index. By running In general, the three buttons at the bottom of the screen for location, the news text from the news article in the figure through the data interest profile and time can at any time be activated and de-activated in processing pipeline, we identified nine WikiData entities, including combinations to provide very different recommendation strategies. For Bedfordshire, Home Office and Theresa May. Note that the news texts and example, keeping all buttons active with default parameters means that the list of entities and associations in the figure is shortened. All entities system will recommend news articles that have recently takes place in the contain a textual description and a list of associations. These associations vicinity of the reader and are consistent with her profile. A screencast are typed relations to other WikiData entities. We can see that video describing the features of the system and its user interface is Bedfordshire contains eight such entity associations. Examples of entities available at https://vimeo.com/121835936 linked and related to Bedfordshire are the instance of relations to Ceremonial county of England and Administrative territorial entity of the United Kingdom. Both Bedfordshire and Home Office are additionally described with geospatial properties. In this case the geospatial properties 5. CONCLUSIONS AND FUTURE WORK are longitude – latitude pairs, but the implementation allows for any geo Many see the full stack of semantic web technologies as a complex spatial shape decribed as valid Geojson . 4 implementation of some really simple and good ideas about adding meaning to data. There are great rewards in understanding the full stack When a user is opening the news app on the mobile a request containing and what it can do, but most news organizations find great rewards by user id, location and preferences are sent to the backend. Here, a multi looking into linked data in combination with traditional information factor search query is formed to retrieve relevant news entries from the retrieval techniques. index. In this paper we have shown a prototype of a news recommender system that demonstrates some of the context and geo spatial aware features online news services can achieve by using available and open knowledge bases and data processing and storage technologies. 4. USER INTERFACE A web-based and responsive user interface is developed to make the news Future work for the Smartmedia prototype will focus on improvement on stream contents explorable on mobile devices. In this interface, the user is entity linking qualities and evaluations of user needs. The user evaluations will look into to which extent users find the ability to control their news feed in terms of location, interest profile and time valuable and useful. 2 http://storm.apache.org/ 3 https://lucene.apache.org/core/ 4 http://geojson.org/ articleId: "Guardian_254439378" type: "article" title: "Theresa May 'allowed state-sanctioned abuse of women' at Yarl's Wood" leadText: "Shadow home secretary criticises minister after TV documentary alleges rape and self-harm at detention centre were ignoredTheresa May, the home secretary, has been accused of allowing the “state-sponsored abuse of women” at the Yarl’s Wood detention centre after a Channel 4 investigation uncovered guards ignoring self-harm and referring to inmates in racist terms.Yvette Cooper..." entities: [ 9] 0: { entityId: "Q23143" name: "Bedfordshire" description: "county in England" associations: [ ... 8] shape: { type: "Point" coordinates: [ 2] 0: -0.41666666666667 1: 52.083333333333 } } 1: { entityId: "Q763388" name: "Home Office" description: "ministerial department of the Government of the United Kingdom" associations: [ ... 3] shape: { type: "Point" coordinates: [ 2] 0: -0.129948 1: 51.4958 } } 2: { entityId: "Q264766" name: "Theresa May" description: "British politician" associations: [ ... 21]} } Figure 1. Example of a news article enriched with WikiData entities. a) b) c) Figure 2. Screenshots from the Smartmedia prototype. a) The map query interface. b) Presentation of news stories. c) Presentation of news details. [8] Meguebli, Y. and Kacimi, M. 2014. Building rich user profiles 6. REFERENCES for personalized news recommendation. Proceedings of 2nd International Workshop on News Recommendation and [1] Asikin, Y. and Wörndl, W. 2014. Stories around You: Location- Analytics. (2014). based Serendipitous Recommendation of News Articles. [9] Ozgobek, O., Gulla, J. and Erdur, R. 2014. A survey on Proceedings of 2nd International Workshop on News challenges and methods in news recommendation. In Recommendation and Analytics. (2014). Proceedings of the 10th International Conference on Web [2] Cantador, I., Bellogín, A. and Castells, P. 2008. News@ hand: A Information System and Technologies (WEBIST 2014). (2014). semantic web approach to recommending news. Adaptive [10] Samet, H., Sankaranarayanan, J., Lieberman, M.D., Adelfio, hypermedia and adaptive web-based systems. (2008). M.D., Fruin, B.C., Lotkowski, J.M., Panozzo, D., Sperling, J. [3] Cantador, I., Bellogín, A. and Castells, P. 2008. Ontology-based and Teitler, B.E. 2014. Reading news with maps by exploiting personalised and context-aware recommendations of news spatial synonyms. Communications of the ACM. 57, 10 (Sep. items. Proceedings of the 2008 IEEE/WIC/ACM International 2014), 64–77. Conference on Web Intelligence and Intelligent Agent [11] Tavakolifard, M., Gulla, J.A., Almeroth, K.C., Ingvaldesn, J.E., Technology. 1, (2008). Nygreen, G. and Berg, E. 2013. Tailored news in the palm of [4] Erxleben, F., Günther, M. and Krötzsch, M. 2014. Introducing your hand: a multi-perspective transparent approach to news Wikidata to the Linked Data Web. The Semantic Web–ISWC recommendation. WWW ’13 Companion Proceedings of the 2014. (2014). 22nd International Conference on World Wide Web. (May [5] Goossen, F. and IJntema, W. 2011. News personalization using 2013), 305–308. the CF-IDF semantic recommender. Proceedings of the [12] Teitler, B. and Lieberman, M. 2008. NewsStand: A new view on International Conference on Web Intelligence, Mining and news. Proceedings of the 16th ACM SIGSPATIAL international Semantics (WIMS). (2011). conference on Advances in geographic information systems. [6] Gulla, J.A., Ingvaldsen, J.E., Fidjestøl, A.D., Nilsen, J.E., (2008). Haugen, K.R. and Su, X. 2013. Learning User Profiles in Mobile [13] Vrandečić, D. and Krötzsch, M. 2014. Wikidata: a free News Recommendation. Journal of Print and Media Technology collaborative knowledgebase. Communications of the ACM. Research. II, 3 (2013), 183–194. (2014). [7] IJntema, W. and Goossen, F. 2010. Ontology-based news recommendation. Proceedings of the 2010 EDBT/ICDT Workshops. (2010).