<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main"></title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">E403F90FA341C2279896CE02C3EEF267</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T10:41+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract/>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">IMPLEMENTATION</head><p>The backend of the news recommender prototype developed is constructed as a pipeline of operations transforming Rich Site Summary (RSS) entries and raw text data into a semantic and searchable representation. The pipeline and its operations are implemented with using the Apache Storm<ref type="foot" target="#foot_0">2</ref> framework. This distributed computing framework enables scalability and ability to handle large amounts of news items from a magnitude of publishers continuously.</p><p>There are five steps involved in the data processing. The first step creates an input stream by continuously monitoring a set of RSS feeds from a wide range of news publishers. Whenever a new news item occurs, RSS entry properties such as the title, lead text and HTML sources are retrieved. The HTML sources are parsed and cleaned to extract a representative body text. In the second step, natural language processing operations such as language identification, sentence detection and part-ofspeech tagging is applied to extract entity mentions from the textual data. The third step uses supervised models to map entity mentions to referent entities in the WikiData knowledge bases. These models combine textual similarities, WikiData graph relations and entity frequencies and cooccurrence statistics to classify the relevance of multiple referent candidates. First Story Detection is applied in the fourth step to group news items describing the same news story. In the fifth step this semantic representation is indexed and made searchable. As this backend architecture is stream based, it is able to index and promote recent news items soon after they are discovered.</p><p>WikiData is the community-created knowledge base of Wikipedia <ref type="bibr" target="#b13">[13]</ref>. Since its public launch in 2012, the knowledge base has gathered more than 15 millions entities, including more than 34 million statements and over 80 million labels and descriptions in more than 350 languages <ref type="bibr" target="#b4">[4]</ref>. Most geographical entities in WikiData provide a reference to Geonames containing more detailed geographical properties. In the implementation of the Smartmedia prototype, the entity information from these knowledge bases where indexed in a Lucene<ref type="foot" target="#foot_1">3</ref> based search index. This index makes the entities searchable and creates a foundation for addressing entity labels, descriptions and aliases, entity relations and geospatial properties. Figure <ref type="figure">1</ref> shows an example of a news article from the Guardian where the text is parsed and enriched with WikiData entity annotations. The fields and nested data structure in this figure are similar to how the news stories are stored and indexed in the Lucene based index. By running the news text from the news article in the figure through the data processing pipeline, we identified nine WikiData entities, including Bedfordshire, Home Office and Theresa May. Note that the news texts and list of entities and associations in the figure is shortened. All entities contain a textual description and a list of associations. These associations are typed relations to other WikiData entities. We can see that Bedfordshire contains eight such entity associations. Examples of entities linked and related to Bedfordshire are the instance of relations to Ceremonial county of England and Administrative territorial entity of the United Kingdom. Both Bedfordshire and Home Office are additionally described with geospatial properties. In this case the geospatial properties are longitude -latitude pairs, but the implementation allows for any geo spatial shape decribed as valid Geojson 4 .</p><p>When a user is opening the news app on the mobile a request containing user id, location and preferences are sent to the backend. Here, a multi factor search query is formed to retrieve relevant news entries from the index.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">USER INTERFACE</head><p>A web-based and responsive user interface is developed to make the news stream contents explorable on mobile devices. In this interface, the user is allowed to extract news items that are relevant to the geo special locality context, personal interests and given point of time. These three relevance factors are customizable and the user can select whether or not they should influence the retrieved news items.</p><p>To customize the geographical locality, the user specifies a circular relevance region on a map. Figure <ref type="figure">2a</ref> shows an example of such a relevance region. By default, the relevance region is set to users current GPS location with a 50 km radius. By moving the region or modifying the radius, users can generate a local newspaper for any region of the world. If the location factor is disabled, it means that the system is recommending news from any location in the world and news that are not containing location information.</p><p>In the current Smartmedia prototype, we have predefined a handful of user interest profiles. Each user profile contains an alias and a weighted vector of WikiData entities. Examples of predefined profiles in the system are stock trader, soccer fan, technology geek, etc. By selecting any of these interest profiles, the retrieved news will be influenced and biased towards the interest topics. When the personal interest factor is disabled, the user retrieve a news composition which is general and without such bias.</p><p>By changing the time-factor, the user is presented with a calendar where can move in time and retrieve either recent or historic news items. When, the time-factor is disabled the user will retrieve news solely based on the other relevance factors (location and personal interests). Figure <ref type="figure">2b</ref> shows an example of how news stories are presented. Here we see the same article as we had in Figure <ref type="figure">1</ref>. The three circular buttons on the bottom of the screen allow users to toggle whether their locality, personal interest profile and time setting such influence news story retrieval.</p><p>By clicking on a news story, the user gets the ingress of the news story and a list of the most salient entities for the selected news story. Figure <ref type="figure">2c</ref> shows the ingress and relevant WikiData entities from the news article about Theresa May. As we can see, our news story about politics and terror related to Syria, Theresa May, ISIL and Sky News. By hovering these items, the user is presented with their textual WikiData description. On figure <ref type="figure">1c</ref>, we can see that the WikiData entity for Theresa May contains the description "British politician".</p><p>In general, the three buttons at the bottom of the screen for location, interest profile and time can at any time be activated and de-activated in combinations to provide very different recommendation strategies. For example, keeping all buttons active with default parameters means that the system will recommend news articles that have recently takes place in the vicinity of the reader and are consistent with her profile. A screencast video describing the features of the system and its user interface is available at https://vimeo.com/121835936</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">CONCLUSIONS AND FUTURE WORK</head><p>Many see the full stack of semantic web technologies as a complex implementation of some really simple and good ideas about adding meaning to data. There are great rewards in understanding the full stack and what it can do, but most news organizations find great rewards by looking into linked data in combination with traditional information retrieval techniques.</p><p>In this paper we have shown a prototype of a news recommender system that demonstrates some of the context and geo spatial aware features online news services can achieve by using available and open knowledge bases and data processing and storage technologies.</p><p>Future work for the Smartmedia prototype will focus on improvement on entity linking qualities and evaluations of user needs. The user evaluations will look into to which extent users find the ability to control their news feed in terms of location, interest profile and time valuable and useful. articleId: "Guardian_254439378" type: "article" title: "Theresa May 'allowed state-sanctioned abuse of women' at Yarl's Wood" leadText: "Shadow home secretary criticises minister after TV documentary alleges rape and self-harm at detention centre were ignoredTheresa May, the home secretary, has been accused of allowing the "state-sponsored abuse of women" at the Yarl's Wood detention centre after a Channel 4 investigation uncovered guards ignoring self-harm and referring to inmates in racist terms.Yvette Cooper..." entities: <ref type="bibr" target="#b9">[ 9]</ref> 0: { entityId: "Q23143" name: "Bedfordshire" description: "county in England" associations: [ .. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 .Figure 2 .</head><label>12</label><figDesc>Figure 1. Example of a news article enriched with WikiData entities.</figDesc><graphic coords="3,54.14,351.22,499.20,330.48" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">http://storm.apache.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">https://lucene.apache.org/core/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">http://geojson.org/</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title/>
		<author>
			<persName><surname>References</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Stories around You: Locationbased Serendipitous Recommendation of News Articles</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Asikin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wörndl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of 2nd International Workshop on News Recommendation and Analytics</title>
				<meeting>2nd International Workshop on News Recommendation and Analytics</meeting>
		<imprint>
			<date type="published" when="2014">2014. 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">I</forename><surname>Cantador</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bellogín</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Castells</surname></persName>
		</author>
		<title level="m">News@ hand: A semantic web approach to recommending news</title>
				<imprint>
			<date type="published" when="2008">2008. 2008</date>
		</imprint>
	</monogr>
	<note>Adaptive hypermedia and adaptive web-based systems</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Ontology-based personalised and context-aware recommendations of news items</title>
		<author>
			<persName><forename type="first">I</forename><surname>Cantador</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bellogín</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Castells</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology</title>
				<meeting>the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology</meeting>
		<imprint>
			<date type="published" when="2008">2008. 2008</date>
			<biblScope unit="volume">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Introducing Wikidata to the Linked Data Web</title>
		<author>
			<persName><forename type="first">F</forename><surname>Erxleben</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Günther</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Krötzsch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web-ISWC</title>
				<imprint>
			<date type="published" when="2014">2014. 2014. 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">News personalization using the CF-IDF semantic recommender</title>
		<author>
			<persName><forename type="first">F</forename><surname>Goossen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ijntema</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Web Intelligence, Mining and Semantics (WIMS)</title>
				<meeting>the International Conference on Web Intelligence, Mining and Semantics (WIMS)</meeting>
		<imprint>
			<date type="published" when="2011">2011. 2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Learning User Profiles in Mobile News Recommendation</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Gulla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Ingvaldsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">D</forename><surname>Fidjestøl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Nilsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">R</forename><surname>Haugen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Su</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Print and Media Technology Research. II</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="183" to="194" />
			<date type="published" when="2013">2013. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Ontology-based news recommendation</title>
		<author>
			<persName><forename type="first">W</forename><surname>Ijntema</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Goossen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2010 EDBT/ICDT Workshops</title>
				<meeting>the 2010 EDBT/ICDT Workshops</meeting>
		<imprint>
			<date type="published" when="2010">2010. 2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Building rich user profiles for personalized news recommendation</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Meguebli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kacimi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of 2nd International Workshop on News Recommendation and Analytics</title>
				<meeting>2nd International Workshop on News Recommendation and Analytics</meeting>
		<imprint>
			<date type="published" when="2014">2014. 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A survey on challenges and methods in news recommendation</title>
		<author>
			<persName><forename type="first">O</forename><surname>Ozgobek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gulla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Erdur</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th International Conference on Web Information System and Technologies (WEBIST</title>
				<meeting>the 10th International Conference on Web Information System and Technologies (WEBIST</meeting>
		<imprint>
			<date type="published" when="2014">2014. 2014. 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Reading news with maps by exploiting spatial synonyms</title>
		<author>
			<persName><forename type="first">H</forename><surname>Samet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sankaranarayanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Lieberman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Adelfio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">C</forename><surname>Fruin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Lotkowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Panozzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sperling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">E</forename><surname>Teitler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">57</biblScope>
			<biblScope unit="page" from="64" to="77" />
			<date type="published" when="2014-09">2014. Sep. 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Tailored news in the palm of your hand: a multi-perspective transparent approach to news recommendation</title>
		<author>
			<persName><forename type="first">M</forename><surname>Tavakolifard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Gulla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">C</forename><surname>Almeroth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Ingvaldesn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Nygreen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Berg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">WWW &apos;13 Companion Proceedings of the 22nd International Conference on World Wide Web</title>
				<imprint>
			<date type="published" when="2013-05">2013. May 2013</date>
			<biblScope unit="page" from="305" to="308" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">NewsStand: A new view on news</title>
		<author>
			<persName><forename type="first">B</forename><surname>Teitler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lieberman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems</title>
				<meeting>the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems</meeting>
		<imprint>
			<date type="published" when="2008">2008. 2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Wikidata: a free collaborative knowledgebase</title>
		<author>
			<persName><forename type="first">D</forename><surname>Vrandečić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Krötzsch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<date type="published" when="2014">2014. 2014</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
