UrbanMatch – linking and improving Smart Cities Data Irene Celino Simone Contessa Marta Corubolo CEFRIEL, Italy CEFRIEL, Italy Politecnico di Milano, Italy irene.celino@cefriel.it simone.contessa@cefriel.it marta.corubolo@polimi.it Daniele Dell’Aglio Emanuele Della Valle Stefano Fumeo CEFRIEL, Italy Politecnico di Milano, Italy CEFRIEL, Italy daniele.dellaglio@cefriel.it emanuele.dellavalle@polimi.it stefano.fumeo@cefriel.it Thorsten Krüger SIEMENS, Germany thorsten.krueger@siemens.com ABSTRACT curated dataset available in this field is GeoNames3 . Re- Urban-related data and geographic information are becom- cently, open Web APIs like Open Street Map4 and geo- ing mainstream in the Linked Data community due also to graphic datasets from public administrations were (partially) the popularity of Location-based Services. In this paper, we turned into a Linked Data form by efforts like LinkedGeo- introduce the UrbanMatch game, a mobile gaming applica- Data5 [2] and the Spanish GeoLinkedData.es6 . tion that joins data linkage and data quality/trustworthiness Still this massive bulk of urban data is largely unexplored assessment in an urban environment. By putting together and poorly exploited [15]. Two main drawbacks hamper Linked Data and Human Computation, we create a new in- a larger adoption of Linked Data in Smart Cities scenar- teraction paradigm to consume and produce location-specific ios: the doubtful quality of the available information and the linked data by involving and engaging the final user. The lack of user-centred tools to consume such data. Those two UrbanMatch game is also offered as an example of value (well-known) problems create a vicious cycle: the unreliable proposition and business model of a new family of linked quality of data makes people distrust Linked Data content data applications based on gaming in Smart Cities. and the lack of usable tools prevent people from contributing to the Linked Data improvement. Our research hypothesis consists in employing user-friendly 1. INTRODUCTION mobile gaming applications to engage people in mobility; Urban environments are experiencing a progressive digi- through those games, Linked Data related to urban envi- tization that is leading to the creation and release of large ronments are consumed, created, improved and corrected. amounts of data: information about interesting aspects of While a similar approach was already partially explored for cities – ranging from street topology and traffic conditions to Linked Data at large [22], we believe that the popularity of business activities, from points of interest (POI) to events, location-based services (LBS) can make this approach suc- from environmental measures to people life-logs – are in- cessful for urban-specific Linked Data: people are more and creasingly present on the Web and even proactively fed by more used to “check-in” physical places with their mobile the open community. The growing attention given to Smart devices and to add small bits of information related to their Cities themes and problems and the ever increasing popu- activities and actions in the physical world. larity of location-based applications make urban ecosystems In this paper we present our first experiment to prove at the center of the research and innovation agenda of public our hypothesis: UrbanMatch is a mobile and location-aware authorities and big industrial players such as IBM with its Game with a Purpose that engages players to provide infor- Smarter Planet initiative1 , and CISCO with its Smart+Con- mation related to the city of Milano. Specifically, Urban- nected Communities initiative2 . Match is aimed at linking points of interests in the city with Also the Linked Data world turned to urban data, espe- the most representative photos retrieved from Web sources. cially in the area of spatial information. One of the most The remainder of this paper is organized as follows. Sec- tion 2 introduces motivation and related work; Section 3 ex- 1 http://www.ibm.com/uk/smarterplanet plains the mechanics and purpose of the UrbanMatch game, 2 http://www.cisco.com/web/strategy/smart_ while the evaluation methodology is shown in Section 4; fi- connected_communities.html nally Section 5 draws some conclusions and future works. 2. MOTIVATION AND BACKGROUND Our work focuses on link creation and quality assessment for Linked Data in urban scenarios. In this section, we il- 3 http://geonames.org/ 4 http://www.openstreetmap.org/ 5 Copyright is held by the author/owner(s). http://linkedgeodata.org/ 6 LDOW2012 April 16, 2012, Lyon, France. http://geo.linkeddata.es/ lustrate the motivation problem, explain the specificity of [17] integrated LinkedGeoData POI, OpenStreetMap streets Smart City solutions and introduce and discuss related work. and a private dataset describing Seoul road signs to check the validity of all road sign information. Finally, the mobile 2.1 Data Quality and Linked Data app BOTTARI [4] explored social media to provide location- Data Quality [19] is the discipline that studies the most based recommendations for POIs. appropriate and relevant features to describe the value of These experiences allow us to assert that: a) the quality data. Examples of dimensions defined by Data Quality are of urban Linked Data unpredictably ranges from very good consistency, completeness, accuracy, and relevance. to very poor; b) it is possible to detect missing information A key point of Data Quality is that the quality measure- or inconsistencies by cross-validating datasets that describe ments are context-dependent: given a dataset, its quality the same urban space from different points of view (e.g., if can be very high with respect to the fulfilment of some tasks a road sign tells to turn right to reach a POI, and the POI but very bad for other ones. In other words, it is not rel- is nearby, but no street reaches it, a street is missing in the evant (and not always possible) to define absolute quality topology dataset); c) when inconsistencies or missing data values [18]. As pointed out in [16]: “The perception of in- are detected, data quality can be easily increased by a small formation quality on the WWW is highly dependent on the amount of manual work that does not require specific skills, fitness for use being relative to the specific task that users but often the physical presence in the urban environment. have at their hands”. Those three assertions may not be valid for Linked Data For the last years, the Linked Data community has started in general, but appear to be valid for urban Linked Data. to follow this topic with growing interest: the birth and The third point suggests us that a part of the manual work growth of the Linked Open Data (LOD) cloud introduced a required to fix urban Linked Data could be “crowd-sourced”. huge amount of RDF data distributed across several datasets. The recent popularity of Location-based Services (LBS) like The Linked Data best practices alone [14] assure more qual- Foursquare demonstrates that people are willing to share ity than “raw data” in closed databases because: a) data small bits of information on-the-go by exploiting their mo- becomes accessible over the Web rather than being closed bile devices capabilities. “Unlock your city” – the slogan up in silos; b) the use of shared vocabularies makes the data of Foursquare – reveals that LBS can be considered an ef- both easier to “read” (i.e. user information needs can be sat- fective means to collect useful information for Smart City isfied by a single SPARQL query instead of requiring many applications. dataset-specific queries) and easier to “interpret” (i.e. shared vocabulary semantics can be used to verify data integrity and/or infer implied data); c) the presence of links makes it 2.3 Related work also possible to verify consistency across different sources. Assessing data quality is a hard problem for computers. Still, problems related to the available data quality soon We, as humans, are perfectly capable of it, but we are not arose in the LOD cloud. While at the beginning the number necessarily willing to. Human Computation [25] and So- of statements and the number of links were used to estimate cial Computing [6], however, demonstrated that a number the relevance of the published data sets, more recently the of different “computations” can be carried out by groups of definition of more detailed and expressive metrics to describe people. In this area Games with a Purpose [24] (GWAP) the available data has become more and more important. emerged as a means to engage people to perform activities Flemming worked on the definition of quality criteria for that are almost trivial for humans and very complex for com- linked data sources [10]. Fürber and Hepp studied how to puters; these tasks range from labelling images to improve integrate the Data Quality Management processes into the web searching, from transcription of text (where OCR soft- Semantic Web. In [12] they presented their approach to use ware fails) to any activity requiring common sense or human SPARQL and SPIN to model Data Quality rules and execute experience. them, while in [13] they defined an ontology (named DQM The incentives to make people contribute to Human Com- ontology) to describe Data Quality dimensions in RDF. putation can be of different kinds: they can give the partic- However, the assessment of data quality factors like ac- ipant an explicit and concrete reward (like in the popular curacy, timeliness, completeness, relevance and comprehen- Amazon Mechanical Turk – also named MTurk9 – in which siveness of data is intrinsically a hard task that Linked Data people are paid to perform small and simple tasks) or they technologies do not make any easier. provide a different kind of implicit or more abstract return, for example by means of entertainment like in GWAPs. 2.2 Urban Linked Data In the Semantic Web community, Games with a Purpose In the last years we have been largely experimenting with were already used to cover the complete Semantic Web life- urban related linked (and non-linked) data in the develop- cycle [21]. A dedicated community portal was recently set ment of a number of Smart City demonstrators using Linked up10 to collect those games. A good showcase is the Linked Data related to urban environments. The Urban LarKC7 Data Movie Quiz11 [1], that builds a cinematographic game [7] and then the Traffic LarKC8 [3] integrated DBpedia, an based on the available movie-related Linked Data showing Eventful wrapper and two Milano municipality’s datasets that “the answers are out there; and so are the questions”. with all Milano streets and three years of traffic sensors Recently, [20] investigated which Linked Data management data; those applications made it possible to answer queries tasks can be easily and semi-automatically turned into crowd- like “which are the modern art exhibitions that I can reach sourcing assignments for the MTurk platform. today in less than 25 minutes if I can get into my car this afternoon at 4pm?” [8]. The Seoul Road Sign Management 9 http://mturk.com/ 7 10 http://larkc.cefriel.it/alpha-Urban-LarKC/ http://www.semanticgames.org/ 8 11 http://larkc.cefriel.it/traffic-larkc/ http://lamboratory.com/hacks/ldmq/ 3. THE URBANMATCH GAME cannot be considered as representative for the game purpose 12 UrbanMatch is a mobile location-based Game with a (e.g., a photo is actually taken in the proximity of a POI, Purpose that aims at selecting the most representative pho- but it does not depict it or it is focused on an irrelevant tos related to the points of interest (POI) in an urban envi- detail). ronment; more specifically, UrbanMatch is oriented to link Those candidate links are expressed as RDF links using the monuments and relevant places of the city of Milano the foaf:depiction predicate as explained before; however, with their respective photos as retrieved from social media those links are further annotated with a confidence value Web sites and to “rank” those links, so to identify the most that expresses the lack of certainty about their trustworthi- characteristic ones and to discard the others, thus improving ness (e.g., the initial confidence of links to Wikimedia images the quality. is set to 60%, links to Flickr to 40%). 3.1 Data input 3.2 Gameplay The UrbanMatch game is a photo coupling game. The game mechanics respects the best practice of casual games Places & POIs from OpenStreetMap and Games with a Purpose [11]: it consists in a simple and intuitive interface that presents the player with 8 photos of UrbanMatch server POIs in the vicinity of the player and asks for their coupling Trusted Manual source (cf. Figure 2). Selection of linked photos UrbanMatch clients Uncertain sources Wikimedia Commons Trusted links: foaf:depiction Figure 1: Input data sources and output links in UrbanMatch. The input data come from available Web sources (cf. Fig- ure 1). Points of interest in Milano were collected and cho- sen among those available from OpenStreetMap; an RDF description of those POIs is also available in LinkedGeo- Data, the linked data version of OpenStreetMap. For each of the 34 POIs of this set, we manually selected 5-6 photos depicting them (some chosen on the Web, some taken by ourselves). In this way we built a “trusted set” of 196 links that relate the POIs with their respective images; those links are expressed in the form: foaf:depiction . Figure 2: Screenshots of UrbanMatch. 6 08/02/2012 A much higher number of photos of Milano POIs was col- lected from Wikimedia Commons13 – the media collection of The links between the surrounding POIs and the pre- Wikipedia – and from Flickr14 , probably the most popular sented photos is not the same for all the presented 8 pho- social media sharing site dedicated to photos. The images tos: some links are certain, because they come from the were collected either by keyword/concept search (i.e., pho- trusted source; some are uncertain, because they are taken tos explicitly related to Milano POIs) or via location-based from the set of candidate links; finally, some are distractors, queries (e.g., search by geographical coordinates). Among i.e. they are not related to the surrounding POIs and are the collected photos, we considered only those released with used to check the reliability of players. Each game is orga- an open license, allowing for a free reuse of the image (like nized in multiple levels of increasing difficulty, i.e. with a CreativeCommons “Attribution” license). varying number of certain/uncertain/distracting links (the This second set of information is considered – for the game higher the level, the greater the uncertainty degree). purpose – an “uncertain source”: it consists of more than The user geographic position taken from the mobile de- 37,000 “candidate” links that relate the POIs with the images vice sensors computes the user proximity to the “playable that potentially depict them. This link-set is uncertain or places”, i.e. the game locations; even if the game allows for untrusted because the retrieved photos can be incorrect (i.e. playing from any place, the user location is used to distin- they are not related to Milano POIs even if returned by the guish between the choices operated “on site” – based on the search), their metadata can be wrong or incomplete or they player’s experience knowledge – and couples selected on the basis of the player’s domain knowledge. 12 http://bit.ly/urbanmatch The UrbanMatch game is available at http://bit.ly/ 13 http://commons.wikimedia.org/ um-itunes on the iTunes store. Being a research prototype, 14 http://www.flickr.com/ the game will be available only for a few months. 3.3 Game purpose and Data Analysis and quality links between Milano POIs and their related Through the UrbanMatch game we aim at identifying and photos – and the appraisal of the game “playability” – the selecting the POI-photo “correct” links among the candidate intrinsic fun or entertaining characteristic of the game. links in the uncertain input source. Through a Human Com- Regarding the assessment of the game purpose, we iden- putation approach, we aim at collecting evidences of players tified a number of metrics to measure the UrbanMatch ca- decisions to correlate images. pability to improve the “fitness-for-use” quality [19] of the The approach we follow to post-process the collected data urban-related data involved in the game. More specifically, is similar to that of other Games with a Purpose [24]. From we measure the completeness and the accuracy of the new the collection of all evidences of image coupling as per- trusted links produced by the game. formed by the game players, with majority voting and other We define completeness as the capability of the game to statistically-relevant algorithms [5], we alter the confidence assess all the input candidate links, deciding if they are ei- value of each POI-photo link. ther trustable or incorrect. The completeness is calculated by dividing the number of assessed links (i.e. the links that became either trusted or incorrect after the gameplay) by Post- Trusted Processing New the total number of input uncertain links. link POI trusted POI link We define accuracy as the capability of the game to make correct assessments about the input links, minimizing the Candidate UrbanMatch “false positive” outcomes (i.e., POI-photo links considered links players POI trustable but actually incorrect) and “false negative” out- POI Incorrect link discarded comes (i.e., POI-photo links considered incorrect but actu- ally trustable). To measure the game accuracy, we need to Figure 3: Computing UrbanMatch links. know the ground truth, thus we manually check the assessed links to identify the false positive/negative items. The ac- Intuitively, the game purpose processing is represented in curacy is then calculated by dividing the number of correct Figure 3: in each game level, trusted POI-photo links (the assessments (true positive and true negative items) by the green one on the left with Milano’s Duomo picture) are pre- total number of input uncertain links. sented together with a number of candidate links related to Our preliminary evaluation is based on an early set of the same POI (the yellow ones on the left, with uncertain played games: we collected evidences from 54 unique play- photos retrieved from Wikimedia Commons and Flickr as ers (not including the development team), who played 290 being related to Milano’s Duomo) and with a number of games for a total of 781 levels, in which they tested 2,006 distractors (for simplicity not drawn in the figure). uncertain links. Setting the thresholds on the confidence If players couple the same two photos, those photos rea- value to 70% and 20% for the upper and lower limits re- sonably belong to the same POI: thus, if a player associates spectively, the game assessed the correctness/incorrectness a trusted photo with an uncertain photo, the candidate link of 1,284 uncertain links, getting to an improvement of the related to the latter is given a sign of “trust” and its confi- global completeness from 1.54% to 4.98% with a final accu- dence value is increased. racy of 99.4% (4 false positive and 8 false negative links). On the contrary, if there is no evidence of the association On the other hand, it is clear that an important success between an uncertain photo and any other one, the candi- factor for Games with a Purpose lies in the gaming feature: date link to that photo is not validated: the lack of coupling the more engaging and entertaining the game, the higher the actions is considered as a sign of “distrust” and decreases the number of participant players and thus the collected data. confidence value of the candidate link. For those reasons, we believe that an important part of the The same POI-photo candidate link is given as input to UrbanMatch evaluation consists in assessing the “playabil- multiple users; each player action modifies the link confi- ity” of the game itself. dence value, by increasing or decreasing it. After a variable As suggested by several studies about traditional games [9, number of played games, this confidence value crosses some 23] and taking into consideration the peculiarities of Urban- thresholds, thus leaving its uncertainty status and becoming Match, we built an evaluation questionnaire that Urban- either a trusted link or an incorrect one. Match players can find at http://bit.ly/um-survey. The When the confidence value of a link becomes greater than questions are oriented at assessing the game characteristics a given upper threshold (e.g., 70%), the POI-photo link be- as well as “measuring” how much the purpose is hidden and comes “trustable” and is inserted in the trusted link-set; in immersed within the gameplay. the figure, the green link on the right is associated to Mi- At writing time, we collected the feedbacks of 12 players lano’s Duomo and it can be used to validate other candidate whose opinion was quite positive: UrbanMatch was evalu- links. Similarly, if the confidence value of a POI-photo link ated to be easy (91%) and clear (61%); most players “spread becomes smaller than a lower limit (e.g., 20%), the link is the word” suggesting their friends to play (64%) and sup- discarded and is no more given as input to other players, ported our hypothesis that the physical presence in the ur- like for the red one on the right of Figure 3 (in which the ban environment makes the gameplay easier (55%). photo evidently depicts Roma’s Colosseum). 5. CONCLUSIONS 4. EVALUATION While Linked Data research is continuously evolving and The evaluation of the UrbanMatch game is currently be- improving, new ways to interlink information from different ing performed and it is aimed at two different results: the independent sources are being formulated and explored. In assessment of the game “purpose” – the ability of our Hu- this paper, we presented our UrbanMatch application, a mo- man Computation approach to actually derive meaningful bile and location-based Game with a Purpose, oriented to create high-quality links between existing datasets, namely [6] C. Charron, J. Favier, and C. Li. Social Computing – OpenStreetMap, LinkedGeoData and Flickr. UrbanMatch How Networks Erode Institutional Power, And What is our first experiment to prove that mobile gaming applica- to Do About It. Forrester Research, 2006. tions can be successfully employed to consume, create and [7] E. Della Valle, I. Celino, and D. Dell’Aglio. The improve urban-related Linked Data; our early experience Experience of Realizing a Semantic Web Urban seems to confirm our research hypothesis, even if further Computing Application. Transactions in GIS, 14(2), evaluation is needed. 2010. It is also worth noting that the data gathered via Ur- [8] E. Della Valle, I. Celino, D. Dell’Aglio, F. Steinke, banMatch have a clear business value: linked and ranked R. Grothmann, and V. Tresp. Semantic Traffic-Aware photos of places represent a valuable dataset which can be Routing for the City of Milano using the LarKC used to improve a number of services ranging from image Platform. IEEE Internet Computing, 15(6), 2011. search to geo-marketing. More generally, Games with a Pur- [9] H. Desurvire, M. Caplan, and J. A. Toth. Using pose aimed at linking, collecting or correcting Linked Data heuristics to evaluate the playability of games. In related to urban spaces and Smart Cities constitute a po- Proceedings of CHI Conference 2004, 2004. tential business for a number of different stakeholders: local [10] A. Flemming and O. Hartig. Quality Criteria for public authorities, local businesses (shops, transportation Linked Data sources. Online at companies, tourism actors, etc.) and citizens. http://bit.ly/ld-quality. Different datasets can be created or improved via Human [11] T. Fullerton. Game Design Workshop, A Playcentric Computation approaches for local services, trade, tourism, Approach to Creating Innovative Games. Gama traffic optimization, environmental sustainability or simple Network Series. Morgan Kaufmann, 2008. information. This includes both urban related Linked Data and non-linked data such as municipality datasets or other [12] C. Fürber and M. Hepp. Using SPARQL and SPIN for private datasets related to Smart Cities. We believe that Data Quality Management on the Semantic Web. In applications like UrbanMatch show a clear business model BIS, pages 35–46, 2010. to be exploited with special regards to location-aware Linked [13] C. Fürber and M. Hepp. Towards a Vocabulary for Data. Data Quality Management in Semantic Web A final note on the “gaming” approach: data linkage and Architectures. In Proceedings of LWDM Workshop content creation or assessment tasks need an active and 2011, March 2011. productive engagement of users, but not all crowdsourcing [14] T. Heath and C. Bizer. Linked Data: Evolving the campaigns are successful and effective. Shaping the Human Web into a Global Data Space. Synthesis Lectures on Computation missions as gaming activities (like in Games the Semantic Web. Morgan & Claypool, 2011. with a Purpose) is a potential solution to this issue. A [15] P. Jain, P. Hitzler, P. Z. Yeh, K. Verma, and A. P. further research question could be oriented to prove that a Sheth. Linked data is merely more data. In AAAI funny or entertaining flavour can accomplish more than an Spring Symposium: Linked Data Meets AI, 2010. extrinsic reward: finding a good balance between the game [16] S. A. Knight and J. Burn. Developing a framework for rules and the purpose achievement is often a hard task. assessing information quality on the World Wide Web. Informing Science, 8, 2005. [17] T. Lee, S. Park, Z. Huang, and E. Della Valle. Toward Acknowledgements Seoul Road Sign Management on the LarKC Platform. This work was partially supported by Siemens Corporate In Posters and Demos of ISWC 2010, 2010. Research and Technologies and by the PlanetData EU proj- [18] F. Naumann and C. Rolker. Assessment Methods for ect (FP7-257641). Information Quality Criteria. In B. D. Klein and D. F. Rossin, editors, IQ. MIT, 2000. 6. REFERENCES [19] T. C. Redman. Data Quality: The Field Guide. Digital Press, 2001. [1] G. Álvaro and J. Álvaro. A Linked Data Movie Quiz: [20] E. Simperl, B. Norton, and D. Vrandecic. the answers are out there, and so are the questions. Crowdsourcing Tasks within Linked Data http://bit.ly/linkedmovies, Aug. 2010. Management. In Proceedings of COLD Workshop [2] S. Auer, J. Lehmann, and S. Hellmann. 2011, 2011. LinkedGeoData - Adding a Spatial Dimension to the [21] K. Siorpaes and M. Hepp. Games with a Purpose for Web of Data. In Proceedings of ISWC 2009, 2009. the Semantic Web. Intelligent Systems, 23:50–60, 2008. [3] I. Celino, D. Dell’Aglio, E. Della Valle, R. Grothmann, [22] K. Siorpaes and E. Simperl. Human Intelligence in the F. Steinke, and V. Tresp. Integrating Machine Process of Semantic Content Creation. World Wide Learning in a Semantic Web Platform for Traffic Web, 13(1-2), 2010. Forecasting and Routing. In Proceedings of IRMLES [23] P. Sweetser and P. Wyeth. Gameflow: a model for Workshop 2011, 2011. evaluating player enjoyment in games. Computers in [4] I. Celino, D. Dell’Aglio, E. Della Valle, Y. Huang, Entertainment, 3(3):3, 2005. T. Lee, S. Park, and V. Tresp. Making Sense of [24] L. von Ahn. Games with a Purpose. IEEE Computer, Location-based Micro-posts Using Stream Reasoning. 39(6):92–94, 2006. In Proceedings of #MSM Workshop 2011, 2011. [25] L. von Ahn. Human computation. In G. Alonso, J. A. [5] K. T. Chan, I. King, and M.-C. Yuen. Mathematical Blakeley, and A. L. P. Chen, editors, ICDE, pages 1–2. modeling of social games. In Proceedings of IEEE CSE IEEE, 2008. Conference 2009, volume 4, pages 1205–1210, 2009.