=Paper= {{Paper |id=None |storemode=property |title=To Trust, or not to Trust: Highlighting the Need for Data Provenance in Mobile Apps for Smart Cities |pdfUrl=https://ceur-ws.org/Vol-1075/07.pdf |volume=Vol-1075 |dblpUrl=https://dblp.org/rec/conf/immoa/EmaldiPLLVM13 }} ==To Trust, or not to Trust: Highlighting the Need for Data Provenance in Mobile Apps for Smart Cities== https://ceur-ws.org/Vol-1075/07.pdf
     To trust, or not to trust: Highlighting the need for data
          provenance in mobile apps for smart cities∗

                   Mikel Emaldi                            Oscar Peña                                Jon Lázaro
          Deusto Institute of Technology          Deusto Institute of Technology            Deusto Institute of Technology
                  - DeustoTech                            - DeustoTech                              - DeustoTech
            m.emaldi@deusto.es                     oscar.pena@deusto.es                         jlazaro@deusto.es
            Diego López-de-Ipiña                    Sacha Vanhecke                               Erik Mannens
          Deusto Institute of Technology            Ghent University - iMinds -               Ghent University - iMinds -
                  - DeustoTech                          Multimedia Lab                            Multimedia Lab
               dipina@deusto.es                  sacha.vanhecke@ugent.be erik.mannens@ugent.be

ABSTRACT                                                                  generated data can be found, like Urbanopoly [4], Urban-
The popularity of smartphones makes them the most suit-                   match [5] or popular mobile apps related to the 311 service
able devices to ensure access to services provided by smart               in cities like Calgary, Minneapolis, Baltimore or San Diego,
cities; furthermore, as one of the main features of the smart             all of them available in Google Play. The IES Cities project
cities is the participation of the citizens in their governance,          goes one step beyond, providing an entire architecture to fos-
it is not unusual that these citizens generate and share their            ter the development of urban apps based on Linked Open
own data through their smartphones. But, how can we know                  Data2 provided by government, through user-friendly JSON
if these data are reliable? How can identify if a given user              APIs. All of these works that manage user-generated data
and, consequently, the data generated by him/her, can be                  have the same worry about these data: are they reliable?
trusted? On this paper, we present how the IES Cities’                    How can we know if can a given user and, consequently, the
platform integrates the PROV Data Model and the related                   data generated by him/her can be trusted? Recently, the
PROV-O ontology, allowing the exchange of provenance in-                  W3C has created the PROV Data Model [14], for provenance
formation about user-generated data in the context of smart               interchange on the Web. This PROV Data Model describes
cities.                                                                   the entities, activities and people involved in the creation of
                                                                          a piece of data, allowing the consumer to evaluate the relia-
                                                                          bility of the data based on the their provenance information.
1. INTRODUCTION                                                           Furthermore, PROV was deliberately kept extensible, allow-
   According to the “Apps for Smart Cities Manifesto”1 ,                  ing various extended concepts and custom attributes to be
smart city applications could be sensible, connectable, acces-            used. For example, the Uncertainty Provenance (UP) [8] set
sible, ubiquitous, sociable, sharable and visible/augmented.              of attributes can be used to model the uncertainty of data,
It is not a coincidence that all of these features can be found           aggregated from heterogeneously divided trusted and un-
in a standard smartphone: the popularity of these devices                 trusted sources, or with varying confidence. On this paper,
makes them the most suitable to ensure access to the ser-                 we present how IES Cities’ platform integrates PROV Data
vices provided by smart cities. As one of the main features               Model and the related PROV-O ontology [13], allowing the
of the smart cities is the participation of the citizens in their         exchange of provenance information about user-generated
governance, it is not unusual that these citizens generate and            data in the context of smart cities. The final aim is to
share their own data through their smartphones. Reviewing                 enrich the knowledge gathered about a city not only with
the literature, some examples of apps that deal with user                 government-provided or networked sensors’ provided data,
∗This research is funded by project CIP-ICT-PSP-2012-6                    but also with high quality and trustable data coming from
                                                                          the citizens themselves.
“IES Cities: Internet Enabled Services for the Cities accross
Europe”, under “The Information and Communication                            The remaining of the paper is organized as follows: in
Technologies Policy Support Programme”.           More info               Section 2 the current state of the art on apps that deal with
at     http://ec.europa.eu/information_society/apps/                      user data in the context of smart cities is presented. Sec-
projects/factsheet/index.cfm?project_ref=325097                           tion 3 outlines the main concepts about IES Cities project.
1
  http://www.appsforsmartcities.com/?q=manifesto                          Sections 4 and 5 describe the semantic representation of the
                                                                          provenance through a use case and the metrics to calculate
                                                                          the reliability of the data, respectively. Finally, in Section 6
                                                                          the conclusions and the future work are presented.

                                                                          2.     RELATED WORK
                                                                             The following works can be highlighted regarding smart
                                                                          cities’ mobile applications. Urbanopoly [4] presents an app
                                                                          2
                                                                              http://linkeddata.org/


         Proceedings IMMoA’13                                       681          http://www.dbis.rwth-aachen.de/IMMoA2013/
for smartphones which combines Human Computation, gam-                    special meta-information about the data submitted by IES
ification and Linked Open Data to verify, correct and gather              Cities’ users. The idea that a single way of representing
data about tourism venues. To achieve this, Urbanopoly                    and collecting provenance could be internally adopted by
offers different games to the users, like quizzes, photo tak-             all systems does not seem to be realistic today, so the actual
ing contests, etc. Similar to Urbanopoly, Urbanmatch [5]                  approaches modelling their provenance information into a
can be found, a game in which the user takes photos about                 core data model, and applications that need to make sense
some tourism venues, in order to be published as Linked                   of provenance information can then import it, process it,
Open Data by the system. Another work that uses Hu-                       and reason over it [6].
man Computation for movie-related data curation is Linked                    In addition, when considering user-provided data mea-
Movie Quiz3 . In [3], the authors present csxPOI, an appli-               sures for data consolidation have to be considered. Contri-
cation that allows its users to collaboratively create, share,            butions from one user have to be cross-validated with con-
and modify semantically annotated POIs. These semantic                    tributions from other users in order to avoid information du-
POIs are modelled through a set of ontologies developed to                plication and foster validation of others’ data. Thus, data
fulfill this specific task; and published following the Linked            contributions from different users presenting spatial, linguis-
Open Data principles. csxPOI allows users to create custom                tic and semantic similarity should be clustered [2]. Before
ontology classes, modelling new POI categories, and to es-                a user contributes with new data, other user’s contributions
tablish subclass, superclass or equality relationships among              at nearby locations should be shown to avoid recreating al-
them. In addition to create new classes, users can link these             ready existing data and encourage additions and enhance-
categories to concepts extracted from DBpedia4 . In order to              ments to be applied to the existing data. After contributing
detect duplicate POIs, csxPOI clusters the available POIs                 with new data, the data providing user should be presented
with the aim of finding similarities among them.                          with earlier submitted similar contributions both in terms of
   As can be seen, the authors that work with user-generated              contents and location in order to confirm whether their new
Linked Open Data have to deal with duplication, missclasi-                contribution is actually a new contribution or it is amend-
fication, mismatching and data enrichment issues; and, as                 ing an earlier existing one. In essence, aids before and after
previously described, the end-user has arisen as the most                 editing new entries have to be provided and a two phase com-
important agent in smart cities’ environments. In the next                mit process for user provided data should be put in place
sections we explain how the IES Cities project uses the                   to ensure that contents of the highest quality are always
Provenance Data Model to represent provenance informa-                    added. Future work in IES Cities will tackle these issues by
tion about user-generated data.                                           providing REST interfaces to invoke services for clustering
                                                                          data entries and to retrieving related entries associated to a
                                                                          given one.
3. IES CITIES
   ‘IES Cities’5 , is the last iteration in a chain of inter-related
projects promoting user-centric and user-provided mobile                  4.    SEMANTIC REPRESENTATION OF PROVE-
services that exploit both Open Data and user-supplied data                     NANCE
in order to develop innovative services.                                     To illustrate the semantic representation of trust and prove-
   The project encourages the re-use of already deployed sen-             nance data through the Provenance Ontology, a use case is
sor networks in European cities and the existing Open Gov-                presented: 311 Bilbao. This app uses Linked Open Data to
ernment related datasets. It envisages smartphones as both                get an overview of reports addressing faults in public infras-
a sensors-full device and a browser with increasing compu-                tructures. From the data owner’s point of view, the enrich-
tational capabilities which is carried by almost every citizen.           ment of datasets carried out by third parties (such as users
   IES Cities’ main contribution is to design and implement               of the 311 Bilbao app), revealed two problems: 1) the fact
an open technological platform to encourage the develop-                  that data does not need to be approved before being pub-
ment of Linked Open Data based services, which will be later              lished and that there is no mechanism to control the amount
consumed by mobile applications. This platform will be                    of data a citizen can add and 2) there is still the need for
deployed in 4 different European cities: Zaragoza and Ma-                 a way to differentiate the default trustworthiness of the dif-
jadahonda (Spain), Bristol (United Kingdom), and Rovereto                 ferent authors such as citizens and city council’s staff. The
(Italy), providing citizens the opportunity to get the most               following code represents the provenance of a user-generated
out of their city’s data.                                                 report6 :
   Remarkably, IES Cities wants to analyse the impact that
                                                                      1   @prefix foaf : < http :// xmlns . com / foaf /0.1/ > .
citizens may have on improving, extending and enriching the 2             @prefix prov : < http :// www . w3 . org / ns / prov # > .
data these services will be based upon, as they will become 3             @prefix iesc : < http :// studwww . ugent . be /~ satvheck / IES /
leading actors of the new open data environment within the 4              schemas / iescities . owl > .
city. Nonetheless, the quality of the provided data may sig- 56           @prefix up :
                                                                          @prefix :
                                                                                          < http :// users . ugent . be /~ tdenies / up / > .
                                                                                          < http :// bilbao . iescities . org # > .
nificantly vary from one citizen to another, not to mention 7
the possibility of someone’s interest in populating the sys- 8            entity (: report_23456 , [ prov : value =" The paper bin is
                                                                      9   broken " ])
tem with fake data.                                                  10   wasG enerate dBy (: report_23456 , : r e p o r t A c t i v i t y _ 2 3 4 5 6 )
   Thus, the need for evaluating the value and trust of the 11            w as At tr i bu te dT o (: report_23456 , : jdoe )
user contributed data requires the inclusion of a validation 12           w a s I n v a l i d a t e d B y (: report_23456 , : invActivity_639 ,
module [12]. In other words, we should be able to express 13              2013 -07 -22 T03 :05:03)
                                                                    14
3
  http://lamboratory.com/hacks/ldmq/                                      6
                                                                           The provenance data is represented using Provenance No-
4
  http://dbpedia.org                                                      tation (PROV-N). More information at http://www.w3.
5
  http://iescities.eu                                                     org/TR/prov-n/


          Proceedings IMMoA’13                                      692         http://www.dbis.rwth-aachen.de/IMMoA2013/
15   activity (: reportActivity_23456 , 2013 -07 -22 T01 :01:01 ,                    where p is the measured property and n is the total number
16   2013 -07 -22 T01 :05:03)                                                        of measured properties. α is a value between 0 and 1 to
17   w a sAs soc iate dWi th (: reportActivity_23456 , : jdoe )
18                                                                                   denote the relevance of this property, making the measure
19   agent (: jdoe , [ prov : type = ’ prov : Person ’ , foaf : name =               based on a certain property more or less relevant. trustp is a
20   " John Doe " , foaf : mbox = ’ < mailto : jdoe@example . org > ’ ])             function that returns a value between 0 and 1 determining
21
22   entity (: report_23457 , [ prov : value =" It is incorrect ,
                                                                                     the trust of a given report according to a certain property.
23   another paper bin has replaced the old one , but 2                                 Both the α values and the trustp functions can be defined
24   meters beyond " ])                                                              by the developers using IES Cities platform, because both
25   w asAttributedTo (: report_23457 , : jane )
26   wasDerivedFrom (: report_23457 , : report_23456 ,
                                                                                     of them are dependant on the context and the need of the
27   : invActivity_639 , -, -, [ prov : type = ’ prov : Revision ’ ])                application domain.
28                                                                                      To clarify, we are using this model in the 311 Bilbao use
29   activity (: invActivity_639 , 2013 -07 -22 T02 :58:01 ,
                                                                                     case. To that end, we have selected the most relevant trust-
30   2013 -07 -22 T03 :04:47)
31   w a sAs soc iate dWi th (: invActivity_639 , : jane )                           properties concerning our use case:
32                                                                                      Authority: It refers to the fact that if a resource is cre-
33   agent (: jane , [ prov : type = ’ prov : Person ’ , foaf : name =               ated by an authority in a given context, this information
34   " Jane " , foaf : mbox = ’ < mailto : jane@bilbao . iescities . org > ’
35     ])                                                                            is more reliable. For our use case a basic function like the
36   a ctedOnBehalfOf (: jane , : b i l ba o _ c i t y _ c o un c i l )              following can be used:
37                                                                                                              
38   agent (: bilbao_city_concil , [ prov : type =                                                                0 if user 6= authority
39   ’ prov : Organization ’ , foaf : name =" Bilbao City Council "                            trustauthority =                                 (2)
                                                                                                                  1 if user = authority
40   ])
                                                                                     in which being authority can be checked with a SPARQL
       On this piece of semantic information the :report 23456
                                                                                     ASK query:
     resource represents the report made by the user. This re-
     port is identified by its own and unique URL and provides 1                     PREFIX prov : < http :// www . w3 . org / ns / prov # >
     information about the user that has made it and which 2                         ASK { : jane prov : ac te dO n Be ha lf O f : b i l b a o _ c i t y_ c on c il }

     activity that has generated this report (lines 8-13). The                          Popularity: The number of references and uses of a piece
     :reportActivity 23456 shows details about the activity that                     of information is a key aspect to determine its trust. In the
     generated the report, like when the user started reporting                      case of 311 Bilbao we measure the popularity of a report
     the issue and when it ended. At line 19 the information                         based on the number of visits that the report receives, with
     about “John Doe”, the user that reported the fault, can be                      the following formula:
     seen. In the example given, another user, Jane (lines 33-36),
                                                                                                                                visitsreport
     has revised the report made by John (lines 22-31). As the                                       trustpopularity =                                          (3)
     actedOnBehalfOf asserts, Jane is some kind of municipal                                                                 visitsopen reports
     worker of Bilbao City Council (line 38). As Jane’s report                       in which the number of visits of the report is normalized
     has more authority agains John’s report, John’s report is                       with the number of overall visits of opened reports at the
     invalidated as wasInvalidatedBy asserts. Allowing the se-                       moment.
     mantic descriptions of the provenance of the reports made                         Recommendation: Recommendation refers to impor-
     at 311 Bilbao app, the data generated by a concrete user                        tance that the ratings that other users gives to a given re-
     can be reached through SPARQL [15] language queries.                            source has in its trust. The function to measure the rele-
                                                                                     vance of user ratings can be as sophisticated as the developer
     5. PROVENANCE BASED RELIABILITY                                                 wants, but for our case we have selected a very naive and
        There exist some approaches on how to calculate trust in                     simple one, in which other users can vote the reports with
     semantic web using provenance information. IWTrust [16]                         +1 / -1 buttons and the trust value is calculed with this
     uses provenance in the trust component of an answering                          formula:
     engine, in which a trust value for answers is measured based                                                     positive votesreport
                                                                                              trustrecommendation =                             (4)
     on the trust in sources and in users. In [10] provenance                                                           total votesreport
     data is used to evaluate the reliability of users based on                         Provenance / Reputation: In this case, provenance
     trust relationships within a social network. [11] presents                      refers to the trust that the entities responsible for generat-
     an assesment method for evaluating the quality of data on                       ing a piece of information may transfer information itself. A
     the Web using provenance graphs, and provides a way to                          key aspect to measure the trust in a publisher is the reputa-
     calculate trust values based on timeliness. In [7] the authors                  tion. There exist many approaches to measure the reputa-
     propose generic procedures for computing reputation and                         tion of a user; some of them measure the reputation based
     trust assessments based on provenance information.                              on trust relationships between users [10], while some others
        In [9] the authors identify 19 parameters that affect how                    like [7] are based the historical evidence of each user. For
     users determine trust in content provided by web informa-                       the our use case, we propose using the three-step procedure
     tion sources, such as the authority of the creator of the in-                   presented in [7]. In the ‘evidence selection’ step every report
     formation or the popularity and recency of that informa-                        made by a given user are retrieved, in the ‘evidence weight-
     tion, among others. Based on these factors, we have built a                     ing’ step the recommendation trust function is executed for
     generic model for the measurement of a trust value in the                       every report, and in the last step all these trust values are
     context of IES Cities, in which the trust according to each                     aggregated through subjective logic to get the trustworthi-
     factor is calculated independently:                                             ness of a given user.
                         Pn
                           p=[auth,agree...] αp ∗ trustp (report)
                                                                                        Recency / Timeliness: Timeliness can be defined as
       trust(report) =                                            (1)                the the up-to-date degree of a data item in relation with the
                                             n

               Proceedings IMMoA’13                                            703         http://www.dbis.rwth-aachen.de/IMMoA2013/
task at hand. We propose and adaption of [11] formula to                       [4] I. Celino, D. Cerizza, S. Contessa, M. Corubolo,
measure timeliness, based on the work described in [1]:                            D. DellAglio, E. D. Valle, and S. Fumeo. Urbanopoly
                                    currency                                       – a social and location-based game with a purpose to
      trustauthority = (max(1 −                ), 0)sensitivity   (5)              crowdsource your urban data. In Privacy, Security,
                                    volatility
                                                                                   Risk and Trust, page 910–913, Amsterdam, 2012.
where currency is the difference between the time data is                      [5] I. Celino, S. Contessa, M. Corubolo, D. Dell’Aglio,
presented to the user and the time it was reported to the                          E. D. Valle, S. Fumeo, and T. Krüger. UrbanMatch -
system. Volatility refers to the maximum amount of time a                          linking and improving smart cities data. In C. Bizer,
given report time should be active (for example, if a broken                       T. Heath, T. Berners-Lee, and M. Hausenblas, editors,
street lamp is reported, it should be repaired within a month                      Linked Data on the Web, volume 937 of CEUR
at most), and sensitivity may change its value by observing                        Workshop Proceedings. CEUR-WS, 2012.
the updates made over the status of the report: it would                       [6] D. Ceolin, P. T. Groth, W. R. van Hage,
adopt a high value for data being constantly updated, and                          A. Nottamkandath, and W. Fokkink. Trust evaluation
a low value for data that does not change often.                                   through user reputation and provenance analysis. In
   Other trust factors: Apart from the aspects identified                          Uncertainty Reasoning for the Semantic Web, volume
in [9], the model is flexible enough to include other factors                      900, page 15–26. CEUR-WS, 2012.
affecting the trust. In the case of 311 Bilbao mobile app, the                 [7] D. Ceolin, P. T. Groth, W. R. van Hage,
geographical distance could be a key aspect of the truth, as                       A. Nottamkandath, and W. Fokkink. Trust evaluation
reports talking about events happening near to where the                           through user reputation and provenance analysis. In
user sends the report would be more reliable.                                      F. Bobillo, R. N. Carvalho, P. C. G. da Costa,
                                        1                                          C. d’Amato, N. Fanizzi, K. B. Laskey, K. J. Laskey,
    trustdistance =                                              (6)               T. Lukasiewicz, T. Martin, M. Nickles, and M. Pool,
                      geodistance(locreport , locreportedplace )
                                                                                   editors, URSW, volume 900 of CEUR Workshop
The function for the calculus of the geographical distance                         Proceedings, pages 15–26. CEUR-WS.org, 2012.
has as input the geographic coordinates of the report, re-                     [8] T. De Nies, S. Coppens, E. Mannens, and R. Van de
trieved from the smartphone GPS sensor, and the geographic                         Walle. Modeling uncertain provenance and provenance
coordinates of reported place, obtained with geolocation ser-                      of uncertainty in W3C PROV. In International World
vices like Nominatim7 .                                                            Wide Web Conference, page 167–168, Rio de Janeiro,
   After applying our model we will get a trust value between                      Brazil, 2013.
0 and 1, that could be inserted in the provenance graph                        [9] Y. Gil and D. Artz. Towards content trust of web
with a triple, assuming the confidence level was ‘0.6’, like                       resources. Web Semantics: Science, Services and
:report 23456 up:contentConfidence ‘0.6’ [8].                                      Agents on the World Wide Web, 5(4):227–239, 2007.
                                                                              [10] J. Golbeck. Combining provenance with trust in social
6. CONCLUSIONS AND FUTURE WORK                                                     networks for semantic web content filtering. In
   The proposed approach in this article will allow to eval-                       Provenance and Annotation of Data, pages 101–108.
uate the provenance of user-submitted data in IES Cities’                          Springer, 2006.
platform. The metrics proposed will measure data trust-                       [11] O. Hartig and J. Zhao. Using web data provenance for
worthiness level, providing an extra confidence layer in the                       quality assessment. In In: Proc. of the Workshop on
project’s framework. City council staff and platform admin-                        Semantic Web and Provenance Management at ISWC,
istrators will be able to query data quality through SPARQL                        2009.
queries, retrieving only those results with a confidence level                [12] O. Hartig and J. Zhao. Publishing and consuming
above a parameterised threshold.                                                   provenance metadata on the web of linked data. In
   The evaluation and validation of the proposed metrics                           D. L. McGuinness, J. R. Michaelis, and L. Moreau,
against other implementations following the PROV-O on-                             editors, Provenance and Annotation of Data and
tology will be left for a future iteration on IES Cities, ag-                      Processes, number 6378 in Lecture Notes in Computer
gregating other significant metrics should they improve the                        Science, pages 78–90. Springer Berlin Heidelberg, Jan.
provenance of the generated data.                                                  2010.
                                                                              [13] T. Lebo, S. Sahoo, D. McGuinness, K. Belhajjame,
7. REFERENCES                                                                      J. Cheney, D. Corsar, D. Garijo, S. Soiland-Reyes,
                                                                                   S. Zednik, and J. Zhao. Prov-o: The prov ontology.
    [1] D. Ballou, R. Wang, H. Pazer, and G. K. Tayi.                              W3C Recommendation, http://www. w3.
        Modeling information manufacturing systems to                              org/TR/prov-o/(accessed 30 Apr 2013), 2013.
        determine information product quality. pages 462–484,                 [14] L. Moreau, P. Missier, K. Belhajjame, R. B’Far,
        1998.                                                                      J. Cheney, S. Coppens, S. Cresswell, Y. Gil, P. Groth,
    [2] M. Braun, A. Scherp, and S. Staab. Collaborative                           G. Klyne, et al. Prov-dm: The prov data model.
        semantic points of interests. In The Semantic Web:                         Candidate Recommendation, 2012.
        Research and Applications, page 365–369. Springer,                    [15] E. Prud’hommeaux and A. Seaborne. SPARQL query
        2010.                                                                      language for RDF, 2008.
    [3] M. Braun, A. Scherp, S. Staab, et al. Collaborative                   [16] I. Zaihrayeu, P. P. Da Silva, and D. L. McGuinness.
        creation of semantic points of interest as linked data                     Iwtrust: Improving user trust in answers from the
        on the mobile phone. 2007.                                                 web. In Trust Management, pages 384–392. Springer,
7
    http://wiki.openstreetmap.org/wiki/Nominatim                                   2005.


           Proceedings IMMoA’13                                         714       http://www.dbis.rwth-aachen.de/IMMoA2013/