=Paper= {{Paper |id=Vol-417/paper-6 |storemode=property |title=select * where { :I :trust :you } |pdfUrl=https://ceur-ws.org/Vol-417/paper6.pdf |volume=Vol-417 |dblpUrl=https://dblp.org/rec/conf/samt/HalbH08 }} ==select * where { :I :trust :you }== https://ceur-ws.org/Vol-417/paper6.pdf
        select * where { :I :trust :you }
                 How to Trust Interlinked Multimedia Data

                         Wolfgang Halb and Michael Hausenblas

                Institute of Information Systems & Information Management,
               JOANNEUM RESEARCH, Steyrergasse 17, 8010 Graz, Austria
                         firstname.lastname@joanneum.at




       Abstract. Finding, accessing and consuming multimedia content on the Web is
       still a challenge. In this position statement we discuss three still widely neglected
       issues arising when one is interacting with multimedia content in social media
       environments: provenance, trust, and privacy. We will introduce a generic model
       allowing us to identify potential risks and problems, further discuss this model
       regarding multimedia content and finally outline how Semantic Web technologies
       can help.



1   Motivation

It is a trivial truth that, in order to use any kind of content or service on the Web, one
must know how to access it (that is, to know the URI). What is true for the Web of
Documents is equally true for the Web of Data. While with the rise of linked data [1,
2] the situation has changed—publishing and consuming data is now possible straight
forward—there are still a number of issues in the discovery process. For example, with
our multimedia interlinking demonstrator CaMiCatzee [3] we have identified issues
around trust and believe of information regarding linked data in general and multime-
dia content in special. CaMiCatzee allows the FOAF-profile-based search for person
depictions on flickr. However, when looking at Fig. 1, how can we find out if one of the
depicted persons actually is Sir Tim Berners-Lee?




                          Fig. 1. Exemplary result from CaMiCatzee.


                                               59
select * where { :I :trust :you }


     Motivated by this observation we will address—from a practical point of view—
 widely neglected issues arising when one is interacting with multimedia content in so-
 cial platforms:

     – Provenance. Where does content stem from? Who provided annotations?
     – Trust. Is a person that provided an annotation trustworthy? Is the interlinking eli-
       gible?
     – Privacy. When interacting with content—what are the consequences?

 In the following, we will first introduce a generic model addressing the three above
 listed issues. Based on our experience with interlinked multimedia data we will have a
 detailed look at the consequences when this model is applied to multimedia content.


 2     The Abstract Provenance-Trust-Privacy Model

 In order to identify issues with the usage of content on the Web, we have developed the
 “Provenance-Trust-Privacy” (PTP) model (Fig. 2). Basically, two aspects are covered by
 this model: the real life and the online world, the Web. In the PTP model we deal with
 three orthogonal, nevertheless interdependent dimensions, (i) the social dimension, (ii)
 the interaction dimension, and (iii) the content dimension.




                                Fig. 2. The abstract PTP model.




                                             60
select * where { :I :trust :you }


 The Social Dimension. The dotted arrows in Fig. 2 represent social relations between
 humans, either in the real life or online. While it is straight-forward to deal with the
 case when people know each other in real life (and maybe continue this relation in the
 online world), the other way round can cause substantial problems. For example, only
 because I have added someone to my “buddy list” on a social platform such as LinkedIn
 does not mean that I really know the person and that this person also (wants to) know
 me.

 The Interaction Dimension. In the realm of the PTP model we understand the interac-
 tion dimension of taking place online-only. Again, referring to Fig. 2, everything that
 happens between the user-agent (being instructed by a human) and other participants
 (server, etc.) on the Web. In general, two interaction patterns for the discovery and ac-
 cess of resources on the Web can be observed:

   – A direct access of the content. In this case the URI of the content is known in
     advance by the end-user who instructs her user-agent to access the content. The URI
     may originally stem from a newspaper advertisement, a friend may have pointed it
     out in an e-mail, etc.
   – An indirect access of the content by means of consulting an intermediate such as
     a search engine, a recommendation system, etc.

 Based on these two interaction patterns, four possible paths can be identified:

   1. The user-agent, equipped with the end-user’s profile and her desire consults an
      intermediate. For example, a user enters a search string into Google and is presented
      a list of URIs. The end-user happens to select a trustworthy source.
   2. The user-agent, equipped with an URI from the end-user, accesses a trustworthy
      source.
   3. The user-agent, equipped with the end-user’s profile and her desire consults an
      intermediate. This time, the end-user happens to select a troublesome source.
   4. The user-agent, equipped with an URI from the end-user, accesses a troublesome
      source.

     Obviously, the first two situations are desirable. The end-user has—for example
 based on previous experiences or trust in the search results—found some content that
 she can use and which is not causing her troubles (a virus, Trojan horse, etc.). However,
 equally, we are after avoiding the latter two cases where the end-user actually finds
 herself using content and/or services that are harmful and/or violate her privacy.

 The Content Dimension. Regarding the content dimension one generally can state that
 the more is known about the content, the easier it is to assess its usefulness and its capa-
 bilities regarding a potential damage. Wherever possible, we are after self-descriptive
 resources, that is we require a minimum level of metadata being available. In our case,
 we focus on multimedia content. In the next section we will therefore initially discuss
 this kind of content and along the metadata in greater detail.


                                             61
select * where { :I :trust :you }


 3      Multimedia Content in the Provenance-Trust-Privacy Model
 In the position paper at hand we focus on multimedia content. We will in the follow-
 ing discuss characteristics of spatio-temporal multimedia content in the context of the
 emerging interlinking multimedia effort1 . Further, we apply the above introduced PTP
 model to multimedia content and try to derive requirements for it.

 Characteristics of Multimedia Content and Multimedia Metadata. Multimedia content
 has some specific characteristics that allow and/or request special treatment. We have
 reported on this in great detail elsewhere [4]. A basic observation, however, with impact
 on many parts of the interaction process is that with multimedia content we are dealing
 almost always with spatio-temporal dimensions.
     From the prosumers point-of-view, multimedia content is cheap to produce and
 available in high volumes (mobile phones, etc.). Further, most of the current content
 in that regard is publicly and freely available (Flickr, youtube). Business models remain
 vague. On the other hand, for professionally created content for very specific domains
 such as broadcaster’s archives, adult content, etc. the fees are considerably high.
     Multimedia content is in general good for consumption in mobile environments (as
 opposed to reading longish text on a mobile).
     In general it is hard and expensive to create good and detailed multimedia content
 descriptions (for example in MPEG-7, etc.). This leads to a problem regarding the fine-
 grained search and automated summaries.
     In Table 1 the above discussed characteristics are summarised, and weighted regard-
 ing the content itself on the one hand and the metadata on the other hand.


       Issue            Content        Metadata      Remark
       production (pro- ++             -             easy to produce high-volume con-
       sumers)                                       tent (e.g., mobile phone)
       production (pro- ++            -              esp. high-level semantic content de-
       fessionals)                                   scriptions expensive
       consumption      ++            -              easy to consume (also in mobile en-
                                                     vironments)
       search          -              --             practically, only global descriptions
                                                     are available
       summaries       --             --             little automation possible
                   Table 1. Overview on Multimedia Content Characteristics.




 Applying the PTP Model to Multimedia Content. With the above listed characteristics
 of multimedia content in mind we claim the following regarding the application of the
 PTP model:
     – Any solution addressing PTP issues must at least avoid accessing “bad” content
       and should support the discovery and consumption of “good” content.
   1
       http://www.interlinkingmultimedia.info/


                                             62
select * where { :I :trust :you }


     – Existing and deployed multimedia metadata formats (such as Exif, ID3, etc.) have
       to be taken into account.
     – The solution at hand needs to scale to the size of the Web.
     – It must be practically relevant in terms of availability in widely used platforms such
       as Drupal, MediaWiki, etc. (for example as a plug-in, etc.; however it needs to be
       integrated).
     – Provider must be able to easily offer and administer it (e.g., “enable” it with little
       configuration effort).
     – Consumer must be able to use it in a non-disruptive way, for example as a part of
       their everyday tools.

 In the next section we will report on how Semantic Web technologies can be used in
 combination with other, deployed technologies (such as for identification and authenti-
 cation) in order to address the above listed requirements.


 4      How Semantic Web Technology Can Help

 We strongly believe that Semantic Web technologies can help to realise a PTP model for
 multimedia content. Lots of research is already available2 , however with little practical
 impact to this end. Apart from avoiding unreliable or even malicious content, the main
 aim of applying PTP to multimedia content is to help the user in finding trustworthy
 information sources. In a first step, we consider all content created by a trusted person
 or authority to be trustworthy. Solving this issue implies that there need to be techniques
 that can ensure a content’s provenance and the content producer’s identity respectively.
 Consequently means have to be made available that can decide which person to trust or
 not.
      In the case of multimedia content it also has to be taken into account that informa-
 tion associated with a single content item can potentially have a multitude of contribu-
 tors. A photo along with some metadata (title, description) on Flickr for instance might
 be uploaded by the fictitious trusted user “T. Rustworthy”. Another user,“B. Adguy”,
 could add a fake note about who is depicted in the photo. When accessing the photo and
 the associated metadata it is thus not sufficient to only consider who contributed the im-
 age but we would also need to be able to figure out who contributed the metadata about
 it. Taking this further to video content it might also be relevant to take into account who
 contributed which parts of a video (consider, for example, advertisements inserted into
 a video stream).
      In the following we will discuss already available technologies that may be able
 to address the PTP issues along the three identified dimensions. However, to date only
 isolated solutions exist and there is still a lack of systems that incorporate all avail-
 able technologies for the user’s benefit. We envision a framework that would allow to
 combine the below listed technologies and develop plug-ins for widely used platforms
 (Flickr, Youtube, etc.) and systems (Drupal, etc.).
   2
       http://www4.wiwiss.fu-berlin.de/bizer/SWTSGuide/


                                              63
select * where { :I :trust :you }


 Social Dimension. For the identification as well as for the authentication several tech-
 nologies are available. A user can for example provide her OpenID3 to identify against a
 service. Further, OAuth4 can be used for publishing and interacting with protected data.
 Big players such as Google are already offering support for the above mentioned tech-
 nologies5 . It seems advisable to build on this and contemplate what might be missing
 to align it with the Web of Data, being based on RDF [5].
     While FOAF-based white listing and other related approaches [6, 7] are available,
 there is still a need for an up-front agreed way to deploy it in widely used systems. The
 same issue can be observed with privacy: there are proposals on the table (for example
 P3P6 ) but no measurable uptake in terms of users, systems that offer it, etc. can be
 stated.

 Interaction Dimension. Especially for data provenance it seems to us that named graphs [8]
 offer a solid and scalable solution. With the rise of RDFa7 one can think of new prove-
 nance mechanisms, as the hosting document can actually be understood as the “name”
 of the graph. Just imagine Flickr (already offering licensing information in RDFa) to
 include provenance information on both the content and the metadata, based on vocab-
 ularies such as the “Semantic Web Publishing Vocabulary” [9, Chapter 6]. Finally, we
 note that regarding the discovery and usage of linked (multimedia) data we are cur-
 rently working on VoiD, the “Vocabulary of Interlinked Datasets”8 —again, provenance
 and trust issues are in scope, here.

 Content dimension. In our understanding, the content dimension of the PTP model re-
 quires the most attention. Basic mechanisms proposed to represent the type of multime-
 dia content in RDF9 are available. Still, practical ways for creating and consuming rich
 multimedia content descriptions are missing. Recently, we have proposed ramm.x [10]
 which allows to use existing multimedia metadata formats such as MPEG-7, Exif, ID3,
 etc. in the Web of Data. However, we expect a fair amount of further research being re-
 quired to address provenance and trust issues properly and make tools and applications
 available in a real-world setup.


 References
   1. C. Bizer, T. Heath, D. Ayers, and Y. Raimond. Interlinking Open Data on the Web (Poster).
      In 4th European Semantic Web Conference (ESWC2007), pages 802–815, 2007.
   2. C. Bizer, R. Cyganiak, and T. Heath. How to Publish Linked Data on the Web. http://
      sites.wiwiss.fu-berlin.de/suhl/bizer/pub/LinkedDataTutorial/,
      2007.
   3
     http://openid.net/
   4
     http://oauth.net/
   5
     http://googledataapis.blogspot.com/2008/06/
     oauth-for-google-data-apis.html
   6
     http://www.w3.org/P3P/
   7
     http://www.w3.org/TR/xhtml-rdfa-primer/
   8
     http://community.linkeddata.org/MediaWiki/index.php?VoiD
   9
     http://www.w3.org/TR/Content-in-RDF/


                                              64
select * where { :I :trust :you }


  3. M. Hausenblas and W. Halb. Interlinking Multimedia Data. In Linking Open Data Tripli-
     fication Challenge at the International Conference on Semantic Systems (I-Semantics08),
     2008.
  4. T. Bürger and M. Hausenblas. Why Real-World Multimedia Assets Fail to Enter the Se-
     mantic Web. In International Workshop on Semantic Authoring, Annotation and Knowledge
     Markup (SAAKM07), Whistler, Canada, 2007.
  5. G. Klyne, J. J. Carroll, and B. McBride. RDF/XML Syntax Specification (Revised). W3C
     Recommendation, RDF Core Working Group, 2004.
  6. J. Golbeck. Combining Provenance with Trust in Social Networks for Semantic Web Content
     Filtering. In Provenance and Annotation of Data, International Provenance and Annotation
     Workshop, IPAW 2006, Chicago, IL, USA, May 3-5, 2006, Revised Selected Papers, volume
     4145 of Lecture Notes in Computer Science, pages 101–108. Springer, 2006.
  7. A. Harth, A. Polleres, and S. Decker. Towards a social provenance model for the Web. In
     Workshop on Principles of Provenance (PrOPr), Edinburgh, Scotland, 2007.
  8. J. J. Carroll, C. Bizer, P. Hayes, and P. Stickler. Named Graphs, Provenance and Trust. In
     In WWW 05: Proceedings of the 14th international conference on World Wide Web, pages
     613–622. ACM Press, 2005.
  9. C. Bizer. Quality-Driven Information Filtering in the Context of Web-Based Information
     Systems. PhD thesis, Freie Universität Berlin, 2007.
 10. M. Hausenblas, W. Bailer, T. Bürger, and R. Troncy. Deploying Multimedia Metadata on the
     Semantic Web (Poster). In 2nd International Conference on Semantics And digital Media
     Technologies (SAMT 07), 2007.




                                              65