=Paper= {{Paper |id=None |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-932/paper3.pdf |volume=Vol-932 |dblpUrl=https://dblp.org/rec/conf/i-semantics/PattanaphanchaiOH12 }} ==None== https://ceur-ws.org/Vol-932/paper3.pdf
             Proceedings of the I-SEMANTICS 2012 Posters & Demonstrations Track, pp. 12-16, 2012.
             Copyright © 2012 for the individual papers by the papers' authors. Copying permitted only
             for private and academic purposes. This volume is published and copyrighted by its editors.




HETWIN: Helping Evaluate the Trustworthiness
 of Web Information for Web Users Framework
       using Semantic Web Technologies

          Jarutas Pattanaphanchai, Kieron O’Hara, and Wendy Hall

    Electronics and Computer Science, Faculty of Physical and Applied Science,
                           University of Southampton,
                    Southampton, SO17 1BJ, United Kingdom
                       {jp11g09, kmo, wh}@ecs.soton.ac.uk


      Abstract. Assessing the trustworthiness of information found on the
      Web is challenging because of two factors. First, there is a little con-
      trol over publishing quality. Second, Web users have little information
      available on which to base judgment of the trustworthiness of Web infor-
      mation while they are interacting with it. This work addresses this prob-
      lem by collecting and presenting metadata about authority, currency,
      accuracy and relevance to evaluate the trustworthiness of Web informa-
      tion during information seeking processes. In this poster, we propose the
      HETWIN application framework and present a design as a prototype
      tool that employs this framework for academic publications.

      Keywords: Trust, Credibility, Information Quality, Semantic Web


1   Introduction
It is known that ordinary Web users base decisions on whether to trust informa-
tion on the Web on heuristic factors (pertaining to its presentation and layout).
However, as these heuristic factors are mainly based on surface level character-
istics of the Web page, and such characteristics are easily disguised, Web users
can arrive in the wrong conclusions about the trustworthiness of information
they consume [3]. However, a number of studies have suggested that additional
information such as the identity of the author (e.g. name, position), and the
expertise of the author could potentially increase the Web users’ confidence and
help them to make better assessments than by using their own heuristic crite-
ria alone [2, 4–6]. In particular, Bizer et.al. [1] proposed the TriQL.P browser, a
RDF browser, that presents recommended RDF datasets that should be trusted
based on trust policies. However, in their work, the user needs to go to a certain
Web page, from which the browser can extract Semantic Web content. On the
contrary, it is more useful to provide Web users with a tool with which they can
look for the information they need while it automatically gathers the supportive
information to help them evaluate the trustworthiness of Web information.
    In order to address this problem, we propose a framework to help users
evaluate the trustworthiness of Web information, called HETWIN, which, with




                                                           12
HETWIN: Helping Evaluate the Trustworthiness of Web Information for Web Users

 information of the factors from the studies above, are selected to use in the frame-
 work. In addition, we propose a prototype tool, which employs the HETWIN
 framework implemented as a chrome extension. The prototype collects metadata
 using Semantic Web technologies and presents it in a useful way in the context of
 the users’ search for information. In the following section, we explain the HET-
 WIN architecture and display an example result from our prototype. Then, we
 present planned future work.


 2      Helping Evaluate the Trustworthiness of Web
        Information for Web Users Framework
 Our framework uses Semantic Web technologies to collect RDF data which is
 published alongside Websites and queried from SPARQL endpoints, which it
 then integrates to build metadata graphs. Then, these metadata graphs are
 used to create supportive data, which is presented to users in order to help them
 evaluate the trustworthiness of Web information. In this work, we assume that
 the RDF data on the Web or in the data store is accurate. Evaluation is based
 on a case study of the ePrints of the University of Southampton1 , which is an
 online repository of academic publications, in which the accuracy of the RDF
 data published is verified by authorized staff2 . Our application framework, as
 shown in Figure 1, consists of three main modules.
  – The input module accepts the user’s search keywords and the domain of
    interest, which affects the type of information returned by the search. In this
    work, we defined four domains of interest (business, informational, news and
    personal). The input module extracts any RDF linked to from the web page.
    Also, our model evaluates the trustworthiness of the information every time
    the user interacts with the system. Therefore, the system obtains the most
    recent information at the time at which the evaluation is performed.
  – The trustworthiness criteria and metacollection module is composed
    of two main components. The trustworthiness criteria comprises of four basic
    criteria: authority, currency, accuracy, and relevance, which the assessment
    of trustworthiness in each domain of interest is based. Each criterion provides
    the basic predicate keys of RDF that should be used to collect metadata.
    For instance, in the informational domain, trustworthiness is evaluated based
    on the authority criterion, using predicate key, “dct:creator”, the currency
    criterion, using predicate key, “dct:date”, the accuracy criterion which is
    based on the predicate key, “bibo:status”, and the relevance criterion which
    is based on data returned from querying using the predicate key, “dct:title”
    and “dct:abstract”. Alternatively, in the news domain, which still evaluates
    the trustworthiness of the information based on the same criteria, different or
    additional predicate keys might be used. For example, the authority criterion
    might use the predicate key “dct:publisher” in addition to the “dct:creator”.
  1
      http://www.eprints.soton.ac.uk
  2
      http://www.southampton.ac.uk/library/research/eprints/policies/eprints.html




                                            13
HETWIN: Helping Evaluate the Trustworthiness of Web Information for Web Users

    Our framework allows one to add additional predicate keys or new domains
    by adding them into its configuration file. Therefore, the framework can
    adjust for use in different domains and can extend to new domains.
    The metacollection component gathers metadata based on the predicates
    which are defined in the trustworthiness criteria. The collected metadata
    will be aggregated in order to build metadata graph. The basic approach
    of aggregating metadata assumes that the metadata from the four basic
    predicates have the same level of important for assessing the trustworthiness
    of Web information. In the case that the system needs the additional data,
    the system will add the additional data into the metadata graph after the
    basic metadata has been added.
  – The output module displays the metadata graph in a human readable
    format to help the users assess the trustworthiness of Web information. In
    addition, it orders the results based on the relevance of the information to
    the user’s query which is computed based on the frequent of the appearance
    of search terms in the title and the abstract and the expertise of the authors
    or creators of the information.




                           Fig. 1. HETWIN architecture




 3   Results
 We present example results of the output of our tool, using data from ePrints
 at University of Southampton. Specifically, we consider the publications of the
 School of Electronics and Computer Science, and we focus our evaluation on
 the informational domain. The result in Figure 2 displays the identifying details
 of the publication including its title, its abstract and the name of its authors.




                                         14
HETWIN: Helping Evaluate the Trustworthiness of Web Information for Web Users

 Moreover, the results are ordered by the relevance of the information to the user’s
 interests. Specifically, it shows the authors’ full names and also their appellations.
 These can indicate the authority of the author, which represents their reputation
 in producing this content. In addition, it displays the detail of each author’s
 publications, which itself is indicative of the author’s expertise in the area. For
 example, if the author has several publications that relate to privacy, this implies
 that the author is not only interested in that area but also has expertise in
 it. In addition, the more publications that exist by that author that relate to
 the keywords, the more likely the author is to be an expert in that topic. The
 publication date indicates the currency of the publication. The system gathers
 the status of the publication for determining the accuracy of the information
 within it because, in the case of publications, there is a review process, which
 can help to evaluate the accuracy of the content. For example, if the publication
 has been peer-reviewed or published in an academic publication, it is likely to
 be accurate, and therefore trustworthy. In current work, the relevance criterion
 analyses the abstract and the title of the publication. If they contain the user’s
 search keywords, the document is more likely to meet the user’s needs. However,
 we consider another potential approach to evaluate relevance more efficiently
 than matching exact keywords; adopting an ontology concept for finding terms
 related to the user’s keywords to match in key areas of the content such as the
 title or the first paragraph of content. This allows the framework match the
 relevant information to user’s needs better.




 Fig. 2. Example of results using “privacy” as keyword and “informational” as a domain
 (the four criteria are shown in bold)




                                           15
HETWIN: Helping Evaluate the Trustworthiness of Web Information for Web Users

 4    Conclusions
 We proposed an application framework and prototype tool which helps users to
 evaluate the trustworthiness of Web information using Semantic Web technolo-
 gies. The result from the prototype shows the supportive data in each criteria,
 which is explained to the user and can help them assess the trustworthiness of
 Web information. In future work, we will evaluate our framework by conducting
 a user survey. This survey will elicit information about how satisfied the users
 were with the system and how their approach to assessing trust has changed
 since using our system in comparison to an expert.


 References
 1. Bizer, C., Cyganiak, R., Gauss, T., Maresch, O.: The TriQL. P browser: Filtering
    information using context-, content-and rating-based trust policies. In: Proc. of the
    Semantic Web and Policy Workshop. vol. 7, pp. 12–20 (2005)
 2. Fogg, B., Marshall, J., Laraki, O., Osipovich, A.: What makes Web sites credible?:
    a report on a large quantitative study. Proceedings of the SIGCHI Conference on
    Human Factors in Computing Systems pp. 61–68 (2001)
 3. Fogg, B., Soohoo, C., Danielson, D.R., Marable, L., Standford, J., Tauber, E.R.: How
    do users evaluate the credibility of Web sites?: a study with over 2,500 participants.
    Proc. of the 2003 conference on Designing for user experiences pp. 1–15 (2003)
 4. Rieh, S., Belkin, N.: Understanding judgment of information quality and cognitive
    authority in the WWW. In: the 61th annual meeting of the Am. Soc. for Inf. Sci.
    and Technol. vol. 35, pp. 279–89 (1998)
 5. Tate, M.: Web Wisdom: How To Evaluate and Create Information Quality on the
    Web, Second Edition. CRC Press (2009)
 6. Wathen, C.N., Burkell, J.: Believe it or not: Factors influencing credibility on the
    Web. Journal of the Am. Soc. for Inf. Sci. and Technol. 53(2), 134–144 (2002)




                                             16