=Paper=
{{Paper
|id=None
|storemode=property
|title=Semantic Representation of Provenance in Wikipedia
|pdfUrl=https://ceur-ws.org/Vol-670/paper_7.pdf
|volume=Vol-670
|dblpUrl=https://dblp.org/rec/conf/semweb/OrlandiCP10
}}
==Semantic Representation of Provenance in Wikipedia==
<pdf width="1500px">https://ceur-ws.org/Vol-670/paper_7.pdf</pdf>
<pre>
                                    Semantic Representation of
                                     Provenance in Wikipedia
            Fabrizio Orlandi                               Pierre-Antoine Champin                                     Alexandre Passant
 Digital Enterprise Research Institute LIRIS, Université de Lyon, CNRS, UMR5205 Digital Enterprise Research Institute
National University of Ireland, Galway Université Claude Bernard Lyon 1, F-69622 National University of Ireland, Galway
            Galway, Ireland                         Villeurbanne, France                     Galway, Ireland
       fabrizio.orlandi@deri.org                  pchampin@liris.cnrs.fr               alexandre.passant@deri.org


   Abstract—Wikis are often considered as being a wide source of                In the next section, we discuss some related work in the
information. However, identifying provenance information about               realm of provenance management on the Semantic Web. Then,
their content is crucial, whether it is for computing trust in               we give some background information regarding SIOC and
public wiki pages or to identify experts in corporate wikis. In this
paper, we address this issue by providing a lightweight ontology             various extensions used in our work. In Section IV, we
for provenance management in wikis, based on the W7 model.                   present the W7 theory and the lightweight ontology we have
Furthermore, we showcase the use of our model in a framework                 built to represent it in RDFS. We then describe our software
that computes provenance information in Wikipedia, also using                architecture and how we compute provenance information in
DBpedia to compute provenance and contribution information                   Wikipedia and finally present the user-interface to access this
per category, and not only per page.
                                                                             information, before concluding the paper.
                         I. I NTRODUCTION                                                              II. R ELATED W ORK
                                                                                The representation and extraction of provenance informa-
   From public encyclopedia to corporate knowledge man-
                                                                             tion is not a recent research topic. Many studies have been
agement tools, wikis are often considered as being a wide
                                                                             conducted for representing provenance of data [15], but few of
source of information. Yet, since wikis generally offer an open
                                                                             them have been focused on integrating provenance information
publishing process where everyone can contribute, identifying
                                                                             into the Web of data [6]. Providing this information as RDF
provenance information in their pages is an important require-
                                                                             would make provenance meta-data more transparent and inter-
ment. In particular this information can be used to identify
                                                                             linked with other sources, and it would also offer new scenar-
trust values for pages or pages fragments [2] as well as for
                                                                             ios on evaluating trust and data quality on the top of it. In this
identifying experts based on the number of contributions [9]
                                                                             regard a W3C Provenance Incubator Group2 has been recently
and other criteria such as the users’ social graphs [10] etc.
                                                                             established. The mission of the group is to “provide a state-
By providing this information as RDF [6], provenance meta-
                                                                             of-the art understanding and develop a roadmap in the area of
data becomes more transparent and offers new opportunities
                                                                             provenance for Semantic Web technologies, development, and
for the previous use-cases, as well as letting people link to
                                                                             possible standardization”. Requirements for provenance on the
provenance information from other sources, and personalizing
                                                                             Web3 , as well as several use cases and technical requirements
trust metrics based on the trust they have to a person regarding
                                                                             have been provided by the working group. A comprehensive
a particular topic [5].
                                                                             analysis of approaches and methodologies for publishing and
   This paper describes three of our contributions to address
                                                                             consuming provenance metadata on the Web is exposed in [7].
this issue and make provenance information in MediaWiki-
                                                                                Another research topic relevant to our work is the evaluation
powered wikis 1 available on the Semantic Web:
                                                                             of trust and data quality in wikis. Recent studies proposed
 1) a lightweight ontology to represent provenance informa-                  several different algorithms for wikis that would automatically
    tion in wikis, based on the W7 theory [13] and using                     calculate users’ contributions and evaluate their quantity and
    SIOC and its extensions;                                                 quality in order to study the authors’ behavior, produce trust
 2) a software architecture to extract and model provenance                  measures of the articles and find experts. WikiTrust [2] is a
    information about Wikipedia pages and categories, using                  project aimed at measuring the quality of author contributions
    the aforementioned ontology;                                             on Wikipedia. They developed a tool that computes the origin
 3) a user-interface to make this information openly available               and author of every word on a wiki page, as well as “a
    on the Web, both to human and software agents and                        measure of text trust that indicates the extent with which text
    directly within Wikipedia pages.                                         has been revised”4 . On the same topic other researchers tried
                                                                               2 established in September 2009. http://www.w3.org/2005/Incubator/prov/
  This work is funded by the Science Foundation Ireland under grant number
SFI/08/CE/I1380 (Lı́on 2) and by an IRCSET scholarship.                        3 http://www.w3.org/2005/Incubator/prov/wiki/User Requirements
  1 MediaWiki is the wiki engine that powers Wikipedia – www.mediawiki.org     4 WikiTrust: http://wikitrust.soe.ucsc.edu/
to solve the problem of evaluating articles’ quality, not only     as creates, modifies, uses, etc. Besides the SIOC
examining quantitatively the users’ history [9], but also using    ontology, SIOC-actions relies on the vocabulary for Linking
social network analysis techniques [10].                           Open Descriptions of Events (LODE)7 . The core of the module
   From our perspective, there is a need of publishing prove-      is the Action class (subclass of event:Event from the
nance information as Linked Data from websites hosting a           Event Ontology) which is a timestamped event involving an
wide source of information (such as Wikipedia). Yet, most          agent (e.g. a UserAccount) and a number of digital artifacts
of the work on provenance of data is, either not focused on        (e.g. Items). For more details about SIOC Actions and its
integrating the information generated on the Web of data,          implementation see the following Sec. IV.
or mainly based on provenance for resource descriptions or
                                                                    IV. R EPRESENTING THE W7 MODEL USING RDFS/OWL
already structured data. On the other hand, the interesting work
done so far on analyzing trust and quality on wikis does not          The W7 model is an ontological model created to describe
take into account the importance of making the information         the semantics of data provenance [13]. It is a conceptual model
extracted available on the Web of data.                            and to the best of our knowledge a RDFS/OWL representation
                                                                   of this model has not been implemented yet. Hence we will
                              III. BACKGROUND                      focus on an implementation of this model for the specific
A. Using SIOC for wiki modelling                                   context of wikis. As a comparison, in [14] the authors use
                                                                   the example of Wikipedia to illustrate theoretically how their
   The SIOC Ontology — Semantically-Interlinked Online
                                                                   proposed W7 model can capture domain or application specific
Communities [1] — provides a model for representing online
                                                                   provenance.
communities and their contributions5 . It is mainly centered
                                                                      The W7 model is based on the Bunge’s Ontology [3],
around the concepts of users, items and containers, so it can be
                                                                   furthermore it is built on the concept of tracking the history of
used to model content created by a particular user on several
                                                                   the events affecting the status of things during their life cycle.
platforms, enabling a distributed perspective to the manage-
                                                                   In this particular case we consider the data life cycle. The
ment of User-Generated Content on the Web. In particular, the
                                                                   Bunge’s ontology, developed in 1977, is considered as one of
atomic elements of the Web applications described by SIOC
                                                                   the main sources of constructs to model real systems and infor-
are called Items. They are grouped in Containers, that
                                                                   mation systems. Since the Bunge’s work is a theoretical work,
can themselves be contained in other Containers. Finally,
                                                                   there has been some effort from the scientific community to
every Container belongs to a Space. As an example,
                                                                   translate his work into machine readable ontologies8 .
a Site (subclass of Space) may contain a number of
                                                                      The W7 model represents data provenance using seven
Wikis (subclass of Container) and every Wiki contains
                                                                   fundamental elements or interrogative words: what, when,
a set of WikiArticles (subclass of Item) generated by
                                                                   where, how, who, which, and why. It has been purposely built
UserAccounts. For more details about SIOC, we invite the
                                                                   with general and extensible principles, hence it is possible to
reader to consult the W3C Member Submission [1] and its
                                                                   capture provenance semantics for data in different domains.
online specification6 .
                                                                   We refer to [13] for a detailed description of the mappings
   While the SIOC Types module provides several sub-
                                                                   between W7 and Bunge’s models, and in Table I we provide
classes of Container and Item, including Wiki and
                                                                   a summary of the W7 elements (as in [14]).
WikiArticle, some characteristics of wikis required further
                                                                   Looking at the structure of the W7 model it is clear the
modelling. Hence, in our previous work [11] we extended the
                                                                   motivation why we chose the SIOC Actions module as core of
SIOC Ontology to take into account such characteristics (e.g.
                                                                   our model. Most of the concepts in the Actions module are the
multi-authoring, versioning, etc.). Then, some tools to generate
                                                                   same as in the W7 model. Furthermore wikis are community
and consume data from wikis using our model have also been
                                                                   sites and the Actions module has been implemented to repre-
developed [12].
                                                                   sent dynamic, action-centric views of online communities.
B. The SIOC Actions module                                            In the following sections we give a detailed description of
                                                                   how we answered each of these seven questions.
   While SIOC represents the state of a community at a
given time, SIOC-actions [4] can be used to represent their        A. What
dynamics, i.e. how they evolve. Hence, SIOC provides a                The What element represents an event that affected data
document-centric view of online communities and SIOC-              during its life cycle. It is a change of state and the core of
actions focuses on an action-centric view. More precisely,         the model. In this regard, there are three main events affecting
the evolution of an online community is represented as a set       data: creation, modification and deletion. In the context of
of actions, performed by a user (sioc:UserAccount), at             wikis, each of them can appear: users can (1) add new
some time, and impacting a number of objects (sioc:Item).          sentences (or characters), (2) remove sequences of characters,
SIOC-actions provides an extensible hierarchy of properties        or (3) modify characters by removing and then adding content
for representing the effect of an action on its items, such
                                                                     7 LODE Ontology specification — http://linkedevents.org/ontology/
  5 http://sioc-project.org                                           8 Evermann J. provides an OWL description of the Bunge’s ontology at:
  6 http://rdfs.org/sioc/spec/                                     http://homepages.mcs.vuw.ac.nz/∼jevermann/Bunge/v5/index.html
 Provenance   Construct      Definition
 element      in Bunge’s                                                      following types of edits: Insertion, Update and Deletion of
              ontology                                                        both Sentences and References. With the term Sentence here
 What         Event          An event (i.e. change of state) that happens     we refer to every sequence of characters that does not include
                             to data during its life time
 How          Action         An action leading to the events. An event may
                                                                              a reference or a link to another source, and with Reference
                             occur, when it is acted upon by another thing,   we refer to every action that involves a link or a so-called
                             which is often a human or a software agent       Wikipedia reference. As discussed in [14], another type of
 When         Time           Time or more accurately the duration of an
                             event
                                                                              edit would be a Revert, or an undo of the effects of one or
 Where        Space          Locations associated with an event               more edits previously happening. However, in Wikipedia, a
 Who          Agent          Agents including persons or organizations in-    revert does not restore a previous version of the article, but
                             volved in an event                               creates a new version with content similar to the one from an
 Which        Agent          Instruments or software programs used in the
                             event                                            earlier selected version. In this regard, we decided to model a
 Why          -              Reasons that explain why an event occurred       revert as all the other edits, and not as a particular pattern. The
                                                                              distinction between a revert and other types of action can be
                             TABLE I                                          yet identified, with an acceptable level of precision, by looking
           D EFINITION OF THE 7 W S BY R AM S . AND L IU J.
                                                                              at the user comment entered when doing the revert, since most
                                                                              users add a related revert comment 9 .
                                                                                 Going further, and to represent provenance data for the
in the same position of the article. In addition, in systems like             action involved in each wiki edit, we modelled the diffs
Wikipedia, some other specific events can affect the data on the              appearing between pages. To model the differences calculated
wiki, for example “quality assessment” or “change in access                   between subsequent revisions we created a lightweight
rights” of an article [14]; however, they can be expressed with               Diff ontology, inspired by the Changeset vocabulary10 .
the three broader types defined above.                                        Yet, instead of describing changes to RDF statements, our
   Since (1) wikis commonly provide a versioning mechanism                    model aims at describing changes to plain text documents.
for their content and (2) every action on a wiki article leads                It provides a main class, the diff:Diff class, and six
to the generation of a new article revision, the core event                   subclasses: SentenceUpdate, SentenceInsertion,
describing our What element is the creation of an article                     SentenceDeletion                 and       ReferenceUpdate,
version. In particular we model this creation, and the related                ReferenceInsertion, ReferenceDeletion, based
modification of the latest version (i.e. the permalink), using                on the previous How patterns.
the SIOC-Actions model as shown in Listing 1.
 <http://example.com/action?title=Dublin_Core#380106133>
     sioca:creates <http://en.wikipedia.org/w/index.php?
          title=Dublin_Core&oldid=380106133>;
     sioca:modifies <http://en.wikipedia.org/wiki/
          Dublin_Core>;
     a sioca:Action.

              Listing 1.   Representing the ”What” element

   As we can see from the example above expressed
in Turtle syntax, we have a sioca:Action identified
by the URI hhttp://example.com/action?title=Dublin Core#
380106133i that leads to the creation of a revision of the main
wiki article about “Dublin Core”. The creation of a new revi-
sion was originated by a modification (sioca:modifies)
of the main Wikipedia article hhttp://en.wikipedia.org/wiki/                  Fig. 1. Modeling differences in plain text documents with the Diff vocabulary
Dublin Corei. Details about the type of event are exposed
in the next section about the How element, where we identify                     The main Diff class represents all information about
the type of action involved in the event creation.                            the change between two versions of a wiki page (see
                                                                              Fig. 1). The Diff’s properties subjectOfChange and
B. How                                                                        objectOfChange point respectively to the version changed
   The How element in W7 is an equivalent to the Action                       by this diff and to the newly created version. Details about
element from Bunge’s ontology, and describes the action                       the time and the creator of the change are provided respec-
leading to an event. In wikis, the possible actions leading                   tively by dc:created and sioc:has_creator. More-
to an event (i.e. the creation of a new revision) are all                     over, the comment about the change is provided by the
the edits applied to a specific article revision. By analyzing                diff:comment property with range rdfs:Literal. In
the diff between two subsequent revisions of a page, we                         9 Note that we could also compare the n-1 and n+1 version of each page to
can identify the type of action involved in the creation of                   identify if a change is a revert
the newer revision. In particular we focus on modelling the                     10 The Changeset schema: http://purl.org/vocab/changeset/schema#
Figure 1 we also display a Diff class linking to another Diff         D. Where
class. The latter represents one of the six Diff subclasses              The Where element represents the online “Space” or the
described earlier in this section. Since a single diff between        location associated with an event. In wikis, and in particular
two versions can be composed by several atomic changes (or            in Wikipedia, this is one of the most controversial elements
“sub-diffs”), a Diff class can then point to several subclasses       of the W7 model. If the location of an article update might
using the dc:hasPart property. Each Diff subclass can                 be considered as the location of the user when updating the
have maximum one TextBlock removed and one added: if                  content, then this information on Wikipedia is not completely
it has both, then the type of change is an Update, otherwise          provided or accurate. Indeed we can extract this information
the type would be an Insertion or a Deletion.                         only from the IP address of the anonymous users but not
   The TextBlock class is part of the Diff ontology and               from all the Wikipedia users. To note that is possible to
represents a sequence of characters added or removed in a             link a sioc:UserAccount (e.g. hhttp://en.wikipedia.org/
specific position of a plain text document. It exposes the            wiki/User:96.245.230.136i) to the related IP address using the
content itself of this sequence of characters (content) and           SIOC ip_address property.
a pointer to its position inside the document (lineNumber).
It is important to precise that usually the document content is       E. Who
organized in sets of lines, as in wiki articles, but this class          The Who element describes an agent involved in an event,
is generic enough to be reusable with other types of text             therefore it includes a person or an organization. On a wiki it
organization. To note also that each of the six subclasses of         represents the editor of a page, and it can be either a registered
the Diff class inherit the properties defined for the parent          user or an anonymous user. A registered user might also
class, but unfortunately this is not displayed in Figure 1 for        have different roles in the Wikipedia site and, on this basis,
space reasons.                                                        different permissions are granted to its account. With this work
   With the model presented it is possible to address an              we are only interested in keeping track of the user account
important requirement for provenance: the reproducibility of          involved in each event, and not also in the role on the wiki.
a process. Starting from an older revision of a wiki article,         Users are modelled with the sioc:UserAccount class and
just following the diffs between the newer revisions and the          linked to each sioca:Action, sioct:WikiArticle
TextBlocks added or removed, it is possible to reconstruct            and diff:Diff with the property sioc:has_creator. A
the latest version of the article. This approach goes a step          sioc:UserAccount represents a user account, in an online
further than just storing the different data versions: it provides    community site, owned by a physical person or a group or an
details of the entire process involved in the data life cycle.        organization (i.e. a foaf:Agent). Hence a physical person,
                                                                      represented by a foaf:Person subclass of foaf:Agent,
C. When
                                                                      can be linked to several sioc:UserAccount.
   The When element in W7 is equivalent to the Time element
from Bunge’s ontology, and obviously refers to the time an
event occurs, which is recorded in every wiki platform for page
edits. As depicted in Figure 1, each Diff class is linked to the
timestamp of the event using the dc:created property. The
same timestamp is also linked to each Diff subclass using
the same property (not shown in Fig. 1 for space reasons). The
time of the event is modelled with more detail in the Action
element as shown in the following Listing 2 11 .
                                                                          Fig. 2.   Modeling the Who element with sioc:UserAccount
 <http://example.com/action?title=Dublin_Core#380106133>
    dc:created "2010-08-21T06:36:17Z"ˆˆ<http://www.w3.org
          /2001/XMLSchema#dateTime>;
    lode:atTime [                                                     F. Which
       a time:Instant;
       time:inXSDDateTime "2010-08-21T06:36:17Z"ˆˆ<http://               The Which element represents the programs or the instru-
            www.w3.org/2001/XMLSchema#dateTime>.                      ments used in the event. In our particular case it is the software
    ];
    a sioca:Action.                                                   used in editing the event, which might be a bot or the wiki
                                                                      software used by the editor. Since there is not a direct and
      Listing 2.   Representing the ”When” element in Turtle syntax
                                                                      precise way to identify whether the edit has been made by a
   In this context we consider actions to be instantaneous. As in     human or a bot, our model does not make this distinction. A
[4] we track the instant that an action is taking effect on a wiki    naive method could be to look at the username and check if
(i.e. when a wiki page is saved). Usually, this creation time         it contains the “bot” string.
is represented using dc:created. Another option, provided
                                                                      G. Why
by the LODE ontology, uses the lode:atTime property to
link to a class representing a time interval or an instant.             The Why element represents the reasons behind the event
                                                                      occurrence. On Wikipedia it is defined by the justifications for
  11 For all the namespaces see: http://prefix.cc                     a change inserted by a user in the “comment” field. This is
not a mandatory field for the user when editing a wiki page           to compute the type of change for each of the differences
but the Wikipedia guidelines recommend to fill-in this text           identified. This allows us to mark each change with one of the
field. We model the comment left by the user with a property          Sentence or Reference Insertion/Update/Deletion subclasses
diff:comment linking the diff:Diff class to the related               of the diff:Diff class. Finally the script generates RDF
rdfs:Literal.                                                         data with the model described before and inserts it in the
                                                                      local triplestore. In order to test our application we ran the
     V. A PPLICATION USING PROVENANCE DATA FROM                       data extraction algorithm starting from the category “Semantic
                      W IKIPEDIA                                      Web” on the English Wikipedia, and we generated data for
A. Collecting the data from the Web                                   all the 166 wiki articles belonging to this category and its
   In order to validate and test our modelling solution for           subcategories recursively. As we can see, using Semantic Web
provenance on wikis and in particular from the Wikipedia              technologies, we have the advantage of having a single and
website, we collected data from the English Wikipedia and the         standard language to query wiki and provenance data together,
DBpedia service. The DBpedia project12 since it extracts and          while developers that need to query original systems have to
publishes structured information from the English Wikipedia,          learn a new API for each new system we want to query.
is considered as its RDF export. Collecting data not only
                                                                      B. A Firefox plug-in for provenance from Wikipedia
from Wikipedia but also from the DBpedia source has an
important advantage: it directly provides us structured data             In order to show the potential of the data collected and
modelled with popular standard lightweight ontologies in RDF.         the data model created, we built an application to show some
We use the DBpedia data especially for the categories that            interesting statistics extracted from provenance information of
hierarchically structure the articles on Wikipedia. We ran our        the analyzed articles. The application displays a table directly
experiment collecting a portion of the Wikipedia articles, and        on the top of each Wikipedia article exposing some informa-
in particular the articles belonging to the whole hierarchy           tion about the most active users on the article and their edits.
under a given category. By doing this we could limit our              In particular this has been developed using a Greasemonkey16
dataset only to articles strongly related with each other, and        script: a Mozilla Firefox extension that allows users to install
collect a user community with the same interest in common.            scripts that make on-the-fly changes to HTML web page
   A PHP script has been developed to extract all the articles        content. This script is developed in JavaScript language and
belonging to a category and all its subcategories, and for each       is now compatible with other popular Web browsers. The
article all its revision history. More in detail, this program:       structure of the application is then composed by the following
    • Executes a SPARQL
                             13
                                query over the DBpedia endpoint       elements: 1) The triplestore containing the data collected and
      to get the categories hierarchy;                                exposing a SPARQL endpoint for querying the data; 2) A
    • Stores the categories hierarchy (modelled with the
                                                                      PHP script, used as an interface between the Greasemonkey
      SKOS14 vocabulary) in a local triplestore;                      script and the triplestore; 3) A Greasemonkey script, which
    • Queries again the DBpedia endpoint to get all the articles
                                                                      retrieves the URL of the Wikipedia loaded page, sends the
      belonging to the categories collected;                          request to the PHP script and then displays the returned
    • For all the articles collected it generates (and stores
                                                                      HTML data on the Wikipedia page. The PHP script in this
      locally) RDF data using the SIOC-MediaWiki exporter15 ;         application is important because it is responsible for executing
    • Using the sioc:previous_version property it ex-
                                                                      the SPARQL queries on the triplestore. Furthermore it retrieves
      ports RDF for all the previous revisions of each article.       the results and creates the HTML code to embed on the
                                                                      Wikipedia page. A screenshot of the result of the process is
It is clear the advantage of using DBpedia in this process since      displayed in Figure 3.
we collected structured data just executing two lightweight
                                                                         The tables displayed in Figure 3 appear only on the top of
SPARQL queries.
                                                                      the Wikipedia articles and categories that we analyzed with the
   A second PHP script has been developed to extract detailed         method described in Section V-A. A different type of table is
provenance information from the articles collected with the           showed when the page visited is a category page. In Figure 3
previous step. This script calculates the diff function between       on the top table, we can see the top six users who did the
consecutive versions of the articles, and retrieves more related      biggest number of edits on the article. For each of these users
information from the Wikipedia API. The data retrieved from           we then compute: (1) their total number of edits on the page;
the API is composed by all the information needed for the cre-        (2) their percentage of “ownership” on the page (or better, the
ation of the model described in the previous section. Therefore       percentage of their edits compared to all the edits done on the
information about the editor, the timestamp, the comment and          article); (3) their number of lines added on the article; (4) their
the ID of the versions are identified. Moreover the algorithm         number of lines removed on the article; (5) their total number
is not only capable of extracting the diff function, but also         of lines added and removed on all the articles belonging to
  12 http://dbpedia.org                                               the category “Semantic Web”. With the other use-case, when
  13 Query Language for RDF: http://www.w3.org/TR/rdf-sparql-query/   the user visits a Wikipedia category page, we display different
  14 SKOS Reference: http://www.w3.org/TR/skos-reference/
  15 http://ws.sioc-project.org/mediawiki/                              16 http://www.greasespot.net/
                                                                                        VI. C ONCLUSION AND F UTURE W ORK
                                                                               The goal of this paper was to provide a solution for
                                                                            representing and managing provenance of data from Wikipedia
                                                                            (and other wikis) using Semantic Web technologies. To solve
                                                                            this problem we provided: a specific lightweight ontology for
                                                                            provenance in wikis, based on the W7 model; a framework
                                                                            for the extraction of provenance data from Wikipedia; an
                                                                            application for accessing the generated data in a meaningful
                                                                            way and exposing it to the Web of data. We showed that
                                                                            the W7 model is a good choice for modelling provenance
                                                                            information in general and in wikis but, because of its high
                                                                            abstraction level, it has to be refined using for instance other
                                                                            specific lightweight ontologies. In our case this has been done
                                                                            using SIOC and the Actions module. Future developments will
                                                                            include a refinement of the proposed model and a subsequent
                                                                            alignment with other general-purpose ontologies for represent-
                                                                            ing provenance as Linked Data (e.g. the Open Provenance
                                                                            Model). We also plan to improve and extend the potentialities
Fig. 3. A screenshot of the application on the “Linked Data” page and the
table from the Category “Semantic Web” page                                 of our application offering more features, and providing a
                                                                            wider range of data with an architecture that automatically
                                                                            updates the data as soon as it changes on Wikipedia.
types of information but using the same method. See the table
                                                                                                        R EFERENCES
on the bottom in Figure 3. Browsing a wiki category page, the
application shows a list of the users with the biggest number                [1] SIOC Core Ontology Specification. W3C Member Submission 12
                                                                                 June 2007, World Wide Web Consortium, 2007. http://www.w3.org/
of edits on the articles of the whole category (and related                      Submission/sioc-spec/.
subcategories). It also shows the related percentages of their               [2] B.T. Adler, L. de Alfaro, I. Pye, and Vishwanath Raman. Measuring
edits compared to the total edits on the category. The second                    author contributions to the wikipedia. In Proceedings of WikiSym ’08.
                                                                                 ACM, 2008.
table on the right exposes a list of the most edited articles in             [3] Mario Bunge. Treatise on Basic Philosophy: Ontology I: The Furniture
the category during the last three months. To note also that                     of the World. Riedel, Boston, 1977.
at the bottom of each table there is a link pointing to a page               [4] P.A. Champin and A. Passant. SIOC in Action - Representing the Dy-
                                                                                 namics of Online Communities. In Proceedings of the 6th International
where a longer list of results will be displayed.                                Conference on Semantic Systems (I-SEMANTICS 2010). ACM, 2010.
At the moment the PHP script developed is available at http:                 [5] J. Golbeck, B. Parsia, and J. Hendler. Trust networks on the semantic
//vmuss06.deri.ie/WikiProvenance/index.php. Just using this                      web. Cooperative Information Agents VII, pages 238–249, 2003.
                                                                             [6] Olaf Hartig. Provenance information in the web of data. In 2nd
script is possible to have the same information displayed                        Workshop on Linked Data on the Web (LDOW 2009) at WWW, 2009.
using the Greasemonkey script and also to have the RDF                       [7] Olaf Hartig and Jun Zhao. Publishing and Consuming Provenance
descriptions of the page requested. In order to represent these                  Metadata on the Web of Linked Data. In Proceedings of 3rd Int.
                                                                                 Provenance and Annotation Workshop, 2010.
statistical information in RDF, we use SCOVO, the Statistical                [8] M Hausenblas, W Halb, Y Raimond, L Feigenbaum, and D Ayers.
Core Vocabulary [8]. It relies on the concept of Item and                        SCOVO: Using statistics on the Web of data. In Semantic Web in Use
dimensions to represent statistical information. In our context,                 Track of the 6th European Semantic Web Conference (ESWC2009), 2009.
                                                                             [9] B Hoisl, W Aigner, and S Miksch. Social Rewarding in Wiki Systems–
the item is one piece of statistical information (e.g. user                      Motivating the Community. In Proceedings of the 2nd international
“X” edited 10 lines on page “Y”), and various items are                          conference on Online communities and social computing, pages 362–
involved in the description: (1) the type of information that                    371. Springer-Verlag, 2007.
                                                                            [10] NT Korfiatis, M Poulos, and G Bokos. Evaluating authoritative sources
we want to represent (number of edits, percentage, lines added                   using social networks: an insight from Wikipedia. Online Information
and removed etc.); (2) the page or the category impacted;                        Review, 2006.
(3) the user involved. Hence, we created four instances of                  [11] Fabrizio Orlandi and Alexandre Passant. Enabling cross-wikis integra-
                                                                                 tion by extending the SIOC ontology. In 4th Semantic Wiki Workshop
scv:Dimension to represent the first dimension, and relied                       (SemWiki 2009). CEUR-WS, 2009.
then simply on the scv:dimension property for the other                     [12] Fabrizio Orlandi and Alexandre Passant. Semantic Search on Hetero-
ones. As an example, the following snippet represents that the                   geneous Wiki Systems. In International Symposium on Wikis (Wik-
                                                                                 iSym2010). ACM, 2010.
user KingsleyIdehen made 11 edits on the SIOC page.                         [13] Sudha Ram and Jun Liu. Understanding the semantics of data prove-
 ex:123 a scovo:Item ;                                                           nance to support active conceptual modeling, pages 17–29. Springer
    rdf:value 11 ;                                                               Berlin / Heidelberg, lncs edition, 2007.
    scv:dimension :Edits ;                                                  [14] Sudha Ram and Jun Liu. A New Perspective on Semantics of Data
    scv:dimension <http://wikipedia.org/wiki/SIOC>;                              Provenance. In First International Workshop on the role of Semantic
    scv:dimension <http://wikipedia.org/wiki/User:                               Web in Provenance Management (SWPM 2009), 2009.
         KingsleyIdehen>.                                                   [15] Y.L. Simmhan, B. Plale, and D. Gannon. A survey of data provenance
                                                                                 techniques. Computer Science Department, Indiana University, Bloom-
   Listing 3.   Representing the number of edits by a user with SCOVO            ington IN, 47405, 2005.

</pre>