=Paper=
{{Paper
|id=None
|storemode=property
|title=Enhancing MediaWiki Talk pages with Semantics for Better Coordination - A Proposal
|pdfUrl=https://ceur-ws.org/Vol-632/paper20.pdf
|volume=Vol-632
|dblpUrl=https://dblp.org/rec/conf/semwiki/SchneiderPB10
}}
==Enhancing MediaWiki Talk pages with Semantics for Better Coordination - A Proposal ==
Enhancing MediaWiki Talk pages with
Semantics for Better Coordination?
A Proposal
Jodi Schneider, Alexandre Passant, John G. Breslin??
Digital Enterprise Research Institute,
National University of Ireland, Galway
firstname.lastname@deri.org
Abstract. This paper presents a 15-item classification for MediaWiki
Talk pages comments, associated with a new lightweight ontology that
extends SIOC to represent these categories. We discuss how this ontology
can enhance MediaWiki Talk pages, with RDFa, making content of such
pages easier to parse and to understand.
Key words: MediaWiki, Wikipedia, Talk pages, RDFa, SIOC
1 Introduction
Wikis are often used for collaborative knowledge gathering and sharing, and
coordination of this work may take place on and off the wiki (e.g. [8]). How-
ever, finding relevant conversations may become more difficult as their volume
increases.
MediaWiki software1 , used by Wikipedia, Wikia2 , and other wikis, is one of
the most popular systems, and we focus on it throughout the paper. Article-
level coordination is common in MediaWiki; by default, MediaWiki installations
provide a Talk namespace. Each article links to a Talk page (originally empty),
which can be used to coordinate, discuss, and dispute the editing of that article.
Figure 1 shows a sample Talk page. Talk pages are heavily used (as we discuss
in Section 2.1), and some improvements to Talk pages have already been made
available as MediaWiki plugins3,4 . We believe that Talk pages could benefit from
increased semantics.
As Talk pages grow, MediaWiki editors may benefit from tools to help iden-
tify relevant comments. We provide sample RDFa markup for MediaWiki Talk
?
The work presented in this paper has been funded in part by Science Foundation
Ireland under Grant No. SFI/08/CE/I1380 (Lı́on-2).
??
John G. Breslin is also member of the School of Engineering and Informatics, NUI
Galway
1
http://www.mediawiki.org/
2
http://www.wikia.com/
3
http://www.mediawiki.org/wiki/Extension:LiquidThreads
4
http://www.mediawiki.org/wiki/Category:Discussion_and_forum_extensions
2 Jodi Schneider, Alexandre Passant, John G. Breslin
Fig. 1. Talk page for the Semantic Web article in Wikipedia
pages, using a lightweight ontology for Talk page comments which extends SIOC
[2]. This markup and ontology provide underlying metadata which could later
be used to highlight and query for certain types of Talk page comments.
In the remainder of the paper, we first review related work, then describe
15 categories used to classify comments on MediaWiki Talk pages. Next we
distill that classification system to a lightweight ontology for relevant Talk page
comments, which we use to markup a Talk page segment in RDFa. Finally we
outline work in progress on leveraging this ontology with RDFa markup and
JavaScript- and SPARQL-based tools.
2 Related Work
2.1 Talk pages are heavily edited on Wikia and Wikipedia
Based on their studies of Wikia, Aniket & Kittur postulate that article talk
scales linearly with the size of the wiki [5]. They compare coordination and Talk
pages of Wikipedia and over 6000 Wikia wikis, finding differences which they
attribute to differences in community size and type.
Wikipedia’s Talk pages are heavily used, and in recent years, Talk pages have
been added more quickly than articles, growing at a rate of 11x, compared to
9x for articles [11]. Over a 2.5 year period, edits to Wikipedia Talk pages nearly
doubled, from 11% to 19% of all page edits, while article edits nearly halved
Enhancing MediaWiki Talk pages with Semantics for Better Coordination 3
from 53% to 28% of all page edits [10]. Further, Wikipedia’s users make a larger
or smaller percentage of edits to Talk pages depending on their social roles [12].
2.2 Studies of Wikipedia Talk pages
While Wikipedia Talk pages have been studied from a content analysis, commu-
nications theory, and data mining perspective, further research is needed because
the variance between Talk pages is significant. For instance, the most common
type of discussion, coordination requests (described in Section 3 below), ranges
widely, from 2% to 97% of the comments on a page, depending on the page [11].
Due to the variance, perhaps it is not surprising that researchers do not agree on
the second most common type of discussion [3][11]. However, despite the evident
variance, few categorical differences between Talk pages have been identified or
systematically described. Furthermore, sample sizes for qualitative studies have
been small (see [10] for a comparison of Featured and non-Featured articles with
the largest sample size, 60 Talk pages). Other studies of Talk pages include [6],
[4], [1], and [3].
Viégas [11] provides both a manual classification of 25 hand-selected Talk
pages, and a quantitative analysis, which reveals that articles with Talk pages
are more highly edited, and have more editors than articles without Talk pages.
In particular,“94% of the pages with more than 100 edits have related Talk
pages”. The dimensions used in their manual classification are further discussed
in Section 3, where they form the basis for our lightweight ontology.
3 Classifying comments in Wikipedia
Our classification began organically from the items in Talk pages we reviewed
for our content analysis [9]. These coalesced into a set of classifications, which we
then compared with the classification frameworks used in [11] and [10]. Since we
planned to develop an ontology for editors to apply to their own comments, the
directness of Viégas’ classifications suited us, especially since these had already
been used for at least two studies, and were very similar to our own classification.
By contrast, since Stvilia classifies the possible information quality problems of
an article, his classifications (such as cohesiveness and verifiability) require more
abstraction, since they describe attributes of the article, not of the comment;
further, some terms, (such as semantic consistency and security) might not be
instantly accessible to the lay reader and wiki editor.
To update and extend Viégas’ analysis [11], we undertook a manual content
analysis [9] of Talk page comments, based on 100 Talk pages from five differ-
ent types of Wikipedia Talk pages. Our content analysis used 15 non-mutually-
exclusive classifications. First, we used the 11 classifications defined by Viégas
[11]; Table 1 shows definitions of each term, with examples taken from Wikipedia
Talk pages that we analyzed. To capture other features we were interested in,
we added 4 new, non-mutually-exclusive classifications as shown in Table 2.
We added these types because:
4 Jodi Schneider, Alexandre Passant, John G. Breslin
Classification Definition Example
Requests/suggestions Ideas, comments, or sugges- Currently some of the refs
for editing coordination tions involving editing the are YYYY-MM-DD format
article. and some are Month DD,
YYYY. Which format do we
want to standardize to?
Requests for informa- Questions asked by someone Where is Ligurian spoken in
tion who doesn’t intend to edit the Var ?
the page.
References to vandalism Mentions of vandalism. I’ve semi-protected the ar-
ticle for another week, the
signal-to-noise ratio of the
IP edits seemed too low.
References to wiki References to guidelines The section I removed had
guidelines and policies and/or policies of this wiki. no sources / references - if
you have sources they’re no
good being kept a secret
;) WP:VERIFY, WP:CITE.
Thanks/
References to internal References to internal wiki Would it be a good thing to
wiki resources resources such as diffs, Talk re-add the links that were
page discussions, old version taken off in August? Some-
of a page. body made them into a tem-
plate that was subsequently
deleted. The edit to recover
the old links is here: [6]
Off-topic remarks Remarks not relating to PLATO IS THE BEST
editing the article. MAN ALIVE! LONG LIVE
PLATO
Polls Formal proposals followed A month should be deleted
by statements such as Sup- from the “Deaths in [CUR-
port and Oppose, with jus- RENT YEAR]” page ONE
tifications. WEEK after the month
ends...
Requests for peer re- Requests for peer review. Users hoping to elevate arti-
view cles to featured status may
solicit a peer review.[11]
Information boxes Special boxes with informa- See Fig. 2(a), which pro-
tion, usually found at the poses and discusses a new
top of a Talk page. info box for the Swine in-
fluenza article.
Images Images posted on the Talk See Fig. 2(b)
page.
Other The sole exclusive category, “This review is transcluded
describes items that don’t from Talk:Wiki/GA1. The
fit elsewhere. edit link for this section can
be used to add comments to
the review.”
Table 1. Viégas’ 11 types of Talk pages comments [11]
Enhancing MediaWiki Talk pages with Semantics for Better Coordination 5
Classification Definition Example
References to sources References to sources, in- Exclusive! Mighty Stef
outside the wiki cluding print and deep web records football protest
resources, outside this wiki. song”Hot Press. Not sure
where to put it but I’ll leave
it here as somebody might
find it useful...
References to reverts, Discussions of reverts, re- I noticed some people edit
removed material, or moving material, or contro- the page into what it will be
controversial edits versial edits. in 10 minutes but someone
is reverting it...just let it be.
Reference to edits the Applied when an editor dis- Added the About.com re-
discussant made cusses his/her own article view since the review was
edits on the Talk page. part of the reception sec-
tion.
Requests for help with Solicitations for assistance This is just to invite at-
another article, portal, elsewhere, or recruiting ed- tention to the page Face-
etc. itorial help in the Talk page book statistics just created;
for another article. of all interested editors. I
have just placed a mergeto
tag in it. Thanks.
Table 2. Our 4 additional comment types for Talk pages
(a) (b)
Fig. 2. Comments from the Swine influenza Talk page containing: (a) a proposed in-
fobox and, (b) images.
– Sources are heavily discussed in Talk pages, and some comments seem to be
made soley to deposit a source. While many sources are on the open web
(and can be detected as external links), print resources, inexact references,
and deep web resources may also be provided.
– Disagreements about article content often take place in the context of reverts
to the page. Discussions about removing content or editing controversial
material may also take place on the Talk page before the article is edited.
– The Talk page may be used to notify other editors about a recent edit,
perhaps to provide further description, anticipate questions, or clarify that a
6 Jodi Schneider, Alexandre Passant, John G. Breslin
suggestion has been implemented. Editors may also explain their own edits
in discussions of reverts and edit wars.
– The Talk page is often seen as a site for communication with editors who
have interest in or knowledge about a given topic. Requests for help, like
Requests for information, draw on that perceived expertise.
4 A model for structuring wiki contributions
Based on the aforementioned 15 categories (11 from previous work plus the 4
that we introduced), we designed a lightweight vocabulary for annotating Talk
pages. The main purpose of this model is to categorize each comment in the wiki
page, so that, for example, one could immediately identify all the references to
vandalism, all the pages requiring help, or all the sources recommended on the
Talk page. This could be useful since editors may specialize, performing a certain
type of task repeatedly [12]. Categorization could also facilitate automatically
collating comments, for instance transcluding Requests for Information into a
more appropriate spot, such as the Wikipedia Reference Desk5 for that category.
To that end, we provide a model (applied to a Talk page in Fig. 3):
– using existing ontologies, namely FOAF and SIOC, to model the users, the
discussion topics (considered as SIOC threads), and the comments. Among
others, we reuse the sioct:WikiArticle class from the SIOC Types module
and the sioc:has_discussion property that was introduced by some of our
previous work regarding modeling wiki structure using semantics [7].
– providing new classes to represent some of the classifications introduced in
Section 3. We focused only on the requests and reference categories, for two
reasons. First, these are the ones that people might indicate when they add
new content (we will describe the process later). It is hard to imagine that
someone would mark their own comment as off-topic; however, labeling it a
“request for help” seems plausible. Second, these categories seem to be the
most relevant for querying and retrieving information.
In addition, additional RDF properties could be used, e.g. from the Dublin
Code vocabulary. For instance, when making a ReferenceToEdit, specifying a
permalink to the edit could be done with dcterms:requires, or when making a
ReferenceToSources, specifying the URI of a source with dcterms:references.
Our model, available at http://rdfs.org/sioc/wikitalk, then consists of:
– A class WikiDiscussionItem.
– Two classes, subclasses of the aforementioned one, named ReferenceItem
and RequestItem, for references and requests, respectively, that have various
subclasses as follows:
• For the ReferenceItem class:
◦ ReferenceToEdit;
◦ ReferenceToGuidelinesOrPolicies;
5
http://en.wikipedia.org/wiki/Wikipedia:Reference_desk
Enhancing MediaWiki Talk pages with Semantics for Better Coordination 7
◦ ReferenceToInternalResources;
◦ ReferenceToRevertsOrControversialOrRemovedMaterial;
◦ ReferenceToSources;
◦ ReferenceToVandalism.
• For the RequestItem class:
◦ RequestEditingCoordination;
◦ RequestHelpElsewhere;
◦ RequestInfo;
◦ RequestPeer-review
Fig. 3. Annotated Talk page
5 Providing and using the annotations
5.1 RDFa Markup
Using this model, we then describe the type(s) of each comment, and the struc-
tural connections between these comments in MediaWiki Talk pages using RDFa
markup. Here is an example before adding the markup (Listing 1.1), and after
(Listing 1.2). The extracted RDF is also provided in Listing 1.3.
8 Jodi Schneider, Alexandre Passant, John G. Breslin
< span class =" editsection " >[ < a href ="/ w / index . php ? title = Talk : Semantic_Web
& amp ; action = edit & amp ; section =2" title =" Edit section : Opening
sentence " > edit a >] span >
< span class =" mw - headline " id =" O p e n i n g _ s e n t e n c e " > Opening sentence span >
h2 >
Could somebody please put examples of ’ semantic web ’ immediately
after the opening sentence ? Otherwise it just sounds a bit waffly
and , more importantly , the intelligent lay reader is lost . Thanks .
86.42.96.251 a > ( < a href ="/ wiki /
User_talk :86.42. 96.251" title =" User talk :86.42.96.251" > talk a >)
10:38 , 30 March 2009 ( UTC )
Listing 1.1. Example of a comment in a Talk page
< div xmlns:sioc = " http: // rdfs . org / sioc / ns # " xmlns:siocwt = " http: // rdfs . org
/ sioc / wikitalk # " xmlns:content = " http: // purl . org / rss /1.0/ modules /
content / " about = " # O p e n i n g _ s e n t e n c e " typeof = " sioc:Thread " rel = "
s i o c : h a s _ c o n t a i n e r " href = " / w / index . php ? title = T a l k : S e m a n t i c _ W e b " >
< h2 >
< span class = " editsection " >[ edit a >] span >
< span class = " mw - headline " id = " O p e n i n g _ s e n t e n c e " > Opening sentence span >
h2 >
Could somebody please
put examples of ’ semantic web ’ immediately after the opening
sentence ? Otherwise it just sounds a bit waffly and , more
importantly , the intelligent lay reader is lost . Thanks .
86.42.96.251 a > ( talk a >) 10
:38 , 30 March 2009 ( UTC )
p >
div >
Listing 1.2. Example of a comment in a Talk page, with RDFa markup
<# post_1 > a siocwt : R e q u e s t E d i t i n g C o o r d i n a t i o n ;
content : encoded """ Could somebody please put examples of ’ semantic web
’ immediately after the opening sentence ? Otherwise it just sounds
a bit waffly and , more importantly , the intelligent lay reader is
lost . Thanks .
86.42.96.251 a > ( < a href ="/ wiki /
User_talk :86.42.9 6.251" title =" User talk :86.42.96.251" > talk a >)
10:38 , 30 March 2009 ( UTC )
"""^^ rdf : XMLLiteral ;
sioc : has_container <# Opening_sentence > .
<# Opening_sentence > a sioc : Thread ;
sioc : has_container w / index . php ? title = Talk : Semantic_Web > .
Listing 1.3. Example of a comment in a Talk page, in Turtle (without prefixes)
Enhancing MediaWiki Talk pages with Semantics for Better Coordination 9
5.2 Annotation and extraction tools
We are currently developing several services to provide and use the aforemen-
tioned annotations. First, we are creating two JavaScript plugins, an annotation
plugin and a highlight plugin. Then, we will also investigate the use of SPARQL-
based interfaces to query such annotations.
While editing the Talk page, an editor could use a JavaScript-based anno-
tation plugin to specify which of the 10 classifications of our ontology apply.
(Users do say that they are willing to choose the comment type.) The plugin
would then generate the applicable RDFa markup. The annotation plugin could
also get certain FOAF and SIOC attributes from the username or IP address.
The annotation plugin will also facilitate user testing with the Wikipedia com-
munity, which may lead to further refinement of the Wikitalk module and its
class labels, based on task-based evaluations with frequent wiki editors and other
user testing of the annotation process.
So far we have created a plugin to use such annotations; relying on the RDFa
markup, it uses a JavaScript RDFa parser6 to parse a Talk page and to highlight
relevant comments on a single Talk page, based on an ontology category to which
they belong. We are currently evaluating this plugin and making improvements
based on user feedback.
A third application, based on SPARQL, will allow querying to get “views”
on the top of MediaWiki pages. For example, the user could “find all references
to vandalism posted in the last 2 days” or “find all comments mentioning a
source outside Wikipedia”. SPARQL also opens up exciting possibilities, such as
automatically collating comments, for instance transcluding Requests for Infor-
mation into a more appropriate spot, such as (for Wikipedia) the Reference Desk
for that topic, thus enabling new ways to automatically gather particular kind
of comments, and facilitating the coordination process in MediaWiki instances.
6 Conclusion
Talk pages, as we have seen, are highly used, making it challenging to find
relevant comments. To help fill this need, we used a 15-item classification for
MediaWiki Talk page comments, extended from Viégas, and then developed a
new lightweight ontology extending SIOC to represent the relevant categories.
We then enhanced MediaWiki Talk pages with RDFa markup to indicate com-
ment types and structural elements. That markup can in ongoing and future
work be extracted with JavaScript and SPARQL, making the content of such
pages easier to parse and to understand.
While the classifications in Tables 1 and 2 suit our immediate purpose, other
alternatives are possible. Different classifications aiming towards a different on-
tology might focus more narrowly on the changes suggested (or indicated as
made) by each comment (see, e.g. Table 3 in Stvilia [10]). Alternately, an on-
tology dedicated to a particular wiki could be based on information quality
6
http://www.w3.org/2001/sw/BestPractices/HTML/rdfa-bookmarklet/
10 Jodi Schneider, Alexandre Passant, John G. Breslin
dimensions and editorial policies specific to that wiki. As our work progresses,
we will be guided by user evaluations, to discover which such approaches might
be beneficial for editors collaborating in wiki spaces.
References
1. Nicolas Bencherki and Jeanne d’Arc Uwatowenimana. Writing a Wikipedia ar-
ticle: Data mining and organizational communication to explain the practices by
which contributors maintain the article’s coherence. In Annual Meeting of the
International Communication Association, Montreal, Quebec, May 2008.
2. John G. Breslin, Andreas Harth, Uldis Bojars, and Stefan Decker. Towards
Semantically-Interlinked Online Communities. In The Semantic Web: Research and
Applications, Proceedings of the 2nd European Semantic Web Conference (ESWC
’05), number 3532 in LNCS, pages 500–514. Heraklion, Greece, 2005.
3. Katherine Ehmann, Andrew Large, and Jamshid Beheshti. Collaboration in con-
text: Comparing article evolution among subject disciplines in Wikipedia. First
Monday, 13(10), October 2008.
4. Sean Hansen, Nicholas Berente, and Kalle Lyytinen. Wikipedia as rational dis-
course: An illustration of the emancipatory potential of information systems. In
40th Annual Hawaii International Conference on System Sciences, 2007.
5. Aniket Kittur and Robert E. Kraut. Beyond Wikipedia: Coordination and conflict
in online production groups. In CSCW 2010. ACM, February 2010.
6. Travis Kriplean, Ivan Beschastnikh, David W. McDonald, and Scott A. Golder.
Community, consensus, coercion, control: cs*w or how policy mediates mass par-
ticipation. In Proceedings of the 2007 International ACM Conference on Supporting
Group Work, pages 167–176, Sanibel Island, Florida, 2007. ACM.
7. Fabrizio Orlandi and Alexandre Passant. Enabling cross-wikis integration by ex-
tending the SIOC ontology. In Proceedings of the Fourth Semantic Wiki Workshop
(SemWiki 2009), co-located with 6th European Semantic Web Conference (ESWC
2009), volume 464, Hersonissos, Heraklion, Crete, Greece, June 2009.
8. Christian Pentzold and Sebastian Seidenglanz. Foucault@Wiki first steps towards
a conceptual framework for the analysis of wiki discourses. In WikiSym ’06: Pro-
ceedings of the 2006 International Symposium on Wikis, 2006.
9. Jodi Schneider, Alexandre Passant, and John G. Breslin. A content analysis: How
Wikipedia talk pages are used. In WebScience 2010, Raleigh, North Carolina, April
2010. http://websci10.org/.
10. Besiki Stvilia, Michael B. Twidale, Linda C. Smith, and Les Gasser. Informa-
tion quality work organization in Wikipedia. Journal of the American Society for
Information Science and Technology, 59(6):983–1001, 2008.
11. Fernanda B. Viégas, Martin Wattenberg, Jesse Kriss, and Frank van Ham. Talk
before you type: Coordination in Wikipedia. In 40th Annual Hawaii International
Conference on System Sciences, pages 78–87, 2007.
12. Howard T. Welser, Dan Cosley, Gueorgi Kossinets, Austin Lin, Fedor Dokshin,
Geri Gay, and Marc Smith. Finding social roles in Wikipedia. In Proceedings of
the American Sociological Association 2008, Boston, MA, 2008.