<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Characterizing and Predicting Activity in Semantic MediaWiki Communities</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Simon Walk</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markus Strohmaier</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Computer Science, University of Koblenz-Landau</institution>
          ,
          <addr-line>Koblenz</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>GESIS - Leibniz Institute for the Social Sciences</institution>
          ,
          <addr-line>Cologne</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute for Information Systems and Computer Media, Graz University of Technology</institution>
          ,
          <addr-line>Graz</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Semantic MediaWikis represent shared and discretionary databases that allow a community of contributors to capture knowledge and to specify semantic features, such as properties for articles, relationships between articles, or concepts that lter articles for certain property values. Today, Semantic MediaWikis have received a lot of attention by a range of di erent groups that aim to organize an array of di erent subjects and domain knowledge. However, while some Semantic MediaWiki projects have been thriving, others have failed to reach critical mass. We have collected and analyzed a total of 79 publicly available Semantic MediaWiki instances to learn more about these projects and how they di er from each other. Further, we conducted an empirical analysis using critical mass theory on Semantic MediaWiki communities to investigate whether activity or the number of registered users (or a mixture of both) are important for achieving critical mass. In addition, we conduct experiments aiming to predict user activity and the number of registered users at certain points in time. Our work provides new insights into Semantic MediaWiki communities, how they evolve and rst insights into how they can be studied using critical mass theory.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Semantic MediaWikis are open repositories for structured data that can be edited
by a community of users, who are interested in digitally modeling and
representing domains. These Wikis have been used to capture knowledge from a wide
variety of di erent domains, including for example beaches4, games5 or academic
institutions6.</p>
      <p>Although Semantic MediaWikis have matured technologically, we still don't
have a good understanding about the social processes behind them, e.g. why
some Semantic MediaWiki communities are thriving and others are failing to
reach critical mass. In this paper, we are using principles of critical mass theory
4 http://beachapedia.org/
5 http://nobbz.de/wiki/
6 http://www.aifb.kit.edu/portal
to investigate activity and community growth in 79 publicly available
Semantic MediaWikis with the goal of identifying and comparing factors that directly
in uence community growth and activity in said instances. In the context of
online platforms, critical mass is often referred to as the amount or number of
\something" (e.g., a feature or quality) that has to be reached for a system to
become self-sustaining [8{10]. In terms of Semantic MediaWiki communities we
want to know what this \something" is and if it is the same as it is for other
systems and communities. In our empirical analysis we will look at activity, i.e.
the accumulated number of changes contributed by the corresponding
community to each Semantic MediaWiki at certain points in time. In addition, we will
study the role of community growth via the number of accumulated unique users
that have contributed to the Wikis at certain points in time. In particular, we
are going to investigate whether activity or community growth (or a mixture of
both) are important for achieving critical mass and predicting activity as well
as community growth in Semantic MediaWikis at certain points in time.
Answering these questions will fuel our understanding of how Semantic MediaWiki
communities operate and evolve over time.</p>
      <p>The remainder of this paper is structured as follows. In Section 2 we will
present related work as well as work that has inspired the analysis conducted
in this paper. A short characterization of the crawled Semantic MediaWiki
instances and a description of the used methods for our analyses can be found
in Section 3. The results and interpretations of our analyses are presented in
Section 4. We conclude this paper in Section 5 and highlight future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>The work presented in this paper builds upon work in the areas of critical mass
theory and collaborative ontology engineering.
2.1</p>
      <sec id="sec-2-1">
        <title>Critical Mass Theory</title>
        <p>In 1985, Oliver and colleagues [8{10] have discussed and analyzed the concept of
critical mass theory by introducing so called production functions to characterize
decisions made by groups or small collectives. Essentially, these production
functions represent the link between individual bene ts and bene ts for the group.</p>
        <p>
          They argue that when achieving critical mass of users, collective goods of
groups are limited, thus interest can not be maintained longer than the limited
(collective) resource allows for. In the case of online communities, the collective
goods are not limited, theoretically allowing for an in nite increase in users.
However, without users motivated in contributing, interest will decrease and
critical mass will lose momentum and ultimately decelerate. In their work, three
di erent types of production functions are identi ed: Accelerating, decelerating
and linear functions (see Figure 2). The idea behind accelerating production
functions is that each contribution is worth more than its preceding one. In a
decelerating production function the opposite would be the case, resulting in
each succeeding contribution to be worth less than the preceding one. Until
today it is still mostly unclear what these production functions look like for online
communities and online production systems. Depending on the investigated or
desired point of view, di erent aspects of these communities and online
production systems can be used to calculate production functions. According to
Solomon and Wash [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] it is still unclear which features of an online
community characterize critical mass. One approximation they used was the activity
and number of users for calculating and predicting critical mass in traditional
WikiProjects. The authors argue that activity, for online production systems,
after certain amounts of time is the best indicator of a self-sustaining system.
In this work, we will adopt the same approach to characterizing critical mass
for Semantic MediaWikis. Having an accelerating production function for the
number of registered users and activity would indicate that users are interested
in the collective good (e.g., the WikiProject) but also contribute to it (measured
through activity). Achieving accelerating production functions for both of these
factors critically promotes achieving critical mass. Once accelerating functions
are reached, critical mass is likelier to follow, as interest (and pay-o ) increases
and user contributions rise, until the maximum potential of a system is reached.
        </p>
        <p>
          The analysis of Oliver and colleagues [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] also highlights that di erent
production functions can lead to very di erent outcomes in similar situations. For
example, given an accelerating production function, users who contribute to a
system are likely to nd their potential contribution \pro table", as each
subsequent contribution increases the value of their own contribution. Naturally,
this increases the incentive to make larger contributions to begin with. Given a
deceleration production function, users would not immediately see the bene t
of large contributions, given that each subsequent contribution is increasing the
overall value less, while more e ort, in the form of larger contributions, is needed
to turn a decelerating production function into an accelerating one.
        </p>
        <p>
          Raban et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] investigated factors that allow for a prediction of survival
rates for IRC channels and characterized the production function of these chat
channels as the best- tting function for the curve that is generated when plotting
the number of unique users versus the number of messages posted at certain
(ascending) points in time.
        </p>
        <p>
          Cheng and Bernstein [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] have analyzed concepts of activation thresholds,
which resemble features that, when achieved, can help to reach and sustain
selfsustainability. They created an online platform that allow groups to pitch ideas,
which only will be activated if enough people commit to it.
        </p>
        <p>
          Recently, Ribeiro [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] conducted an analysis of the daily number of active
users that visit speci c websites, tting a dynamic model that allows to predict if
a website has reached self-sustainability, de ned through the shape of the curve of
the daily number of active users over time. He uses two constants and , where
represents the constant rate of active members in uencing inactive members to
become active. describes the rate of an active member spontaneously becoming
inactive. Whenever 1 a website is unsustainable and without intervention
the daily number of active users will converge to zero. If &lt; 1 and the number
of daily active users is initially higher than the asymptotic one, a website is
categorized as self-sustaining.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Collaborative Ontology Engineering</title>
        <p>
          The Semantic Web community has developed a number of tools aimed at
supporting the collaborative development of ontologies. For example, Semantic
MediaWikis [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and some of its derivatives, such as OntoWiki and Moki [
          <xref ref-type="bibr" rid="ref1 ref4">1, 4</xref>
          ], add
semantic, ontology modeling and collaborative features to traditional MediaWiki
systems. In particular, OntoWiki represents a semantically enriched Wiki that
supports collaborative ontology engineering, focussing on the acquisition of
instance data and not the ontology or schema itself. MoKi is another collaborative
tool that is implemented as an extension of a MediaWiki, which has already
been deployed in a number of real world use cases.
        </p>
        <p>
          Gil et al. [
          <xref ref-type="bibr" rid="ref5 ref6">5,6</xref>
          ] empirically analyzed di erent aspects of 230 di erent instances
of Semantic MediaWikis, with a focus on the evolution of semantic features,
such as properties and concepts. Among other things, they found out that in
the investigated Semantic MediaWiki instances, categories were still much more
popular than concepts. However, structured properties were used by all Wikis
with a total of 50 instances exhibiting &gt; 100 de ned properties.
        </p>
        <p>
          Protege, and its extensions for collaborative development, such as WebProtege
[
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] and iCAT [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], are prominent stand-alone tools that are used by a large
community worldwide to develop ontologies in a variety of di erent projects.
        </p>
        <p>
          To learn more about the nature of the engineering processes that occur when
collaboratively developing an ontology, Poschko, Walk and colleagues [
          <xref ref-type="bibr" rid="ref12 ref19">12, 19</xref>
          ]
have created PragmatiX, a web-based tool to visualize and analyze a
collaboratively engineered ontology.
        </p>
        <p>
          Falconer et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] investigated the change-logs of collaborative
ontologyengineering projects, showing that users exhibit speci c roles, which can be
used to group and classify users, when contributing to the ontology. Walk et
al. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] applied Markov chains on the structured logs of changes of ve
collaborative ontology-engineering projects to extract sequential patterns. Pesquita and
Couto [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] analyzed if the location and speci c structural features can be used to
determine if and where the next change is going to occur in a large biomedical
ontology. Strohmaier et al. [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] investigated the hidden social dynamics when
collaboratively developing an ontology providing new metrics to quantify
various aspects to characterize collaborative engineering processes. Wang et al. [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]
used association-rule mining to analyze user editing patterns in collaborative
ontology-engineering projects.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Materials &amp;</title>
    </sec>
    <sec id="sec-4">
      <title>Methods</title>
      <p>
        We rst characterize activity and community growth of our collected
Semantic MediaWiki instances by applying principles of critical mass theory. We then
continue our analysis and investigate if activity and community growth are good
predictors for determining the number of changes and users of Semantic
MediaWikis at certain points in time. We comparing our results to what has been
uncovered by Solomon and Wash [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] for WikiProjects, investigating if the
number of users in the beginning stages of Semantic MediaWiki projects does play
an important role for predicting activity and community growth. To study these
e ects in Semantic MediaWiki communities, we have crawled a total of 79
Semantic MediaWiki instances, which were all publicly available at the time of
writing with the exception of three Wikis789 that have already been taken
ofine.
3.1
      </p>
      <sec id="sec-4-1">
        <title>Semantic MediaWiki Datasets</title>
        <p>The datasets used for the analyses in this paper are all randomly selected from
di erent domains and vary in multiple aspects. Due to limitations in space we
provide a summary of descriptive statistics for the entirety of our 79 Semantic
MediaWikis10 in Table 1. The number of users ranges from 1 to 85 users for
our crawled Semantic MediaWiki instances with a mean of 6:7 unique users
and a median of 2 users contributing to the di erent Wiki instances within the
rst month of its existence. Similar observations can be made for activity in
Semantic MediaWiki communities. Initially we started our analysis with a little
over 110 instances. However, due to restrictions necessary for our analyses we
had to remove all Wikis with an observable lifespan of &lt; 2 years, explaining the
7 http://artfriendsgroup.com
8 http://www.awaycity.com/wiki
9 http://enlloc.net/hkp/w
10 See http://www.simonwalk.at/wikis.html for a full list.</p>
        <sec id="sec-4-1-1">
          <title>Accumulated Activity over Time</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>Accumulated Number of Users over Time</title>
          <p>Wiki Instance</p>
          <p>Wiki Instance
0
0
)go 100
l
(
s
frseU 1000
o
r
e
ubNm 100
d
e
t
a
luum 10
c
c</p>
          <p>A
1</p>
          <p>1
0
100
200</p>
          <p>300
Weeks
400
500
0
100
200
400</p>
          <p>500
minimum duration of 113 weeks. After removing all instances that did not meet
the two year requirement we ended up with a total of 79 Semantic MediaWiki
communities to investigate.</p>
          <p>We have aggregated and accumulated activity and the number of users for
each week from the inception of each Semantic MediaWiki until the date of
the last observed change. The duration (observation period) of a Semantic
MediaWiki instance starts with the rst, and ends with the last change in our
datasets. Figure 1(a) depicts this accumulated activity per week for every
Semantic MediaWiki used in our analyses. Analogously, the accumulated number
of users per week for every Wiki instance in our dataset is shown in Figure 1(b).
The plots highlight the di erences in observation lengths (x-axes), intensity of
activity as well as number of users (y-axes, log-scale). Note that the number of
users refers to all users that have contributed at least a single change. Anonymous
users are represented by their ip address and are not ltered. These di erences
are also indicating that nding features that are suitable for tting a general
model to predict future information for Semantic MediaWiki communities is a
di cult task.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Critical Mass Theory</title>
        <p>We gathered the accumulated number of revisions and unique users after 1,
6, 12 and 24 months to determine the corresponding production functions for</p>
        <p>Accelerating Growth
60 80 100</p>
        <p>Weeks
(a) Accelerating
Production Function
ison 12
s
i
fveR 10
o
tgae 8
n
e
rce 6
dP
ltccuuaeAm 024 ●●● ●●●● ● ● ● ● ● ●
0</p>
        <p>garetien.de
● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ●●●●●●●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●</p>
        <p>
          Linear Growth
80 100
each Semantic MediaWiki. As depicted in Figure 2, we plotted the accumulated
number of users and activity versus elapsed weeks (one data point per week) and
tted a linear and squared function. As described in Solomon and Wash [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ],
if the squared function is not statistically signi cantly di erent from the linear
function, the production function was classi ed as linear. If the di erence is
signi cant, depending on the priors of the second coe cient, representing the
slope of the curve, we classi ed the production function as accelerating (positive
coe cient) or decelerating (negative coe cient).
To determine if and to what extent features of Semantic MediaWiki
communities are usable to determine the overall amount of activity and number of users
after two years, we t multiple regression models to the extracted activity and
user data. To avoid any bias from di ering overall timespans we use xed
timeintervals (1, 6 and 12 months) for extracting the input data for our regression
models. Thus, we collected the accumulated amount of activity and users per
week for each Semantic MediaWiki instance after 1, 6, 12 months to predict
activity and the number of users after 24 months. Given that the extracted
activity and number of users data from our 79 Semantic MediaWiki instances is
over-dispersed, meaning that the variances are greater than the means (see
Table 1), and the distribution of our extracted Semantic MediaWiki values resemble
a negative binomial distribution, we can not use a standard logistic regression
approach. Instead, we apply Negative Binomial Regression, which is used with
count data that can not be smaller than 0 and follows a negative binomial
distribution, on our datasets.
        </p>
        <p>For each dependent variable, we are going to t three negative binomial
regression models, each using input data (activity and number of user) from
● Linear</p>
        <p>Accelerating
Decelerating
●
End</p>
        <sec id="sec-4-2-1">
          <title>Evolution of Activity−Based Production Functions</title>
        </sec>
        <sec id="sec-4-2-2">
          <title>Evolution of User−Based Production Functions</title>
          <p>inception up to di erent points in time (1 , 6 and 12 months). The data points are
collected every week and represent the independent regression model variables
with their corresponding interaction terms. The dependent variables that we
want to predict are (i) the accumulated number of changes after two years and (ii)
the accumulated number of users after two years for each Semantic MediaWiki
instance.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>In this section we present the results of the di erent analyses described in
Section 3.</p>
      <sec id="sec-5-1">
        <title>4.1 Critical Mass Theory Results</title>
        <p>The number of Wikis classi ed according to the corresponding production
functions can be seen in Figure 3. The further a Wiki progresses, the less likelier it will
be classi ed with a linear growth function, both for user diversity and activity,
evident in the decreasing linear lines in Figure 3. For the investigated Semantic
MediaWikis the (signi cant) production functions for activity and number of
users exhibit a Pearson correlation coe cient of 0:75, indicating that the two
production functions are correlated for each Wiki. This observation is also
evident in Figure 4, showing that the majority of Wikis, after two years, exhibit
either negative (lower left quadrant) or positive (upper right quadrant) user and
activity growth. As can be seen in the histograms, the values of the growth
coe cients are equally scattered around positive and negative values. The larger
the growth coe cient, the steeper the slope of the resulting production
function. We calculated a Pearson correlation coe cient between (signi cant) user
and activity-based growth coe cients of 0:75, indicating that critical mass for
Semantic MediaWiki communities is constituted by an immanent correlation of
the number of users and activity.</p>
        <p>The median R2 values for the tted functions of activity and number of users
at the di erent points in time range from 0:83 and 0:78 for the activity and
userbased production functions in the rst month to a R2 of 0:95 for both after 2
years. These observed R2 values represent a (rather) good t, which also becomes
more evident when looking at the sample ts in Figure 2 and the median R2
values with input data from inception until year one.</p>
        <p>To further characterize our investigated Semantic MediaWiki instances we
have plotted the user diversity and activity growth coe cients extracted from
iton LINEAR
c
n
u
F
n
o
tcDECELERATING
i
u
d
o
r
P
m
roACCELERATING
F
0
0
1
0
5
y
c
n
e
u
q
e
r
F
0
5
1
0
5
ILTAEEANRCCG ILTAEEENRCDG
To Production Function</p>
        <p>R
A
E
N
I
L</p>
        <p>ILTAEEANRCCG ILTAEEERCNDG
To Production Function
the previously tted production function models, using the accumulated
number of users and changes from inception until the second year, for each Wiki
individually. Figure 4 allows us to plot the di erent growth coe cients for all 79
Semantic MediaWiki instances, including information about the \intensity" of
the observed slopes. Circles represent Semantic MediaWiki instances where both
production functions were signi cantly di erent from a linear function. Triangles
depict Semantic MediaWiki instances with signi cant activity-based production
functions and linear user-based production functions. The crosses follow
analogously to the triangles. This means that circles in the top right quadrant are
Semantic MediaWiki communities that have an accelerating activity and user
diversity production function. We can also see that Semantic MediaWikis have
a tendency to exhibit the same production function for activity and user
diversity, evident in the number of circles in the upper right and lower left quadrant
of Figure 4. To strengthen our observation we calculated a Pearson correlation
coe cient of 0:75 for the di erent (signi cant) growth coe cient distributions.
Thus, critical mass might be a mixture of the number of users and activity. We
have trained a ( rst-order) Markov chain model, using the chronologically
ordered sequences of extracted production functions after 1, 6, 12 and 24 months as
input, to analyze whether Semantic MediaWiki communities frequently switch
between production functions. For the user-based transition matrix (Figure 5(b))
accelerating and decelerating production functions tend to stay accelerating and
decelerating. Linear production functions have a higher tendency to either switch
to accelerating or stay linear, than become decelerating. The activity-based
production functions (Figure 5(a)) exhibit very strong tendencies to stay at the
same state (accelerating and decelerating). If a linear production function was
determined for a Wiki, it is similarly likely to continue to exhibit a linear
activity production function or switch to an accelerating production function, and
is most likely to switch to a decelerating production function. In general,
Semantic MediaWikis exhibit a high tendency to stick with their decelerating and
accelerating production functions.</p>
        <p>For managers of Semantic MediaWikis, this would mean that they would
have to monitor both production functions and take action if already one of
them is showing rst signs of deceleration.
4.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Factors that drive activity and user diversity</title>
        <p>Given the observations made with critical mass theory in Section 4.1 we tted 6
negative binomial regression models to predict the number of user and activity
after two years, using the gathered input data from 1, 6 and 12 months. This
method allows us to analyze if activity (and the number of user) after 2 years can
best be explained by activity and/or the number of users of preceding points in
time. The models are described in more detail in Tables 2 and 3. The goodness
of t for both models is described by the Akaike Information Criterion (AIC)
and allows for relative comparisons between the di erent models. The closer the
data that was used for tting the models is to the target prediction time of two
years, the better the model ts the data, evident in (minimally) decreasing AIC
values.</p>
        <p>When using negative binomial regression to predict the amount of activity
after two years in Semantic MediaWikis communities the models show
statistically signi cant e ects for activity in all three models (1, 6 and 12 months)
on the amount of activity after two years, when holding the number of users
constant. When using the model tted with data after 12 months to predict the
activity in a Semantic MediaWiki community (see Table 2) with 500 and 600
users, with an activity of 10; 000 changes, we would expect to have 12; 412 and
12; 342 changes after two years respectively. The tted model is clearly showing
that more users, in the case of our observed Semantic MediaWiki communities,
do not automatically mean an increase in activity after two years, which is in
contradiction to our intuition after looking at the growth coe cients from the
critical mass theory results.</p>
        <p>Analogously, when holding activity on a constant level and predicting the
number of unique users (or user diversity) after two years in Semantic
MediaWikis (see Table 3), the amount of users already present after 1, 6 and 12
months is showing statistically signi cant e ects on the number of users after
two years. After 12 months we can determine statistical signi cance for activity
and the (negative) interaction term as well. Similarly, when predicting the
number of users in our Semantic MediaWiki communities after two years, using the
tted model after 12 months with 10; 000 and 11; 000 performed changes and 50
users, we would expect to have 99 and 101 users after two years. In contrast to
the previous prediction we can observe the positive (and statistically signi cant
for p &lt; 0:05) in uence of activity on the number of users after 2 years.</p>
        <p>This actually means that, with a general model for Semantic MediaWiki
communities, activity after two years can be predicted by looking at the activity
after 1, 6 and 12 months. The number of users is not signi cant and, at least in
our tted model, has a negative impact on activity. This would mean (according</p>
        <p>AIC</p>
        <p>AIC
to our model) that administrators and managers of Semantic MediaWikis should
try to get as much content as possible, as soon as possible into their Wikis to
ensure later activity. Critical mass for activity at later stages in a Semantic
MediaWiki solely depends on activity in the beginning of a Wiki.</p>
        <p>To predict the number of user after 2 years, the number of users after 1,
6 and 12 months are a signi cant factor. From month 1 to month 12 we can
also observe a signi cance for the interaction term, which further increases in
signi cance until activity becomes signi cant for the prediction at month 12.
For increasing the number of users in a Semantic MediaWiki community, both,
the number of users and activity (after a year) have to exhibit a positive (and
signi cant) in uence.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5 Conclusions &amp; Future Work</title>
      <p>The main contribution of this work is the characterization of activity and number
of users using approaches of critical mass theory to gauge the viability of
Semantic MediaWiki communities. We have studied 79 Semantic MediaWiki projects
and their respective production functions over time. In addition, we have tted
negative binomial regression models to predict activity and the number of users
after two years. Our approach is not speci c to the projects under
investigation but can be applied to other (Semantic) MediaWiki projects or collaborative
online production systems at scale. In summary, we have found the following:</p>
      <p>Semantic MediaWikis exhibit a wide range of evolving production
functions: We have shown that the majority of observed Semantic MediaWikis
start o with linearly growing activity and numbers of users. This changes within
the rst 6 to 12 months, which also apparently marks the timeframe where
\something" determines if a Wiki will exhibit accelerating, decelerating or linear
production functions after two years. At this point we leave it up to future work
to further investigate, analyze and determine these in uential factors.</p>
      <p>Semantic MediaWikis su er decaying information system
lifecycles: The results obtained from the critical mass analysis, as well as the
prediction experiment suggest that Semantic MediaWikis are prone to su er from
the vicious circles of decaying information systems. Meaning that Semantic
MediaWiki instances that exhibit a decelerating production function (user and/or
activity-based) are very likely to keep this decelerating production function,
resulting in either less active users or lesser activity, which in turn triggers again
less activity or less active users.</p>
      <p>Successful Semantic MediaWiki communities start small: Our
analysis suggests that the more content is produced by as few users as early as
possible, the likelier it is for (our observed) Semantic MediaWikis to reach
critical mass and exhibit the highest amount of activity after two years. This also
means that the higher the number of users that contribute to a Wiki early on,
the lower the amount of activity after two years is going to be. Surprisingly, after
12 months, the amount of activity becomes (positively) signi cant for the total
number of users after 2 years. This indicates that after a certain amount of time
(12 months), to attract more users, high activity in a Semantic MediaWiki has
a positive e ect.</p>
      <p>One hypothesis to explain our observations could be that small groups around
structured data projects are usually much more focused and devoted, as they
need more background knowledge to contribute. However, this could imply that
they do not necessarily need to reach critical mass for the number of users,
but rather only in terms of activity, as their interest in creating a structured
knowledge base already outweighs the e orts of contributing.</p>
      <p>Summarizing, we believe that the work presented in this paper represents
an important rst step towards a better understanding of the factors that drive
Semantic MediaWiki communities and their evolution. While our analysis has
been initially performed on 79 Semantic MediaWikis and has been limited to
user growth and activity, our method can be applied on a wider scale. Future
work might focus on investigating additional instances, semantic properties, the
evolution of the underlying knowledge base, di erent kinds of communities and
types of Semantic MediaWikis with di erent motivations and interests,
structural properties or additional dimensions of activity, such as passive usage logs
(where visits are studied in addition to edits) or di erent kinds of activities and
speci c non-trivial phenomena, such as \edit wars", as well as other log data to
expand our understanding of social and community dynamics in such systems.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietzold</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Riechert</surname>
          </string-name>
          .
          <article-title>OntoWiki{A Tool for Social, Semantic Collaboration</article-title>
          .
          <source>In Proceedings of the 5th International Semantic Web Conference (ISWC</source>
          <year>2006</year>
          ),
          <source>volume LNCS 4273</source>
          , Athens, GA,
          <year>2006</year>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Justin Cheng and Michael S Bernstein. Catalyst:
          <article-title>Triggering collective action with thresholds</article-title>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Sean</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Falconer</surname>
          </string-name>
          , Tania Tudorache, and Natalya Fridman Noy.
          <article-title>An analysis of collaborative patterns in large-scale ontology development projects</article-title>
          .
          <source>In Mark A. Musen and scar Corcho</source>
          , editors,
          <source>K-CAP</source>
          , pages
          <volume>25</volume>
          {
          <fpage>32</fpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Chiara</given-names>
            <surname>Ghidini</surname>
          </string-name>
          , Barbara Kump, Stefanie Lindstaedt, Nahid Mahbub, Viktoria Pammer, Marco Rospocher, and
          <article-title>Luciano Sera ni</article-title>
          .
          <article-title>MoKi: The Enterprise Modelling Wiki</article-title>
          . In Lora Aroyo, Paolo Traverso, Fabio Ciravegna, Philipp Cimiano, Tom Heath, Eero Hyvonen, Riichiro Mizoguchi, Eyal Oren,
          <string-name>
            <given-names>Marta</given-names>
            <surname>Sabou</surname>
          </string-name>
          , and Elena Paslaru Bontas Simperl, editors,
          <source>Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications</source>
          <year>2009</year>
          , pages
          <fpage>831</fpage>
          {
          <fpage>835</fpage>
          , Berlin, Heidelberg,
          <year>2009</year>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Yolanda</given-names>
            <surname>Gil</surname>
          </string-name>
          , Angela Knight, Kevin Zhang, Larry Zhang, and
          <string-name>
            <surname>Ricky</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Sethi</surname>
          </string-name>
          .
          <article-title>An initial analysis of semantic wikis</article-title>
          . In Jihie Kim,
          <article-title>Je rey Nichols, and Pedro A</article-title>
          . Szekely, editors,
          <source>IUI Companion</source>
          , pages
          <volume>109</volume>
          {
          <fpage>110</fpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Yolanda</given-names>
            <surname>Gil</surname>
          </string-name>
          and
          <string-name>
            <given-names>Varun</given-names>
            <surname>Ratnakar</surname>
          </string-name>
          .
          <article-title>Knowledge capture in the wild: a perspective from semantic wiki communities</article-title>
          . In V. Richard Benjamins, Mathieu d'Aquin, and Andrew Gordon, editors,
          <source>K-CAP</source>
          , pages
          <volume>49</volume>
          {
          <fpage>56</fpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Markus</given-names>
            <surname>Kro</surname>
          </string-name>
          <article-title>tzsch, Denny Vrandecic, and Max Volkel. Semantic MediaWiki</article-title>
          .
          <source>In Proceedings of the 5th International Semantic Web Conference 2006 (ISWC</source>
          <year>2006</year>
          ), pages
          <fpage>935</fpage>
          {
          <fpage>942</fpage>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Gerald</given-names>
            <surname>Marwell</surname>
          </string-name>
          , Pamela E Oliver, and
          <string-name>
            <given-names>Ralph</given-names>
            <surname>Prahl</surname>
          </string-name>
          .
          <article-title>Social networks and collective action: A theory of the critical mass, ill</article-title>
          .
          <source>American Journal of Sociology</source>
          ,
          <volume>94</volume>
          (
          <issue>3</issue>
          ):
          <volume>502</volume>
          {
          <fpage>534</fpage>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Pamela</given-names>
            <surname>Oliver</surname>
          </string-name>
          , Gerald Marwell, and
          <string-name>
            <given-names>Ruy</given-names>
            <surname>Teixeira</surname>
          </string-name>
          .
          <article-title>A theory of the critical mass. i. interdependence, group heterogeneity, and the production of collective action</article-title>
          .
          <source>American journal of Sociology</source>
          , pages
          <volume>522</volume>
          {
          <fpage>556</fpage>
          ,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Pamela</surname>
            <given-names>E</given-names>
          </string-name>
          <string-name>
            <surname>Oliver</surname>
            and
            <given-names>Gerald</given-names>
          </string-name>
          <string-name>
            <surname>Marwell</surname>
          </string-name>
          .
          <article-title>The paradox of group size in collective action: A theory of the critical mass</article-title>
          . ii. American Sociological Review, pages
          <fpage>1</fpage>
          <issue>{8</issue>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Catia</given-names>
            <surname>Pesquita</surname>
          </string-name>
          and Francisco M.
          <article-title>Couto. Predicting the extension of biomedical ontologies</article-title>
          .
          <source>PLoS Comput Biol</source>
          ,
          <volume>8</volume>
          (
          <issue>9</issue>
          ):e1002630,
          <year>09 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. Jan Poschko, Markus Strohmaier, Tania Tudorache, and
          <string-name>
            <given-names>Mark A.</given-names>
            <surname>Musen</surname>
          </string-name>
          .
          <article-title>Pragmatic analysis of crowd-based knowledge production systems with iCAT Analytics: Visualizing changes to the ICD-11 ontology</article-title>
          .
          <source>In Proceedings of the AAAI Spring Symposium 2012: Wisdom of the Crowd</source>
          ,
          <year>2012</year>
          .
          <article-title>Accepted for publication</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Daphne R. Raban</surname>
            , Mihai Moldovan, and
            <given-names>Quentin</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>An empirical study of critical mass and online community survival</article-title>
          .
          <source>In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, CSCW '10</source>
          , pages
          <fpage>71</fpage>
          {
          <fpage>80</fpage>
          , New York, NY, USA,
          <year>2010</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Bruno</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          .
          <article-title>Modeling and predicting the growth and death of membershipbased websites</article-title>
          .
          <source>In Proceedings of the 23rd International Conference on World Wide Web, WWW '14</source>
          , pages
          <fpage>653</fpage>
          {
          <fpage>664</fpage>
          , Republic and Canton of Geneva, Switzerland,
          <year>2014</year>
          . International World Wide Web Conferences Steering Committee.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Solomon</surname>
          </string-name>
          and
          <string-name>
            <given-names>Rick</given-names>
            <surname>Wash</surname>
          </string-name>
          .
          <article-title>Critical mass of what? exploring community growth in wikiprojects</article-title>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Markus</surname>
            <given-names>Strohmaier</given-names>
          </string-name>
          , Simon Walk, Jan Poschko, Daniel Lamprecht, Tania Tudorache, Csongor Nyulas,
          <string-name>
            <given-names>Mark A.</given-names>
            <surname>Musen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Natalya F.</given-names>
            <surname>Noy</surname>
          </string-name>
          .
          <article-title>How ontologies are made: Studying the hidden social dynamics behind collaborative ontology engineering projects</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          ,
          <volume>20</volume>
          (
          <issue>0</issue>
          ),
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>T.</given-names>
            <surname>Tudorache</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Falconer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. I.</given-names>
            <surname>Nyulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Noy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Musen</surname>
          </string-name>
          .
          <article-title>Will Semantic Web technologies work for the development of ICD-11?</article-title>
          <source>In Proceedings of the 9th International Semantic Web Conference (ISWC</source>
          <year>2010</year>
          ),
          <source>ISWC (In-Use)</source>
          , Shanghai, China,
          <year>2010</year>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Tania</surname>
            <given-names>Tudorache</given-names>
          </string-name>
          , Csongor Nyulas,
          <string-name>
            <given-names>Natalya F.</given-names>
            <surname>Noy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Mark A.</given-names>
            <surname>Musen. WebProtege: A Distributed Ontology</surname>
          </string-name>
          <article-title>Editor and Knowledge Acquisition Tool for the Web</article-title>
          .
          <source>Semantic Web Journal</source>
          ,
          <volume>4</volume>
          (
          <issue>1</issue>
          /
          <year>2013</year>
          ):
          <volume>89</volume>
          {
          <fpage>99</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Simon</surname>
            <given-names>Walk</given-names>
          </string-name>
          , Jan Poschko, Markus Strohmaier, Keith Andrews, Tania Tudorache, Csongor Nyulas,
          <string-name>
            <given-names>Mark A.</given-names>
            <surname>Musen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Natalya F.</given-names>
            <surname>Noy</surname>
          </string-name>
          .
          <article-title>PragmatiX: An Interactive Tool for Visualizing the Creation Process Behind Collaboratively Engineered Ontologies</article-title>
          .
          <source>International Journal on Semantic Web and Information Systems</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Simon</surname>
            <given-names>Walk</given-names>
          </string-name>
          , Philipp Singer, Markus Strohmaier, Tania Tudorache,
          <article-title>Mark A Musen,</article-title>
          and Natalya F Noy.
          <article-title>Discovering Beaten Paths in Collaborative OntologyEngineering Projects using Markov Chains</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          ,
          <year>January 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Hao</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Tania Tudorache, Dejing Dou, Natalya F Noy,
          <article-title>and Mark A Musen. Analysis of user editing patterns in ontology development projects</article-title>
          .
          <source>In On the Move to Meaningful Internet Systems: OTM 2013 Conferences</source>
          , pages
          <volume>470</volume>
          {
          <fpage>487</fpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>