<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Variable Attention and Variable Noise: Forecasting User Activity</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cesar Ojeda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kostadin Cvejoski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafet Sifa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Bauckhage</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer IAIS</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The study of collective attention is of growing interest in an age where mass- and social media generate massive amounts of often short lived information. That is, the problem of understanding how particular ideas, news items, or memes grow and decline in popularity has become a central problem of the information age. Recent research e orts in this regard have mainly addressed methods and models which quantify the success of such memes and track their behavior over time. Surprisingly, however, the aggregate behavior of users over various news and social media platforms where this content originates has large been ignored even though the success of memes and messages is linked to the way users interact with web platforms. In this paper, we therefore present a novel framework that allows for studying the shifts of attention of whole populations related to websites or blogs. The framework is an extension of the Gaussian process methodology, where we incorporate regularization methods that improve prediction and model input dependent noise. We provide comparisons with traditional Gaussian process and show improved results. Our study in a real world data set, uncovers hidden patterns of user behavior.</p>
      </abstract>
      <kwd-group>
        <kwd>Gaussian process</kwd>
        <kwd>Regularization Methods</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Over the last couple of years, so called \question answering" (QA) sites have
gained considerable popularity. These are internet platforms where users pose
questions to a general population. Yahoo Answers, Quora and the Stack
Exchange family establish internet communities which provide natural and seamless
ways for organizing and providing knowledge [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. So far, dynamical aspects of
such questions answering sites have been studied in di erent contexts. Previous
work in this area includes studying causality aspects through quasi experimental
designs [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], user churn analysis through classi cation algorithms such as support
vector machines or random forests [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and predictions of the future value of
questions answers pairs according to the initial activity of the question post [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
In contrast to previous work where long term activity of users is being predicted,
our focus in this paper is time series analysis related to user-de ned tags. This
approach allows detailed daily analysis of the behavior of users and we
concentrate on the QA site Stackover ow. This platform has an established reputation
on the web and boasts a community of over 5 million distinct active users who,
so far, have provided more 18 million answers to more than 11 million questions.
Thanks to the shear size of the corresponding data set as well as because of the
regular activity of the user base, we are able to mine temporal data in order to
uncover de ning aspects of the dynamics of the user behavior.
      </p>
      <p>
        Due to the complexity of user-system interaction (millions of people discuss
thousands of topics), exible and accurate models are required in order to
guarantee reliable forecasting. In recent years the Bayesian setting and the Gaussian
Process (GP) framework [
        <xref ref-type="bibr" rid="ref11 ref5">11, 5</xref>
        ] has shown to provide an accurate and exible
tool for time series analysis. In particular, the possibility of incorporating error
ranges as well as di erent models with the selection of di erent kernels, permits
interpretability of the results. In this work, we model changes in attention as
a variability in the uctuation of the time series of occurrences of user de ned
tags which can be categorized as a special case of heterocedasticity or input
dependent noise. We provide an extension of sparse input Gaussian Processes [
        <xref ref-type="bibr" rid="ref14 ref15">15,
14</xref>
        ] which allow us to model functional dependence in the time variation of the
uctuations. In practical experiments, we study the top 10 di erent tags for the
Stackover ow data set over di erent years, spanning a data set of over 2.9 million
questions. We nd that our model outperform predictions made by the simple
GP model under variable noise. In particular, we uncover weekly and seasonal
periodicity patterns as well as random behavior in monthly trends. All in all, we
are able to forecast the number of questions within a 5 percent error 20 days in
the future.
      </p>
      <p>In the next section, we formally introduce the Gaussian Process framework and
provide details regarding our extensions towards variable noise models. We then
show an analysis of the periodicity of the time series of tag activity as
apparent from the Stackover ow data set. Next, we compare our prediction results
with those of other models and discuss the advantages of introducing functional
dependencies on noise terms. Finally, we provide conclusions and directions of
future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>A Model for Time Series Analysis</title>
      <p>
        In this section, we propose a Gaussian process (GP) model for regression that
extends the sparse pseudo-input Gaussian process (SPGP) for regression [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
Our model deals with the problem of over tting that hampers the SPGP model
and makes it possible to analyze the function of the uncertainty added to
every pseudo-input. Analyzing the uncertainty function, we indirectly analyze the
e ects of heteroscedastic noise.
      </p>
      <p>
        A GP is a Bayesian model that is commonly used for regression tasks [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The
main advantages of this method are its non-parametric nature, the possibility
of interpreting the model through exible kernel selection, and the con dence
intervals (error bars) obtained for every prediction. The non-parametric nature
of this method has a drawback, though. The computational cost of the training
is O N 3 , where N is the number of training points. There are many sparse
approximation methods of the full GP that try to lower the computational cost
of the training to O M 2N where M is the size of the subset of the training
points that are used for approximation (i.e. the active set) and typically M N
[
        <xref ref-type="bibr" rid="ref12 ref13">13, 12</xref>
        ]. The M points for the approximation are chosen according to various
information criteria. This leads to di culties w.r.t. learning the kernel
hyperparameters by maximizing the marginal likelihood of the GP using gradient
ascent. The re-selection of the active set causes non-smooth uctuations of the
gradients of the marginal likelihood, which results likely convergence to
suboptimal local maxima [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
2.1
      </p>
      <sec id="sec-2-1">
        <title>Gaussian Process for Regression</title>
        <p>
          Next, we rst brie y review the GP model for regression, yet, for a detailed
discussion we refer to [
          <xref ref-type="bibr" rid="ref10 ref11">11, 10</xref>
          ].
        </p>
        <p>N
Consider a data set D of size N containing pairs of input vectors X = fxngn=1</p>
        <p>N
and real value target points y = fyngn=1. In order to apply the GP model to
regression problems, we need to account for noise in the available target values,
which are thus expressed as</p>
        <p>yn = fn + n
where fn = f (xn) and n is a random noise variable which is independently
chosen for each n. We shall consider a noise process following a Gaussian
distribution de ned as
p (y j f ) = N
y j f ; 2I
where N (y j m; C) is a Gaussian distribution with mean m and covariance C.
The marginal distribution of p (f ) is then given by another Gaussian distribution,
namely p (f j X; ) = N (f j 0; KN ). The covariance function that determines
KN is chosen to express the property that, if points xn and xm are similar, the
value [KN ]nm should express this similarity. Usually, this property of the
covariance function is controlled by small number of hyperparameters . Integrating
out over f , we obtain the marginal likelihood as</p>
        <p>Z
p (y j X; ) =</p>
        <p>
          p (y j f ) p (f j X; )
where [kx]n = K (xn; x) and Kxx = K (x; x). In order to predict with GP model,
we need to have all the training data available during run-time, which is why
the GP for regression is referred to as a non-parametric model.
An approximation of the full GP model for regression is presented in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] in which
the authors propose the sparse pseudo-input Gaussian process (SPGP) regression
model that enables a search for the kernel hyper-parameters and the active set
in a single joint optimization process. This is possible because it is allowed for
the active set (pseudo-inputs M ) to take any position in the data space, not
only to be a subset of the training data. Parameterizing the covariance function
of the GP by the pseudo-inputs, gives the possibility for learning the
pseudoinputs using gradient ascent. This is a major advantage, because it improves the
model t by ne tuning the locations of the pseudo-inputs. Let, X = fxmgmM=1
be the pseudo-inputs and f = ff mgmM=1 are the pseudo targets, the predictive
distribution of the model for a new input x will be given by
        </p>
        <p>Z
p y j x ; D; X
=</p>
        <p>p y j x ; X; f p f j D; X df
where KN is the covariance matrix of the training data, KM is the covariance
matrix of the pseudo inputs, 2 is the noise, Q is de ned as
and
is de ned as</p>
        <p>Q = KMN
+ 2
1</p>
        <p>KNM
= diag KN</p>
        <p>KNM KM1KMN :
Finding the pseudo input locations X and the hyperparameters (kernel
parameters and noise) = f ; 2g can be done by maximizing the marginal likelihood
(8) with respect to the parameters fX; g.</p>
        <p>p y j X; X;</p>
        <p>
          Z
=
One positive e ect of the sparsity of the SPGP model is the capability of
learning data sets that have variable noise where the term variable noise refers to
noise which depends on the input. However, it is important to note, that this
capability is limited and an improvement of the SPGP model is presented in
(5)
(6)
(7)
(8)
[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Introducing an additional uncertainty hm parameter to every pseudo-input
point makes the model more exible and allows for improved representations of
heteroscedastic data sets. The covariance matrix of the pseudo-inputs is de ned
by
        </p>
        <p>
          KM ! KM + diag (h) ;
(9)
where h is a positive vector of uncertainties that needs to be learned and diag (h)
represents a diagonal matrix whose elements are those of the h vector. This
extension allows the possibility of gradual in uence on the pseudo inputs. This
means that if uncertainty hm = 0, then the pseudo input m behaves like in the
standard SPGP. Yet, as hm grows the particular pseudo input has less in uence
on the predictive distribution. This possibility of partially turning o the pseudo
inputs allows a larger noise variance in the prediction. The authors of [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] refer
to this as heteroscedastic (input dependable noise) extension SPGP+HS.
2.3
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>SPGP+FUNC-HS</title>
        <p>Introducing the heteroscedastic extension to the SPGP empowers the model
to learn from data sets with varying noise. However, making the model this
exible may cause problems of over tting. Also, using the SPGP+HS to predict
user and website activities, does not allow us to interpret the behavior of the
noise because noise is represented as a positive vector h of uncertainties and
attempts of interpreting these values do not yield meaningful information about
the behavior of the noise.</p>
        <p>One way of solving the problems of over tting and lack of interpretability will
be to put a prior distribution over the vector h of uncertainties. However, taking
this approach leads to computationally intractable integrals.</p>
        <p>The solution which we propose for these problems is to make use of an
uncertainty function that depends on the pseudo-inputs. Our covariance function of
the pseudo-inputs is de ned as</p>
        <p>KM ! KM + diag (fh (xm)) ;
(10)
where fh is the uncertainty function and xm is a pseudo-input. By de ning the
heteroscedastic extension in this way, it is possible for the parameters of the
uncertainty function to be learned by the gradient based maximum likelihood
approaches. Hence, later on, we are able to interpret the parameters of the
heteroscedastic noise function as parameters that govern the noise in the model.
Another advantage of having a heteroscedastic function is that it restricts the
parameter search space when learning the model. This restriction can be
benecial when learning the model, because, it removes unnecessary local maxima.
This results in much faster convergence when learning the model and also in
improved chances of reducing over tting. In the following, we will refer to our
new heteroscedastic function model as SPGP+FUNC-HS.</p>
        <p>For modeling the Stackover ow data set, we introduce two heteroscedastic noise
functions. In general, we may use any function that can describe the noise of the
given data set. The rst heteroscedastic noise function which we consider is the
simple sine function de ned by</p>
        <p>fh (xm) = a sin (2 !xm + ') ,
where a is the amplitude, ! is the frequency and ' is the phase. We refer to
this model as SPGP+SIN-HS. The second heteroscedastic noise function we
investigate is a product of the sine function and an RBF kernel, namely
fh (xm; hm) = c2e (xm2lh2m)2 sin (2 !xm + '),
(11)
(12)
where c is the variance, hm is a mean associated with every pseudo-input xm in
the RBF kernel, and l is the length scale of the RBF kernel. The mean in the RBF
kernel can be initialized at random or set by the user if the user has corresponding
prior knowledge. Setting a mean for every pseudo-input point divides the whole
input space into regions where, in each region, we have a function governing
the uncertainty associated with every pseudo input. The uncertainty function
de ned like this then behaves like mixture of experts and we refer to this model
as SPGP+RBFSIN-HS model.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>1010
]z109
H
/
**2108
V
[D107
S
P106
105</p>
      <p>Time P1e01riods Log(Days)
102</p>
      <p>In the previous section, we presented the Gaussian process method and two
extensions of this method, the SPGP+HS and the SPGP+FUNC-HS. In this
section, we present results we obtained when using these models on our
Stackover ow data set.</p>
      <p>In order to test our models, we used publicly available data-dumps of
Stackoverow1. The data set contains the number of questions and answers of postings
1 Downloadable URL: www.archive.org/details/stackexchange
classi ed by tag for every business day. The models are trained on a data set
containing information about daily postings in the time between 01.02.2014 to
31.08.2014. The evaluation of the models is done on a test set containing postings
for the rst 21 working days in September 2014.</p>
      <p>
        The performance of the presented models depends on the choice of the kernels
used for the covariance matrix. When working with GPs, an additional analysis
is required to select proper kernels for the covariance matrix. Because we work
with a data set that re ects user behavior, we supposed that it may show a form
of periodicity in the behavior of the users. Accordingly, we performed spectral
density estimation analysis [
        <xref ref-type="bibr" rid="ref4 ref6 ref7">6, 4, 7</xref>
        ] of the time series using a periodogram analysis
[
        <xref ref-type="bibr" rid="ref4 ref6 ref7">6, 4, 7</xref>
        ]. This analysis shows the power (amplitude) of the time series as a function
of frequency and in this way we are able to verify if there are indeed periodicities
and, if so, at what frequency they occur.
      </p>
      <p>A periodogram of the time series data that we are analyzing is shown in Figure
1. Since all our tag related time series tag have almost the same periodogram, we
only show one of them. For better interpretability we converted the frequencies
into periods to observe in how many days the periods occur. There are two
apparent peaks, the rst occurring at two and a half days and the second at ve
days. In this case the period of ve days appears as an echo of the two and a
half days period, therefore we dismiss the second period and we only take into
account the rst period. Additional characteristics of this data set are minor
irregularities and a long term rising trend in the overall time series.
Given these observations, our models that show the best performance as a
covariance function use a sum of four kernels
k (x; x0) = k1 (x; x0) + k2 (x; x0) + k3 (x; x0) + k4 (x; x0) :
(13)
The question of how to choose these kernels and the particular role of each kernel
in the learned model will be discussed in the next section.</p>
      <p>
        Next, we present the result achieved for the top ten tags according to the number
of posted questions and answers in the 2014 Stackover ow data set. Table 1
presents the results of the di erent models of the posted questions time series and
Table 2 presents the results of the di erent models for the posted answers time
series. In order to compare the prediction models, we considered the following
measures:
{ Mean Square Error (MSE) to accounts for the accuracy of the prediction
of an unseen data point
{ Negative Log Predictive Distribution (NLPD) to obtain a con dence
for the predicted values on an unseen data point
{ Negative Log Marginal Likelihood (NLML) to account for how well
the model ts the training data.
For the MSE and the NLPD measures, smaller values are better, and for the
NLML larger values are better. The best model for each tag has been chosen
using the Akaike information criterion (AIC) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We observe that models with
functional noise perform better in nine of the ten tags in the answer time series,
and eight of the ten tags in the question time series. The superior performance
of the SPGP+FUNC-HS over the full GP can be attributed to the fact that the
data set contains variable noise. Note that for this data set, SPGP+FUNC-HS
performs better, because of the sparsity of the model and the additional
functional noise that is added to the pseudo-inputs. SPGP+HS performs worse than
the best models because, adding only a positive vector of uncertainty increases
1100
1000
ya900
d
r
sep800
iit
s
fvo700
#600
500
500
450
400
y
rad350
e
tsp300
ii
s
fv250
o
#200
150
the exibility of the covariance function, which at the end can lead to over
tting and convergence to bad local maxima. Using a functional noise constraint,
the optimization space shrinks and implicitly prunes bad local maximas. The
drawback is that the function of the noise should follow the distribution of the
noise in the data set, otherwise the model will perform poorly. This is probably
the case why the SPGP+FUNC-HS performs worse on one tag for the answers
and on two tags for the questions.
      </p>
      <p>In Fig. 2, we present two learned models, one for the tag \Java" (Fig. 2a) and
one for the tag \iOS" (Fig. 2b). We observe that the model that models the Java
tag strives to predict the test point using the mean. In contrast, the model that
models the iOS tag predicts the test points in terms of noise.</p>
      <p>training data
test data
mean
The di erent kernels in Eq. (13) allow us to dissect the dynamical behavior of
the population w.r.t. di erent scales and patterns. In order to portrait these
behaviors, we calculated the mean function and variance Eq. (5) by generating
vector k&gt; using independent kernels. We present the values of each kernel in the
\android" question data set in Fig. 3
{ Mean trends (Fig. 3a) characterize the behavior of the population of users
over scales measured in months and represent the global mean behavior of the
population. We hypothesize that they are driven by the shear size of the user
base. The more people interested in the tag are visiting the site, the higher
the average number of questions per month. Further, this overall trend might
represent the changes in the dominance of this particular tag of questions in
the data set. Because the tag refers to a programming language, trends like
this indicate changes in attention to various languages. Such dynamics are
modeled using the rational quadratic kernel
In this paper, we addressed the problem of forecasting the daily posting
behavior of users of the Stackover ow question answering web platform. In order to
accomplish this task, we extended the variable noise pseudo inputs Gaussian
training data
mean
ya 100
repd 50
iitss 0
v
fo# 50
100</p>
      <p>Feb Mar</p>
      <p>Process framework by introducing a functional noise variant. The idea of using
functional descriptions of noise allowed us to study periodic patterns in collective
attention shifts and was found to act as a regularizer in model training.
Our extended Gaussian Process framework with functional representations of
various kinds of noise provides the added advantage of increased interpretability
of results as the di erent kernels de ned for this purpose can uncover di erent
kinds of dynamics. In particular, our kernels revealed major distinct
characteristics of the question answering behavior of users. First of all, there are major
trends on time scales of about six months showing growing and declining
interest in particular topics or corresponding tags. Second of all, these major trends
are perturbed by seasonal behavior, for example overall activities usually drop
during the summer season. Third of all, on a ne grained scale, there are weekly
patterns characterized by periods of 2.5 days. Fourth of all, there are noisy
uctuations in activities on daily scales.</p>
      <p>Given the models and results presented in this paper, there various directions
for future work. First and foremost, we are currently working on implementing
a distributed Gaussian Process framework in order to extend our approach
towards massive amounts of behavioral data (use of tags, comments, and likes)
that can be retrieved from similar social media platforms such as Twitter or
Facebook.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Adamic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , E. Bakshy, and
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Ackerman</surname>
          </string-name>
          .
          <article-title>Knowledge Sharing and Yahoo Answers: Everyone Knows Something</article-title>
          .
          <source>In Proc. of ACM WWW</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>A. Anderson</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Huttenlocher</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kleinberg</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Leskovec</surname>
          </string-name>
          .
          <article-title>Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Over ow</article-title>
          .
          <source>In Proc. of ACM KDD</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Bishop</surname>
          </string-name>
          .
          <source>Pattern Recognition and Machine Learning</source>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          .
          <source>Time Series Analysis</source>
          . Princeton University Press,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>K.</given-names>
            <surname>Kersting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Plagemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pfa</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Burgard</surname>
          </string-name>
          .
          <article-title>Most likely heteroscedastic gaussian process regression</article-title>
          .
          <source>In Proceedings of the 24th international conference on Machine learning</source>
          , pages
          <volume>393</volume>
          {
          <fpage>400</fpage>
          . ACM,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Manolakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. K.</given-names>
            <surname>Ingle</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Kogon</surname>
          </string-name>
          .
          <source>Statistical and Adaptive Signal Processing: Spectral Estimation</source>
          , Signal Modeling, Adaptive Filtering, and
          <string-name>
            <given-names>Array</given-names>
            <surname>Processing</surname>
          </string-name>
          .
          <source>Artech House Norwood</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>C.</given-names>
            <surname>Ojeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sifa</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bauckhage</surname>
          </string-name>
          .
          <article-title>Investigating and Forecasting User Activities in Newsblogs: A Study of Seasonality, Volatility</article-title>
          and
          <string-name>
            <given-names>Attention</given-names>
            <surname>Burst</surname>
          </string-name>
          . Work On Progress,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>H.</given-names>
            <surname>Oktay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. D.</given-names>
            <surname>Jensen</surname>
          </string-name>
          .
          <article-title>Causal Discovery in Social Media Using Quasi-experimental Designs</article-title>
          .
          <source>In Proc. of ACM Workshop on Social Media Analytics</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Pudipeddi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Akoglu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Tong</surname>
          </string-name>
          .
          <article-title>User Churn in Focused Question Answering Sites: Characterizations and Prediction</article-title>
          .
          <source>In Proc. of ACM WWW</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Rasmussen</surname>
          </string-name>
          .
          <article-title>Evaluation of Gaussian Processes and Other Methods for Nonlinear Regression</article-title>
          .
          <source>PhD thesis</source>
          , University of Toronto,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Rasmussen</surname>
          </string-name>
          and
          <string-name>
            <given-names>C. K. I.</given-names>
            <surname>Williams</surname>
          </string-name>
          .
          <article-title>Gaussian Processes for Machine Learning</article-title>
          . MIT Press,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>M.</given-names>
            <surname>Seeger</surname>
          </string-name>
          .
          <article-title>Pac-bayesian Generalisation Error Bounds for Gaussian Process Classi cation</article-title>
          .
          <source>J. Mach. Learn. Res.</source>
          ,
          <volume>3</volume>
          :
          <fpage>233</fpage>
          {
          <fpage>269</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>M. Seeger</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Williams</surname>
            , and
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Lawrence</surname>
          </string-name>
          .
          <article-title>Fast Forward Selection to Speed Up Sparse Gaussian Process Regression</article-title>
          .
          <source>In Proc. of Workshop on Arti cial Intelligence and Statistics</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>E.</given-names>
            <surname>Snelson</surname>
          </string-name>
          and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ghahramani</surname>
          </string-name>
          .
          <article-title>Sparse Gaussian Processes Using Pseudo-inputs</article-title>
          .
          <source>In Proc. of NIPS</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>E.</given-names>
            <surname>Snelson</surname>
          </string-name>
          and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ghahramani</surname>
          </string-name>
          .
          <article-title>Variable Noise and Dimensionality Reduction for Sparse Gaussian Processes</article-title>
          .
          <source>arXiv preprint arXiv:1206.6873</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>