<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Personalized Affect Response Model for Online News Articles</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hokkaido University</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>RIKEN AIP</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>atarashi k</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>a.moriyama astg@complex.ist.hokudai.ac.jp</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>oyama</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>kuriharag@ist.hokudai.ac.jp</string-name>
        </contrib>
      </contrib-group>
      <fpage>5</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>With the spread of smartphones and the diversification of Web services, individuals, companies, and organizations can send and receive various types of information to and from anywhere. Although information is provided by many types of media, articles and posts are the most common type: news articles, blog posts, social media posts, etc. Such articles and posts can create unexpected emotional responses in readers and can occur flame war in the worst case. To avoid the risk of occurrence of flame war and to inform the recipient as intended, methods for personalized affect analysis have attracted attention. We present a model based on latent Dirichlet allocation for performing personalized affect analysis of news articles. Each article is assumed to have a distribution of topics, and each reader is assumed to have latent features that represent the strength of the effect of each topic on the reader. A reader responds to an article on the basis of the distribution of the topics it contains and the reader's latent features. Furthermore, the model leverages the articles to which no readers respond for training. The effectiveness of the proposed model was demonstrated using data on the responses, as collected through crowdsourcing, of readers to several online news articles.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>With the spread of smartphones and the diversification of Web
services, individuals, companies, and organizations can send
and receive various types of information to and from
anywhere. Although information is provided by many types of
media, articles and posts are the most common type: news
articles, blog posts, social media posts, etc. Such articles and
posts can create unexpected emotional responses in readers
and can occur flame war in the worst case.</p>
      <p>To avoid the risk of the occurrence of flame war and to
inform the recipient as intended, methods for personalized
affect analysis, which predict the responses of a reader to an
Alphabetic order (equal contribution).
article or post, have attracted attention. By modeling the
personal affect responses of a reader, one can recommend
articles that are beneficial for that reader and avoid
recommending ones that might make the reader feel bad and avoid flame
war by predicting the ratio of people who will feel bad
after reading the article. Although there are several methods
for performing affect analysis, many of them predict only
one affect for a given article or word [Maas et al., 2011;
Ptaszynski et al., 2013; Liu et al., 2003]. However, different
readers may have different responses to an article and hence
an article that makes one reader feel bad may be beneficial to
another reader.</p>
      <p>We have developed a probabilistic generative model for
personalized affect analysis that combines the latent features
(factors) of news articles and their readers. It is an
extension of the latent Dirichlet allocation (LDA) model [Blei et
al., 2003]. Each article and its readers have their own latent
features. An article’s latent features can be interpreted as the
topics it includes. A reader’s latent features can be regarded
as parameters representing the strength of each topic’s effect
on the reader. Unlike other personal response models [Dawid
and Skene, 1979; Kajino et al., 2012; Koren et al., 2009;
Duan et al., 2014], our model uses the articles to which no
readers respond for training (i.e., for inferring the posterior
distributions) since the proposed model includes the
generating process of articles. The model’s effectiveness was
demonstrated on the task of predicting the affect responses, which
were collected by crowdsourcing, of readers of Japanese news
articles.</p>
      <p>We next introduce the notation used. In Section 3, we
review the LDA model. In Section 4, we present our proposed
model for personalized affect analysis. We discuss related
work and the differences between the proposed model and
previous models in Section 5. We present and discuss our
experimental results in Section 6 and conclude in Section 7 with
a brief summary.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Notation</title>
      <p>We denote the set of articles as D. We use Nd for the
number of words in the d-th article. We use wd;n for the
nth word in the d-th article. We denote the set of readers
as H and use E for the set of affect responses. We
represent the affect responses of the h-th reader to the d-th article
as lh;d 2 f0; 1gjEj. For e 2 jEj, lh;d;e = 1 means that
the h-th reader feels the e-th affect for the d-th article, and
lh;d;e = 0 means that the h-th reader does not. Note that we
assume that readers can feel multiple affects for an article;
i.e., PjeE=j1 lh;d;e can be higher than 1. Our goal is to develop
a model for personalized affect analysis, i.e., a model for
predicting responses lh;d;e that are not observed (unknown).
Therefore, in our scenario, all readers do not read all the
articles. We represent the set of indices of readers that read and
respond to the d-th article by Hd and the set of observations
of responses flh;d;e j d 2 [D]; h 2 Hd; e 2 [E]g by L.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Latent Dirichlet Allocation</title>
      <p>The LDA model [Blei et al., 2003] is a statistical model for
analyzing documents. Each document is assumed to have a
distribution of topics; i.e., each document can be represented
as a mixture of topics, and each topic has a distribution of
words. LDA is based on the assumption that each topic has a
word distribution and that the words in a document are
generated on the basis of this distribution and on the mixture-ratio
of topics in the document. The joint distribution of the LDA
is given by</p>
      <p>K
Y
k=1
Nd
Y
n=1
p(D; f kg; fzn;dg; f dg; ; )
:=
p( k; ) Y [p( d; )
jDj
d=1
p(zd;n; d)p(wd;n; zd;n )];
(1)
1. For k = 1; : : : ; K</p>
      <p>Dir( k; ).
where K is the number of topics, which is specified by the
user,
and</p>
      <p>are the parameters of Dirichlet distributions
variables. The process for generating D is as follows:
p( k; ) and p( d;</p>
      <p>), and f kg; fzn;dg, and f dg are latent
(a) Generate word distribution for k-th topic
k
(a) Generate topic distribution for d-th document d
 
 
latent features of readers and readers affect responses on the
basis of those features and the topic distributions of the
articles.
4.1</p>
      <sec id="sec-3-1">
        <title>Generating Process of Documents and Affect</title>
      </sec>
      <sec id="sec-3-2">
        <title>Responses in Proposed Model</title>
        <p>We first explain our assumptions for personalized affect
analysis.</p>
        <p>Because the LDA model is simply a basic model for
analyzing documents, we extended it for personalized affect
analysis. Since the e-th affect response of reader h to the d-th
news article lh;d;e is in f0; 1g, we assume that the distribution
of lh;d;e is a Bernoulli distribution: Ber(lh;d;e; ph;d;e), where
ph;d;e is the probability of lh;d;e = 1. Because ph;d;e clearly
depends on the reader and the article, and each article has a
topic distribution</p>
        <p>d in the LDA model, we assume that the
(h-th) reader has his or her own latent features for the e-th
affect response:
teristics of the reader. We also assume that lh;d;e is generated
from
ph;d;e must be greater than 0 and less than 1, we assume that
d and the characteristics: ph;d;e = h d;
h;ei. Because
h;e;k
satisfies the constraint: greater than 0 and less than 1.</p>
        <p>Beta( h;e;k; k). Then, ph;d;e = h d;
h;ei clearly
The joint distribution of the proposed model, which reflects
h;e 2 RK , which represents the
characthese assumptions, is given by
p(D; L; f kg; fzn;dg; f dg; f h;eg; ; ; f kg)
p( h;e;k; ) Y</p>
        <p>p( k; ) Y [p( d; )
K
k=1
jDj
d=1
p(zd;n; d)p(wd;n; zd;n )
:=
jHj jEj K
Y</p>
        <p>Y</p>
        <p>Y
h=1 e=1 k=1
Nd
Y
n=1
Y
h2Hd e=1
jEj
Y
p(lh;d;e; h d;
h;ei)]:
(2)
The process for generating D and flh;d;eg is as follows:
1. For k = 1; : : : ; K
(a) Generate word distribution for k-th topic</p>
        <p>Dir( k; ).
(b) For h = 1; : : : ; jH j
i. For e = 1; : : : ; jEj
k
  ,
model: the distribution of reader responses p(lh;d;e) is
introduced naturally. As described in the next section, in many of
the existing models for personal responses, articles to which
no readers responded, i.e., unresponded-to articles, cannot be
used for learning (i.e., inferring posterior distributions)
because they model only the generation of the distribution of
reader responses, p(lh;d;e). On the other hand, our model can
leverage these unresponded-to articles for learning since it
includes the generating process of articles.
4.2</p>
      </sec>
      <sec id="sec-3-3">
        <title>Inference of Posterior Distributions</title>
        <p>Given articles D and responses L, the goal of a user using
the proposed model is to obtain he posterior distributions of
topics p( d j D; L), words p( k j D; L), and reader
parameters p( h;e j D; L) for all d 2 [jDj], h 2 [jHj], e 2 [jEj],
and k 2 [K]. After inferring the posterior distributions, the
proposed model can predict unknown responses lh;d;e that are
not included in L; i.e., lh;d;e 2 Lu = flh;d;e j d 2 [jDj]; e 2
[jEj]; h 2 [jHj] n Hdg. Since the proposed model is an
extension of the LDA model, it can infer the posterior distributions
by using a VB or MCMC method in a manner similar to that
of the LDA model. Generally, VB-based methods are faster
than MCMC-based ones but require model-specific
derivation and implementation of the algorithm, which are more
complex than those of MCMC-based methods. In our
experiments, we used the automatic differentiation variational
inference (ADVI) [Kucukelbir et al., 2017] method for inferring
the posterior distributions. The ADVI method enables the
inference of approximated (variational) posterior distributions
without model-specific complex derivation and
implementation of an algorithm. The approximated posterior
distribution is represented as a Gaussian distribution with conversion
of some parameters and is optimized by maximizing the
evidence lower bound using stochastic gradient ascent with the
reparameterization trick [Kingma and Welling, 2014]. After
the approximated posterior distributions have been inferred,
the user can easily compute the distribution of topics of
unknown articles.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Related Work</title>
      <p>Various methods have been reported for affect analysis. Some
have been rule based. Liu et al. presented an affect
analysis model based on the Open Mind Common Sense database,
which is a generic common sense database [Liu et al., 2003].</p>
      <sec id="sec-4-1">
        <title>Ptaszynski et al.</title>
        <p>presented an affect analysis system for
Japanese narratives based on ML-Ask, which is an automatic
affect annotation tool using the Emotive Expression
Dictionary [Ptaszynski et al., 2013].</p>
      </sec>
      <sec id="sec-4-2">
        <title>Some have been machine</title>
        <p>learning based. Tokuhisa et al. presented one that uses an
automatically obtained labeled dataset [Tokuhisa et al., 2008].
Maas et al. proposed a method for learning word
representations (word latent features) and a classifier for the affect
analysis of documents. These methods are not for
personalized affect analysis. Our proposed method does not introduce
latent features of words and assumes that the affect responses
of readers depends on only the latent features of documents
and each reader. Hence, our future work includes the
extension of the proposed model by introducing the latent features
of words, like [Maas et al., 2011].</p>
        <p>Probabilistic models that have been proposed in the
crowdsourcing field can be used to predict personal
responses [Dawid and Skene, 1979; Kajino et al., 2012; Kajino
et al., 2013; Welinder et al., 2010; Duan et al., 2014;
Whitehill et al., 2009].</p>
      </sec>
      <sec id="sec-4-3">
        <title>Crowdsourcing services have been used</title>
        <p>to collect labeled data for supervised learning: a
machinelearning user first collects unlabeled data and then asks
crowdsourcing workers to label the data.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Because crowd</title>
        <p>sourcing workers are not professional, the quality of their
work is generally low. Hence, many methods have been
developed for inferring true labels and worker ability from a
set of noisy item labels given by multiple workers [Dawid
and Skene, 1979; Kajino et al., 2012; Kajino et al., 2013;
Welinder et al., 2010; Duan et al., 2014; Whitehill et al.,
2009]. In addition to inferring unknown item true labels,
these methods can also be used to predict personal responses.</p>
        <p>The model presented by Dawid and Skene for aggregating
diagnoses from multiple doctors [Dawid and Skene, 1979]
has also been used for inferring true labels from a set of noisy
labels given by crowdsourcing workers.</p>
        <p>We call it the DS
model. It is based on the assumption that each doctor has his
or her own confusion matrix for diagnosis and that their
diagnoses (responses) depend on his or her confusion matrix
and the unknown true diseases of patients. In the task of
aggregating noisy crowdsourced responses, doctors, patients,
and diagnoses correspond to workers, items (data), and
responses, respectively. There are three main differences
between the proposed model and the DS model. (i) The DS
model is based on the assumption that each item (document)
has one true label (true affect response) while the proposed
model does not (it is not reasonable to assume the existence of
a true affect response). (ii) The DS model does not consider
the characteristics of each item because responses depend
on only the true labels and worker characteristics. (iii) The
proposed model introduces the generation of articles (items)
while the DS model does not, so the proposed model can
leverage unresponded-to articles for learning while the DS
model can neither leverage nor predict the responses of
workers to unresponded-to items.</p>
        <p>Kajino et al. proposed the personal classifier (PC) model
for learning a classifier directly from noisy crowdsourced
labels [Kajino et al., 2012]. The PC model is based on the
assumption that (i) each worker has his or her own
classifier, (ii) given an item, each worker inputs the feature vector
of the item to his or her own personal classifier and labels
the item in accordance with the output of the classifier, and
(iii) there is a base classier and the parameters of the personal
classifiers are noisy versions of the base classifier (noises
represent the worker characteristics). The advantages of the PC
model are that (i) it produces not only (inferred) true labels
but also a classifier, (ii) the optimization problem is convex
and thus easy to solve, and (iii) it can predict the responses
of workers to unresponded-to items, unlike the DS model.
However, unlike the proposed model, the PC model cannot
use unresponded-to items for learning.</p>
        <p>Methods for modeling personal responses have also been
presented for recommender systems [Koren et al., 2009].
Matrix factorization (MF) [Koren et al., 2009] is a commonly
used method for recommender systems. It is based on the
assumption that each item and the user has its own latent
features (multi-dimensional vector) and models the responses of
the h-th user to the d-th item as the dot product of the
latent features of the user and those of the item. Random
variables d and h correspond to the latent features of the item
and user, respectively. A similar model was proposed in the
crowdsourcing area [Welinder et al., 2010]. Unlike the
proposed model, the MF method cannot leverage and predict
responses of workers to unresponded-to items.</p>
        <p>Duan et al. presented probabilistic models for
estimating multiple labels for emotions generated by
crowdsourcing workers [Duan et al., 2014]. Their models are extensions
of the DS model. Because their models, like the DS model,
do not use item (document) information (i.e., unlike the
proposed model), they cannot leverage and predict responses of
workers to unresponded-to items.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Evaluation</title>
      <p>We used 770 articles taken from the livedoor news
corpus dataset1. To collect reader responses to these
articles, we used the Lancers Crowdsourcing Service2. There
were 95 readers, and each one responded to at least
ten articles We defined the affect responses as E =
fanger; sadness; joy; displeasure; surprise; fearg. We asked
the readers to label each article they read with at least one
affect response. There were 220 articles with responses (i.e.,
jfd j d 2 [D]; Hd 6= ;gj = 220). We used 30 responses
for each article (Hd = 30 for such d). The mean,
median, and mode of the number of responses per readers were
69 30, and 10, respectively. The number of responses (i.e.,
jflh;d;e = 1 j d 2 [D]; h 2 Hdg) for each affect for the 220
articles is shown in Table 1.</p>
      <p>Since we wanted to evaluate the prediction performances
of different methods, we split the 220 responded-to articles
into 200 training articles and 20 testing articles. The
remaining 550 articles were used as unresponded-to articles to train
the proposed model.</p>
      <p>PC: personal classifier model [Kajino et al., 2012]. We
used logistic regressions as personal classifiers and a
base classifier similarly to [Kajino et al., 2012]. The
workers had jEj personal classifiers for each affect
response.</p>
      <p>IPC: independent personal classifier model. Each reader
has his or her own personal classifier, as in the PC model,
but does not have a base classifier, unlike the PC model.
That is, in the IPC model, personal classifiers are learned
independently, so the IPC model is a baseline model.
Proposed: proposed model described in Section 4. We
set the number of topics per article K to 5. Since the
proposed model can use unresponded to articles for
learning, we compared its performance with and without the
unresponded to articles, i.e., the remaining 550 articles.
We call the proposed model with unresponded-to articles
Proposed-Unresponded-to and without
unrespondedto articles Proposed. In real-world applications, the
articles for which reader responses are to be predicted
can be obtained in advance, and the proposed model
can leverage such test articles for training similar to
methods for transductive learning [Vapnik, 1998]. We
call the proposed model with test article information
We experimentally evaluated the effectiveness of our
proposed model for predicting personal affect responses.</p>
      <sec id="sec-5-1">
        <title>1https://www.rondhuit.com/download.html#ldcc 2http://www.lancers.jp/</title>
        <p>Proposed-Transductive and without test article
information Proposed. Similarly, we call the proposed
model with unresponded-to articles and test articles</p>
        <sec id="sec-5-1-1">
          <title>Proposed-Unresponded-to-Transductive and without</title>
          <p>unresponded-yo articles and with test articles
Proposed</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>Unresponded-to.</title>
          <p>
            The DS model [Dawid and Skene, 1979] and MF model
[Koren et al., 2009]
            <xref ref-type="bibr" rid="ref15">(and similar models [Welinder et al., 2010])</xref>
            were not compared because they cannot predict the responses
of readers to unresponded-to articles. We evaluated them
by using the area under the receiver operating characteristic
curve (ROC-AUC) on the 20 test articles.
6.3
          </p>
        </sec>
        <sec id="sec-5-1-3">
          <title>Results and Discussion</title>
          <p>The results are shown in Table 2. The average ROC-AUC
values among affect labels for all versions of the proposed
model were higher than those for the PC and IPC models.
Unfortunately, using unresponded-to articles and/or test
articles was not effective, possibly due to the small numbers
of unresponded-to and test articles. Future work includes
investigating the effects of using larger numbers of each.
Furthermore, the mode number of responses per reader was
10, which is insufficient for learning the latent features of a
reader. As with the clustering PC model [Kajino et al., 2013],
clustering readers on the basis of their latent features is a
promising way to efficiently learn the latent features of
readers. Future work also includes extending our model to include
reader clustering.
7</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Summary</title>
      <p>Our proposed model for personalized affect analysis is a
natural extension of the LDA model. Personal affect responses
are obtained from the latent features of articles and readers,
which are easy to interpret. Testing demonstrated that the
proposed model outperforms existing models on the task of
predicting personal responses.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was partially supported by JSPS KAKENHI Grant
Numbers JP15H02782 and JP18H03337, the
Telecommunications Advancement Foundation, and Global Station for Big
Data and Cybersecurity, a project of Global Institution for
Collaborative Research and Education at Hokkaido
University.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Blei et al.,
          <year>2003</year>
          ]
          <string-name>
            <surname>David M Blei</surname>
          </string-name>
          , Andrew Y Ng, and
          <string-name>
            <given-names>Michael I</given-names>
            <surname>Jordan</surname>
          </string-name>
          .
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>3</volume>
          (Jan):
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Dawid and Skene</source>
          , 1979]
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Dawid</surname>
          </string-name>
          and
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Skene</surname>
          </string-name>
          .
          <article-title>Maximum likelihood estimation of observer error-rates using the em algorithm</article-title>
          .
          <source>Journal of the Royal Statistical Society</source>
          . Series C (Applied Statics),
          <volume>28</volume>
          (
          <issue>1</issue>
          ):
          <fpage>20</fpage>
          -
          <lpage>28</lpage>
          ,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Duan et al.,
          <year>2014</year>
          ]
          <string-name>
            <given-names>Lei</given-names>
            <surname>Duan</surname>
          </string-name>
          , Satoshi Oyama,
          <string-name>
            <given-names>Haruhiko</given-names>
            <surname>Sato</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Masahito</given-names>
            <surname>Kurihara</surname>
          </string-name>
          .
          <article-title>Separate or joint? estimation of multiple labels from crowdsourced annotations</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>41</volume>
          (
          <issue>13</issue>
          ):
          <fpage>5723</fpage>
          -
          <lpage>5732</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Kajino et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>Hiroshi</given-names>
            <surname>Kajino</surname>
          </string-name>
          , Yuta Tsuboi, and
          <string-name>
            <given-names>Hisashi</given-names>
            <surname>Kashima</surname>
          </string-name>
          .
          <article-title>A convex formulation for learning from crowds</article-title>
          .
          <source>In Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI</source>
          <year>2012</year>
          ), pages
          <fpage>73</fpage>
          -
          <lpage>79</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Kajino et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>Hiroshi</given-names>
            <surname>Kajino</surname>
          </string-name>
          , Yuta Tsuboi, and
          <string-name>
            <given-names>Hisashi</given-names>
            <surname>Kashima</surname>
          </string-name>
          .
          <article-title>Clustering crowds</article-title>
          .
          <source>In Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI</source>
          <year>2013</year>
          ), pages
          <fpage>1120</fpage>
          -
          <lpage>1127</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Kingma and Welling</source>
          , 2014]
          <article-title>Diederik P Kingma and Max Welling</article-title>
          .
          <article-title>Auto-encoding variational bayes</article-title>
          .
          <source>In Proceedings of the International Conference on Learning Representations</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Koren et al.,
          <year>2009</year>
          ]
          <string-name>
            <given-names>Yehuda</given-names>
            <surname>Koren</surname>
          </string-name>
          , Robert Bell, and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Volinsky</surname>
          </string-name>
          .
          <article-title>Matrix factorization techniques for recommender systems</article-title>
          .
          <source>Computer</source>
          ,
          <volume>42</volume>
          (
          <issue>8</issue>
          ),
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Kucukelbir et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Alp</given-names>
            <surname>Kucukelbir</surname>
          </string-name>
          , Dustin Tran, Rajesh Ranganath,
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Gelman</surname>
          </string-name>
          , and
          <string-name>
            <surname>David M Blei.</surname>
          </string-name>
          <article-title>Automatic differentiation variational inference</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          ,
          <volume>18</volume>
          (
          <issue>1</issue>
          ):
          <fpage>430</fpage>
          -
          <lpage>474</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Liu et al.,
          <year>2003</year>
          ] Hugo Liu, Henry Lieberman, and
          <string-name>
            <given-names>Ted</given-names>
            <surname>Selker</surname>
          </string-name>
          .
          <article-title>A model of textual affect sensing using real-world knowledge</article-title>
          .
          <source>In Proceedings of the 8th international conference on Intelligent user interfaces</source>
          , pages
          <fpage>125</fpage>
          -
          <lpage>132</lpage>
          . ACM,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Maas et al.,
          <year>2011</year>
          ] Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang,
          <string-name>
            <given-names>Andrew Y.</given-names>
            <surname>Ng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Potts</surname>
          </string-name>
          .
          <article-title>Learning word vectors for sentiment analysis</article-title>
          .
          <source>In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT '11</source>
          , pages
          <fpage>142</fpage>
          -
          <lpage>150</lpage>
          , Stroudsburg, PA, USA,
          <year>2011</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Porteous et al.,
          <year>2008</year>
          ]
          <string-name>
            <given-names>Ian</given-names>
            <surname>Porteous</surname>
          </string-name>
          , David Newman,
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Ihler</surname>
          </string-name>
          , Arthur Asuncion, Padhraic Smyth, and
          <string-name>
            <given-names>Max</given-names>
            <surname>Welling</surname>
          </string-name>
          .
          <article-title>Fast collapsed gibbs sampling for latent dirichlet allocation</article-title>
          .
          <source>In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , pages
          <fpage>569</fpage>
          -
          <lpage>577</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Ptaszynski et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>Michal</given-names>
            <surname>Ptaszynski</surname>
          </string-name>
          , Hiroaki Dokoshi, Satoshi Oyama, Rafal Rzepka, Masahito Kurihara, Kenji Araki, and
          <string-name>
            <given-names>Yoshio</given-names>
            <surname>Momouchi</surname>
          </string-name>
          .
          <article-title>Affect analysis in context of characters in narratives</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>40</volume>
          (
          <issue>1</issue>
          ):
          <fpage>168</fpage>
          -
          <lpage>176</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Tokuhisa et al.,
          <year>2008</year>
          ]
          <string-name>
            <given-names>Ryoko</given-names>
            <surname>Tokuhisa</surname>
          </string-name>
          , Kentaro Inui, and
          <string-name>
            <given-names>Yuji</given-names>
            <surname>Matsumoto</surname>
          </string-name>
          .
          <article-title>Emotion classification using massive examples extracted from the web</article-title>
          .
          <source>In Proceedings of the 22nd International Conference on Computational LinguisticsVolume 1</source>
          , pages
          <fpage>881</fpage>
          -
          <lpage>888</lpage>
          . Association for Computational Linguistics,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[Vapnik</source>
          , 1998]
          <string-name>
            <given-names>Vladimir N.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          .
          <article-title>Statistical Learning Theory</article-title>
          . Wiley-Interscience,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Welinder et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>Peter</given-names>
            <surname>Welinder</surname>
          </string-name>
          , Steve Branson, Pietro Perona, and
          <string-name>
            <surname>Serge J Belongie.</surname>
          </string-name>
          <article-title>The multidimensional wisdom of crowds</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>2424</fpage>
          -
          <lpage>2432</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Whitehill et al.,
          <year>2009</year>
          ]
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Whitehill</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ting-fan Wu</surname>
          </string-name>
          , Jacob Bergsma,
          <string-name>
            <surname>Javier R Movellan</surname>
          </string-name>
          , and Paul L Ruvolo.
          <article-title>Whose vote should count more: Optimal integration of labels from labelers of unknown expertise</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>2035</fpage>
          -
          <lpage>2043</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>