<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>New York City, USA, July</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>SMILE: Twitter Emotion Classification using Domain Adaptation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bo Wang</string-name>
          <email>bo.wang@warwick.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric Jensen</string-name>
          <email>e.jensen@warwick.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Maria Liakata Arkaitz Zubiaga Rob Procter Department of Computer Science University of Warwick Coventry</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>10</volume>
      <issue>2016</issue>
      <fpage>15</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>Despite the widely spread research interest in social media sentiment analysis, sentiment and emotion classification across different domains and on Twitter data remains a challenging task. Here we set out to find an effective approach for tackling a cross-domain emotion classification task on a set of Twitter data involving social media discourse around arts and cultural experiences, in the context of museums. While most existing work in domain adaptation has focused on feature-based or/and instance-based adaptation methods, in this work we study a model-based adaptive SVM approach as we believe its flexibility and efficiency is more suitable for the task at hand. We conduct a series of experiments and compare our system with a set of baseline methods. Our results not only show a superior performance in terms of accuracy and computational efficiency compared to the baselines, but also shed light on how different ratios of labelled target-domain data used for adaptation can affect classification performance.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>With the advent and growth of social media as a ubiquitous
platform, people increasingly discuss and express opinions
and emotions towards all kinds of topics and targets. One
of the topics that has been relatively unexplored in the
scientific community is that of emotions expressed towards arts
and cultural experiences. A survey conducted in 2012 by the
British TATE Art Galleries found that 26 percent of the
respondents had posted some kind of content online, such as
blog posts, tweets or photos, about their experience in the art
galleries during or after their visit [Villaespesa, 2013]. When
cultural tourists share information about their experience in
social media, this real-time communication and spontaneous
engagement with art and culture not only broadens its target
audience but also provides a new space where valuable
insight shared by its customers can be garnered. As a result
museums, galleries and other cultural venues have embraced
social media such as Twitter, and actively used it to
promote their exhibitions, organise participatory projects and/or
create initiatives to engage with visitors, collecting valuable
opinions and feedback (e.g. museum tweetups). This gold
mine of user opinions has sparked an increasing research
interest in the interdisciplinary field of social media and
museum study [Fletcher and Lee, 2012; Villaespesa, 2013;
Drotner and Schrøder, 2014].</p>
      <p>We have also seen a surge of research in sentiment
analysis with over 7,000 articles written on the topic [Feldman,
2013], for applications ranging from analyses of movie
reviews [Pang and Lee, 2008] and stock market trends [Bollen
et al., 2011] to forecasting election results [Tumasjan et al.,
2010]. Supervised learning algorithms that require labelled
training data have been successfully used for in-domain
sentiment classification. However, cross-domain sentiment
analysis has been explored to a much lesser extent. For instance,
the phrase “light-weight” carries positive sentiment when
describing a laptop but quite the opposite when it is used to
refer to politicians. In such cases, a classifier trained on
one domain may not work well on other domains. A widely
adopted solution to this problem is domain adaptation, which
allows building models from a fixed set of source domains
and deploy them into a different target domain. Recent
developments in sentiment analysis using domain adaptation are
mostly based on feature-representation adaptation [Blitzer et
al., 2007; Pan et al., 2010; Bollegala et al., 2011],
instanceweight adaptation [Jiang and Zhai, 2007; Xia et al., 2014;
Tsakalidis et al., 2014] or combinations of both [Xia et
al., 2013; Liu et al., 2013]. Despite its recent increase
in popularity, the use of domain adaptation for sentiment
and emotion classification across topics on Twitter is still
largely unexplored [Liu et al., 2013; Tsakalidis et al., 2014;
Townsend et al., 2014].</p>
      <p>In this work we set out to find an effective approach
for tackling the cross-domain emotion classification task on
Twitter, while also furthering research in the interdisciplinary
study of social media discourse around arts and cultural
experiences1. We investigate a model-based adaptive-SVM
approach that was previously used for video concept
detection [Yang et al., 2007] and compare with a set of
domaindependent and domain-independent strategies. Such a
modelbased approach allows us to directly adapt existing models
to the new target-domain data without having to generate
domain-dependent features or adjusting weights for each of
1SMILE project: http://www.culturesmile.org/
the training instances.We conduct a series of experiments and
evaluate the proposed system2 on a set of Twitter data about
museums, annotated by three annotators from the social
sciences. The aim is to maximise the use of the base
classifiers that were trained from a general-domain corpus, and
through domain adaptation minimise the classification error
rate across 5 emotion categories: anger, disgust, happiness,
surprise and sadness. Our results show that adapted SVM
classifiers achieve significantly better performance than
outof-domain classifiers and also suggest a competitive
performance compared to in-domain classifiers. To the best of our
knowledge this is the first attempt at cross-domain emotion
classification for Twitter data.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>Most existing approaches can be classified into two
categories: feature-based adaptation and instance-based
adaptation. The former seek to construct new adaptive feature
representations that reduce the difference between domains, while
the latter aims to sample and re-weight source domain
training data for use in classification within the target domain.</p>
      <p>With respect to feature domain adaptation, [Blitzer et al.,
2007] applied structural correspondence learning (SCL)
algorithm for cross-domain sentiment classification. SCL chooses
a set of pivot features with highest mutual information to
the domain labels, and uses these pivot features to align
other features by training N linear predictors. Finally it
computes singular value decomposition (SVD) to construct
low-dimensional features to improve its classification
performance. A small amount of target domain labelled data
is used to learn to deal with misaligned features from SCL.
[Townsend et al., 2014] found that SCL did not work well for
cross-domain adaptation of sentiment on Twitter due to the
lack of mutual information across the Twitter domains and
uses subjective proportions as a backoff adaptation approach.
[Pan et al., 2010] proposed to construct a bipartite graph from
a co-occurrence matrix between domain-independent and
domain specific features to reduce the gap between different
domains and use spectral clustering for feature alignment.
The resulting clusters are used to represent data examples and
train sentiment classifiers. They used mutual information
between features and domains to classify domain-independent
and domain specific features, but in practice this also
introduces mis-classification errors. [Bollegala et al., 2011]
describes a cross-domain sentiment classification approach
using an automatically created sentiment sensitive thesaurus.
Such a thesaurus is constructed by computing the point-wise
mutual information between a lexical element u and a
feature as well as relatedness between two lexical elements. The
problem with these feature adaptation approaches is that they
try to connect domain-dependent features to known or
common features under the assumption that parallel sentiment
words exist in different domains, which is not necessarily
applicable to various topics in tweets [Liu et al., 2013].
[Glorot et al., 2011] proposes a deep learning system to extract
features that are highly beneficial for the domain adaptation
2The code can be found at http://bit.ly/1WHup4b
of sentiment classifiers, under the intuition that deep
learning algorithms learn intermediate concepts (between raw
input and target) and these intermediate concepts could yield
better transfer across domains.</p>
      <p>When it comes to instance adaptation, [Jiang and Zhai,
2007] proposes an instance weighting framework that prunes
“misleading” instances and approximates the distribution of
instances in the target domain. Their experiments show that
by adding some labelled target domain instances and
assigning higher weights to them performs better than either
removing “misleading” source domain instances using a small
number of labelled target domain data or bootstrapping unlabelled
target instances. [Xia et al., 2014] adapts the source domain
training data to the target domain based on a logistic
approximation. [Tsakalidis et al., 2014] learns different classifiers
on different sets of features and combines them in an
ensemble model. Such an ensemble model is then applied to part
of the target domain test data to create new training data (i.e.
documents for which different classifiers had the same
predictions). We include this ensemble method as one of our
baseline approaches for evaluation and comparison.</p>
      <p>In contrast with most cross-domain sentiment classification
works, we use a model-based approach proposed in [Yang et
al., 2007], which directly adapts existing classifiers trained
on general-domain corpora. We believe this is more efficient
and flexible [Yang and Hauptmann, 2008] for our task. We
evaluate on a set of manually annotated tweets about cultural
experiences in museums and conduct a finer-grained
classification of emotions conveyed (i.e. anger, disgust, happiness,
surprise and sadness).
3</p>
    </sec>
    <sec id="sec-3">
      <title>Datasets</title>
      <p>We use two datasets, a source-domain dataset and a
targetdomain dataset, which enables us to experiment on domain
adaptation. The source-domain dataset we adopted is the
general-domain Twitter corpus created by [Purver and
Battersby, 2012], which was generated through distant
supervision using hashtags and emoticons associated with 6
emotions: anger, disgust, fear, happiness, surprise and sadness.</p>
      <p>Our target-domain dataset that allows us to perform
experiments on emotions associated with cultural experiences
consists of a set of tweets pertaining to museums. A
collection of tweets mentioning one of the following
Twitter handles associated with British museums was gathered
between May 2013 and June 2015: @camunivmuseums,
@fitzmuseum uk, @kettlesyard, @maacambridge,
@iciabath, @thelmahulbert, @rammuseum, @plymouthmuseum,
@tateliverpool, @tate stives, @nationalgallery,
@britishmuseum, @ thewhitechapel. These are all museums associated
with the SMILES project. A subset of 3,759 tweets was
sampled from this collection for manual annotation. We
developed a tool for manual annotation of the emotion expressed
in each of these tweets. The options for the annotation of
each tweet included 6 different emotions; the six Ekman
emotions as in [Purver and Battersby, 2012], with the exception of
‘fear’ as it never featured in the context of tweets about
museums. Two extra annotation options were included to indicate
that a tweet should have no code, indicating that a tweet was
not conveying any emotions, and not relevant when it did not
refer to any aspects related to the museum in question. The
annotator could choose more than one emotion for a tweet,
except when no code or not relevant were selected, in which
case no additional options could be picked. The annotation of
all the tweets was performed independently by three
sociology PhD students. Out of the 3,759 tweets that were released
for annotation, at least 2 of the annotators agreed in 3,085
cases (82.1%). We use the collection resulting from these
3,085 tweets as our target-domain dataset for classifier
adaptation and evaluation. Note that tweets labelled as no code or
not relevant are included in our dataset to reflect a more
realistic data distribution on Twitter, while our source-domain
data doesn’t have any no code or not relevant tweets.</p>
      <p>The distribution of emotion annotations in Table 2 shows a
remarkable class imbalance, where happy accounts for 30.2%
of the tweets, while the other emotions are seldom observed
in the museum dataset. There is also a large number of tweets
with no emotion associated (41.8%). One intuitive
explanation is that Twitter users tend to express positive and
appreciative emotions regarding their museum experiences and
shy away from making negative comments. This can also be
demonstrated by comparing the museum data emotion
distribution to our general-domain source data as seen in Figure 1,
where the sample ratio of positive instances is shown for each
emotion category.</p>
      <p>To quantify the difference between two text datasets,
Kullback-Leibler (KL) divergence has been commonly used
before [Dai et al., 2007]. Here we use the KL-divergence
method proposed by [Bigi, 2003], as it suggests a back-off
smoothing method that deals with the data sparseness
problem. Such back-off method keeps the probability
distributions summing to 1 and allows operating on the entire
vocabulary, by introducing a normalisation coefficient and a
very small threshold probability for all the terms that are
not in the given vocabulary. Since our source-domain data
contains many more tweets than the target-domain data, we
have randomly sub-sampled the former and made sure the
two data sets have similar vocabulary size in order to avoid
biases. We removed stop words, user mentions, URL links
and re-tweet symbols prior to computing the KL-divergence.
Finally we randomly split each data set into 10 folds and
compute the in-domain and cross-domain symmetric
KLdivergence (KLD) value between every pair of folds.
Table 1 shows the computed KL-divergence averages. It can
be seen that KL-divergence between the two data sets (i.e.
KLD(Dsrc || Dtar)) is twice as large as the in-domain
KLdivergence values. This suggests a significant difference
between data distributions in the two domain and thus justifies
our need for domain adaptation.</p>
      <p>Data domain
KLD(Dsrc || Dsrc)
KLD(Dtar || Dtar)
KLD(Dsrc || Dtar)</p>
      <p>
        Averaged KLD value
2.391
2.165
4.818
(xik, yik) iN=sk1rc in Dsrc, where xik is the ith feature vector
with each element as the value of the corresponding feature
and yik are the emotion categories that the ith instance
belongs to. Suppose we have some classifiers fskrc(x) that have
been trained on the source-domain data
        <xref ref-type="bibr" rid="ref2 ref25 ref6">(named as the
auxiliary classifiers in [Yang et al., 2007])</xref>
        and a small set of
labelled target-domain data as Dtlar where Dtar = Dtlar [
Dtuar, our goal is to adapt fskrc(x) to a new classifier ftar(x)
based on the small set of labelled examples in Dtlar, so it can
be used to accurately predict the emotion class of unseen data
from Dtuar.
      </p>
      <sec id="sec-3-1">
        <title>4.1 Base Classifiers</title>
        <p>Our base classifiers are the classifiers that have been trained
on the source-domain data (xi, yi) iN=s1rc , where yi 2
{1, ..., K} with K referring to the number of emotion
categories. In our work, we use Support Vector Machines (SVMs)
in a “one-versus-all” setting, which trains K binary
classifiers, each separating one class from the rest. We chose this
as a better way of dealing with class imbalance in a
multiclass scenario.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Features</title>
        <p>The base classifiers are trained on 3 sets of features
generated from the source-domain data: (i) n-grams, (ii) lexicon
features, (iii) word embedding features.</p>
        <p>N-gram models have long been used in NLP for various
tasks. We used 1-2-3 grams after filtering out all the stop
words, as our n-gram features. We construct 32 Lexicon
features from 9 Twitter specific and general-purpose lexica.
Each lexicon provides either a numeric sentiment score, or
categories where a category could correspond to a particular
emotion or a strong/weak positive/negative sentiment.</p>
        <p>The use of Word embedding features to represent the
context of words and concepts, has been shown to be very
effective in boosting the performance of sentiment
classification. In this work we use a set of word embeddings learnt
using a sentiment-specific method in [Tang et al., 2014] and
another set of general word embeddings trained with 5 million
tweets by [Vo and Zhang, 2015]. Training on an additional
set of 3 million tweets we trained ourselves did not increase
performance. Pooling functions are essential and particularly
effective for feature selection from dense embedding feature
vectors. [Tang et al., 2014] applied the max, min and mean
pooling functions and found them to be highly useful. We
tested and evaluated six pooling functions, namely sum, max,
min, mean, std (i.e. standard deviation) and product, and
selected sum, max and mean as they led to the best performance.</p>
      </sec>
      <sec id="sec-3-3">
        <title>4.2 Classifier Adaptation</title>
        <p>[Yang et al., 2007] proposes a many-to-one SVM adaptation
model, which directly modifies the decision function of an
ensemble of existing classifiers fskrc(x), trained with one or k
sets of labelled source-domain data in Dsrc, and thus creates
a new adapted classifier ftar(x) for the target-domain Dtar.
The adapted classifier has the following form:
ftar(x) =</p>
        <p>M
X ⌧ kfskrc(x) +
k=1
f (x)
(1)
where ⌧ k 2 (0, 1) is the weight of each base classifier
fskrc(x). f (x) is the perturbation function that is learnt from
a small set of labelled target-domain data in Dtlar. As shown
in [Yang et al., 2007] it has the form:
f (x) = wT (x) =</p>
        <p>N
X ↵ iyiK(xi, x)
i=1
(2)
where w = PiN=1 ↵ iyi (xi) are the model parameters to be
estimated from the labelled examples in Dtlar and ↵ i is the
feature coefficient of the ith labelled target-domain instance.
Furthermore K(·, ·) ⌘ (·)T (·) is the kernel function
induced from the nonlinear feature mapping. f (x) is learnt
in a framework that aims to minimise the regularised
empirical risk [Yang, 2009]. The adapted classifier ftar(x) learnt
under this framework tries to minimise the classification error
on the labelled target-domain examples and the distance from
the base classifiers fskrc(x), to achieve a better bias-variance
trade-off.</p>
        <p>In this work we use the extended multi-classifier
adaptation framework proposed by [Yang and Hauptmann, 2008],
which allows the weight controls {⌧ k M
}k=1 of the base
classifiers fskrc(x) to be learnt automatically based on their
classification performance of the small set of labelled target-domain
examples. To achieve this, [Yang and Hauptmann, 2008]
adds another regulariser to the regularised loss minimisation
framework, with the objective function of training the
adaptive classifier now written as:
1
2
wT w +</p>
        <p>B(⌧ )T ⌧ + C</p>
        <p>N
X ⇠ i
i=1</p>
        <p>M
yi X ⌧ kfskrc(x) + yiwT (xi)
1 ⇠ i,
(3)
min
w,⌧ ,⇠
s.t.</p>
        <p>1
2</p>
        <p>k=1
⇠ im</p>
        <p>0, 8 (xi, yi) 2 Dsrc
where 12 (⌧ )T ⌧ measures the overall contribution of base
classifiers. Thus this objective function seeks to avoid over
reliance on the base classifiers and also over-complex f (·).
The two goals are balanced by the parameter B. By rewriting
this objective function as a minimisation problem of a
Lagrange (primal) function and set its derivative against w, ⌧ ,
and ⇠ to zero, we have:
w =</p>
        <p>N
X ↵ iyi (xi), ⌧ k =
i=1
where ⌧ k is a weighted sum of yifskrc(xi) and it
indicates the classification performance of f k
src on the
targetdomain. Therefore we have base classifiers assigned with
larger weight if they classify the labelled target-domain data
well. Now given (1), (2) and (4), the new decision function
can be formulated as:
ftar(x) = 1 XM XN ↵ iyifskrc(xi)fskrc(x) + f (x)
B</p>
        <p>k=1 i=1
N</p>
        <p>⇣
= X ↵ iyi K(xi, x) +
i=1
The baseline methods and our proposed system are the
following:
• BASE: the base classifiers use either one set of features
or all three feature sets (i.e. BASE-all). As an example,
the BASE-embedding classifier is trained and tuned with
all source-domain data using only word-embedding
features, then tested on 30% of our target-domain data. We
use the LIBSVM implementation [Chang and Lin, 2011]
of SVM for building the base classifiers.
• TARG: trained and tuned with 70% labelled
targetdomain data. Since this model is entirely trained from
the target domain, it can be considered as the
performance upper-bound that is very hard to beat.
• AGGR: an aggregate model trained from all
sourcedomain data and 70% labelled target-domain data.
• ENSEMBLE: combines the base classifiers in an
ensemble model. Then perform classification on 30% of
the target-domain data to generate new training data, as
described in Section 2.
• ADAPT: our domain adapted models using either one
base classifier trained with all feature sets (i.e.
ADAPT1-model) or an ensemble of three standalone base
classifiers with each trained with one set of features (i.e.
ADAPT-3-model). We use 30% of the labelled
targetdomain data for classifier adaptation and parameter
tuning described in Section 4.2.</p>
        <p>The above methods are all tested on the same 30% labelled
target-domain data in order to make their results
comparable. In addition we perform in-domain cross-validation and
evaluation only on our source-domain data using all feature
sets; this model is named as SRC-all. We use an RBF kernel
function (as it outperforms linear kernel. Polynomial kernel
gives similar performance but requires more parameter
tuning) with default setting of the gamma parameter in all the
methods. For the cost factor C and class weight parameter
(except the SRC-all model) we conduct cross-validated
gridsearch over the same set of parameter values for all the
methods, for parameter optimisation. This makes sure our ADAPT
models are comparable with BASE, TARG, ENSEMBLE and
AGGR. For ADAPT-3-model we also optimise the base
classifier weight parameters, denoted as ⌧ k in Eq.(1), as described
in Section 4.2.
5.2</p>
      </sec>
      <sec id="sec-3-4">
        <title>Experimental Results</title>
        <p>We report the experimental results in Table 3, with three
categories of models: 1) in-domain no adaptation methods, i.e.
BASE and TARG models, TARG being the upper-bound for
performance evaluation; 2) the domain adaptation baselines,
i.e. AGGR and ENSEMBLE and 3) our adaptation systems
(ADAPT models). As can be seen the classification
performances reported for emotions other than “happy” are below
50 in terms of F1 score with some results being as low as
0.00. This is caused by the class imbalance issue within these
emotions as shown in Table 2 and Figure 1, especially for
the emotion “disgust” which has only 16 tweets. We tried to
balance this issue using a class weight parameter, but it still
is very challenging to overcome without acquiring more
labelled data than we currently have. It especially effects our
domain adaptation as all the parameters in Eq.(3) cannot be
properly optimised.</p>
        <p>Since there are very few tweets annotated as “disgust”, we
decide not to consider the “disgust” emotion as part of our
experiment evaluation here. As seen in Table 3, BASE
models are outperformed significantly by all other methods
(except ENSEMBLE, which performs only slightly better than
the BASE models) positing the importance of domain
adaptation. With the exception of the ADAPT-3-model for “Anger”,
our ADAPT models consistently outperform AGGR-all and
ENSEMBLE while showing competitive performance
compared to the upper-bound baseline, TARG-all. We also
observe that the aggregation model AGGR-all is outperformed
by TARG-all, indicating such domain knowledge cannot be
transferred effectively to a different domain by simply
modelling from aggregated data from both domains. In
comparison, our ADAPT models are able to leverage the large
and balanced source-domain data (as base classifiers) unlike
TARG, while adjusting the contribution of each base
classifier unlike AGGR.</p>
        <p>When comparing our ADAPT models, we find that in most
cases models adapted from multiple base classifiers beat the
ones adapted from one single base classifier, even though the
same features are used in both scenarios. This shows the
benefit of the multi-classifier adaptation approach, which aims to
maximise the utility of each base classifier. Two additional
models, namely ADAPT-1-modelx and ADAPT-3-modelx,
are the replicates of ADAPT-1/3-model except they also use
40% target-domain data for tuning the model parameters. On
average their results are only slightly better than
ADAPT-1/3model that use 30% of the target-domain data for both
training and parameter optimisation. This is especially prominent
with “happiness” where we have sufficient target-domain
instances and less of a class imbalance issue. This shows our
ADAPT models are able to yield knowledge transfer
effectively across different domains with a small amount of
labelled target-domain data. More analysis on the impact of
adaptation sample ratios is given in Section 5.3.</p>
        <p>We can also evaluate the performance of each model by
comparing its efficiency in terms of computation time. Here
we report the total computation time taken for all the above
methods except BASE, for the emotion “happiness”. Such
computation process consists of adaptation training,
gridsearch over the same set of parameter values and final testing.
As seen in Table 4, compared to other out-of-domain
strategies the proposed ADAPT models are more efficient to train
especially in comparison with AGGR, which is an order of
magnitude more costly due to the inclusion of source-domain
data. Within the ADAPT models, ADAPT-1-model requires
less time to train since it only has one base classifier for
adaptation.
5.3</p>
      </sec>
      <sec id="sec-3-5">
        <title>Effect of Adaptation Training Sample ratios</title>
        <p>
          Here we evaluate the effect of different ratios of the
labelled target-domain data on the overall classification
performance for the emotion “happiness”. Figure 2 shows the
normalised F1 scores and computation time of each ADAPT
Model
model across different adaptation training sample sizes
ranging from 10% to 70% of the total target-domain data (with the
same 30% held out as test data) and with the cost factor C =
1, 3 and 10
          <xref ref-type="bibr" rid="ref2 ref25 ref6">(as the same choices of C are used in [Yang et al.,
2007] for conducting their experiment)</xref>
          . We observe a
logarithmic growth for the F1 scores obtained from every model,
against a linear growth of computation time cost. Thus even
though there is a reasonable increase in classification
performance when increasing the adaptation sample size from 50%
to 70%, it becomes much less efficient to train such
models and we require more data, which may not be available.
Since we have a trade-off between model effectiveness and
efficiency here, it is appropriate to use 30% of our labelled
target-domain data for classifier adaptation as we have done
so in ADAPT-1-model and ADAPT-3-model. One should
select the adaptation training sample size accordingly based on
the test data at hand, but empirically we think 1,000 labelled
target-domain tweets would be enough for an effective
adaptation to classify 3,000-4,000 test tweets.
6
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this work we study a model-based multi-class
adaptiveSVM approach to cross-domain emotion recognition and
compare against a set of domain-dependent and
domainindependent strategies. We conduct a series of experiments
and evaluate our proposed system on a set of newly
annotated Twitter data about museums. We find that our adapted
SVM model outperforms the out-of-domain base models and
domain adaptation baselines while also showing
competitive performance against the in-domain upper-bound model.
Moreover, in comparison to other adaptation strategies our
approach is computationally more efficient especially
compared to the classifier trained on aggregated source and
target data. Finally, we shed light on how different ratios of
labelled target-domain data used for adaptation can effect
classification performance. We show there is a trade-off between
model effectiveness and efficiency when selecting adaptation
sample size. Our code and data4 are publicly available,
enabling further research and comparison with our approach.</p>
      <p>In the future we would like to investigate a feature-based
deep learning approach for cross-topic emotion classification
on Twitter while examining the possibility of making it as
efficient and flexible as the model adaptation based approaches.
Another future direction is to study how to best resolve the
remarkable class imbalance issue in social media emotion
analysis when some emotions are rarely expressed.</p>
      <p>4http://bit.ly/1SddvIw
This work has been funded by the AHRC SMILES project.
We would like to thank Liz Walker, Matt Jeffryes and Michael
Clapham for their contribution to earlier versions of the
emotion classifiers.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Bigi</source>
          , 2003]
          <string-name>
            <given-names>Brigitte</given-names>
            <surname>Bigi</surname>
          </string-name>
          .
          <article-title>Using Kullback-Leibler distance for text categorization</article-title>
          . Springer,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Blitzer et al.,
          <year>2007</year>
          ] John Blitzer, Mark Dredze,
          <string-name>
            <given-names>Fernando</given-names>
            <surname>Pereira</surname>
          </string-name>
          , et al.
          <article-title>Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification</article-title>
          .
          <source>In ACL</source>
          , volume
          <volume>7</volume>
          , pages
          <fpage>440</fpage>
          -
          <lpage>447</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Bollegala et al.,
          <year>2011</year>
          ]
          <string-name>
            <given-names>Danushka</given-names>
            <surname>Bollegala</surname>
          </string-name>
          , David Weir,
          <string-name>
            <given-names>and John</given-names>
            <surname>Carroll</surname>
          </string-name>
          .
          <article-title>Using multiple sources to construct a sentiment sensitive thesaurus for cross-domain sentiment classification</article-title>
          .
          <source>In NAACL HLT</source>
          , pages
          <fpage>132</fpage>
          -
          <lpage>141</lpage>
          . Association for Computational Linguistics,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Bollen et al.,
          <year>2011</year>
          ]
          <string-name>
            <given-names>Johan</given-names>
            <surname>Bollen</surname>
          </string-name>
          , Huina Mao, and
          <string-name>
            <given-names>Xiaojun</given-names>
            <surname>Zeng</surname>
          </string-name>
          .
          <article-title>Twitter mood predicts the stock market</article-title>
          .
          <source>Journal of Computational Science</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Chang and Lin</source>
          , 2011]
          <article-title>Chih-Chung Chang and Chih-Jen Lin</article-title>
          .
          <article-title>LIBSVM: A library for support vector machines</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          ,
          <volume>2</volume>
          :
          <issue>27</issue>
          :
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          :
          <fpage>27</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Dai et al.,
          <year>2007</year>
          ]
          <string-name>
            <given-names>Wenyuan</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <surname>Gui-Rong</surname>
            <given-names>Xue</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Qiang</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Yong</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>Co-clustering based classification for out-of-domain documents</article-title>
          .
          <source>In SIGKDD</source>
          , pages
          <fpage>210</fpage>
          -
          <lpage>219</lpage>
          . ACM,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Drotner and Schrøder</source>
          , 2014]
          <article-title>Kirsten Drotner and Kim Christian Schrøder</article-title>
          .
          <article-title>Museum communication and social media: The connected museum</article-title>
          .
          <source>Routledge</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[Feldman</source>
          , 2013]
          <string-name>
            <given-names>Ronen</given-names>
            <surname>Feldman</surname>
          </string-name>
          .
          <article-title>Techniques and applications for sentiment analysis</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>56</volume>
          (
          <issue>4</issue>
          ):
          <fpage>82</fpage>
          -
          <lpage>89</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>[Fletcher and Lee</source>
          , 2012]
          <string-name>
            <given-names>Adrienne</given-names>
            <surname>Fletcher and Moon J Lee</surname>
          </string-name>
          .
          <article-title>Current social media uses and evaluations in american museums</article-title>
          .
          <source>Museum Management and Curatorship</source>
          ,
          <volume>27</volume>
          (
          <issue>5</issue>
          ):
          <fpage>505</fpage>
          -
          <lpage>521</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Glorot et al.,
          <year>2011</year>
          ]
          <string-name>
            <given-names>Xavier</given-names>
            <surname>Glorot</surname>
          </string-name>
          , Antoine Bordes, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <article-title>Domain adaptation for large-scale sentiment classification: A deep learning approach</article-title>
          . In ICML, pages
          <fpage>513</fpage>
          -
          <lpage>520</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[Jiang and Zhai</source>
          , 2007]
          <string-name>
            <given-names>Jing</given-names>
            <surname>Jiang and ChengXiang Zhai</surname>
          </string-name>
          .
          <article-title>Instance weighting for domain adaptation in nlp</article-title>
          .
          <source>In ACL</source>
          , pages
          <fpage>264</fpage>
          -
          <lpage>271</lpage>
          . Association for Computational Linguistics,
          <year>June 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Liu et al.,
          <year>2013</year>
          ] Shenghua Liu,
          <string-name>
            <given-names>Fuxin</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Fangtao</given-names>
            <surname>Li</surname>
          </string-name>
          , Xueqi Cheng, and
          <string-name>
            <given-names>Huawei</given-names>
            <surname>Shen</surname>
          </string-name>
          .
          <article-title>Adaptive co-training svm for sentiment classification on tweets</article-title>
          .
          <source>In CIKM</source>
          , pages
          <fpage>2079</fpage>
          -
          <lpage>2088</lpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Pan et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>Sinno</given-names>
            <surname>Jialin</surname>
          </string-name>
          <string-name>
            <given-names>Pan</given-names>
            , Xiaochuan Ni,
            <surname>Jian-Tao</surname>
          </string-name>
          <string-name>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Qiang</given-names>
            <surname>Yang</surname>
          </string-name>
          , and Zheng Chen.
          <article-title>Cross-domain sentiment classification via spectral feature alignment</article-title>
          .
          <source>In WWW</source>
          , pages
          <fpage>751</fpage>
          -
          <lpage>760</lpage>
          . ACM,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[Pang and Lee</source>
          , 2008]
          <string-name>
            <given-names>Bo</given-names>
            <surname>Pang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lillian</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Opinion mining and sentiment analysis</article-title>
          .
          <source>Foundations and trends in information retrieval</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          -2):
          <fpage>1</fpage>
          -
          <lpage>135</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>[Purver and Battersby</source>
          , 2012]
          <string-name>
            <given-names>Matthew</given-names>
            <surname>Purver</surname>
          </string-name>
          and
          <string-name>
            <given-names>Stuart</given-names>
            <surname>Battersby</surname>
          </string-name>
          .
          <article-title>Experimenting with distant supervision for emotion classification</article-title>
          .
          <source>In EACL</source>
          , pages
          <fpage>482</fpage>
          -
          <lpage>491</lpage>
          . Association for Computational Linguistics,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Tang et al.,
          <year>2014</year>
          ]
          <string-name>
            <given-names>Duyu</given-names>
            <surname>Tang</surname>
          </string-name>
          , Furu Wei, Nan Yang,
          <string-name>
            <surname>Ming Zhou</surname>
            , Ting Liu, and
            <given-names>Bing</given-names>
          </string-name>
          <string-name>
            <surname>Qin</surname>
          </string-name>
          .
          <article-title>Learning sentimentspecific word embedding for twitter sentiment classification</article-title>
          .
          <source>In ACL</source>
          , volume
          <volume>1</volume>
          , pages
          <fpage>1555</fpage>
          -
          <lpage>1565</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Townsend et al.,
          <year>2014</year>
          ] Richard Townsend, Aaron Kalair, Ojas Kulkarni, Rob Procter, and Maria Liakata. University of warwick:
          <article-title>Sentiadaptron-a domain adaptable sentiment analyser for tweets-meets semeval</article-title>
          .
          <source>SemEval</source>
          <year>2014</year>
          , page
          <volume>768</volume>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [Tsakalidis et al.,
          <year>2014</year>
          ]
          <string-name>
            <given-names>Adam</given-names>
            <surname>Tsakalidis</surname>
          </string-name>
          , Symeon Papadopoulos, and
          <string-name>
            <given-names>Ioannis</given-names>
            <surname>Kompatsiaris</surname>
          </string-name>
          .
          <article-title>An ensemble model for cross-domain polarity classification on twitter</article-title>
          .
          <source>In WISE</source>
          , pages
          <fpage>168</fpage>
          -
          <lpage>177</lpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Tumasjan et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>Andranik</given-names>
            <surname>Tumasjan</surname>
          </string-name>
          , Timm Oliver Sprenger, Philipp G Sandner,
          <article-title>and Isabell M Welpe. Predicting elections with twitter: What 140 characters reveal about political sentiment</article-title>
          .
          <source>ICWSM</source>
          ,
          <volume>10</volume>
          :
          <fpage>178</fpage>
          -
          <lpage>185</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>[Villaespesa</source>
          , 2013]
          <string-name>
            <given-names>Elena</given-names>
            <surname>Villaespesa</surname>
          </string-name>
          .
          <article-title>Diving into the museums social media stream: Analysis of the visitor experience in 140 characters</article-title>
          .
          <source>In Museums and the Web</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>[Vo and Zhang</source>
          , 2015]
          <article-title>Duy-Tin Vo and Yue Zhang. Targetdependent twitter sentiment classification with rich automatic features</article-title>
          .
          <source>In IJCAI</source>
          , pages
          <fpage>1347</fpage>
          -
          <lpage>1353</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [Xia et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>Rui</given-names>
            <surname>Xia</surname>
          </string-name>
          , Chengqing Zong, Xuelei Hu, and
          <string-name>
            <given-names>Erik</given-names>
            <surname>Cambria</surname>
          </string-name>
          .
          <article-title>Feature ensemble plus sample selection: domain adaptation for sentiment classification</article-title>
          .
          <source>Intelligent Systems</source>
          , IEEE,
          <volume>28</volume>
          (
          <issue>3</issue>
          ):
          <fpage>10</fpage>
          -
          <lpage>18</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [Xia et al.,
          <year>2014</year>
          ]
          <string-name>
            <given-names>Rui</given-names>
            <surname>Xia</surname>
          </string-name>
          , Jianfei Yu, Feng Xu,
          <string-name>
            <given-names>and Shumei</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Instance-based domain adaptation in nlp via intarget-domain logistic approximation</article-title>
          .
          <source>In AAAI</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <source>[Yang and Hauptmann</source>
          , 2008]
          <string-name>
            <given-names>Jun</given-names>
            <surname>Yang and Alexander G Hauptmann</surname>
          </string-name>
          .
          <article-title>A framework for classifier adaptation and its applications in concept detection</article-title>
          .
          <source>In MIR</source>
          , pages
          <fpage>467</fpage>
          -
          <lpage>474</lpage>
          . ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [Yang et al.,
          <year>2007</year>
          ]
          <string-name>
            <given-names>Jun</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Rong</given-names>
            <surname>Yan</surname>
          </string-name>
          , and Alexander G Hauptmann.
          <article-title>Cross-domain video concept detection using adaptive svms</article-title>
          .
          <source>In Proceedings of the 15th international conference on Multimedia</source>
          , pages
          <fpage>188</fpage>
          -
          <lpage>197</lpage>
          . ACM,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>[Yang</source>
          , 2009]
          <string-name>
            <given-names>Jun</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>A general framework for classifier adaptation and its applications in multimedia</article-title>
          .
          <source>PhD thesis</source>
          , Columbia University,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>