<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>[CL-AFF Shared Task] Multi-label Text Classi cation Using an Emotion Embedding Model</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jiwung Hyun</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Byung-Chull Bae</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yun-Gyung Cheong</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Computing, Sungkyunkwan University</institution>
          ,
          <addr-line>Suwon-si</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Games, Hongik University</institution>
          ,
          <addr-line>Sejong-si</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we propose a deep learning model-based approach that combines a language embedding model and an emotion embedding model in the classi cation of text for the CL-AFF Shared Task 2020. The task aims to predict the disclosure and supportiveness labels of the comments (to the posts) in the O MyChest dataset which consists of a small labeled dataset and a large unlabeled dataset. We investigate the e ectiveness of the BERT, Glove, and Emotional Glove embedding models, to represent the text for label prediction. We also propose to use the original posts in the dataset as contextual information. We evaluated our approach and report the results.</p>
      </abstract>
      <kwd-group>
        <kwd>Semisupervised learning Emotion embedding BERT</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        This paper describes our approach to solve the CL-AFF (Computational
Linguistics - A ect Understanding) Shared Task 2020. In the CL-AFF 2020 task,
the O MyChest conversation dataset is introduced to help understand the role
of emotion in conversations (see [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for details). The dataset consists of top
posts and the comments to these posts collected from the CasualConversations
and the O MyChest communities on Reddit. A small portion of the comments
are labeled as informational disclosure, emotional disclosure, and
supportiveness, where supportiveness is further characterized as general, informational,
and emotional. As a result, a comment text is annotated with a total of 6 labels,
where a single comment can have multiple labels. Therefore, this task involves
multi-label classi cation problems.
      </p>
      <p>Text classi cation in natural language processing has traditionally employed
classi cation algorithms in machine learning such as support vector machines,
Bayesian classi ers, decision trees, etc. A variety of features in text are given
as the input of the classi ers, which are crucial to traditional text classi cation
algorithms. For the representation of text, word frequency-based approaches
(e.g., bag-of-words feature) or sequence-based approaches (e.g., N-grams feature)
have been commonly used.</p>
      <p>
        Text classi cation using neural models are comprised of embedding models
and classi cation models. Along with the success of word embedding models
such as Word2vec [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and Glove [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] in text classi cation, advances of various
deep learning algorithms have lead to more complex embedding models, such
as contextual language model, also known as ELMo [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and BERT [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Deep
neural networks are used not only to extract features from text but also to
construct classi ers. Kim [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], for example, presented great performances in text
classi cation by applying 1D CNN on sentence classi cation problems.
      </p>
      <p>
        In this paper, focusing on emotional words in the data, we propose a deep
learning model-based text classi cation approach using an embedding model,
which has the combination of a language model and an emotion embedding
model. We particularly focus on emotional words from O MyChest dataset to
improve learning emotional labels (emotional disclosure, emotional support). We
combine labeled and unlabeled comments data for our semi-supervised method
to help supervised learning. And then the sentence features extracted from the
embedding model are given as TextCNN[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] input for text classi cation.
      </p>
      <p>
        Furthermore, to improve the classi cation performance, we apply EDA (Easy
Data Augmentation)[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] on small labeled data in the training step. This paper
presents our baseline and variables in our experiment, as well as the evaluation
models for binary classi cation of disclosure and supportiveness (Task 1).
2
      </p>
      <p>The O</p>
    </sec>
    <sec id="sec-2">
      <title>MyChest Conversation Dataset</title>
      <p>O MyChest conversation dataset is comprised of three sets - labeled training
set, unlabeled training set, and test set: unlabeled training data set includes
unlabeled posts and comments; labeled training set and unlabeled test set include
about 10,000 labeled sentences and 3,000 unlabeled sentences respectively from
the top commented posts. Table 1 lists several excerpts sampled from the labeled
data.</p>
      <p>Text id ED ID S GS IS ES
Hope you have a nice day 91px39 0 0 1 1 0 0
My wife came in when I was around half way through 91px39 1 1 0 0 0 0
this and asked why I was all choked up and watery eyed,
so we read it together and now we're both crying.</p>
      <p>I am crying a lot of happy tears right now. 91px39 1 0 0 0 0 0
He's my father in every sense of the word but name, I 946qw9 1 0 0 0 0 0
still call him by his rst name but only because we are
both used to it and he doesn't mind a bit.</p>
      <p>Stepdad will be the one walking me down the aisle when 946qw9 1 1 0 0 0 0
I get married.
dThat's wonderful. :) My step-dad has been around for 946qw9 1 1 1 1 0 1
30 yrs now.
Category
Emotional Disclosure
Informational Disclosure
Support
General Support
Informational Support
Emotional Support
None (all label is 0)
All</p>
      <p>Number of label 1 Percentage in the category
3,948 31%
4,891 38%
3,226 25%
680 5%
1,250 10%
1,006 8%
4,157 32%
12,860</p>
      <p>To understand the data, we examine the number of labeled data in each
category (Table 1). As seen in the table, the number of the data labeled as class
1 in the `Support' category groups (e.g., support, general support, informational
support, emotional support) are far less than the number of data labeled as class
0. For instance, the data labeled as 1 in the `General Support' category occupy
only 5% of the a the data. This means that the data exhibits class imbalance
problems, severely in the labels of `general support', `informational support',
`emotional support'. Although these categories seem to be the sub-types of the
`Support' category, we treat them independently for the classi cation because
the sum of all their instances with label 1 does not match with the number of
instances in the `Support' category (N =3,226).</p>
      <p>Furthermore, we investigate the word usage in the emotional categories. First,
we extract the top 100 most frequently used words in each emotional category.
Then, we remove the words that also appear frequently in the non-emotional
categories (e.g., informational disclosure, support, general support, informational
support). Finally, we normalize the raw count by the number of labels of the
(a) Emotional disclosure
(b) Informational disclosure
(c) Emotional support
(d) Informational support
(e) general support
(f) no label
corresponding category and sum up the two numbers. Table 4 shows the top 10
words ranked by the normalized frequencies. Therefore, these words can be used
to characterize the emotional categories.</p>
      <p>Finally, Figure 1 shows the word cloud for each category, visualizing the
frequently used words in the categories. While only nouns are generally used to
construct word clouds, we also use adjectives as they can represent emotion. The
word clouds show that many words (e.g., life, time, good) overlap across di erent
categories. There are some words unique to each group: for example, `way' in the
informational disclosure category, and `sorry' in the emotional support category.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Approach</title>
      <p>
        This section details our approach which employs pre-trained embedding models
to generate the vectors representing the text. These vectors serve as the input
for the textCNN [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] model for multi-label classi cation. We also investigate if
the use of the post as contextual information can enhance the label prediction
performance Figure 2 illustrates the overall architecture of our system.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Word Embedding Models</title>
        <p>
          As our word embedding model, we utilize the pre-trained BERT [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and an
emotional embedding scheme [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The original posts of comments are generally
lengthy. Therefore, we summarize them into 3 5 sentences using LexRank [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
before text processing. As the rst text pre-processing step, the sentences are
tokenized using the BERT tokenizer. We set the max sequence length of tokens
as 64, because the average sequence length of comments data is 19 and only 1%
of the total training data exceeds 64 tokens. As a result, the feature vector of
a single comment consists of the 64 tokens representing the comment itself and
the 64 tokens representing its corresponding post.
        </p>
        <p>
          Additionally, we use an emotional embedding model that incorporates
emotional information into a word embedding model such as Word2vec [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], Glove
[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], etc. Emotional embedding refers to a new vector space by tting emotional
information into pre-trained word vectors. Constraint set constructed by all
word/emotion relations are used for training. It is learned to get closer to the
pairs of words that are in positive relation. We use pre-trained emotional
embedding Glove [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. As the vocabulary in BERT is di erent from that of Glove,
we set zero embedding when the token tokenized by BERT is not present in the
Glove vocabulary. Features from the pre-trained BERT and emotional
embedding Glove are concatenated. As a result, the features vector for a comment has
(62, 1068) shape, representing 62 tokens of 1068 dimensions (768 BERT
dimension + 300 emotional Glove dimension 300). When the post is used as a context,
the input of our CNN model has (124, 1068) shape.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>TextCNN</title>
        <p>Our textCNN model uses one-dimension convolutions with the lter size of 256
and the kernel size of 3, 4, 5. After max pooling on convolutions result, the
features are concatenated and atten to be connected with a fully-connected
layer as the nal layer for the classi cation. The output dimension is 6, and
the sigmoid function is used as the activation function. The output represents
prediction probability of each class; the probability of greater than or equal to
0.5 is labeled as 1, and otherwise labeled as 0. We adopt the Adam optimizer
with learning rate 1e-4 and epsilon 1e-8 in this study. The binary cross entropy
loss function is used to train our model.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Data Augmentation for Training</title>
        <p>
          Via informal experimentations, we discovered that the prediction performance
of `General support' is very low. We attribute this to the small number of the
label 1 data. On the other hand, the numbers of the `Info support' and the
`Emo support' labels are 1250 and 1006 respectively. Those labels show half
the performance of the rest of labels. To address this class imbalance issue, we
apply EDA (Easy Data Augmentation) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] to the general support, informational
support, and the emotional support categories. EDA uses synonym replacement,
random insertion, random swap, and random deletion for augmentation. We
augment 9 sentences for each sentence of the support group categories during
the training stage in the system run.
The O MyChest dataset also provides unlabeled comment data, which contain
over 420,000 sentences. To make use of the data, we assign pseudo-labels to the
unlabeled comments data using the best classi cation model. Then we re-train
the model with the labeled data along with the pseudo-labeled data to improve
the classi cation performance following the semi-supervised method in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>When training, we want the model to learn from the labeled data more than
the pseudo-labels as the pseudo-labels can be incorrect. Thus, we build the
initial model trained with the augmented labeled data on 10 epochs. Then, we
re-train the model with the pseudo-labeled data on 3 epochs. While we applied
the semi-supervised learning for the system run submission, the experiment
results presented below are obtained without the semi-supervised learning scheme
because our limited computing facility cannot allow the 10-fold cross validation.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>To evaluate the proposed approach in this paper, we use 10-fold cross validation
on the labeled training data. In this experimentation, EDA and semi-supervised
learning were not used due to limited computing resources for 10-fold cross
validation. The di erent conditions examined in our experiments are described
as below.</p>
      <p>
        Glove [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]: use of the pre-trained Glove trained with Twitter data. That
was our rst approach to solve this problem, however not used exclude this
experiment. Since the data we use in this study were collected from the Internet
community Reddit, we expected that the pre-trained Glove model may show
good performance. The pre-trained model uses a 200 dimension vector with 27B
tokens.
      </p>
      <p>
        Emotional Glove [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]: emotional embedding model using an approach
combining emotional information with Glove embedding. This model uses a 300
dimension vector with 6B tokens.
      </p>
      <p>
        BERT [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]: using the state-of-the-art language model to generate the feature
vector. Embedding dimension is 768, and max sequence length is 64.
      </p>
      <p>Context: utilizing the posts as a contextual information to test if it can
enhance the prediction performance.
4.1</p>
      <sec id="sec-4-1">
        <title>Results</title>
        <p>Table 4 reports the accuracy of label prediction. Overall, the BERT
embedding model without using the post shows the best performance in accuracy for
three categories. The combination of BERT with Emotional Glove results in the
best performance in the emotional disclosure category (without context) and
the general support category (with context). The combination of BERT with
Glove results outperforms the other models in the information support category.
The performance in accuracy seems promising in the support group categories,
ranging from 0.814(support) to 0.938(general support). Yet, their F1 scores show
di erent ndings.</p>
        <p>Table 5 shows the F1 scores of our model in each category. Overall, the
`BERT + Emotional Glove' model and the `BERT + Glove' model show the best
performance. The performance of the support subgroups (i.e., General Support,
Informational Support, and Emotional Support) are poor, as low as 0.05 (for the
general support label prediction), which are not su cient for its practical use.
Meanwhile, its corresponding accuracy is 0.934. This means that accuracy is not
a good metric when the class is imbalanced. Precision and recall performances
of the selected model are described in the following Table 6.</p>
        <p>Methods
Glove (Baseline)
Emotional Glove
BERT
BERT + context
BERT + Glove
BERT + Emotional Glove
BERT + Emotional Glove
+ context
{ As for the evaluation measures, accuracy is a poor metric to evaluate the
proposed approach due to class imbalance problems. Therefore, we propose
to use an F1 score, a harmonic mean of precision and recall, instead.
{ For the word embedding model, the BERT models perform better than the
Glove models.
{ The combination of BERT and Glove enhances the classi cation
performance.
{ The use of emotional Glove improves the classi cation performance for the
categories where classes are imbalanced (i.e., support, information support,
emotional support).
{ Opposed to our initial assumption, the use of the original post as a context
did not contribute to enhancing the prediction performance. We postulate
that it is because our method of concatenating one comment to one context
(i.e., a post associated with the comment) fails to give proper weight to
contexts, as the comments data are presumably labeled without considering
the contexts.</p>
        <p>For the CL-A Shared task 2020 competition, we used `BERT + Emotional
Glove' models as the system run model, as it reports the best F1 scores without
considering context. In the system run we applied EDA (Easy Data
Augmentation) and semi-supervised learning as described in Section 3.3 and 3.4. Our
submission contains the labels generated using 4 di erent settings from the
combination of [`with context' and `without context'] and [`with pseudo-label' and
`without pseudo-label'].
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>This paper describes our approach that combines the word and emotion
embedding models to predict `Disclosure' and `Supportiveness' in O MyChest dataset.
Three language embedding models - BERT, Glove, and Emotional Glove - were
compared, and BERT showed a better performance (about 10%) for the label
prediction than Glove. Our evaluation results also indicate that the combination
of embedding models can improve the performance. It is particularly noted that
Emotional Glove better represents the text than Glove when the class is
imbalanced. We adopted the original posts along with their associated comments
to increase the prediction performance. However, the result shows that using
the post makes no contribution to increasing the model's classi cation
performance. In the future, we plan to investigate how to convey the context of a
post in an e cient manner rather than just concatenating to increase prediction
performance.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>This research was partially supported by Basic Science Research Program through
the National Research Foundation of Korea(NRF) funded by the Ministry of
Education(2016R1D1A1B03933002). This work was partially supported by the
National Research Foundation of Korea(NRF) grant funded by the Korea
government(MEST) (No. 2019R1A2C1006316). This work was also partially supported
by Basic Science Research Program through the National Research Foundation of
Korea (NRF) funded by the Ministry of Science and ICT (2017R1A2B4010499).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Erkan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radev</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          : Lexrank:
          <article-title>Graph-based lexical centrality as salience in text summarization</article-title>
          .
          <source>J. Artif. Int. Res</source>
          .
          <volume>22</volume>
          (
          <issue>1</issue>
          ),
          <volume>457</volume>
          {479 (Dec
          <year>2004</year>
          ), http://dl.acm.org/citation.cfm?id=
          <volume>1622487</volume>
          .
          <fpage>1622501</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Jaidka</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiahui</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chhaya</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ungar</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>A report of the CL-A O MyChest Shared Task at</article-title>
          A ective Content Workshop @ AAAI.
          <source>In: Proceedings of the 3rd Workshop on A ective Content Analysis @ AAAI (A Con2020)</source>
          . New York, New York (
          <year>February 2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Convolutional neural networks for sentence classi cation</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          . pp.
          <volume>1746</volume>
          {
          <fpage>1751</fpage>
          . Association for Computational Linguistics, Doha, Qatar (Oct
          <year>2014</year>
          ). https://doi.org/10.3115/v1/
          <fpage>D14</fpage>
          -1181, https://www.aclweb.org/anthology/D14-1181
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>D.H.</given-names>
          </string-name>
          :
          <article-title>Pseudo-label : The simple and e cient semi-supervised learning method for deep neural networks</article-title>
          .
          <source>ICML 2013 Workshop : Challenges in Representation Learning (WREPL)</source>
          (
          <year>07 2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2</source>
          . pp.
          <volume>3111</volume>
          {
          <fpage>3119</fpage>
          . NIPS'
          <volume>13</volume>
          , Curran Associates Inc.,
          <source>USA</source>
          (
          <year>2013</year>
          ), http://dl.acm.org/citation.cfm?id=
          <volume>2999792</volume>
          .
          <fpage>2999959</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.: Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Empirical Methods in Natural Language Processing (EMNLP)</source>
          . pp.
          <volume>1532</volume>
          {
          <issue>1543</issue>
          (
          <year>2014</year>
          ), http://www.aclweb.org/anthology/D14-1162
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>In: Proc. of NAACL</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Seyeditabari</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tabari</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gholizadeh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zadrozny</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Emotional embeddings: Re ning word embeddings to capture emotional content of words</article-title>
          . CoRR abs/
          <year>1906</year>
          .00112 (
          <year>2019</year>
          ), http://arxiv.org/abs/
          <year>1906</year>
          .00112
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zou</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>EDA: easy data augmentation techniques for boosting performance on text classi cation tasks</article-title>
          . CoRR abs/
          <year>1901</year>
          .11196 (
          <year>2019</year>
          ), http://arxiv.org/abs/
          <year>1901</year>
          .11196
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>