<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semi-Supervised Models via Data Augmentation for Classifying Interactive A ective Responses</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jiaao Chen</string-name>
          <email>jiaaochen@gatech.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuwei Wu?</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diyi Yang</string-name>
          <email>diyi.yang@cc.gatech.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Georgia Institute of Technology</institution>
          ,
          <addr-line>Atlanta GA 30318</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Shanghai Jiao Tong University</institution>
          ,
          <addr-line>Shanghai</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present semi-supervised models with data augmentation (SMDA), a semi-supervised text classi cation system to classify interactive a ective responses. SMDA utilizes recent transformer-based models to encode each sentence and employs back translation techniques to paraphrase given sentences as augmented data. For labeled sentences, we performed data augmentations to uniform the label distributions and computed supervised loss during training process. For unlabeled sentences, we explored self-training by regarding low-entropy predictions over unlabeled sentences as pseudo labels, assuming high-con dence predictions as labeled data for training. We further introduced consistency regularization as unsupervised loss after data augmentations on unlabeled data, based on the assumption that the model should predict similar class distributions with original unlabeled sentences as input and augmented sentences as input. Via a set of experiments, we demonstrated that our system outperformed baseline models in terms of F1-score and accuracy.</p>
      </abstract>
      <kwd-group>
        <kwd>Semi-Supervised Learning</kwd>
        <kwd>Data Augmentation</kwd>
        <kwd>Deep Learn- ing</kwd>
        <kwd>Social Support</kwd>
        <kwd>Self-disclosure</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        A ect refers to emotion, sentiment, mood, and attitudes including subjective
evaluations, opinions, and speculations [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. Psychological models of a ect have
been utilized by other extensive computational research to operationalize and
measure users' opinions, intentions, and expressions. Understanding a ective
responses with in conversations is an important rst step for studying a ect
and has attracted a growing amount of research attention recently [
        <xref ref-type="bibr" rid="ref19 ref20 ref4">20, 4, 19</xref>
        ].
The a ective understanding of conversations focuses on the problem of how
speakers use emotions to react to a situation and to each other, which can
help better understand human behaviors and build better
human-computerinteraction systems.
      </p>
      <p>
        However, modeling a ective responses within conversations is relatively
challenging since it is hard to quantify the a ectiveness [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and there are no
largescale labeled dataset about a ective levels in responses. In order to facilitate
research in modeling interactive a ective responses, [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] introduced a
conversation dataset, O MyChest, building from Reddit, and proposed two tasks: (1)
Semi-supervised learning task: predict labels for Disclosure and Supportiveness
in sentences based on a small amount of labeled and large unlabeled training
data; (2) Unsupervised task: design new characterizations and insights to model
conversation dynamics. The current work focused on the rst task.
      </p>
      <p>With limited labeled data and large amount of unlabeled data being given,
to alleviate the dependence on labeled data, we combine recent advances in
language modeling, semi-supervised learning on text and data augmentations on
text to form Semi-Supervised Models via Data Augmentation (SMDA). SMDA
consists of two parts: supervised learning over labeled data (Section 4.1) and
unsupervised learning over unlabeled data (Section 4.2). Both parts utilize data
augmentations to enhance the learning procedures. Our contributions in this
work can be summarized into three parts: analysed the O MyChest dataset
in Section 3, proposed a semi-supervised text classi cation system to classify
interactive a ective responses classi cation in Section 4 and described the
experimental details and results in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Transformer-based Models : With transformer-based pre-trained models
becoming more and more widely-used, pre-training and ne-tuning framework [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] with
large pre-trained language models are applied into a lot of NLP applications
and achieved state-of-the-art performances [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Language models [
        <xref ref-type="bibr" rid="ref15 ref21 ref7">15, 7, 21</xref>
        ] or
masked language models [
        <xref ref-type="bibr" rid="ref10 ref3">3, 10</xref>
        ] are pre-trained over a large amount of text from
Wikipedia and then ne-tuned on speci c tasks like text classi cations. Here we
built our SMDA system based on such framework.
      </p>
      <p>
        Data Augmentation on Text : When the amount of labeled data is limited,
one common technique for handling the shortage of data is to augment given
data and generate more training \augmented" data. Previous work has utilized
simple operations like synonym replacement, random insertion, random swap
and random deletion for text data augmentation [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Another line of research
applied neural models for augmenting text by generating paraphrases via back
translations [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and monotone submodular function maximization [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Building
on those prior work, we utilized back translations as our augment methods on
both labeled and unlabeled sentences.
      </p>
      <p>
        Semi-Supervised Learning on Text Classi cation : One alternative to deal with
the lack of labeled data is to utilize unlabeled data in the learning process, which
is denoted as Semi-Supervised Learning (SSL), since unlabeled data is usually
easier to get compared to labeled data. Researchers has made use of variational
auto encoders (VAEs) [
        <xref ref-type="bibr" rid="ref2 ref22 ref6">2, 22, 6</xref>
        ], self-training [
        <xref ref-type="bibr" rid="ref11 ref12 ref5">11, 5, 12</xref>
        ], consistency regularization
[
        <xref ref-type="bibr" rid="ref13 ref14 ref18">14, 13, 18</xref>
        ] to introduce extra loss functions over unlabeled data to help the
learning of labeled samples. VAEs utilize latent variables to reconstruct input labeled
and unlabeled sentences and predict sentence labels with these latent variables;
self-training adds unlabeled data with high-con dence predictions as pseudo
labeled data during training process and consistency regularization forces model to
output consistent predictions after adding adversarial noise or performing data
augmentations to input data. We combined self-training, entropy minimization
and consistency regularization in our system for unlabeled sentences.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Data Analysis and Pre-processing</title>
      <p>
        Researching how human initiate and hold conversations has attracted increasing
attention those days, as it can help us better understand how human behave over
conversations and build better AI systems like social chatbot to communicate
with people. In this section, we took a closer look at the conversation dataset,
O MyChest [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], for better understanding and modeling interactive a ective
responses. Speci cally, we describe certain characteristics of this dataset and our
pre-processing steps.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Label De nition</title>
        <p>
          For each comment of a post on Reddit, [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] annotated them with 6 labels:
Information disclosure representing some degree of personal information in
comments; Emotional disclosure representing comments containing certain positive
or negative emotions; Support referring to comments o ering social support like
advice; General support representing that comments are o ering general support
through quotes and catch phrases, with Information support o ering speci c
information like practical advice, and Emotional support o ering sympathy, caring
or encouragement. Each comment can belong to multiple categories.
In O MyChest corpus, there are 12,860 labeled sentences and over 420k
unlabeled sentences for training, 5,000 unlabeled sentences for test. The label
distributions of labeled sentences are showed in Fig. 1. To train and evaluate our
        </p>
        <p>Original Augmented
I'm crying a lot of tears of joy Right now I'm crying a lot of
right now. happy tears.</p>
        <p>Stepdad will be the one walking It will be my stepfather walking
me down the aisle when I get me down the aisle when I
married. get married.</p>
        <p>Hope you have a nice day. I hope you have a good day.</p>
        <p>Your best e ort, both of you bBeostthsohfoty.ou are giving it your
Plan your transition back to Plan your move back to a job
working outside of the home. outside your own home.</p>
        <p>I am so freaking happy for you! I'm so excited for you!
systems, we randomly split the given labeled sentence set into train, development
and test set. The data statics are shown in Table 1. We tuned hyper-parameters
and chose best models based on performance on dev set, and reported model's
performance on test set.
We utilized XLNet-cased-based tokenizer3 to split each sentence into tokens. We
showed the cumulative sentence length distribution in Fig. 2, 95% comments
have less than 64 tokens. Thus we set the maximum sentence length to 64,
and remained the rst 64 tokens for sentences that exceed the limit. As for data
augmentations, we made use of back translation with German as middle language
to generate paraphrases for given sentences. Speci cally, we loaded translation
model from Fairseq4, translated given sentences from English to German, and
then translated them back to English. Also to increase the diversity of generated
paraphrases, we employed random sampling with a tunable temperature (0.8)
instead of beam search for the generation. We describe some examples in Table 2.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Method</title>
      <p>We convert this 6-class a ective response classi cation task into 6 binary
classi cation tasks, namely whether each sentence belongs to each category or not
3 https://huggingface.co/transformers/model_doc/xlnet.html#xlnettokenizer
4 https://github.com/pytorch/fairseq
(labeled with 1 or 0). For each binary classi cation task, given a set of labeled
sentences consisting of n samples S = fs1; :::; sng with labels L = fl1; :::; lng,
where li 2 f0; 1g2, and a set of unlabeled sentences Su = fs1u; :::; sumg, our goal is
to learn the classi er f (^ljs; i); i 2 [1; 6]. Our SMDA model contains several
components: Supervised Learning (Section 4.1) for labeled sentences, Unsupervised
Learning (Section 4.2) for unlabeled sentences, and Semi-Supervised Objective
Function (Section 4.3) to combine labeled and unlabeled sentences.</p>
      <sec id="sec-4-1">
        <title>Generating Balanced Labeled Training Set As shown in Fig. 1, the dis</title>
        <p>tribution is very unbalanced with respect to General support, Info support and
Emo support. In order to get more training sentences with these three types of
support and make these three binary classi cation sub-tasks learn-able with a
more balanced training set, we performed data augmentations over sentences
with these three labels. Speci cally, we paraphrased each sentence by 4 times
via back translations and regarded that the augmented sentences have the same
labels as original sentences. The comparison distributions are shown in Fig. 3</p>
      </sec>
      <sec id="sec-4-2">
        <title>Supervised Learning for Labeled Sentences For each input labeled sen</title>
        <p>
          tence si, we used XLNet [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] g(:) to encode it into hidden representation hi =
g(si), and then passed them though a 2-layer MLP to predict the class
distribution l^i = f (hi). Since these sentences have speci c labels, we optimize the cross
entropy loss as supervised loss term:
        </p>
        <p>LS (si; li) =</p>
        <p>X li log f (g(si))
(1)</p>
        <p>
          (a) Before Augmentation
(b) After Augmentation
Paraphrasing Unlabeled Sentences We rst performed back translations
once for each unlabeled sentence siu 2 Su to generate the augmented sentence
su;a; :::; sum;ag in the same manner we described before.
set Su;a = f 1
Guessing Labels for Unlabeled Sentences For an unlabeled sentence siu,
we utilized g(:) and f (:) in Section 4.1 to predict the class distribution:
To avoid the prediction being so close to uniform distribution, we generate
lowentropy guessing labels ~liu by a sharpening function [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]:
^liu = f (g(siu))
~liu =
        </p>
        <p>(^liu) T1
jj(^liu) T1 jj1
(2)
(3)
where jj:jj1 is l1-norm of the vector. When T ! 0, the guessed label becomes an
one-hot vector.</p>
        <p>Self-training for Original Sentences Inspired by self-training where model is
also trained over unlabeled data with high-con dence predictions as their labels,
in SMDA, with our guessed labels ~liu with respect to original unlabeled sentence
siu, we added such pair (siu; ~liu) into training by minimize the KL Divergence
between them:</p>
        <p>Ls(siu) = KL(f (g(siu))jj~liu)</p>
      </sec>
      <sec id="sec-4-3">
        <title>Entropy Minimization for Original Sentences One common assumption in</title>
        <p>
          many semi-supervised learning methods is that a classi er's decision boundary
should not pass through high-density regions of the marginal data distribution
[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Thus for original unlabeled sentence siu, we added another loss term to
minimize the entropy of model's output:
        </p>
        <p>Le(siu) =</p>
        <p>X f (g(siu)) log f (g(siu))</p>
      </sec>
      <sec id="sec-4-4">
        <title>Consistency Regularization for Augmented Sentences With the assump</title>
        <p>tion that the model should predict similar distributions with input sentences
before and after augmentations, we minimized the KL Divergence between outputs
with original sentence siu as input and augmented sentence siu;a as input:</p>
        <p>Lc(siu) = KL(^liujjf (g(siu;a)))
Combining all the loss terms for unlabeled sentences, we de ned our
unsupervised loss terms as:</p>
        <p>LU (siu) = Ls(siu) + Le(siu) + Lc(siu)
4.3</p>
      </sec>
      <sec id="sec-4-5">
        <title>Semi-Supervised Objective Function</title>
        <p>We combined the supervised and unsupervised learning described above to form
our overall semi-supervised objective function:</p>
        <p>L = E(si;li)2(S;L)LS (si; li) +</p>
        <p>Esiu2Su LU (siu)
where</p>
        <p>is the balanced weight between supervised and unsupervised loss term.
5
5.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <sec id="sec-5-1">
        <title>Model Setup</title>
        <p>In SMDA 5, we only used single model for each task without jointly training and
parameter sharing. That is, we trained six separate classi ers on these tasks.
5 The codes and data split will be released later.
(4)
(5)
(6)
(7)
(8)</p>
        <p>Inspired by recent success in pre-trained language models, we utilized the
pretrained weights of XLNet and followed the same ne-tuning procedure as XLNet.
We set the initial learning rate for XLNet encoder as 1e-5 and other linear layers
as 1e-3. The batch size was selected in f32; 64; 128; 256g. The maximum number
of epochs is set as 20. Hyper-parameters were selected using the performance
on development set. The sharpen temperature T was selected in f0:3; 0:5; 0:8g
depending on di erent tasks. The balanced weight between supervised learning
loss and unsupervised learning loss term started from a small number and grew
through training process to 1.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Results</title>
        <p>Our experimental results are shown in Table 3. We compared our proposed
SMDA with BERT and XLNet in terms of accuracy(%) and Macro F1 score.
BERT and XLNet achieved similar performance since they both obey the
pretraining and ne-tuning manner. When combining with augmented and more
balanced labeled data, massive unlabeled data, our SMDA achieved best
performance across six binary-classi cation tasks. And we submitted the classi cation
results on given unlabeled test set.
In this work, we focused on identifying disclosure and supportiveness in
conversation responses based on a small labeled and large unlabeled training data via
our proposed semi-supervised text classi cation system : Semi-Supervised
Models via Data Augmentation (SMDA). SMDA utilized supervised learning over
labeled data and conducted self-training, entropy minimization and consistency
regularization over unlabeled data. Experimental results demonstrated that our
system outperformed baseline models signi cantly.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Berthelot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carlini</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Papernot</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oliver</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raffel</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Mixmatch: A holistic approach to semi-supervised learning</article-title>
          .
          <source>CoRR abs/1905</source>
          .02249 (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Livescu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gimpel</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Variational sequential labelers for semi-supervised learning</article-title>
          .
          <source>In: Proc. of EMNLP</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <volume>4171</volume>
          {
          <fpage>4186</fpage>
          . Association for Computational Linguistics, Minneapolis,
          <source>Minnesota (Jun</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ernala</surname>
            ,
            <given-names>S.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rizvi</surname>
            ,
            <given-names>A.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Birnbaum</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kane</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Choudhury</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Linguistic markers indicating therapeutic outcomes of social media disclosures of schizophrenia</article-title>
          .
          <source>Proceedings of the ACM on Human-Computer Interaction 1(CSCW)</source>
          ,
          <volume>43</volume>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Grandvalet</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Semi-supervised learning by entropy minimization</article-title>
          .
          <source>In: Proceedings of the 17th International Conference on Neural Information Processing Systems</source>
          . pp.
          <volume>529</volume>
          {
          <fpage>536</fpage>
          . NIPS'04, MIT Press, Cambridge, MA, USA (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gururangan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Card</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          :
          <article-title>Variational pretraining for semisupervised text classi cation</article-title>
          . CoRR abs/
          <year>1906</year>
          .02242 (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Howard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruder</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Universal language model ne-tuning for text classi cation</article-title>
          .
          <source>In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          . pp.
          <volume>328</volume>
          {
          <fpage>339</fpage>
          . Association for Computational Linguistics, Melbourne,
          <source>Australia (Jul</source>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Jaidka</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiahui</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chhaya</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ungar</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>A report of the CL-A O MyChest Shared Task at</article-title>
          A ective Content Workshop @ AAAI.
          <source>In: Proceedings of the 3rd Workshop on A ective Content Analysis @ AAAI (A Con2020)</source>
          . New York, New York (
          <year>February 2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhattamishra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhandari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Talukdar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Submodular optimization-based diverse paraphrasing and its e ectiveness in data augmentation</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <volume>3609</volume>
          {
          <fpage>3619</fpage>
          . Association for Computational Linguistics, Minneapolis,
          <source>Minnesota (Jun</source>
          <year>2019</year>
          ). https://doi.org/10.18653/v1/
          <fpage>N19</fpage>
          - 1363, https://www.aclweb.org/anthology/N19-1363
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lample</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Cross-lingual language model pretraining</article-title>
          . CoRR abs/
          <year>1901</year>
          .07291 (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>D.H.</given-names>
          </string-name>
          :
          <article-title>Pseudo-label : The simple and e cient semi-supervised learning method for deep neural networks</article-title>
          .
          <source>ICML 2013 Workshop : Challenges in Representation Learning (WREPL)</source>
          (
          <year>07 2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Meng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Zhang, C.,
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          .:
          <article-title>Weakly-supervised neural text classi cation</article-title>
          .
          <source>In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management</source>
          . pp.
          <volume>983</volume>
          {
          <fpage>992</fpage>
          . CIKM '18,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2018</year>
          ). https://doi.org/10.1145/3269206.3271737, http://doi.acm.
          <source>org/10</source>
          .1145/ 3269206.3271737
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Miyato</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Adversarial training methods for semisupervised text classi cation</article-title>
          .
          <source>In: International Conference on Learning Representations</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Miyato</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maeda</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koyama</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ishii</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Virtual adversarial training: A regularization method for supervised and semi-supervised learning</article-title>
          .
          <source>IEEE Trans. Pattern Anal. Mach. Intell</source>
          .
          <volume>41</volume>
          (
          <issue>8</issue>
          ),
          <year>1979</year>
          {
          <year>1993</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long Papers). pp.
          <volume>2227</volume>
          {
          <fpage>2237</fpage>
          . Association for Computational Linguistics, New Orleans,
          <source>Louisiana (Jun</source>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Warriner</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shore</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Imbault</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuperman</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Sliding into happiness: A new tool for measuring a ective responses to words</article-title>
          .
          <source>Canadian Journal of Experimental Psychology/Revue canadienne de psychologie experimentale 71(1)</source>
          ,
          <volume>71</volume>
          (
          <year>2017</year>
          ). https://doi.org/10.1037/cep0000112, https://app.dimensions.ai/details/publication/pub.1084126691andhttp: //europepmc.org/articles/pmc5334777?pdf=render
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zou</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>EDA: easy data augmentation techniques for boosting performance on text classi cation tasks</article-title>
          . CoRR abs/
          <year>1901</year>
          .11196 (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luong</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          :
          <article-title>Unsupervised data augmentation for consistency training</article-title>
          . arXiv preprint arXiv:
          <year>1904</year>
          .
          <volume>12848</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kraut</surname>
            ,
            <given-names>R.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , May eld, E.,
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          : Seekers, providers, welcomers, and
          <article-title>storytellers: Modeling social roles in online health communities</article-title>
          .
          <source>In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems</source>
          . p.
          <fpage>344</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seering</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kraut</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>The channel matters: Self-disclosure, reciprocity and social support in online cancer support groups</article-title>
          .
          <source>In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems</source>
          . p.
          <fpage>31</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , Carbonell, J.G.,
          <string-name>
            <surname>Salakhutdinov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          :
          <article-title>Xlnet: Generalized autoregressive pretraining for language understanding</article-title>
          . CoRR abs/
          <year>1906</year>
          .08237 (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhutdinov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berg-Kirkpatrick</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Improved variational autoencoders for text modeling using dilated convolutions</article-title>
          .
          <source>CoRR abs/1702</source>
          .08139 (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , P.:
          <article-title>The a ective response model: A theoretical framework of a ective concepts and their relationships in the ict context</article-title>
          .
          <source>Management Information Systems Quarterly (MISQ) 37</source>
          ,
          <fpage>247</fpage>
          {
          <volume>274</volume>
          (03
          <year>2013</year>
          ). https://doi.org/10.25300/MISQ/
          <year>2013</year>
          /37.1.
          <fpage>11</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>