<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Vito at HASOC 2019: Detecting Hate Speech and O ensive Content through Ensembles</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Victor Nina-Alcocer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Systems and Computation Universitat Politecnica de Valencia Cam de Vera</institution>
          <addr-line>s/n 46022 Valencia</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes our participation in the shared task \Hate Speech and O ensive Content Identi cation in Indo-European Languages" (HASOC) at the Forum for Information Retrieval Evaluation (FIRE) 2019. This work studies the detection of hate or o ensive content on English posts published on Facebook or Twitter. For a negrained study of the task, we analyzed two di erent approaches: the rst one regards the design of two architectures using convolutional and recurrent neural networks. Meanwhile, the second approach examines a range of paradigms based on classical machine algorithms, neural networks, and transformers.</p>
      </abstract>
      <kwd-group>
        <kwd>Facebook Twitter Hate Speech O ensive Content Convolutional Neural Networks Recurrent Neural Networks Transformers</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        HASOC[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] at FIRE[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] aims to identify hate and o ensive content in social
media, taking into account tweets and Facebook posts for Indo-European languages.
To accomplish this goal, they propose three sub-tasks that tackle this issue in
three languages: English, German and, Hindi.
      </p>
      <p>
        As we did in previous researches [13{15], we put our e ort to deal with
these kinds of challenges using Machine Learning (ML) and Natural Language
Processing (NLP). In this paper, we will use the same elds but digging a little
deeper in Deep Learning (DL) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], Neural Networks (NN) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and architectures
based on transformers [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>We organize this paper into four sections. The rst section provides an
introduction to this paper, followed by descriptions of the proposed approaches
and their respective systems. Meanwhile, the third section presents many
experiments, and those results achieved. The nal section concludes our work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Systems Description</title>
      <p>This section presents the main approaches that have been taken into account in
this research.</p>
      <p>
        The rst approach considers two architectures. The rst one is DL-NN, which
utilizes Convolutional Neural Networks (CNNs) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and Recurrent Neural
Networks (RNNs). It has three embedding layers as inputs. The rst layer (I1) takes
into account the embedding of a pre-processed post. The following two layers:
consider the embedding of its Part of Speech (POS) tagging [
        <xref ref-type="bibr" rid="ref16 ref3">3, 16</xref>
        ] (I2) as well as
the existence of positive or negative words, according to a pre-de ned lexicon [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
(I3). After that, the rst embedding input feeds a CNN layer, which is followed
by an LSTM layer (O1). The second and third input layers feed a CNN (O2) and
a dense layer (O3) respectively. The outputs of these three layers (O1, O2, O3)
are concatenated and feed to a dense layer, which compounded by two nodes.
Softmax is used to get the predictions.
      </p>
      <p>
        Our second architecture is NN-AA. It implements Long short-term memory
(LSTM) [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] layers, which is a typical type of RNN. It takes the embedding
representation of its respective pre-processed post, together with another feature
input extracted from the tweet's POS tagging. It is essentially an embedding
layer followed by an LSTM layer. The model also incorporates an attention
layer [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to pay focus on critical words in the sentence. We concatenate the
output of this attention layer to the POS tagging vector representation. The
vector is calculated as the sum of the frequency for each word's POS tag divided
by the total number words of the sentence. We feed the concatenated vector into
a dense layer that implements softmax function and generates predictions.
      </p>
      <p>
        We considered another approach, which is called MA-MO. This approach
examines a variety of models, that go from traditional Machine Learning, i.e.
Support Vector Machine [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Logistic Regression [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and Naive Bayes [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] models
to some Deep Learning models i.e., a simple CNN, a simple Dense layer (SDL),
or a Multi-Layer Perceptron (MLP) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. We also considered the ne-tunning of
some architectures based on slightly modi ed transformers. We combined the
results of our systems to feed an ensemble classi er.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Experiments and Results</title>
      <p>In this section, we comment on all the experiments and the results achieved.
HASOC provides a training set of approximately 6000 posts written in English.
Table 1 shows that sub-task A has 5852 posts, and they are labeled as HOF
(Hate and o ensive) and NOT (Non-Hate-O ensive). Sub-task B and sub-task
C each have 2261 posts. HATE (Hate speech), OFFN (O ensive), and PRFN
(Profane) labels are for sub-task B. TIN (Target insult) and, UNT (Untargeted)
are for sub-task C.
nodes. Finally, a layer with two nodes and softmax is used to get predictions.
The dimensions of word embeddings are 200. We used Adam optimizer with a
learning rate of 0.01, and 50 epochs were run.</p>
      <p>In contrast to DL-NN, we implement the attention mechanism in NN-AA
architecture, which particularly focuses on important parts (words) of the sentence
(post). For the attention layer, we experimented with GRU, BiLSTM, and LSTM
structures, and LSTM outperforms the others. Another feature that boosts a few
of our results was the addition of a vector POS tagging representation. To get
this last feature, we took the main combinations of tags, i.e verbs or adjectives
followed by a noun. These combinations are some of the most repeated patterns
in the training dataset. We concatenated the vector pattern representation with
the output of the attention layer. A dropout rate of 0.2 was used, but the results
did not get better. Then we just used softmax to make the predictions.</p>
      <p>We carried out many experiments with various setups for the second approach
(MA-MO). The same features as with the other two architectures, together with
Glove word embeddings, are tested on conventional ML models. Most models
have experimented with default parameter settings. For the models that use
CNN (with 128 lters and a kernel size of 2), SDL (with 128 nodes), or MLP
(256 nodes), we just fed them with embeddings mentioned above (with the
dimension of 200). Meanwhile, the ne-tuning models used raw tokenized posts as
inputs. There are two nodes in the output layer for all models mentioned, and
the softmax function for generating predictions.
As we can see, Table 2 shows a summary of the results that each one of the
proposed approaches achieved. We can observe that DL-NN has the worst
performance. However, when we incorporate the Attention layer (NN-AA) and the
vector POS representation, the performance increases further. Furthermore, the
performances signi cantly improved with the ensemble models (MA-MO). We
combined the rst two architectures with MA-MO to get the ENSEM model.
And clearly, the last one outperforms the others.
3.1</p>
      <p>O cial Results
As we commented on in section 3, HASOC provides a test set of approximately
1000 unlabeled posts to evaluate our systems. We show the results of such an
evaluation in Table 3. For each one of the sub-tasks, three runs were submitted.
The NN-AA, MA-MO, and ENSEM were submitted as run1, run3, and run2
respectively. It is emphasized that we used the same systems to face the three
sub-tasks. The proposed system exceeded our expectations, but the ensemble
classi er boosted us in the ranking.
In this paper, we proposed two approaches that allow us to face this shared task.
We observe that, for the rst approach, the combination of CNNs and RNNs
for handling n-grams and long-term dependencies cannot generate satisfactory
results. Such an issue gets solved when we incorporate attention layers and a
vector POS representation. We believe this improvement is the result of
considerations imposed on important words and the most repeated POS patterns
inside sentences. For the second approach, the proposed systems perform robust
enough. We consider that fact is due to each of the systems manages to capture
speci c patterns that other systems ignore, whereas the ensemble could join all
these patterns. It is important to mention that for sub-task B and C, data
augmentation improved the performances.</p>
      <p>For future works, we have observed that more in-depth studies on DL and taking
into account the actual state-of-the art can help us improve our results.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>I would like to share my gratitude with Gong, Zheng. His suggestions were
important to write this paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <issue>1</issue>
          .
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.J.:</given-names>
          </string-name>
          <article-title>Libsvm: A library for support vector machines</article-title>
          .
          <source>ACM Trans. Intell. Syst. Technol</source>
          .
          <volume>2</volume>
          (
          <issue>3</issue>
          ),
          <volume>27</volume>
          :1{
          <fpage>27</fpage>
          :27 (May
          <year>2011</year>
          ). https://doi.org/10.1145/1961189.1961199, http://doi.acm.
          <source>org/10</source>
          .1145/1961189.1961199
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <article-title>Forum for Information Retrieval Evaluation: FIRE</article-title>
          <year>2019</year>
          (
          <year>2019</year>
          ), http:// re.irsi.res.in/ re/2019/home
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gimpel</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Connor</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mills</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eisenstein</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heilman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yogatama</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flanigan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          :
          <article-title>Part-of-speech tagging for twitter: Annotation, features, and experiments</article-title>
          .
          <source>In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers - Volume 2</source>
          . pp.
          <volume>42</volume>
          {
          <fpage>47</fpage>
          . HLT '
          <volume>11</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, Stroudsburg, PA, USA (
          <year>2011</year>
          ), http://dl.acm.org/citation.cfm?id=
          <volume>2002736</volume>
          .
          <fpage>2002747</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural Comput</source>
          .
          <volume>9</volume>
          (
          <issue>8</issue>
          ),
          <volume>1735</volume>
          {1780 (Nov
          <year>1997</year>
          ). https://doi.org/10.1162/neco.
          <year>1997</year>
          .
          <volume>9</volume>
          .8.1735, http://dx.doi.org/10.1162/neco.
          <year>1997</year>
          .
          <volume>9</volume>
          .8.
          <fpage>1735</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>An introductory survey on attention mechanisms in nlp problems</article-title>
          . In: Bi,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Bhatia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Kapoor</surname>
          </string-name>
          , S. (eds.)
          <source>Intelligent Systems and Applications</source>
          . pp.
          <volume>432</volume>
          {
          <fpage>448</fpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Jozefowicz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaremba</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.:</given-names>
          </string-name>
          <article-title>An empirical exploration of recurrent network architectures</article-title>
          .
          <source>In: Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37</source>
          . pp.
          <volume>2342</volume>
          {
          <fpage>2350</fpage>
          . ICML'15, JMLR.org (
          <year>2015</year>
          ), http://dl.acm.org/citation.cfm?id=
          <volume>3045118</volume>
          .
          <fpage>3045367</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Khoo</surname>
            ,
            <given-names>C.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnkhan</surname>
            ,
            <given-names>S.B.</given-names>
          </string-name>
          :
          <article-title>Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons</article-title>
          .
          <source>Journal of Information Science</source>
          <volume>44</volume>
          (
          <issue>4</issue>
          ),
          <volume>491</volume>
          {
          <fpage>511</fpage>
          (
          <year>2018</year>
          ). https://doi.org/10.1177/0165551517703514, https://doi.org/10.1177/0165551517703514
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Convolutional neural networks for sentence classi cation</article-title>
          .
          <source>CoRR abs/1408</source>
          .5882 (
          <year>2014</year>
          ), http://arxiv.org/abs/1408.5882
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kurkova</surname>
          </string-name>
          , V.:
          <article-title>Kolmogorov's theorem and multilayer neural networks</article-title>
          .
          <source>Neural Netw</source>
          .
          <volume>5</volume>
          (
          <issue>3</issue>
          ),
          <volume>501</volume>
          {506 (Mar
          <year>1992</year>
          ). https://doi.org/10.1016/
          <fpage>0893</fpage>
          -
          <lpage>6080</lpage>
          (
          <issue>92</issue>
          )
          <fpage>90012</fpage>
          -
          <lpage>8</lpage>
          , http://dx.doi.org/10.1016/
          <fpage>0893</fpage>
          -
          <lpage>6080</lpage>
          (
          <issue>92</issue>
          )
          <fpage>90012</fpage>
          -
          <lpage>8</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>LeCun</surname>
          </string-name>
          , Y.,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.:
          <article-title>Deep learning</article-title>
          .
          <source>Nature</source>
          <volume>521</volume>
          (
          <issue>7553</issue>
          ),
          <volume>436</volume>
          {
          <issue>444</issue>
          (5
          <year>2015</year>
          ). https://doi.org/10.1038/nature14539
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kombrink</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deoras</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burget</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cernocky</surname>
          </string-name>
          , J.:
          <article-title>Rnnlm - recurrent neural network language modeling toolkit</article-title>
          .
          <source>In: Proceedings of ASRU 2011</source>
          . pp.
          <volume>1</volume>
          {
          <issue>4</issue>
          .
          <string-name>
            <given-names>IEEE</given-names>
            <surname>Signal Processing Society</surname>
          </string-name>
          (
          <year>2011</year>
          ), https://www. t.vut.cz/research/publication/10087
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Modha</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mandl</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Majumder</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Overview of the HASOC track at FIRE 2019: Hate Speech and O ensive Content Identi cation in Indo-European Languages</article-title>
          . In:
          <article-title>Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Nina-Alcocer</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>AMI at ibereval2018 automatic misogyny identi cation in spanish and english tweets</article-title>
          .
          <source>In: Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2018</year>
          )
          <article-title>co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN</article-title>
          <year>2018</year>
          ), Sevilla, Spain,
          <year>September 18th</year>
          ,
          <year>2018</year>
          . pp.
          <volume>274</volume>
          {
          <issue>279</issue>
          (
          <year>2018</year>
          ), http://ceurws.org/Vol-2150/AMI paper8.pdf
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Nina-Alcocer</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Haterecognizer at semeval-2019 task 5: Using features and neural networks to face hate recognition</article-title>
          .
          <source>In: Proceedings of the 13th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT</source>
          <year>2019</year>
          ,
          <article-title>Minneapolis</article-title>
          , MN, USA, June 6-7,
          <year>2019</year>
          . pp.
          <volume>409</volume>
          {
          <issue>415</issue>
          (
          <year>2019</year>
          ), https://www.aclweb.org/anthology/S19- 2072/
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Nina-Alcocer</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hurtado</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pla</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Aggressiveness detection through deep learning approaches</article-title>
          .
          <source>In: Proceedings of the Iberian Languages Evaluation Forum co-located with 35th Conference of the Spanish Society for Natural Language Processing</source>
          ,
          <source>IberLEF@SEPLN</source>
          <year>2019</year>
          , Bilbao, Spain,
          <year>September 24th</year>
          ,
          <year>2019</year>
          . pp.
          <volume>544</volume>
          {
          <issue>549</issue>
          (
          <year>2019</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-2421
          <source>/MEX-A3T paper 9</source>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Padro</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stanilovsky</surname>
          </string-name>
          , E.:
          <article-title>Freeling 3.0: Towards wider multilinguality</article-title>
          .
          <source>In: Proceedings of the Language Resources and Evaluation Conference (LREC</source>
          <year>2012</year>
          ). ELRA, Istanbul, Turkey (May
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.: GloVe:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference</source>
          . pp.
          <volume>1532</volume>
          {
          <issue>1543</issue>
          (
          <year>2014</year>
          ), https://nlp.stanford.edu/projects/glove/
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Rish</surname>
            ,
            <given-names>I.:</given-names>
          </string-name>
          <article-title>An empirical study of the naive bayes classi er</article-title>
          .
          <source>In: IJCAI 2001 workshop on empirical methods in arti cial intelligence</source>
          . vol.
          <volume>3</volume>
          , pp.
          <volume>41</volume>
          {
          <fpage>46</fpage>
          . IBM New York (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
          </string-name>
          , L.u.,
          <string-name>
            <surname>Polosukhin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Attention is all you need</article-title>
          . In: Guyon,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.V.</given-names>
            ,
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Vishwanathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Garnett</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          , pp.
          <volume>5998</volume>
          {
          <fpage>6008</fpage>
          . Curran Associates, Inc. (
          <year>2017</year>
          ), http://papers.nips.cc/paper/7181-attention
          <article-title>-is-all-you-need</article-title>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Wright</surname>
          </string-name>
          , R.E.:
          <article-title>Logistic regression</article-title>
          . In: Reading and understanding multivariate statistics., pp.
          <volume>217</volume>
          {
          <fpage>244</fpage>
          . American Psychological Association, Washington, DC, US (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kann</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Schutze, H.:
          <article-title>Comparative study of CNN and RNN for natural language processing</article-title>
          .
          <source>CoRR abs/1702</source>
          .
          <year>01923</year>
          (
          <year>2017</year>
          ), http://arxiv.org/abs/1702.01923
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>