<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>3Idiots at HASOC 2019: Fine-tuning Transformer Neural Networks for Hate Speech Identi cation in Indo-European Languages</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>IIT Kanpur UP 208016</institution>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>iSchool, University of Illinois at Urbana-Champaign</institution>
          ,
          <addr-line>Champaign IL 61820</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>We describe our team 3Idiots's approach for participating in the 2019 shared task on hate speech and o ensive content (HASOC) identi cation in Indo-European languages. Our approach relies on netuning pre-trained monolingual and multilingual transformer (BERT) based neural network models. Furthermore, we also investigate an approach based on labels joined from all sub-tasks. This resulted in good performance on the test set. Among the eight shared tasks, our solution won the rst place for English sub-tasks A and B, and Hindi sub-task B. Additionally, it was within the top 5 for 7 of the 8 tasks, being within 1% of the best solution for 5 out of the 8 sub-tasks. We open source our approach at https://github.com/socialmediaie/HASOC2019.</p>
      </abstract>
      <kwd-group>
        <kwd>Hate Speech Identi cation O ensive Content Identi cation Neural Networks BERT Transformers Deep Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Information extraction from social media data is an important topic. In the past
we have used it for identifying sentiment in tweets [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], enthusiastic and passive
tweets and users [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and extracting named entities [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The hate speech
and o ensive content (HASOC) shared task of 2019 focused on Indo-European
languages, gave us an opportunity to try out BERT [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for this shared task.
BERT based pre-trained transformer based neural network models are publicly
available in multiple languages and the model supports ne-tuning for speci c
tasks. We also tried a joint-label based approach called shared-task D, which
alleviates data sparsity issues for some shared tasks, while achieving competitive
performance in the nal leader board evaluation.
      </p>
      <p>
        Label
bert
tok1
tok2
bert
tok2
tok3
bert
tok3
tok4
bert
tok4
tok5
bert
tok5
tok6
bert
tok6
The data supplied by the organizing team, consisted of posts taken from Twitter
and Facebook respectively. The posts were in the following three languages :
English (EN), German (DE) and Hindi (HI). The competition had three sub-tasks
for English data, two sub-tasks for German data and three sub-tasks for Hindi
data respectively. Sub-Task A consisted of labeling a post with (HOF) Hate
and O ensive, if the post contained any hate speech, profane content, or o
ensive content, otherwise the label should be (NOT) Non Hate-O ensive. Next,
Sub-Task B was more ne grained, and speci ed identi cation of (HATE)
Hate Speech, (OFFN) O ensive content, and (PRFN) Profane
content. Finally, Sub-Task C focused on identifying via label (TIN) Targeted
Insult, if the hate speech, o ensive, or profane content (collectively referred to as
insult) was targeted towards an individual, group, or other. If content was
nontargetted, the label should be (UNT) Untargeted. For details about the task,
we refer the reader to the shared task publication [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The organizers released
teaser data, which we utilized as a dev dataset for selecting hyperparameters
of our models. The distribution of the number of samples for each sub-task in
each language is tabulated in table 1. it can be observed that the data-set size
for each task is quite small. Table 1 describes the distribution of data in each
language under each sub-task.
task lang model
split
run id
macro
dev train
weighted
dev train
A
B
C
bert-base-german-cased
bert-base-german-cased (D) 3
DE
bert-base-multilingual-cased 1
bert-base-multilingual-cased (D) 2
bert-base-cased
      </p>
      <p>1
bert-base-cased (D)
EN
bert-base-uncased 3
bert-base-uncased (D) 2
bert-base-multilingual-cased 2
HI bert-base-multilingual-uncased 1
bert-base-multilingual-uncased (D) 3
bert-base-german-cased</p>
      <p>3
bert-base-german-cased (D)
DE
bert-base-multilingual-cased 1
bert-base-multilingual-cased (D) 2
bert-base-cased</p>
      <p>1
bert-base-cased (D)
EN
bert-base-uncased 3
bert-base-uncased (D) 2
bert-base-multilingual-cased 2
HI bert-base-multilingual-uncased 1
bert-base-multilingual-uncased (D) 3
bert-base-cased</p>
      <p>
        1
bert-base-cased (D)
EN
bert-base-uncased 3
bert-base-uncased (D) 2
bert-base-multilingual-cased 2
HI bert-base-multilingual-uncased 1
bert-base-multilingual-uncased (D) 3
Each sub-task can be modelled as a text classi cation problem. Our submission
models are derived from ne-tuning the pre-trained language model to the shared
task data. We used BERT [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as our pre-trained language model because of its
recent success as well as public availability in multiple languages. We utilize
the BERT implementation present in pytorch-transfomers library3. In order to
predict on HI and DE language datasets, we used bert-multilingual as well as
bert-german pre-trained models. Our ne-tuned model is illustrated in gure 1.
3 https://github.com/huggingface/pytorch-transformers
1. English Language Task (EN) - For the English language task we
experimented with the bert-base-cased and bert-base-uncased models. We
experimented on all three sub-tasks using the above models.
2. German Language Task (DE) - For the German language task we
experimented with the bert-base-german-cased and bert-base-multilingual-cased
models. We experimented on sub-tasks A, and B using the above models.
3. Hindi Language Task (HI) - For the Hindi language task we experimented
with the bert-base-multilingual-cased and bert-base-multilingual-uncased
models. We experimented on all three sub-tasks using the above models.
4
      </p>
    </sec>
    <sec id="sec-2">
      <title>Training</title>
      <p>Our models were trained using the Adam optimizer (with = 1e 8) for ve
epochs, with a training/eval batch size of 32. Finally, each sequence is truncated
to max allowed sequence length of 28 characters. We use a learning rate of 5e 5,
weight decay of 0:0, and we also use a max gradient norm of 1:0.
4.1</p>
      <p>Training via joint labels - Sub-Task D
In order to alleviate the data sparsity issue we utilize an approach which we call
sub-task D. Herein, the labels of each sub-task are combined to form a uni ed
multi-label task. All possible class combinations across all sub-tasks are utilized
to create new classes. The motivation behind this approach is to share
information between tasks via their label combinations, training a single model for this
task, followed by post-processing to identify labels for sub-tasks A, B, and C.
The nal set of classes are NOT-NONE-NONE, HOF-HATE-TIN,
HOFHATE-UNT, HOF-OFFN-TIN, HOF-OFFN-UNT, HOF-PRFN-TIN,
HOF-PRFN-UNT. Furthermore, the approach also addresses data sparsity
issue as we use the full training data to solve all tasks.
5
5.1</p>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>Internal evaluation of model training
Since, we did not have test labels, we evaluated our model on both the training
as well as dev set (as described above). Similar to the shared task evaluation
protocol, our evaluation also utilized macro-f1 and weighted f1 scores. Our
evaluation is presented in table 2. We selected the best models from each evaluation
as our submission for the respective sub-task.
5.2</p>
      <p>Evaluation on test data
To identify our model performance on the test data, we utilized the leader board
rankings released by the organizers based on on all the shared task submissions
(see table 3). Among the eight shared tasks, our solutions won the rst place for
task lang run id model
macro f1 weighted f1 rank</p>
      <p>DE
A</p>
      <p>EN
HI
DE
HI
EN</p>
      <p>HI
B</p>
      <p>EN
C</p>
      <p>English sub-tasks A and B, and Hindi sub-task B. Furthermore, it was within
the top 5 for 7 of the 8 tasks, being within 1% of the best solution for 5 out of the
8 sub-tasks. For the English sub-task B, our submissions took all the top three
ranks. Our submissions also came close second for sub-task C for both English
and Hindi.
6</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>We have presented our team 3Idiots's approach based on ne-tuning
monolingual and multi-lingual transformer networks to classify social media posts in
three di erent languages, for hate-speech, and o ensive content. We open source
our approach at: https://github.com/socialmediaie/HASOC2019</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <volume>4171</volume>
          {
          <fpage>4186</fpage>
          . Association for Computational Linguistics, Minneapolis,
          <source>Minnesota (Jun</source>
          <year>2019</year>
          ). https://doi.org/10.18653/v1/
          <fpage>N19</fpage>
          -1423
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Mishra</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets</article-title>
          .
          <source>In: Proceedings of the 30th ACM Conference on Hypertext and Social Media - HT '19</source>
          . pp.
          <volume>283</volume>
          {
          <fpage>284</fpage>
          . ACM Press, New York, New York, USA (
          <year>2019</year>
          ). https://doi.org/10.1145/3342220.3344929, http://dl.acm.org/citation. cfm?doid=
          <volume>3342220</volume>
          .
          <fpage>3344929</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Mishra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Phelps</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Picco</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diesner</surname>
          </string-name>
          , J.:
          <article-title>Enthusiasm and support: alternative sentiment classi cation for social movements on social media</article-title>
          .
          <source>In: Proceedings of the 2014 ACM conference on Web science - WebSci '14</source>
          . pp.
          <volume>261</volume>
          {
          <fpage>262</fpage>
          . ACM Press, Bloomington, Indiana, USA (jun
          <year>2014</year>
          ). https://doi.org/10.1145/2615569.2615667, http://dl.acm.org/citation. cfm?doid=
          <volume>2615569</volume>
          .
          <fpage>2615667</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Mishra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diesner</surname>
          </string-name>
          , J.:
          <article-title>Semi-supervised Named Entity Recognition in noisy-text</article-title>
          .
          <source>In: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)</source>
          . pp.
          <volume>203</volume>
          {
          <fpage>212</fpage>
          .
          <string-name>
            <surname>The</surname>
            <given-names>COLING</given-names>
          </string-name>
          2016
          <string-name>
            <given-names>Organizing</given-names>
            <surname>Committee</surname>
          </string-name>
          , Osaka, Japan (
          <year>2016</year>
          ), https: //aclweb.org/anthology/papers/W/W16/W16-3927/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Mishra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diesner</surname>
          </string-name>
          , J.:
          <article-title>Detecting the Correlation between Sentiment and User-level as well as Text-Level Meta-data from Benchmark Corpora</article-title>
          .
          <source>In: Proceedings of the 29th on Hypertext and Social Media - HT '18</source>
          . pp.
          <volume>2</volume>
          {
          <fpage>10</fpage>
          . ACM Press, New York, New York, USA (
          <year>2018</year>
          ). https://doi.org/10.1145/3209542.3209562, http://dl.acm. org/citation.cfm?doid=
          <volume>3209542</volume>
          .
          <fpage>3209562</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mishra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diesner</surname>
          </string-name>
          , J.:
          <article-title>Capturing Signals of Enthusiasm and Support Towards Social Issues from Twitter</article-title>
          .
          <source>In: Proceedings of the 5th International Workshop on Social Media World Sensors - SIdEWayS'19</source>
          . pp.
          <volume>19</volume>
          {
          <fpage>24</fpage>
          . ACM Press, New York, New York, USA (
          <year>2019</year>
          ). https://doi.org/10.1145/3345645.3351104, http://dl.acm.org/ citation.cfm?doid=
          <volume>3345645</volume>
          .
          <fpage>3351104</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Mishra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diesner</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Byrne</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Surbeck</surname>
          </string-name>
          , E.:
          <article-title>Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization</article-title>
          .
          <source>In: Proceedings of the 26th ACM Conference on Hypertext &amp; Social Media - HT '15</source>
          . pp.
          <volume>323</volume>
          {
          <fpage>325</fpage>
          . ACM Press, New York, New York, USA (
          <year>2015</year>
          ). https://doi.org/10.1145/2700171.2791022, http://doi.acm.
          <source>org/10</source>
          .1145/ 2700171.2791022http://dl.acm.org/citation.cfm?doid=
          <volume>2700171</volume>
          .
          <fpage>2791022</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Modha</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mandl</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Majumder</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Overview of the HASOC track at FIRE 2019: Hate Speech and O ensive Content Identi cation in Indo-European Languages</article-title>
          . In:
          <article-title>Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>