<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Dataset for Detecting Irony in Hindi-English Code-Mixed Social Media Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Deepanshu Vijay</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aditya Bohra</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vinay Singh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Syed S. Akthar</string-name>
          <email>syed.akhtarg@research.iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manish Shrivastava</string-name>
          <email>m.shrivastava@iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Language Technology Research Centre, International Institute of Information Technology</institution>
          ,
          <addr-line>Hyderabad</addr-line>
        </aff>
      </contrib-group>
      <fpage>38</fpage>
      <lpage>46</lpage>
      <abstract>
        <p>Irony is one of many forms of gurative languages. Irony detection is crucial for Natural Language Processing (NLP) tasks like sentiment analysis and opinion mining. From cognitive point of view, it is a challenge to study how human use irony as a communication tool. While relevant research has been done independently on code-mixed social media texts and irony detection, our work is the rst attempt in detecting irony in Hindi-English code-mixed social media text. In this paper, we study the problem of automatic irony detection as a classi cation problem and present a Hindi-English code-mixed dataset consisting of tweets posted online on Twitter. The tweets are annotated with the language at word level and the class they belong to (Ironic or Non-Ironic). We also propose a supervised classi cation system for detecting irony in the text using various character level, word level, and structural features.</p>
      </abstract>
      <kwd-group>
        <kwd>code-mixing</kwd>
        <kwd>language detection</kwd>
        <kwd>linguistics</kwd>
        <kwd>svm</kwd>
        <kwd>random forest</kwd>
        <kwd>hate-speech</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Irony is a subtle form of humor, where there is a gap between the intended
meaning and the literal meaning. Even though it is a widely studied linguistic
phenomenon, no clear de nition seems to exist [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Irony detection is a di cult
task as irony often has ambiguous interpretations. Apart from it's importance
in sentiment analysis and opinion mining, irony detection is also vital in the
areas of medical care and security [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Previous research related to this task
has mainly been focused on monolingual texts [
        <xref ref-type="bibr" rid="ref18 ref2 ref5 ref8">18, 2, 8, 5</xref>
        ] due to the availability
of large-scale monolingual resources. Popularity of opinion-rich online resources
like review forums and microblogging sites has encouraged users to express and
convey their thoughts all across the world in real time. In multilingual societies
like India, users often interchange between two or more languages while
communicating online.
      </p>
      <p>* These authors contributed equally to this work.
Code-Mixing (CM) is a natural phenomenon of embedding linguistic units such
as phrases, words or morphemes of one language into an utterance of another
[13{15, 4]. English and Hindi are two of the most widely used languages in the
world and to the best of our knowledge currently there are no online
HindiEnglish code-mixed resources available for detecting irony.</p>
      <p>Following are some instances of Hindi-English code-mixed tweets. It can be
observed that T1 and T2 contain irony while T3 is a non-ironic tweet.
T1 : \Wo ek teacher hai tab bhi life ke test mein fail ho gaya! Hahaha such
irony :D"
Translation : \He is a teacher yet he failed in the test of life! Hahaha such irony
:D."
T2 : \The kahawat `old is gold' purani hogaee. Aaj kal ki nasal kehti hai `gold
is old', but the old kahawat only makes sense. #MindF #Irony."
Translation : \The saying `old is gold' is old. Today's generation thinks `gold
is old' but only the old one makes sense. #MindF #Irony. "
T3 : \mere single hone ke bawzood mujhe ye nahi pata tha aaj rose day he
#irony."
Translation : \Inspite of me being single, I didn't know today is rose day
#irony."
The structure of the paper is as follows. In Section 2, we review related research in
the area of code mixing and irony detection. In Section 3, we describe the corpus
creation and annotation scheme. In Section 4, we present our system architecture
which includes the pre-processing steps and classi cation features. In Section 5,
we present the results of experiments conducted using various character-level,
word-level and structural features. In the last Section, we conclude our paper,
followed by future work and references.
2</p>
      <p>
        Background and Related Work
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] performed analysis of data from Facebook posts generated by English-Hindi
bilingual users. Analysis depicted that signi cant amount of code-mixing was
present in the posts. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] formalized the problem, created a POS tag
annotated Hindi-English code-mixed corpus and reported the challenges and
problems in the Hindi-English code-mixed text. They also performed experiments
on language identi cation, transliteration, normalization and POS tagging of
the dataset. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] addressed the problem of shallow parsing of Hindi-English
codemixed social media text and developed a system for Hindi-English code-mixed
text that can identify the language of the words, normalize them to their
standard forms, assign them their POS tag and segment into chunks. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] addressed
the problem of language identi cation on Bengali-Hindi-English Facebook
comments. They annotated a corpus and achieved an accuracy of 95.76% using
statistical models with monolingual dictionaries. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] developed a Question Classi
cation system for Hindi-English code-mixed language using word level resources
such as language identi cation, transliteration, and lexical translation. [
        <xref ref-type="bibr" rid="ref1 ref16">1, 16</xref>
        ]
performed Sentiment Identi cation in code-mixed social media text.
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] proposed an algorithm for separating ironic from non-ironic similes in
English, detecting common terms used in this ironic comparison. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] presented a
corpus of Italian tweets which consisted of 25.450 tweets among which 12.5% tweets
were ironic and 87.5% tweets were non-ironic. They evaluated their dataset using
two systems. The rst system relies on lexical and semantic features
characterising each word of a Tweet. The second system exploits words occurrences (BOW
approach) as features useful to train a Decision Tree. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] proposed a model to
detect irony in English Tweets, pointing out that skipgrams which capture word
sequences that contain (or skip over) arbitrary gaps, are the most informative
features. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] presented a corpus generated from review pairs on Amazon that can
be used to identify sarcasm and irony in a tweet. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] collected and annotated a
set of ironic examples from a common collective Italian blog.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Corpus Creation and Annotation</title>
      <p>In this section we explain the scheme used for corpus creation and annotation.
3.1</p>
      <sec id="sec-2-1">
        <title>Corpus Creation</title>
        <p>We constructed the Hindi-English code-mixed corpus using the tweets posted
online since 2010. Tweets were scrapped from Twitter using the Twitter Python
API which uses the advanced search option of twitter. We have mined the tweets
using #irony, keywords `irony' and `ironic' and various hashtags from politics,
sports and entertainment. The last three topics majorly but not essentially
represent non-ironic tweets. As it is evident from example T3 in section 1, it is not
compulsory that irony is detected in all the tweets consisting of irony keywords
and hashtags. We retrieved 1,19,885 tweets from Twitter in json format, which
consists of information such as timestamp, URL, text, user, re-tweets, replies,
full name, id and likes. An extensive semi-automated processing was carried out
to remove all the noisy tweets. Noisy tweets are the ones which comprise only
of hashtags or urls. Also, tweets in which language other than Hindi or English
is used were also considered as noisy and hence removed from the corpus.
Furthermore, all those tweets which were written either in pure English or pure
Hindi language were removed, and thus, keeping only the code-mixed tweets. As
a result, a dataset of 3055 code-mixed tweets was created. Newly created corpus
and code is available online at Github.1
1
https://github.com/deepanshu1995/Irony-Detection-Hindi-English-Code-Mixed</p>
      </sec>
      <sec id="sec-2-2">
        <title>Annotation</title>
        <p>Annotation of the corpus was carried out as follows:
Language at Word Level : For each word, a tag was assigned to its source
language. Three kinds of tags namely, `eng', `hin' and `other' were assigned to the
words by bilingual speakers. `eng' tag was assigned to words which are present in
English vocabulary, such as \Amazing", \Death", etc. `hin' tag was assigned to
Hindi words such as \sapna" (Dream), \hakikat" (Reality). The tag `other' was
given to symbols, emoticons, punctuations, named entities, acronyms, and URLs.
Ironic or Non-Ironic: : An instance of annotation is illustrated in gure
1. Each tweet is enclosed within &lt;tweet&gt;&lt;/tweet&gt;tags. First line in every
annotation consists of tweet id. Language tags are added before every token of the
tweet, enclosed within &lt;word&gt;&lt;/word&gt;tags. Each tweet is annotated with one
of the two tags (Ironic or Non-Ironic). Irony is detected in 782 tweets. Remaining
2273 code-mixed tweets do not contain irony. The annotated dataset
(consisting of tweet id's and annotated tag) with the classi cation system will be made
available online later.
3.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Inter Annotator Agreement</title>
        <p>Annotation of the dataset to detect presence of irony was carried out by two
human annotators having linguistic background and pro ciency in both Hindi
and English. A sample annotation set consisting of 50 tweets (25 ironic and 25
non-ironic) selected randomly from all across the corpus was provided to both the
annotators in order to have a reference baseline so as to di erentiate between
ironic and non ironic text. In order to validate the quality of annotation, we
calculated the inter-annotator agreement (IAA) for irony annotation between the
two annotation sets of 3055 code-mixed tweets using Cohen's Kappa coe cient.
Kappa score is 0.832 which indicates that the quality of the annotation and
presented schema is productive.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>System Architecture</title>
      <p>In this section, we present our machine learning model for detecting irony in the
code-mixed dataset described in the previous sections.
4.1</p>
      <sec id="sec-3-1">
        <title>Pre-processing</title>
        <p>Pre-processing of the code mixed tweets is carried out as follows. All the links
and URLs are replaced with \URL". Tweets often contain mentions which are
directed towards certain users. We replaced all such mentions with \USER". All
the hashtags in the dataset are removed. All the emoticons used in the tweets
are rst stored to be used as a feature and then replaced with \Emoticon". All
the punctuation marks in a tweet are removed. However, before removing them
we store the count of each punctuation mark since we use them as one of the
features in classi cation.
4.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Classi cation Features :</title>
        <p>
          In our work, we have used the following feature vectors to train our supervised
machine learning model.
1. Character N-Grams : Character N-Grams are language independent and
have proven to be very e cient for classifying text. These are also useful in
the situation when text su ers from misspelling errors [
          <xref ref-type="bibr" rid="ref10 ref17 ref20">10, 17, 20</xref>
          ]. Group of
characters can help in capturing semantic meaning, especially in the
codemixed language where there is an informal use of words, which vary signi
cantly from the standard Hindi and English words. We use character n-grams
as one of the features, where n vary from 1 to 3.
2. Word N-Grams : Bag of words feature is vital to capture the content in
the text. Thus we use word n-grams, where n vary from 1 to 3 as a feature
to train our classi cation models.
3. Laugh Words and Emoticons : Instead of using many exclamation marks
internet users may use the sequence `lmao' (i.e. laughing my ass of) or `lol'
(i.e. laughing out loud) or type hahaha. So we use a feature called laugh
words which is the sum of all the internet laughs, such as `haha', `lol', `lmao',
`ro ', `lel', `hehehe'. We also use emoticons as a feature for irony detection
since they often represent textual portrayals of a writer's emotion in the
form of symbols. We took a list of Western Emoticons from Wikipedia.2
4. Punctuations : Users often use exclamation marks when they want to
express strong feelings. We count the occurrence of each punctuation mark
in a sentence and use them as a feature.
5. Intensi ers : Users often tend to use intensi ers for laying emphasis on
their feeling. A list of intensi ers was taken from Wikipedia. We count the
number of intensi ers in a tweet and use the count as a feature.
6. Negation words : A list of negation words was taken from Christopher
Pott's sentiment tutorial.3 We count the number of negations in a tweet and
use the count as a feature.
7. Structure : Ironical tweets in our dataset are often longer than other
tweets. To capture this structure we use a group of features. (i) Number
of characters present in the tweet. (ii) Number of words in the tweet. (iii)
Average word length in the tweet.
We performed experiments with two di erent classi ers namely Support Vector
Machines with radial basis function kernel and Random Forest Classi er. Since
the size of feature vectors formed are very large, we applied chi-square feature
        </p>
        <sec id="sec-3-2-1">
          <title>2 https://en.wikipedia.org/wiki/List of emoticons</title>
          <p>
            3 http://sentiment.christopherpotts.net/lingstruc.html
selection algorithm which reduces the size of our feature vector to 14004. For
training our system classi er, we have used Scikit-learn [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. In all the
experiments, we carried out 10-fold cross validation. Table 1 and Table 2 describe the
F1 score of each feature along with the F1 score when all features are used, in
the case of Support vector machine and Random forest classi er respectively.
Support vector machine performs better than Random forest classi er and gives
a highest F1 score of 0.77 when all features are used. Character N-Grams proved
to be most e cient in SVM, while word n-grams and character n-grams both
resulted in best F1 score in the case of Random Forest Classi er.
6
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Future Work</title>
      <p>In this paper, we present an annotated corpus of Hindi-English code-mixed text,
consisting of tweet ids and the corresponding annotations, which will be made
freely available online later. We also present a supervised system used for
detecting irony in the code-mixed text. The corpus consists of 3055 code-mixed tweets
annotated as ironic or non-ironic. The features used in our classi cation system
are character n-grams, word n-grams, emoticons, laugh words, punctuations,
intensi ers and structural features. Best F1 score of 0.77 is achieved when all the
features are incorporated in the feature vector using SVM as the classi cation
system.</p>
      <p>As a part of future work, the corpus can be annotated with part-of-speech tags
at word level which could yield better results. Moreover, the annotations and
experiments described in this paper can also be carried out for code-mixed texts
containing more than two languages from multilingual societies, in future.</p>
      <sec id="sec-4-1">
        <title>4 The size of feature vector was decided after empirical ne tuning</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Aditya</given-names>
            <surname>Joshi</surname>
          </string-name>
          , Ameya Prabhu, Manish Shrivastava, and Vasudeva Varma:
          <article-title>Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text</article-title>
          .
          <source>In Proceedings of COLING</source>
          <year>2016</year>
          ,
          <source>the 26th International Conference on Computational Linguistics: Technical Papers</source>
          , pp.
          <fpage>2482</fpage>
          -
          <lpage>2491</lpage>
          .
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Antonio Reyes, Paolo Rosso, and
          <article-title>Tony Veale: A multidimensional approach for detecting irony in twitter</article-title>
          .
          <source>Language resources and evaluation 47</source>
          , no.
          <issue>1</issue>
          (
          <year>2013</year>
          ):
          <fpage>239</fpage>
          -
          <lpage>268</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Arnav</given-names>
            <surname>Sharma</surname>
          </string-name>
          , Sakshi Gupta, Raveesh Motlani, Piyush Bansal, Manish Srivastava, Radhika Mamidi, and
          <string-name>
            <surname>Dipti</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>Sharma: Shallow parsing pipeline for hindi-english code-mixed social media text</article-title>
          .
          <source>arXiv preprint arXiv:1604.03136</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Carol</surname>
          </string-name>
          Myers-Scotton:
          <article-title>Dueling Languages: Grammatical Structure in CodeSwitching</article-title>
          . Claredon. (
          <year>1993</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Elena</given-names>
            <surname>Filatova</surname>
          </string-name>
          :
          <article-title>Irony and Sarcasm: Corpus Generation and Analysis Using Crowdsourcing</article-title>
          .
          <source>In LREC</source>
          , pp.
          <fpage>392</fpage>
          -
          <lpage>398</lpage>
          .
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Erik</given-names>
            <surname>Forslid</surname>
          </string-name>
          and
          <string-name>
            <given-names>Niklas</given-names>
            <surname>Wiken</surname>
          </string-name>
          .
          <article-title>Automatic irony-and sarcasm detection in Social media</article-title>
          . (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Fabian</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          , Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel et al:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of machine learning research 12</source>
          , no.
          <source>Oct</source>
          (
          <year>2011</year>
          ):
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Francesco Ronzano, and
          <string-name>
            <given-names>Horacio</given-names>
            <surname>Saggion</surname>
          </string-name>
          .
          <article-title>"Italian irony detection in twitter: a rst approach." In The First Italian Conference on Computational Linguistics CLiC-it</article-title>
          , p.
          <fpage>28</fpage>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Gianti</given-names>
            <surname>Andrea</surname>
          </string-name>
          , Bosco Cristina, Bolioli Andrea, and Luigi Di Caro.
          <article-title>"Annotating irony in a novel italian corpus for sentiment analysis</article-title>
          .
          <source>" In 4th International Workshop on Corpora for Research on EMOTION SENTIMENT &amp; SOCIAL SIGNALS ES</source>
          <year>2012</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          . ELRA,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Huma</surname>
            <given-names>Lodhi</given-names>
          </string-name>
          , Craig Saunders, John Shawe-Taylor, Nello Cristianini, and Chris Watkins:
          <article-title>Text classi cation using string kernels</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>2</volume>
          , no.
          <source>Feb</source>
          (
          <year>2002</year>
          ):
          <fpage>419</fpage>
          -
          <lpage>444</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kalika</surname>
            <given-names>Bali</given-names>
          </string-name>
          , Jatin Sharma, Monojit Choudhury, and Yogarshi Vyas:
          <article-title>\I am borrowing ya mixing?" An Analysis of English-Hindi Code Mixing in Facebook</article-title>
          .
          <source>In Proceedings of the First Workshop on Computational Approaches</source>
          to Code Switching, pp.
          <fpage>116</fpage>
          -
          <lpage>126</lpage>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. Khyathi Chandu Raghavi, Manoj Kumar Chinnakotla, and Manish Shrivastava:
          <article-title>Answer ka type kya he?: Learning to classify questions in code-mixed language</article-title>
          .
          <source>In Proceedings of the 24th International Conference on World Wide Web</source>
          , pp.
          <fpage>853</fpage>
          -
          <lpage>858</lpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Luisa Duran:
          <article-title>Toward a better understanding of code switching and interlanguage in bilinguality: Implications for bilingual instruction</article-title>
          .
          <source>The Journal of Educational Issues of Language Minority Students</source>
          <volume>14</volume>
          , no.
          <issue>2</issue>
          (
          <year>1994</year>
          ):
          <fpage>69</fpage>
          -
          <lpage>88</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Marjolein</surname>
          </string-name>
          <article-title>Gysels: French in urban Lubumbashi Swahili: Codeswitching, borrowing, or both?</article-title>
          .
          <source>Journal of Multilingual &amp; Multicultural Development</source>
          <volume>13</volume>
          , no.
          <issue>1-2</issue>
          (
          <year>1992</year>
          ):
          <fpage>41</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. Pieter Muysken:
          <article-title>Bilingual speech: A typology of code-mixing</article-title>
          . Vol.
          <volume>11</volume>
          . Cambridge University Press,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Souvick</surname>
            <given-names>Ghosh</given-names>
          </string-name>
          , Satanu Ghosh, and
          <string-name>
            <surname>Dipankar Das</surname>
          </string-name>
          :
          <article-title>Sentiment Identi cation in Code-Mixed Social Media Text</article-title>
          .
          <source>arXiv preprint arXiv:1707.01184</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Stephen</surname>
          </string-name>
          <article-title>Hu man</article-title>
          . Acquaintance:
          <article-title>Language-independent document categorization by n-grams.</article-title>
          <string-name>
            <surname>DEPARTMENT OF DEFENSE FORT GEORGE G MEADE</surname>
            <given-names>MD</given-names>
          </string-name>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <article-title>Tony Vealy and Yanfen Hao: Detecting Ironic Intent in Creative Comparisons</article-title>
          .
          <source>In ECAI</source>
          , vol.
          <volume>215</volume>
          , pp.
          <fpage>765</fpage>
          -
          <lpage>770</lpage>
          .
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Utsab</surname>
            <given-names>Barman</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amitava Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>Joachim Wagner</surname>
          </string-name>
          , and Jennifer Foster:
          <article-title>Code mixing: A challenge for language identi cation in the language of social media</article-title>
          .
          <source>In Proceedings of the rst workshop on computational approaches to code switching</source>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>23</lpage>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>William</surname>
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Cavnar</surname>
            ,
            <given-names>and John M.</given-names>
          </string-name>
          <article-title>Trenkle: N-gram-based text categorization</article-title>
          . Ann arbor mi
          <volume>48113</volume>
          , no.
          <issue>2</issue>
          (
          <year>1994</year>
          ):
          <fpage>161</fpage>
          -
          <lpage>175</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Yogarshi</surname>
            <given-names>Vyas</given-names>
          </string-name>
          , Spandana Gella, Jatin Sharma, Kalika Bali, and Monojit Choudhury:
          <article-title>Pos tagging of english-hindi code-mixed social media content</article-title>
          .
          <source>In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pp.
          <fpage>974</fpage>
          -
          <lpage>979</lpage>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>