<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the GermEval 2020 Shared Task on Swiss German Language Identification</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Applied Information Technology Zurich University of Applied Sciences</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present the findings of the Shared Task on Swiss German Language Identification organised as part of the 7th edition of GermEval, co-located with SwissText and KONVENS 2020.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Language Identification is the task of determining
which language(s) a given piece of text is
written in. It is an important step in many modern
language processing pipelines, especially when
working with online data sources as well as for
tasks where downstream processing is
languagedependent. While it has previously been
proclaimed ”a solved problem”
        <xref ref-type="bibr" rid="ref11">(McNamee, 2005)</xref>
        ,
there are still several open challenges: handling
short, noisy, user-generated text from social
media is much harder than working with carefully
composed and edited documents, such as news
articles. Similarly, while some languages are easy to
distinguish from each other, the more fine-grained
the distinction we want to make, the harder it is
to train systems to do so automatically. For
instance, while it may be relatively easy to
distinguish Arabic from English, it is difficult to
distinguish different variations of Arabic from each
other
        <xref ref-type="bibr" rid="ref19 ref9">(Zampieri et al., 2018)</xref>
        .
      </p>
      <p>
        In this shared task, we are specifically
interested in identifying Swiss German. While
Standard German is one of the official languages of
Switzerland (the others are French, Italian and
Romansh), people in the German-speaking part
of Switzerland speak a variety called Swiss
German. It is composed of a range of local dialects,
none of which have a standardized writing system.
Nonetheless, the advent of the internet and social
media has led to an increase in the written usage
of Swiss German
        <xref ref-type="bibr" rid="ref16">(Siebenhaar, 2003)</xref>
        .
      </p>
      <p>Since its written usage has only picked up in
recent years, and there are only few native
speakers to begin with, Swiss German can be
considered a low-resource language. As such, it is not
supported by most modern language identification
tools.</p>
      <p>In this task we are interested in identifying
Swiss German as it is written on social media. We
propose a binary classification task of
distinguishing Swiss German from any other language. To
that end we create a new data set from messages
from the social media platform Twitter1.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>Jauhiainen et al. (2018) have recently summarized
the long history of language identification and the
various approaches that have been explored over
the years.</p>
      <p>
        Recent editions of the VarDial workshop
included many different language identification
tasks
        <xref ref-type="bibr" rid="ref18 ref19 ref20 ref9">(Zampieri et al., 2019, 2018, 2017)</xref>
        . The
tasks usually revolve around distinguishing
similar languages, such as dialects of Arabic. Most
importantly, it also included tasks on German
Dialect Identification, which challenged participants
to distinguish four regional dialects of Swiss
German. The task data was taken from the ArchiMob
corpus of Spoken Swiss German
        <xref ref-type="bibr" rid="ref15 ref20">(Scherrer et al.,
2019)</xref>
        , which consists of interviews transcribed
following the ”Schwyzertu¨tschi Diala¨ktschrift” by
Dieth (1986).
      </p>
      <p>
        Linder et al. (2019) gathered a corpus of Swiss
German from web resources. To build their
corpus, they developed a language identification
system based on the Leipzig text corpora
        <xref ref-type="bibr" rid="ref6">(Goldhahn
1https://twitter.com
et al., 2012)</xref>
        , reporting an accuracy of 99.58%
using a fine-tuned BERT model (Devlin et al.,
2019). Previously, von Da¨niken and Cieliebak
(2018) built a simple binary SVM classifier based
on character n-grams and trained it on data from
the SB-CH corpus
        <xref ref-type="bibr" rid="ref8">(Grubenmann et al., 2018)</xref>
        .
2.1
      </p>
      <sec id="sec-2-1">
        <title>Corpora</title>
        <p>
          NOAH NOAH’s corpus of Swiss German
Dialects
          <xref ref-type="bibr" rid="ref1">(Aepli et al., 2018)</xref>
          is a compilation of Swiss
German texts from various sources and domains.
It contains newspaper articles, blog posts, articles
from the Alemannic Wikipedia, novels by Viktor
Schobinger, and the Swatch Annual Business
Report. Its 115’000 tokens have been annotated with
Part-of-Speech tags.
        </p>
        <p>
          Swiss SMS Corpus The Swiss SMS Corpus
          <xref ref-type="bibr" rid="ref17">(Stark et al., 2009-2014)</xref>
          contains 25’947 SMS
sent by the Swiss public in 2009 and 2010, of
which around 41% are written in Swiss German.
ArchiMob The previously mentioned
ArchiMob corpus
          <xref ref-type="bibr" rid="ref15 ref20">(Scherrer et al., 2019)</xref>
          contains
interview transcriptions. The latest release includes 43
transcripts with an average length of 15’000
tokens. The transcription script
          <xref ref-type="bibr" rid="ref3">(Dieth, 1986)</xref>
          aims
at a close phonetic representation of the
pronunciation and is unfortunately not representative of
how Swiss German is written on social media. For
this reason, the corpus is not as useful for our
purposes.
        </p>
        <p>SB-CH Grubenmann et al. (2018) extended
NOAH and the Swiss SMS Corpus with two new
sources. The first is 87’892 comments crawled
from a Facebook page dedicated to Swiss
German, and the second are 115’350 messages
gathered from the online chat platform ”Chatmania”.
They provide sentiment annotations for parts of
their corpus.</p>
        <p>SwissCrawl Recently, Linder et al. (2019) built
a large corpus of 562’521 Swiss German sentences
from web resources.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Task Description</title>
      <p>We propose a binary classification task of deciding
whether a given Tweet is written in Swiss German
(GSW) or any other language (NOT GSW). The
provided data comes from Twitter, which is a
notoriously noisy data source. For training we only
provided Tweets from the positive class (GSW),
forcing participants to seek out a diverse set of
additional resources to build robust systems, as the
goal is to build a system that can generalize
beyond Twitter.</p>
      <p>Evaluation Participants were asked to submit
predicted labels, as well as classifier scores, such
as confidences, distances to decision boundary, or
similar. We evaluate Precision, Recall, and
F1score of the predicted labels and rank systems
according to their F1-score for the GSW class.
Additionally we use the classifier scores to plot the
Receiver Operating Characteristic (ROC) curve and
Precision-Recall curves. We compute the Area
Under the ROC curve (AUROC) and
AveragePrecision (AP) as secondary criteria to rank the
submissions. While it is standard practice to use
F1-score to evaluate text classification systems,
we were also interested in the specific
precisionrecall trade-offs of the different submissions. We
are particularly interested in applying insights of
the submitted systems to collect further Swiss
German samples, and for that it is useful to be able to
adapt the classification threshold to limit false
positives in practice.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Data</title>
      <p>
        Instead of sampling data from Twitter directly, we
chose to rely on the Swiss Twitter Corpus
        <xref ref-type="bibr" rid="ref12">(Nalmpantis et al., 2018)</xref>
        . It contains Tweets from 2017
and 2018 related to Switzerland, based on
geolocation data, keywords related to Switzerland, and
other criteria. The corpus contains a substantial
subset of Tweets written in Swiss German, as well
as a variety of other languages.
      </p>
      <p>To build our data set, we sampled one million
entries from the Swiss Twitter Corpus, and ranked
them according to the SVM scores of von Da¨niken
and Cieliebak (2018). We selected the top 10000
Tweets according to this score for manual
annotation.</p>
      <p>Every Tweet was annotated by one native
speaker of Swiss German into one of four
categories: The labels GSW and NOT GSW were
used for Tweets that are unambiguously written
in Swiss German (GSW) or any other language
(NOT GSW). The label INDIST (short for
indistinguishable) was used for Tweets where a
distinction between GSW and NOT GSW is not possible.
This is for instance the case for short utterances
consisting entirely of loanwords (Merci!, Hallo)
or utterances where all tokens have the same
surface form as another language but slightly
different pronunciation in Swiss German (e.g. Viel
Spass!). Finally, the label OTHER was used for
Tweets that seemed to be nonsensical or spammy.
A summary of the raw annotations is shown in
Table 1.</p>
      <sec id="sec-4-1">
        <title>Class</title>
        <p>GSW
NOT GSW
INDIST
OTHER</p>
        <p>For the released shared task data we excluded
the categories INDIST and OTHER, since we
deemed them not useful to evaluate language
identification due to their nature and low occurrence
rate (see Table 1). Since we only published Tweet
IDs and their labels, in accordance with Twitter’s
Terms of Service, we also excluded Tweets which
were not available anymore at the time of
publication. We also manually removed a few duplicate
entries before publication. The composition of the
final released data set2 can be seen in Table 2.</p>
        <p>GSW
NOT GSW</p>
        <p>Total</p>
        <p>Train
freq
2001</p>
        <p>0
2001</p>
        <p>
          Models All three teams employed very
different models and input representations. Team
jjcl-uzh trained a bi-directional GRU on
character sequences
          <xref ref-type="bibr" rid="ref7">(Goldzycher and Schaber, 2020)</xref>
          .
Team IDIAP applied an auto-encoder architecture
to character n-gram BoW representations
          <xref ref-type="bibr" rid="ref14">(Parida
et al., 2020)</xref>
          . Finally, team Mohammadreza
Banaei (MB) employed a fine-tuned BERT model
followed by a FastText classifier
          <xref ref-type="bibr" rid="ref2">(Banaei, 2020)</xref>
          .
Additional Corpora Used Table 3 shows
additional corpora that the participants used. The
2The task data is available at: https://github.
zhaw.ch/vode/gswid2020/
following sources of Swiss German data were
used: SwissCrawl, NOAH, the chatmania
subcorpus from SB-CH, and the Swiss SMS
Corpus. Similarly, the following corpora were used
for NOT GSW data: the Leipzig Corpora
collection
          <xref ref-type="bibr" rid="ref6">(Goldhahn et al., 2012)</xref>
          , the Hamburg
Dependency Treebank
          <xref ref-type="bibr" rid="ref5">(Foth et al., 2014)</xref>
          , the data for the
second DSL shared task (DSLCCv2)
          <xref ref-type="bibr" rid="ref21">(Zampieri
et al., 2015)</xref>
          , and the Ling10 corpus
          <xref ref-type="bibr" rid="ref1 ref13 ref4">(Olafenwa and
Olafenwa, 2018)</xref>
          .
        </p>
        <sec id="sec-4-1-1">
          <title>Fine Grained Classification The two leading</title>
          <p>teams (see Section 6) noticed that they get an
improvement in performance when splitting the
NOT GSW class into sub-classes and training
their classifiers on the fine-grained labels.
Data Augmentation Since the provided Tweets
are substantially noisier than most of the other data
sets, Team jj-cl-uzh chose to inject character- and
token level noise into samples during training.
6</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results and Discussion</title>
      <sec id="sec-5-1">
        <title>System MB jj-cl-uzh IDIAP</title>
      </sec>
      <sec id="sec-5-2">
        <title>Precision</title>
        <p>0.984
0.945
0.775</p>
      </sec>
      <sec id="sec-5-3">
        <title>Recall</title>
        <p>0.979
0.993
0.998
The full evaluation results can be seen in Table
4 and Figure 1. Overall all teams achieve good
scores with the two top teams ranking closely
together and solving the task almost perfectly.
Especially notable are the PR- and ROC-Curves,
showing that one can achieve near perfect precision
(recall) without sacrificing too much recall
(precision).</p>
        <p>System Design Given that there were only
three participating systems, it is hard to draw
any general conclusions about the effectiveness
of different systems and features.
Nevertheless, given that both top performing systems
applied fine-grained classification by sub-dividing
the NOT GSW class, this seems a good principle
for other one-versus-all style language
identification tasks.</p>
        <p>Task and Data Overall we can conclude that the
task of identifying Swiss German is indeed
solv(a) Receiver Operating Characteristic Curve for all sub- (b) Precision Recall Curve for all submissions and their
missions and their respective Area Under Curve respective Average Precision
able to a high degree of fidelity, even when facing
short and noisy user-generated utterances.</p>
        <p>Future Work We see several important
directions for future work. First of all we have to
show that the results of this evaluation hold up to
bigger data sets from a bigger range of domains.</p>
        <p>One source of noise in this task’s data set is the
propensity of users to code-switch to English and
other languages. Therefore it would be interesting
to generalize the current task to token-level
language identification. Finally, good language
identification enables us to gather larger high-quality
corpora of Swiss German texts. This has already
been achieved to an extent by Linder et al. (2019).</p>
        <p>Once enough Swiss German texts are available,
the community can shift its efforts to extending
the annotations of these corpora (cf. Section 2)
and building up a collection of standard Natural
Language Processing tools for Swiss German.
7</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>We described the findings of the Shared Task on
Swiss German Language Identification which was
part of GermEval 2020. The three participating
teams achieved high evaluation scores, with the
best system reaching an F1-score of 0:982 on the
Swiss German class (evaluated on 5374 Tweets).
This indicates that Swiss German language
identification is feasible with high fidelity even for short,
noisy, user-generated text.
ference (SwissText) &amp; 16th Conference on Natural
Language Processing (KONVENS).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          Noe¨mi Aepli, Nora Hollenstein, and
          <string-name>
            <given-names>Simon</given-names>
            <surname>Clematide</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>NOAH 3.0: Recent Improvements in a Partof-Speech Tagged Corpus for Swiss German Dialects</article-title>
          .
          <source>In Proceedings of the 3rd Swiss Text Analytics</source>
          Conference - SwissText
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Mohammadreza</given-names>
            <surname>Banaei</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Spoken dialect identification in Twitter using a multi-filter architecture</article-title>
          .
          <source>In Proceedings of the 5th Swiss Text Analytics ConJacob Devlin</source>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          .
          <source>In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers), pages
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          , Minneapolis, Minnesota. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Eugen</given-names>
            <surname>Dieth</surname>
          </string-name>
          .
          <year>1986</year>
          .
          <article-title>Schwyzertu¨tschi Diala¨ktschrift, 2nd Edition</article-title>
          . Sauerla¨nder.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>Pius von Da¨niken</article-title>
          and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Cieliebak</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Swiss German Language Detection in Online Resources</article-title>
          .
          <source>In Proceedings of the 3rd Swiss Text Analytics</source>
          Conference - SwissText
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Kilian A.</given-names>
            <surname>Foth</surname>
          </string-name>
          , Arne Ko¨hn, Niels Beuck, and
          <string-name>
            <given-names>Wolfgang</given-names>
            <surname>Menzel</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Because Size Does Matter: The Hamburg Dependency Treebank</article-title>
          .
          <source>In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)</source>
          , Reykjavik, Iceland.
          <source>European Language Resources Association (ELRA).</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Dirk</given-names>
            <surname>Goldhahn</surname>
          </string-name>
          , Thomas Eckart, and
          <string-name>
            <given-names>Uwe</given-names>
            <surname>Quasthoff</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages</article-title>
          .
          <source>In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12).</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Janis</given-names>
            <surname>Goldzycher</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jonathan</given-names>
            <surname>Schaber</surname>
          </string-name>
          .
          <year>2020</year>
          . ”
          <article-title>Hold up, was zur ho¨u isch ds?” Detecting Noisy Swiss German Web Text Using RNN-</article-title>
          and
          <string-name>
            <surname>Rule-Based Techniques</surname>
          </string-name>
          .
          <source>In Proceedings of the 5th Swiss Text Analytics Conference (SwissText) &amp; 16th Conference on Natural Language Processing (KONVENS).</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Ralf</given-names>
            <surname>Grubenmann</surname>
          </string-name>
          , Don Tuggener, Pius von Da¨niken, Jan Deriu, and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Cieliebak</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>SB-CH: A Swiss German Corpus with Sentiment Annotations</article-title>
          .
          <source>In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC</source>
          <year>2018</year>
          ), Miyazaki,
          <string-name>
            <given-names>Japan. European</given-names>
            <surname>Language Resources Association (ELRA).</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Tommi</given-names>
            <surname>Jauhiainen</surname>
          </string-name>
          , Marco Lui, Marcos Zampieri, Timothy Baldwin, and Krister Linde´n.
          <year>2018</year>
          .
          <article-title>Automatic Language Identification in Texts: A Survey</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          ,
          <volume>65</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Lucy</given-names>
            <surname>Linder</surname>
          </string-name>
          , Michael Jungo, Jean Hennebert, Claudiu Musat, and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Fischer</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Automatic Creation of Text Corpora for Low-Resource Languages from the Internet: The Case of Swiss German</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Paul</given-names>
            <surname>McNamee</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Language identification: A solved problem suitable for undergraduate instruction</article-title>
          .
          <source>Journal of Computing Sciences in Colleges</source>
          ,
          <volume>20</volume>
          :
          <fpage>94</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Christoforos</given-names>
            <surname>Nalmpantis</surname>
          </string-name>
          , Fernando Benites, Michaela Hnizda, Daniel Kriech, Pius von Da¨niken, Ralf Grubenmann, and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Cieliebak</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Swiss Twitter Corpus</article-title>
          .
          <source>In Proceedings of the 3rd Swiss Text Analytics</source>
          Conference - SwissText
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>John</given-names>
            <surname>Olafenwa</surname>
          </string-name>
          and
          <string-name>
            <given-names>Moses</given-names>
            <surname>Olafenwa</surname>
          </string-name>
          .
          <year>2018</year>
          . Ling10.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Shantipriya</given-names>
            <surname>Parida</surname>
          </string-name>
          , Esau´ Villatoro-Tello, Qingran Zhan, Petr Motlicek, and
          <string-name>
            <given-names>Sajit</given-names>
            <surname>Kumar</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Idiap Submission to Swiss German Language Detection Shared Task</article-title>
          .
          <source>In Proceedings of the 5th Swiss Text Analytics Conference (SwissText) &amp; 16th Conference on Natural Language Processing (KONVENS).</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Yves</given-names>
            <surname>Scherrer</surname>
          </string-name>
          , Tanja Samardzˇic´, and
          <string-name>
            <given-names>Elvira</given-names>
            <surname>Glaser</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>ArchiMob: Ein multidialektales Korpus schweizerdeutscher Spontansprache</article-title>
          .
          <source>Linguistik Online</source>
          ,
          <volume>98</volume>
          (
          <issue>5</issue>
          ):
          <fpage>425</fpage>
          -
          <lpage>454</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Beat</given-names>
            <surname>Siebenhaar</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Sprachgeographische Aspekte der Morphologie und Verschriftung in schweizerdeutschen Chats</article-title>
          .
          <source>Linguistik online</source>
          ,
          <volume>15</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Elisabeth</given-names>
            <surname>Stark</surname>
          </string-name>
          , Simone Ueberwasser, and
          <string-name>
            <given-names>Beni</given-names>
            <surname>Ruef</surname>
          </string-name>
          .
          <year>2009</year>
          -
          <fpage>2014</fpage>
          . Swiss SMS Corpus. University of Zurich. https://sms.linguistik.uzh.ch.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Marcos</given-names>
            <surname>Zampieri</surname>
          </string-name>
          , Shervin Malmasi, Nikola Ljubesˇic´,
          <string-name>
            <surname>Preslav</surname>
            <given-names>Nakov</given-names>
          </string-name>
          , Ahmed Ali, Jo¨rg Tiedemann, Yves Scherrer, and Noe¨mi Aepli.
          <year>2017</year>
          .
          <article-title>Findings of the VarDial evaluation campaign 2017</article-title>
          .
          <source>In Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          , Valencia, Spain. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Marcos</given-names>
            <surname>Zampieri</surname>
          </string-name>
          , Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass, Yves Scherrer, Tanja Samardzˇic´, Nikola Ljubesˇic´, Jo¨rg Tiedemann, Chris van der Lee, Stefan Grondelaers, Nelleke Oostdijk, Antal van den Bosch, Ritesh Kumar, Bornini Lahiri, and
          <string-name>
            <given-names>Mayank</given-names>
            <surname>Jain</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation campaign</article-title>
          .
          <source>In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)</source>
          , Santa Fe, USA.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Marcos</given-names>
            <surname>Zampieri</surname>
          </string-name>
          , Shervin Malmasi, Yves Scherrer, Tanja Samardzˇic´,
          <string-name>
            <surname>Francis</surname>
            <given-names>Tyers</given-names>
          </string-name>
          , Miikka Silfverberg, Natalia Klyueva,
          <string-name>
            <surname>Tung-Le</surname>
            <given-names>Pan</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chu-Ren</surname>
            <given-names>Huang</given-names>
          </string-name>
          , Radu Tudor Ionescu,
          <string-name>
            <surname>Andrei M. Butnaru</surname>
            , and
            <given-names>Tommi</given-names>
          </string-name>
          <string-name>
            <surname>Jauhiainen</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>A report on the third VarDial evaluation campaign</article-title>
          .
          <source>In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          , Ann Arbor, Michigan. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Marcos</given-names>
            <surname>Zampieri</surname>
          </string-name>
          , Liling Tan, Nikola Ljubesˇic´, Jo¨rg Tiedemann, and
          <string-name>
            <given-names>Preslav</given-names>
            <surname>Nakov</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Overview of the DSL Shared Task 2015</article-title>
          .
          <source>In Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          , Hissar, Bulgaria. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>