<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>A survey. Journal of Artificial Intelligence Research</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Idiap Submission to Swiss-German Language Detection Shared Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shantipriya Parida</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Esau´ Villatoro-Tello</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sajit Kumar</string-name>
          <email>kumar.sajit.sk@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petr Motlicek</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qingran Zhan</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre of Excellence in AI, Indian Institute of Technology</institution>
          ,
          <addr-line>Kharagpur, West Bengal</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Idiap Research Institute</institution>
          ,
          <addr-line>Rue Marconi 19, 1920 Martigny</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universidad Aut o ́noma Metropolitana, Unidad Cuajimalpa</institution>
          ,
          <addr-line>Mexico City</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>1</volume>
      <fpage>927</fpage>
      <lpage>936</lpage>
      <abstract>
        <p>Language detection is a key part of the NLP pipeline for text processing. The task of automatically detecting languages belonging to disjoint groups is relatively easy. It is considerably challenging to detect languages that have similar origins or dialects. This paper describes Idiap's submission to the 2020 Germeval evaluation campaign1 on Swiss-German language detection. In this work, we have given high dimensional features generated from the text data as input to a supervised autoencoder for detecting languages with dialect variances. Bayesian optimizer was used to fine-tune the hyper-parameters of the supervised autoencoder. To the best of our knowledge, we are first to apply supervised autoencoder for the language detection task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The increased usage of smartphones, social
media, and the internet has led to rapid growth in the
generation of short linguistic texts. Thus,
identification of language is a key component in building
various NLP resources
        <xref ref-type="bibr" rid="ref12">(Kocmi and Bojar, 2017)</xref>
        .
Language detection is the task of determining the
language for the given text. Although it has
progressed substantially, still few challenges exist: (1)
distinguishing among similar languages, (2)
detection of languages when multiple language contents
exist within a single document, and (3) language
identification in very short texts
        <xref ref-type="bibr" rid="ref1 ref12">(Balazevic et al.,
2016; Lui et al., 2014; Williams and Dagli, 2017)</xref>
        .
      </p>
      <p>
        It is a difficult task to discriminate between very
close languages or dialects (for example, German
dialect identification, Indo-Aryan language
identification
        <xref ref-type="bibr" rid="ref13 ref4 ref5">(Jauhiainen et al., 2019a)</xref>
        ). Although dialect
identification is commonly based on the
distributions of letters or letter n-grams, it may not be
possible to distinguish related dialects with very similar
phoneme and grapheme inventories for some
languages
        <xref ref-type="bibr" rid="ref6">(Scherrer and Rambow, 2010)</xref>
        .
      </p>
      <p>
        Many authors proposed traditional machine
learning approaches for language detection like
Naive Bayes, SVM, word and character n-grams,
graph-based n-grams, prediction partial matching
(PPM), linear interpolation with post-independent
weight optimization and majority voting for
combining multiple classifiers, etc.
        <xref ref-type="bibr" rid="ref4 ref5">(Jauhiainen et al.,
2019b)</xref>
        .
      </p>
      <p>More recently, deep learning techniques have
shown substantial performance in many NLP tasks
including language detection (Oro et al., 2018).
In the context of deep learning techniques, many
papers have demonstrated the capability of
semisupervised autoencoders solving different tasks,
indicating that the use of autoencoders allows
learning a representation when trained with unlabeled
data. (Ranzato and Szummer, 2008; Rasmus et al.,
2015). However, as per our literature survey, none
of the recent research has applied autoencoder for
the language detection task. In this paper, we
propose a supervised configuration of the
autoencoders, which utilizes labels for learning the
representation. To the best of our knowledge, this is the
first time this technology is evaluated in the context
of the language detection task.</p>
      <sec id="sec-1-1">
        <title>1.1 Supervised Autoencoder</title>
        <p>
          An autoencoder (AE) is a neural network that learns
a representation (encoding) of input data and then
learns to reconstruct the original input from the
learned representation. The autoencoder is mainly
used for dimensionality reduction or feature
extraction
          <xref ref-type="bibr" rid="ref13 ref4">(Zhu and Zhang, 2019)</xref>
          . Normally, it is used
in an unsupervised learning fashion, meaning that
we leverage the neural network for the task of
representation learning. By learning to reconstruct the
input, the AE extracts underlying abstract attributes
that facilitate accurate prediction of the input.
        </p>
        <p>Thus, a supervised autoencoder (SAE) is an
autoencoder with the addition of a supervised loss
on the representation layer. For the case of a
single hidden layer, a supervised loss is added to the
output layer and for a deeper autoencoder, the
innermost (smallest) layer would have a supervised
loss added to the bottleneck layer that is usually
transferred to the supervised layer after training the
autoencoder.</p>
        <p>In supervised learning, the goal is to learn a
function for a vector of inputs x 2 Rd to predict
a vector of targets y 2 Rm. Consider SAE with
a single hidden layer of size k, and the weights
for the first layer are F 2 Rk d. The function is
trained on a finite batch of independent and
identically distributed (i.i.d.) data, (x1; y1); :::; (xt; yt);
with the goal of a more accurate prediction on
new samples generated from the same distribution.
The weight for the output layer consists of weights
Wp 2 Rm k to predict y and Wr 2 Rd k to
reconstruct x. Let Lp be the supervised loss and Lr
be the loss for the reconstruction error. In the case
of regression, both losses might be represented by
a squared error, resulting in the objective:
1
2t</p>
        <p>t
1 X h
t
i=1
t
X h
i=1
i
Lp(WpFxi; yi) + Lr(WrFxi; xi) =
jjWpFxi
yijj22 + jjWrFxi
xijj2
2i
(1)</p>
        <p>The addition of supervised loss to the
autoencoder loss function acts as regularizer and results
(as shown in equation 1) in the learning of the better
representation for the desired task (Le et al., 2018).
1.2</p>
      </sec>
      <sec id="sec-1-2">
        <title>Bayesian Optimizer</title>
        <p>In the case of SAE, there are many hyperparameters
related to (a) Model construction and (b)
Optimization. Hence, SAE training without any
hyperparameter tuning usually results in poor performance due
to the dependencies that may result in simultaneous
over/under-fitting.</p>
        <p>
          Global optimization is considered to be a
challenging problem of finding the globally best
solution of (possibly nonlinear) models, in the
(possible or known) presence of multiple local optima.
Bayesian optimization (BO) is shown to
outperform other state-of-the-art global optimization
algorithms on several challenging optimization
benchmark functions
          <xref ref-type="bibr" rid="ref2 ref7 ref7">(Snoek et al., 2012; Bergstra and
Bengio, 2012)</xref>
          . BO provides a principled technique
based on Bayes theorem to direct a search for a
global optimization problem that is efficient and
effective. It works by building a probabilistic model
of the objective function, called the surrogate
function, that is then searched efficiently with an
acquisition function before candidate samples are chosen
for evaluation on the real objective function. It tries
to solve the minimization problem:
        </p>
        <p>
          X
= arg min f (x);
x2
(2)
where we consider to be a compact subset of Rk
          <xref ref-type="bibr" rid="ref8">(Snoek et al., 2015)</xref>
          .
        </p>
        <p>Thus, we employed BO for hyperparameter
optimization where the objective is to find the
hyperparameters of a given machine learning algorithm,
for this, we preserved the best performance as
measured on a validation set.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Proposed Method</title>
      <p>
        The architecture of the proposed model is shown
in Figure 1. We used character n-grams as
features from the input text. In comparison to word
n-grams, which only capture the identity of a word
and its possible neighbors, character n-grams are
additionally capable of providing an excellent
tradeoff between sparseness and word’s identity, while
at the same time they combine different types of
information: punctuation, morphological makeup
of a word, lexicon and even context
        <xref ref-type="bibr" rid="ref11">(Wei et al.,
2009; Kulmizev et al., 2017; Sa´nchez-Vega et al.,
2019)</xref>
        . The extracted n-gram features are input to
the deep SAE as shown in the Figure 1. The deep
SAE contains multiple hidden layers. We used the
BO for selecting the optimal parameters.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Experimental Setup and Datasets</title>
      <p>
        The training dataset was provided by the organizers
of the shared task. The training2 dataset consists of
2,000 tweets in the Swiss-German language. The
2Although 2K Twitter ids were provided, we were not able
to retrieve them all, resulting in 1976 training instances.
participants were allowed to use any additional
resources as training datasets. As part of the
additional resources recommended by the organizers,
the following Swiss-German datasets were
suggested: NOAH 3
        <xref ref-type="bibr" rid="ref3">(Hollenstein and Aepli, 2015)</xref>
        ,
and SwissCrawl 4(Linder et al., 2019); which we
used in our experiments.
      </p>
      <p>The test data released by the organizers consists
of 5,374 Tweets (mix of different languages) to
be classified as Swiss-German versus not
SwissGerman.</p>
      <p>
        The training dataset provided by the organizer
did not have any non-Swiss-German text. In
addition to the recommended Swiss-German datasets,
we have used other non-Swiss-German datasets
(DSL 5
        <xref ref-type="bibr" rid="ref10 ref9">(Tan et al., 2014a)</xref>
        , and Ling10 6) for
training our models.
      </p>
      <p>
        DSL Dataset: The data obtained from the
“Discriminating between Similar Language
(DSL) Shared Task 2015” contains 13
different languages as shown in Table 1. The
DSL corpus collection have different versions
based on different language group which
provides datasets for researchers to test their
systems
        <xref ref-type="bibr" rid="ref10 ref9">(Tan et al., 2014a)</xref>
        . We selected DSLCC
version 2.0 7 in our experiments
        <xref ref-type="bibr" rid="ref10 ref9">(Tan et al.,
2014b)</xref>
        .
      </p>
      <p>Ling10 Dataset : The Ling10 dataset contains
3https://noe-eva.github.io/
NOAH-Corpus/
4https://icosys.ch/swisscrawl
5http://ttg.uni-saarland.de/resources/
DSLCC/</p>
      <p>6https://github.com/johnolafenwa/
Ling10</p>
      <p>7https://github.com/Simdiva/DSL-Task/
tree/master/data/DSLCC-v2.0
Group Name
South Eastern Slavic
South Western Slavic
West-Slavic
IberoRomance(Spanish)
IberoRomance(Portuguese)</p>
      <p>Astronesian</p>
      <p>As the task is a binary classification of
SwissGerman versus not Swiss-German, we have split
all our collection of datasets including the training
set provided by the organizers into two categories
as follows:</p>
      <p>Swiss-German (NOAH, SwissCrawl,
SwissGerman Training Tweets).</p>
      <p>not Swiss-German (DSL, Ling10).</p>
      <p>Accordingly, we labeled the target class of all
the Swiss-German text as “gsw” (Swiss-German)
and labeled the target class of all other language
190,000 sentences categorized into 10
languages (English, French, Portuguese, Chinese
Mandarin, Russian, Hebrew, Polish, Japanese,
Italian, Dutch) mainly used for language
detection and benchmarking NLP algorithms.
We considered “Ling10-trainlarge” (one of
the three variants of Ling10 dataset) in our
experiment.</p>
      <p>Id
bg
mk
bs
hr
sr
cz
sk
es-ES
es-AR
pt-BR
Language
Bulgarian
Macedonian
Bosnian
Croatian
Serbian
Czech
Slovak
Peninsular Spain
Argentinian Spanish
Brazilian Portuguese
European Portuguese pt-PT
Indonesian id
Malay my
text as “not gsw”).</p>
      <p>We prepared three settings (S1, S2, and S3)
combining the above datasets in different proportions
of Swiss-German versus not Swiss-German
languages for training the model. The statistics of the
datasets for the settings are shown in Table 2.</p>
      <p>We mixed the datasets of Swiss-German and
other languages and split them into different ratios
for training and development as per the settings. In
each setting, the training and development set is
different based on the selection of the number of
sentences from each dataset. We used the test set
provided by the shared task organizers. As the test
set includes twitter text during preprocessing, we
removed emojis and other unnecessary symbols.</p>
      <p>The range of values for the hyperparameters
search space is shown in Table 3. During training,
BO chooses the best hyperparameters from this
range. The overall configuration of the SAE model
is shown in Table 4.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Results and Discussion</title>
      <p>We evaluated the development set performance and
the test set evaluation performed by the shared task
organizers. The development set performance is
given in section Section 4.1 and the test set
performance in Section 4.2.</p>
      <p>Our evaluation includes calculating classification
accuracy based on the predicted label compared
with the actual label. The organizers calculated
precision, average precision, recall, and F1 score for
each of the submissions. As known, precision is the
ratio of correctly predicted positive observations
to the total predicted positive observations; recall
(or sensitivity) is the ratio of correctly predicted
positive observations to all observations in actual
positive class, and the F1 score is the weighted
average of precision and recall.</p>
      <p>Organizers also generated the Receiver
Operating Characteristic curve (ROC), Area Under the
ROC Curve (AUC), and Precision-Recall (PR)
curves. The AUC - ROC curve is a performance
measurement at various threshold settings. ROC is
a probability curve and AUC represents the degree
or measure of separability. It indicates how much a
trained model is capable of distinguishing between
classes, thus, the higher the AUC, the better the
model performance. Finally, PR curves summarize
the trade-off between the true positive rate and the
positive predictive value for a predictive model
using different probability thresholds; hence, a good
Confusion matrix for setting S1 on dev set.
Confusion matrix for setting S2 on dev set.</p>
      <p>Confusion matrix for setting S3 on dev set.
model is represented by a curve that bows towards
(1,1).
The SAE model performance for the three settings
(S1, S2, and S3) on the development set is shown in
Table 5. The confusion matrix for all the settings
on the development set is shown in Figure 2. The
confusion matrix shows the correct and incorrect
predictions with count values broken down by each
class i.e. “gsw” (Swiss-German) or “not gsw” (not
recall (rec), and F1 score respectively for the setting
S3 as shown in Table 6. The detailed performance
of each of our setting is shown in Table 7.
IDIAP
jj-cl-uzh
Mohammadreza</p>
      <p>Banaei</p>
      <p>Precision
0.775
0.945
0.984
In this paper, we have shown the pertinence of SAE
with Bayesian optimizer for the language detection
task. Obtained results are encouraging, and SAE
was found effective for discriminate between very
close languages or dialects. The proposed model
can be extended by creating a host of features such
as character n-gram, word n-gram, word counts, etc
and then passing it through autoencoder to choose
the best features. In future work, we plan to (i)
verify our model (SAE with BO) with other language
detection datasets, and (ii) include more short texts,
particularly Twitter data, in the training set and
verify the performance of our model under a more
balanced data type scenario.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The work was supported by an innovation project
(under an InnoSuisse grant) oriented to improve
the automatic speech recognition and natural
language understanding technologies for German.
Title: “SM2: Extracting Semantic Meaning from
Spoken Material” funding application no. 29814.1
IP-ICT and EU H2020 project “Real-time network,
text, and speaker analytics for combating organized
crime” (ROXANNE), grant agreement: 833635.
The second author, Esa u´ Villatoro-Tello is
supported partially by Idiap, UAM-C Mexico, and
SNI-CONACyT Mexico during the elaboration of
this work.
Ermelinda Oro, Massimo Ruffolo, and Mostafa
Sheikhalishahi. 2018. Language identification of
similar languages using recurrent neural networks.</p>
      <p>In ICAART.</p>
      <p>Marc’Aurelio Ranzato and Martin Szummer. 2008.</p>
      <p>Semi-supervised learning of compact document
representations with deep networks. In Proceedings of
the 25th international conference on Machine
learning, pages 792–799.</p>
      <p>Antti Rasmus, Mathias Berglund, Mikko Honkala,
Harri Valpola, and Tapani Raiko. 2015.
Semisupervised learning with ladder networks. In
Advances in neural information processing systems,
pages 3546–3554.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Ivana</given-names>
            <surname>Balazevic</surname>
          </string-name>
          , Mikio Braun, and
          <string-name>
            <surname>Klaus-Robert Mu</surname>
          </string-name>
          ¨ller.
          <year>2016</year>
          .
          <article-title>Language detection for short text messages in social media</article-title>
          .
          <source>arXiv preprint arXiv:1608</source>
          .
          <fpage>08515</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>James</given-names>
            <surname>Bergstra</surname>
          </string-name>
          and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Random search for hyper-parameter optimization</article-title>
          .
          <source>Journal of machine learning research</source>
          ,
          <volume>13</volume>
          (Feb):
          <fpage>281</fpage>
          -
          <lpage>305</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Nora</given-names>
            <surname>Hollenstein</surname>
          </string-name>
          and Noe¨mi Aepli.
          <year>2015</year>
          .
          <article-title>A resource for natural language processing of swiss german dialects</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Tommi</given-names>
            <surname>Jauhiainen</surname>
          </string-name>
          , Krister Linde´n, and Heidi Jauhiainen.
          <year>2019a</year>
          .
          <article-title>Language model adaptation for language and dialect identification of text</article-title>
          .
          <source>Natural Language Engineering</source>
          ,
          <volume>25</volume>
          (
          <issue>5</issue>
          ):
          <fpage>561</fpage>
          -
          <lpage>583</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Tommi</given-names>
            <surname>Sakari</surname>
          </string-name>
          <string-name>
            <surname>Jauhiainen</surname>
          </string-name>
          , Marco Lui, Marcos Zampieri, Timothy Baldwin, and Krister Linde´n. 2019b.
          <article-title>Automatic language identification in texts: Fernando Sa´nchez-</article-title>
          <string-name>
            <surname>Vega</surname>
          </string-name>
          ,
          <article-title>Esau´ Villatoro-Tello, Manuel Montes-y Go´mez, Paolo Rosso, Efstathios Stamatatos, and Luis Villasen˜or-</article-title>
          <string-name>
            <surname>Pineda</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Paraphrase plagiarism identification with characterlevel features</article-title>
          .
          <source>Pattern Analysis and Applications</source>
          ,
          <volume>22</volume>
          (
          <issue>2</issue>
          ):
          <fpage>669</fpage>
          -
          <lpage>681</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Yves</given-names>
            <surname>Scherrer</surname>
          </string-name>
          and
          <string-name>
            <given-names>Owen</given-names>
            <surname>Rambow</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Natural language processing for the swiss german dialect area</article-title>
          .
          <source>In Semantic Approaches in Natural Language Processing-Proceedings of the Conference on Natural Language Processing 2010 (KONVENS)</source>
          , pages
          <fpage>93</fpage>
          -
          <lpage>102</lpage>
          . Universaar.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Jasper</given-names>
            <surname>Snoek</surname>
          </string-name>
          , Hugo Larochelle, and Ryan P Adams.
          <year>2012</year>
          .
          <article-title>Practical bayesian optimization of machine learning algorithms</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>2951</fpage>
          -
          <lpage>2959</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Jasper</given-names>
            <surname>Snoek</surname>
          </string-name>
          , Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Mostofa Patwary, Mr Prabhat,
          <string-name>
            <given-names>and Ryan</given-names>
            <surname>Adams</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Scalable bayesian optimization using deep neural networks</article-title>
          .
          <source>In International conference on machine learning</source>
          , pages
          <fpage>2171</fpage>
          -
          <lpage>2180</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Liling</given-names>
            <surname>Tan</surname>
          </string-name>
          , Marcos Zampieri, Nikola Ljubesˇic, and Jo¨rg Tiedemann. 2014a.
          <article-title>Merging comparable data sources for the discrimination of similar languages: The dsl corpus collection</article-title>
          .
          <source>In Proceedings of the 7th Workshop on Building and Using Comparable Corpora (BUCC)</source>
          , pages
          <fpage>11</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Liling</given-names>
            <surname>Tan</surname>
          </string-name>
          , Marcos Zampieri, Nikola Ljubesˇic, and Jo¨rg Tiedemann. 2014b.
          <article-title>Merging comparable data sources for the discrimination of similar languages: The dsl corpus collection</article-title>
          .
          <source>In Proceedings of the 7th Workshop on Building and Using Comparable Corpora (BUCC)</source>
          , pages
          <fpage>11</fpage>
          -
          <lpage>15</lpage>
          , Reykjavik, Iceland.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Zhihua</given-names>
            <surname>Wei</surname>
          </string-name>
          , Duoqian Miao,
          <string-name>
            <surname>Jean-Hugues</surname>
            <given-names>Chauchat</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Rui</given-names>
            <surname>Zhao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Wen</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>N-grams based feature selection and text representation for chinese text classification</article-title>
          .
          <source>International Journal of Computational Intelligence Systems</source>
          ,
          <volume>2</volume>
          (
          <issue>4</issue>
          ):
          <fpage>365</fpage>
          -
          <lpage>374</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Jennifer</given-names>
            <surname>Williams</surname>
          </string-name>
          and
          <string-name>
            <given-names>Charlie</given-names>
            <surname>Dagli</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Twitter language identification of similar languages and dialects without ground truth</article-title>
          .
          <source>In Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)</source>
          , pages
          <fpage>73</fpage>
          -
          <lpage>83</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Qiuyu</given-names>
            <surname>Zhu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ruixin</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>A classification supervised auto-encoder based on predefined evenly-distributed class centroids</article-title>
          . arXiv preprint arXiv:
          <year>1902</year>
          .00220.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>