<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>INFOTEC-MCDI at MentalRiskES: Gambling Risk Early Prediction in Social Media Users Through Bag of Word and BERT Ensembles</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alberto Herrera</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric S. Tellez</string-name>
          <email>eric.tellez@infotec.mx</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Constructor, Ciudad de México</institution>
          ,
          <addr-line>03940</addr-line>
          <country country="MX">México</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>INFOTEC Centro Público de Investigación en Tecnologías de la Información y Comunicación</institution>
          ,
          <addr-line>112 Circuito Tecnopolo Sur</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Parque Industrial Tecnopolo 2</institution>
          ,
          <addr-line>Aguascalientes, 20326</addr-line>
          ,
          <country country="MX">México</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Secretaría de Ciencia</institution>
          ,
          <addr-line>Humanidades, Tecnología e Innovación (SECIHTI), 1582 Insurgentes Sur 1582, Crédito</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Excess gambling can contribute to mental health disorders, including financial instability, interpersonal conflicts, increased risk of anxiety and depression, and impaired occupational or academic functioning. This report presents a computational model to early detect possible gambling disorders, particularly tackles the MentalRiskES challenge Task 2 running at IberLEF 2025 forum. The task asks to identify four gambling disorders in Twitch and Telegram text posts in Spanish. Our approach comprises a bag-of-words model and a BERT-based model achieving competitive performance with respect to quality and speed.</p>
      </abstract>
      <kwd-group>
        <kwd>gambling disorders</kwd>
        <kwd>early mental risk identification</kwd>
        <kwd>author profiling</kwd>
        <kwd>text classification models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ISSN1613-0073</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>Gambling issues could produce significant mental health disorders; they serve as a catalyst for
addiction, initiating a domino efect of devastating consequences that ripple through individuals,
families, and entire communities. These repercussions manifest themselves in the form of
profound financial instability, the unraveling of personal relationships, and an increasing risk
of mental health disorders, including anxiety and depression. The impacts extend further to
the erosion of professional or academic performance and, in the most harrowing cases, lead
to homelessness or even suicidal ideation. The insidious reach of gambling, particularly with
the explosive growth of online platforms, has only magnified these hazards, underscoring the
urgent need for a thorough and nuanced understanding of its multifaceted impact to inform
robust prevention and intervention strategies.</p>
      <p>
        This manuscript presents our solution to the MentalRiskES task 2 challenge at IberLEF
2025 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This task consists of identifying a subject’s possible gaming addiction based on text
messages written by Spanish speakers on Telegram and Twitch, particularly, with an early
identification approach. The challenge also requires determining the degree level going binary,
that is, low and high.
      </p>
      <p>The categories of gambling activities identified by the challenge are:
Betting: Placing wagers on the outcomes of sporting events to obtain potential monetary
gains.</p>
      <p>Online gaming: Participation in traditional chance-based games like roulette, blackjack, and
slots over the Internet.</p>
      <p>Trading and crypto: Engaging in high-risk investments, especially in cryptocurrencies, with
uncertain financial outcomes.</p>
      <p>Lootboxes: A gambling-like mechanic in video games where players purchase virtual
containers with randomized contents.</p>
      <p>
        The MentalRiskES challenge is defined as an early identification text classification problem,
that is, predictions are made with diferent numbers of messages, controlled by the organizers,
and released incrementally in stages called rounds. The model then should refine the predictions
as it access more complete profiles, but it is important to give a precise prediction using the
least possible messages per profile, since the underlying problem should be attended as soon as
possible to reduce the undesired efects. The complete rules and evaluation methodology are
described in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <sec id="sec-2-1">
        <title>1.1. Text classification</title>
        <p>In a supervised learning algorithm, like a classification model, it receives a dataset of examples
( ,  ) , where  =  1,  2, … ,   and  =  1,  2, … ,   . Each   is a vector belonging to ℝ , with
 as its dimension. In contrast, each   denotes a category representing a legitimate class. In
this context, supervised learning-based profiling approaches require the conversion of input
messages into real-valued vectors, where  is typically generated by human annotators who
label each example.</p>
        <p>
          A text classification pipeline consists of a model that transforms text inputs into
highdimensional vectors that can be used as input to a classification model to learn to predict the
label. Some approaches like the bag-of-words approaches compute the vocabulary of a corpus
and use it to map each text to very-high-dimensional vectors where each component corresponds
with some word in the vocabulary. The precise approaches can be quite sophisticated [
          <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
          ], yet
the general pipelines for training and predictions are as follows:
training stage: Preprocess text messages, tokenize, compute vocabulary, compute a weight
to each word in the vocabulary, compute a high-dimensional vector for each document,
learn a classification model using the computed vectors and their associated labels.
prediction stage: Each message to be classified needs to be pre-processed, tokenized to create
a high-dimensional vector using the vocabulary and associated weights, then the vector
should be given to the classification model to obtain the prediction.
        </p>
        <p>Conversely, deep learning methods require identical input and output formats, but these
models are trained to carry out all essential processes within a single framework, facilitated by
a suitable architecture. These models perform exceptionally well and benefit from the extensive
knowledge corpus on which they were pretrained.</p>
        <sec id="sec-2-1-1">
          <title>Author profiling</title>
          <p>
            The author profiling problem is a forensic text classification problem that predicts specific
characteristics of an author through the analysis of his/her documents. Typical characteristics
of AP consist of the authors’ writing style, such as filler words and phrases, as well as the
message’s direct content and intended meaning. These traits can be a possible determination of
his/her mental health, gender, age, personality, native language, among others [
            <xref ref-type="bibr" rid="ref5">5, 6</xref>
            ]. Examples
of this are the 2024 and 2023 MentalRiskEs competitions, where the tasks included determining
psychological problems such as anxiety, depression, suicidal thoughts [7], eating disorders,
or depression [8]. We can find that the profiling problem is not limited solely to identifying
patterns from text; for example, in the sixth edition of the author profiling task at PAN 2018,
this problem associated with gender identification was addressed from a multimodal approach,
using a combination of text and images [9].
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Our contribution</title>
        <p>This article describes our approach to MentalRisk2025 Task 2, which involves early detection of
gambling addiction among Spanish-speaking users on Telegram and Twitch. For the
gamblingtype subtask, we implement a bag-of-words method, MicroTC, while the RoBERTuito fine-tuned
model is employed for the intensity subtask; both models used to solve one aspect of task 2,
together, in each of the runs. We describe both of our approaches and outline our methodology
for formulating our solutions.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Roadmap</title>
        <p>The rest of the paper is organized as follows. Section 2 presents a brief review of work on the
author profile and identification of mental risk. Section 3 describes our approach to solve the
problem and the system we have implemented. Section 4 details our experimental methodology
and lists our results. Finally, conclusions are given in Section 5.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. Related work</title>
      <p>
        [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] introduce the  TC framework,1 a text classifier based on bag-of-words representations that
is optimized in a large graph of possible configurations through combinatorial optimization.
This method has shown competitive results in various tasks, especially in author profiling [ 10].
      </p>
      <p>More recently, [11] introduced the Transformer architecture, a deep learning framework
developed for Natural Language Processing (NLP), which has shown exceptional eficacy in a
wide range of tasks. Transformers use attention mechanisms to highlight important information
1https://github.com/INGEOTEC/microtc
and understand long-range word dependencies within a sentence. Although this approach
can considerably enhance performance over other methods, it tends to be computationally
demanding. Therefore, people are typically limited to finetune a list of pretrained models to
solve tasks. The Bidirectional Encoder Representations from Transformers (BERT) is an
encoderbased transformer designed for pretraining models for various NLP tasks. BERT operates in
two phases: pretraining for language understanding and finetuning for specific tasks ([ 12]).
RoBERTuito is a language model based on the RoBERTa variant of BERT [13], trained on over
500 million Spanish tweets [14]. Its extensive training corpus and pretraining decisions make it
a strong model for text classification.</p>
      <p>In previous editions of MentalRiskEs [8, 7] we found that participants have mainly used
pre-trained transformers to solve the tasks, among which we can find BETO, RoBERTa or
RoBERTuito which was used in this work, however, we can notice that, particularly in the 2023
edition, it is expressed that not all participants used state-of-the-art models, belonging to deep
learning, but rather opted for more traditional models such as Random Forest, Naïve Bayes or
Support Vector Machines. Another situation that stands out from that edition is that there were
contestants who used intensive feature selection methods that managed to become the closest
to end-to-end solutions, within a set of classical algorithms. However, one of the conclusions we
can draw from this edition is that even if the problems fall within the field of natural language
processing, and even more so if they fall within similar areas, such as psychology, the way of
expressing oneself, including the words used, can cause a more traditional model to perform
better than a model compared to other models.</p>
      <p>Author profiling. We can find that there have been diferent approaches to solve the
proifling problem, for example, for the Author Profiling Task at PAN 2018, for age and gender
identification, the model that obtained the best score consisted of a support vector machine
and a combination of stylistic and second order features, while for gender identification the
best model was one based on logistic regression with a mix of stylometric, lexical and n-gram
features, on the other hand, the best model for age and gender profiling was one based on a
support vector machine trained from stylometric, n-grams and second order features [15].</p>
    </sec>
    <sec id="sec-4">
      <title>3. System description</title>
      <p>We approached the challenge as a user profiling problem, messages were gathered around a
particular user, defining its behavior; the challenge asks for early detection capabilities that
were achieved by feeding each profile incrementally as messages were retrieved. Task 2 asks
for prediction of the type of gambling (four classes listed in §1) and the level of risk of the
individual, i.e., low and high; we created a solution employing two NLP techniques, a bag of
words to predict types of gambling and a RoBERTuito-based model to identify risk levels.</p>
      <p>The oficial training set comes from the PRECOM-SM Corpus: Gambling in Spanish Social
Media prepared by [16]. This corpus comprises 350 individuals each associated with a gambling
disorder. Each subject has multiple messages, and we compile each individual’s profile by
merging all their messages. Due to the limited size of the oficial evaluation dataset, we address
this by dividing the oficial training set into an internal training and a test partition, allocating
75% for training and the remaining 25% for testing. Table 1 provides the final distribution details
of our internal partition.</p>
      <p>As mentioned above, we select to probe our MicroTC framework and RoBERTuito to solve
the task. Even when transformer-based models are unbeatable in many NLP tasks, the educated
guess is that the core of our framework has proved to be very competitive in author profiling
tasks [10], yet early identification was never tested. We selected RoBERTuito as the neural
model because it has demonstrated flexibility and power [ 14]. We fine-tuned the model using
the training set to predict the type of addiction and took an additional heuristic to predict the
risk level.</p>
      <p>The MicroTC framework produces a high-dimensional vector space from a broad
vocabulary by utilizing diferent preprocessing functions and tokenizers. Hyperparameters for
MicroTC include num_option=’group’ to consolidate all numbers into a single token,
usr_option=’group’ for grouping user mentions, url_option=’group’ to merge all URLs into one
token, and emo_option=’group’ to replace emoticons with a single token. We also remove
duplicate characters, punctuation, and diacritics. After forming the vector space, LinearSVC
[17] was trained with default hyperparameters. Both the MicroTC and RoBERTuito models
underwent training using K-Fold Cross-Validation with  = 10 . Figure 1 shows the prediction
process, further detailed in the subsequent paragraphs.</p>
      <sec id="sec-4-1">
        <title>Predicting the risk of having a gambling disorder. We use the RoBERTuito model for</title>
        <p>this problem. First, incoming messages are tokenized by the pysentimento library into character
strings. We use prediction logit values for each category (Betting, Online gaming, Trading and
crypto, Lootboxes). If there are at least two positive predictions, a high risk of betting disorder is
indicated as 1, otherwise, a low risk is indicated as 0.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Predicting gambling disorder type by subject. We use MicroTC model predictions directly.</title>
        <p>Tables 2 and 3 present the results obtained during the experimentation phase and across
our data partitions. Table 2 shows results similar to those reported by [7] in task 1; This task
consisted of classifying the illness that the subjects could have, from three options, namely
depression, anxiety, or none of them. As we can see, thanks to a comparison between the
MicroTc and RoBERTuito models, the former turns out to have better performance in most of
the metrics used in experimentation. On the other hand, Table 3 shows better results compared
to [7] in task 3, which addresses the problem of identifying the possibility that the subject has
suicidal thoughts or ideas. Since the problems mentioned in a previous edition of MentalRiskES
are within the same field and are approached from a similar perspective, this being a multiclass
classification problem and a binary classification problem, we decided to take the results obtained
in these problems as a reference point in the selection of the models that make up our proposal.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Results</title>
      <p>Our experiments were conducted in Google Colab2 utilizing an L4 GPU system an Intel (R)
Xeon (R) CPU @ 2.20GHz with two cores. We employed the Huggingface framework with a
Pytorch back-end.</p>
      <p>Table 4 illustrates the performance of our MicroTC model. Our analysis indicates that the
model attained an accuracy rate of 0.594, alongside other performance metrics that enable us
to assess class balance and the model’s generalization capabilities in a multiclass scenario. It
is evident that the model demonstrates relatively balanced outcomes in diferent classes. In
contrast, we also report metrics related to the delay in identifying positive instances [18], our
2Google Colab site https://colab.research.google.com/.
of 0.594, surpassing the chance level of 0.25 for four equally likely classes, suggests the model
has captured certain discriminatory patterns in the texts. The small gap between macro and
micro averaged metrics indicates a data imbalance naturally present in the database.</p>
      <p>Performance limitations could be attributed to the significant linguistic ambiguity on informal
platforms like Twitch and Telegram, where language often appears colloquial, abbreviated, or
full of slang. Additionally, shared vocabulary across platforms and various potential gambling
disorders may also play a role.
Prediction performance of the risk level, i.e., modeled as binary classification with Low and High classes.</p>
      <p>Run
1,2,3</p>
      <p>Accuracy</p>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusions</title>
      <p>This paper describes our solution for the early detection of gambling problems and the prediction
of mental risk levels, using Spanish-language text messages collected from Telegram and Twitch,
as part of the MentalRiskEs@IberLEF2025 challenge. Our approach involves a MicroTC model,
based on a bag-of-words methodology, with LinearSVC serving as the classifier for the early
identification of gambling disorder types. In addition, a fine-tuned RoBERTuito model is used
to predict mental risk levels.</p>
      <p>According to the oficial results, our solution demonstrated strong performance in
detecting both gambling disorders and mental risk levels, particularly excelling in the former and
achieving the top rankings in various metrics. When it comes to assessing the risk level of
developing a gambling problem, the solution also boasted low latency and high speed, making it
a competitive option to address this problem. While putting these models in practical situations
requires additional investigation, our method ofers alternative routes to mainstream strategies
that predominantly use transformer-based models, demonstrating its competitive quality and
relatively low computational demands.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Google translate in order to: translate
from spanish to english. After using this tool/service, the author(s) reviewed and edited the
content as needed and take(s) full responsibility for the publication’s content.
[6] M. Potthast, J. Kiesel, K. Reinartz, J. Bevendorf, B. Stein, A stylometric inquiry into
hyperpartisan and fake news, in: I. Gurevych, Y. Miyao (Eds.), Proceedings of the 56th
Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),
Association for Computational Linguistics, Melbourne, Australia, 2018, pp. 231–240. URL:
https://aclanthology.org/P18-1022/. doi:10.18653/v1/P18-1022.
[7] A. M. Mármol-Romero, A. Moreno-Muñoz, F. M. Plaza-Del-Arco, M. D. Molina-González,
M. T. Martín-Valdivia, L. A. Ureña-López, A. Montejo-Ráez, Overview of mentalriskes
at iberlef 2024: Early detection of mental disorders risk in spanish, Procesamiento del
Lenguaje Natural (2024). URL: https://www.who.int/news/item/17-06-2021-. doi:10.26342/
2024-73-33.
[8] A. M. Mármol-Romero, A. Moreno-Muñoz, F. M. Plaza-Del-Arco, M. D. Molina-González,
M. T. Martín-Valdivia, L. A. Ureña-López, A. Montejo-Ráez, Overview of mentalriskes
at iberlef 2023: Early detection of mental disorders risk in spanish, Procesamiento del
Lenguaje Natural (2023) 329–350. doi:10.26342/2023-71-26.
[9] F. Rangel, P. Rosso, M. Montes-Y-Gómez, M. Potthast, B. Stein, Overview of the 6th Author
Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter, Technical Report,
2018. URL: http://pan.webis.de.
[10] M. Graf, D. Moctezuma, E. S. Téllez, Bag-of-word approach is not dead: A
performance analysis on a myriad of text classification challenges, Natural Language
Processing Journal 11 (2025) 100154. URL: https://www.sciencedirect.com/science/article/pii/
S2949719125000305. doi:https://doi.org/10.1016/j.nlp.2025.100154.
[11] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I.
Polosukhin, Attention is all you need (2017). URL: http://arxiv.org/abs/1706.03762.
[12] F. A. Acheampong, H. Nunoo-Mensah, W. Chen, Transformer models for text-based
emotion detection: a review of bert-based approaches, Artificial Intelligence Review 54
(2021) 5789–5829. doi:10.1007/s10462-021-09958-2.
[13] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,
V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, 2019. URL: https:
//arxiv.org/abs/1907.11692. arXiv:1907.11692.
[14] J. M. Pérez, D. A. Furman, L. Alonso Alemany, F. M. Luque, RoBERTuito: a pre-trained
language model for social media text in Spanish, in: Proceedings of the Thirteenth
Language Resources and Evaluation Conference, European Language Resources Association,
Marseille, France, 2022, pp. 7235–7243. URL: https://aclanthology.org/2022.lrec-1.785.
[15] F. Rangel, P. Rosso, B. Verhoeven, W. Daelemans, M. Potthast, B. Stein, Overview of the
4th Author Profiling Task at PAN 2016: Cross-Genre Evaluations, Technical Report, 2016.</p>
      <p>URL: http://pan.webis.de.
[16] P. Álvarez-Ojeda, M. V. Cantero-Romero, A. Semikozova, A. Montejo-Ráez, The
precomsm corpus: Gambling in spanish social media, in: Proceedings of the 31st International
Conference on Computational Linguistics, 2025, pp. 17–28.
[17] Scikit-learn, Linearsvc, 2025. URL: https://scikit-learn.org/stable/modules/generated/
sklearn.svm.LinearSVC.html.
[18] D. E. Losada, F. Crestani, A test collection for research on depression and language use,
Lecture Notes in Computer Science (2016) 28–39. doi:10.1007/978-3-319-44564-9_3.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>González-Barba</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <article-title>Overview of IberLEF 2025: Natural Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st Conference of the Spanish Society for Natural Language Processing (SEPLN 2025), CEURWS</article-title>
          . org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Mármol-Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Álvarez-Ojeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moreno-Muñoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M. P.</given-names>
            <surname>del Arco</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. D.</surname>
            Molina-González, M.-
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Martín-Valdivia</surname>
            ,
            <given-names>L. A.</given-names>
          </string-name>
          <string-name>
            <surname>Ureña-López</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Montejo-Ráez</surname>
          </string-name>
          , Overview of mentalriskes at iberlef 2025:
          <article-title>Early detection of mental disorders risk in spanish</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>75</volume>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schutze</surname>
          </string-name>
          ,
          <article-title>Foundations of statistical natural language processing</article-title>
          , MIT press,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E. S.</given-names>
            <surname>Tellez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moctezuma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Miranda-Jiménez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Graf</surname>
          </string-name>
          ,
          <article-title>An automated text categorization framework based on hyperparameter optimization</article-title>
          ,
          <source>Knowledge-Based Systems</source>
          <volume>149</volume>
          (
          <year>2018</year>
          )
          <fpage>110</fpage>
          -
          <lpage>123</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Overview of the 5th author profiling task at pan 2017: Gender and language variety identification in twitter, Working notes papers of the CLEF (</article-title>
          <year>2017</year>
          )
          <fpage>1613</fpage>
          -
          <lpage>0073</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>