<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>EVALITA</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>INFOTEC-LaBD at PoliticIT: Political Ideology Detection in Italian Texts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hiram Cabrera-Pineda</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric Sadit Téllez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sabino Miranda</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CICESE</institution>
          ,
          <addr-line>Ensenada, Baja California</addr-line>
          ,
          <country country="MX">México</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>CONAHCYT</institution>
          ,
          <country country="MX">México</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>INFOTEC Aguascalientes</institution>
          ,
          <country country="MX">México</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>UPIITA-IPN</institution>
          ,
          <addr-line>Ciudad de México</addr-line>
          ,
          <country country="MX">México</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>8</volume>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>This working notes presents our approach to clusters of texts profiling in the PoliticIT 2023 challenge, utilizing a non-linear low-dimensional representation of term distribution entropy. Our proposed algorithm is designed to learn a 3-dimensional model of text data, effectively capturing essential features for accurate profiling. Furthermore, it offers valuable insights through cluster analysis and visualizations, enabling a deeper understanding of the underlying patterns. The method employed in our algorithm uses a bag-of-words representation and incorporates weighting schemes based on the term's distribution entropy. By leveraging these techniques, we are able to extract meaningful information and uncover significant characteristics related to clusters of texts profiling. To evaluate the effectiveness of our proposed algorithm, we conducted experiments on the PoliticIT 2023 dataset, encompassing three tasks: gender identification, and binary and multiclass political ideology classification. The obtained results demonstrate the competitiveness of our solution across all three tasks, highlighting its efficacy in accurately predicting the attributes of clusters of texts. One notable advantage of our algorithm is its explainability. It offers insights into the reasoning behind its predictions, allowing users to understand the factors influencing cluster of text behavior. This transparency enhances the practicality and utility of our algorithm as a powerful tool for cluster of text profiling and behavior analysis.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;explainable user-profiling</kwd>
        <kwd>low-dimensional representations</kwd>
        <kwd>political parties identification</kwd>
        <kwd>term's distribution entropy weighting</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>User profiling is a powerful technique used to extract
valuable information about individuals, including their
interests, demographics, behavioral patterns, and even their
political preferences. Cluster profiling aims to identify
patterns, trends, and similarities within groups of texts
written by different users with similar traits or
characteristics based on specific criteria. By carefully analyzing
data related to users, we can gain deep insights into their
unique characteristics and preferences. These insights
have far-reaching applications in various domains,
including enhancing user experiences, personalized advertising,
detecting and preventing malicious activities, and gaining
a deeper understanding of societal dynamics and
preferences. In this manuscript, our focus lies specicfially
on political preferences as participation in the PoliticIT
challenge of the EVALITA 2023 forum.</p>
      <p>
        The task consists of predicting user demographics like
gender and political ideology from 36,240 Twitter
messages in the Italian language.These tweets cover various
topics, from news and current events to personal thoughts
and experiences. Twitter is a valuable data source for user
profiling, offering valuable insights into users’ interests,
demographics, and behaviors. The practical applications
of these tasks span various domains, including
personalized marketing, content recommendation, and social
science research [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Automatic user profiling gives us insight into a
population’s characteristics, preferences, and ideological
orientations. There are several examples of this, including the
PAN@CLEF and FIRE series. These tools cover a wide
range of objectives such as age, gender, language
variety identification, and personality recognition in different
languages and genres, like blogs, reviews, social media,
and Twitter [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5 ref6">2, 3, 4, 5, 6</xref>
        ]. Forums like MEX-A3T [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
delve into identifying occupation types and places of
residence. At the same time, the IberLEF@SEPLN forum
focuses on gender, profession, and political ideology
proifling, see [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ], all of which contribute significantly to
the advancement of user profiling techniques.
      </p>
      <p>This paper provides an overview of the author
profiling task, focusing on its relevance in political domains,
specifically in the context of the PoliticIT challenge. Our
approach is described in Section 2, where we outline the
methodology we employed. In Section 3, we delve into
Our primary objective is to develop robust and
interpretable representations that achieve high performance
and allow for in-depth analysis of the model’s label
predictions. Instead of relying on a predefined set of features,
we generate 3D maps that capture the underlying
similarity structure of the original high-dimensional
representation. This spatial representation lets us explore similar
clusters of texts by directly examining their messages and
analyzing their shared vocabulary.</p>
      <p>
        In a previous publication [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], we introduced our
overarching approach. To tackle the PoliticIT 2023 challenge,
we have streamlined our approach by reducing the number
of hyperparameters. Our cluster of texts profiling
methodology encompasses three main modules, providing a more
concise and focused framework.
      </p>
      <p>
        Classification. The supervised learning stage uses the
low-dimensional vectors to train a classifier to predict
the cluster’s political ideology, or gender. We use SVM
classifiers with linear and non-linear kernels in the sklearn
library [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. We perform a model selection procedure for
tuning each classifier.
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. Methodology and model analysis</title>
      <p>
        This manuscript addresses the challenge of profiling
political preferences within the context of the PoliticIT
challenge held in the EVALITA 2023 forum. The task involves
predicting user gender and political ideology based on a
dataset comprising more than 36 thousand Twitter
messages written in Italian. To ensure privacy and ethical
considerations, an automated clustering approach was
employed, grouping tweet messages from different users who
shared all the evaluated traits [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        The provided training corpus consists of 103,840
messages collected from Twitter, aiming to extract the author’s
traits from Italian texts. This shared task entails gathering
demographic traits, i.e., gender and political ideology as
psychographic traits. See [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] for more details.
      </p>
      <p>The training dataset includes tweets from 1,298
clusters of texts, each contributing 80 tweets. However, it
is important to note that the dataset exhibits varying
degrees of class imbalance across different class categories.
Considering the gender task, the proportion is 63.4% vs.
37.6% for males and females, respectively. There are
12.5%, 43%, 10.1%, and 34.4% between left, moderate
left, moderate right, and right regarding multiclass
political ideology. The proportion is 55.4% vs. 44.5% for</p>
      <sec id="sec-2-1">
        <title>Vector spaces models. We implemented various</title>
        <p>preprocessing steps to prepare the data; for instance, we
converted all messages to lowercase, normalized blank
spaces to a single space, and removed diacritic marks.</p>
        <p>
          Token numbers 1-9 were preserved to capture important
information on small numbers, while other numbers were
replaced by 0 (to reduce dimensionality). We employed
three types of tokens: unigrams, bigrams, and character
qgrams of size four. Each token is modeled as a distribution
along classes, and we compute the token’s weight based
on the distribution’s entropy using the formulation of [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ],
that is, for each token 
entweight() = 1 +
∑︀∈  log
        </p>
        <p>log #
where  is the set of tables and  is the probability of
token  in class . Note that the numerator’s log
produces negative numbers. Therefore, the weight is bounded
between 0 (tokens with low-discrimination power) and
1 (high-discriminant tokens). This formulation ignores
smoothing constants as needed by our original
formulation. Instead, we reject tokens from the vocabulary if they
do not occur in at least  clusters of texts. This change
reduces the memory required by our models and speedups
computations.</p>
        <p>
          Non-linear dimensional reduction. The non-linear
dimensional reduction module uses the Uniform Manifold
Approximation and Projection (UMAP) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] to produce a
        </p>
        <p>Data: df_train, Embedding file: final_emb_train_proj_train_m5_k50</p>
        <p>Data: df_train, Embedding file: final_emb_train_proj_train_m5_k50
Estimated clusters = 33 - Class gender</p>
        <p>Estimated clusters = 31 - Class ideology_binary</p>
        <p>Estimated clusters = 24 - Class ideology_multiclass
binary ideology, i.e., a slight imbalance between the left arations, and overlaps within the data, aiding the analysis
and right categories, respectively. We are asked to create and interpretation. It’s important to clarify that the centers
models to predict these labels for a test dataset, i.e., a represent groups of near clusters of texts, not individual
list of 453 clusters (80 messages per cluster); the label classes.
distribution of the test dataset is unknown. We used the homogeneity score to evaluate the extent</p>
        <p>
          As explained in Section 2, we create low-dimensional to which the clusters are homogeneous, meaning that the
projections that concisely represent the dataset and its samples within each group are similar to each other [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
associated labels. This visualization serves the purpose of It measures the similarity of the clusters in terms of their
uncovering group properties and revealing any underlying composition and whether they predominantly consist of
cluster structures. It is important to note that our method samples from a single ground truth class. A high
homoutilizes a  nearest neighbor graph constructed from a geneity score indicates that the clusters formed by the AP
vector space generated by the entropy weighting model. algorithm are internally consistent. However, it’s
imporTo ensure meaningful results, we exclude tokens with low tant to note that the homogeneity score does not directly
frequency, specifically those that appear in fewer than  measure how well the predicted clusters match the ground
clusters. Hence, our primary hyperparameters revolve truth labels. It doesn’t assess whether the cluster labels
around  and  . perfectly align with the true class labels. Instead, it
as
        </p>
        <p>We experimented with different parameter values to sesses the consistency and purity of the groups within
create UMAP low-dimensional projections. Specifically, themselves.
we varied the values of  and  , with  ranging from 10 We provided homogeneity heatmaps in Figure 2. They
to 50 and  ranging from 3 to 35. We generate three- use color intensity to represent the homogeneity scores
dimensional embeddings using a spectral layout for em- for different combinations of  and  . Darker colors
bedding initialization and optimized during 100 epochs. indicate higher homogeneity.</p>
        <p>We also used three negative samples per point (user vec- By examining the homogeneity score heatmaps, we can
tor). gain insights into the clustering quality and make better</p>
        <p>Based on the combinations of  and  parameters, decisions regarding the choice of parameters  and  .
we obtained 35 unique UMAP projections for each task This helps us understand how different parameters impact
in the challenge, including gender, binary ideology, and the quality of the clusters.
multiclass ideology. These projections provided visual
representations that captured the data’s underlying struc- 3.1. Model Selection
ture and relationships.</p>
        <p>
          In Figure 1, we present an overview of different param- Our participation in the PoliticIT challenge involved the
eter combinations ( and  ) and their impact on the data development of multiple models using the provided
trainrepresentation. Please note that although our models use ing data. To select the most suitable models, we employed
three-dimensional projections, the figures shown are two- a Grid Search approach and conducted model selection
dimensional for easier visualization. The figures display based on maximizing the macro-F1 score. The evaluation
centers obtained through the Affinity Propagation (AP) process utilized vfie-fold stratified cross-validation.
clustering algorithm [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] applied to the low-dimensional For the creation of our classification models, we
utiembeddings. These results offer insights into patterns, sep- lized SVM classifiers with both linear and non-linear
kernels. Through a hyperparameter optimization process training and classification showed clear and meaningful
and 5-fold cross-validation, we identified the models that clusters, increasing the likelihood of achieving
competiexhibited the best performance. tive results in the training phase.
        </p>
        <p>Each model corresponds to a distinct cluster of texts Table 1 shows each task’s best values of  and  . We
representation defined by parameters  and  . Addition- determined these combinations by evaluating projections
ally, we considered three-dimensional representations and using homogeneity scores and affinity propagation.
the concatenation of the three 3D maps from all tasks,
resulting in a 9-dimensional vector space.</p>
        <p>Through this comprehensive evaluation, we thoroughly Task k M
assessed the performance of various models and deter- Gender 30 7
mined the most effective approaches for each task. The Ideol. Binary 50 3
ifnal selection of models was based on the parameter com- Ideol. Multiclass 50 5
bination that achieved the highest accuracy during
crossvalidation.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Experimental results</title>
      <sec id="sec-3-1">
        <title>In this section, we present the experimental results of</title>
        <p>our approach for the EVALITA 2023 PoliticIT task. The
source code can be found at the following GitHub
repository: https://github.com/hiramcp/PoliticIT2023.</p>
        <p>
          Our experiments were conducted on a Windows 10
operating system, utilizing a four-core Laptop with 32
GB of RAM and an Intel Core i7-1165G7 @ 2.80GHz
processor. To compute the vector space and perform
preprocessing functions, tokenization, and entropy-based
weighting, we utilized TextSearch.jl Julia package
available at https://github.com/sadit/TextSearch.jl. For UMAP
projections, we employed the
SimSearchManifoldLearning.jl Julia package found at https://github.com/sadit/
SimSearchManifoldLearning.jl. Lastly, model selection
and classification tasks were carried out using the Python
scikit-learn package [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>We explored different values of  and  to find the
right balance between preserving local and global
structures. Analyzing homogeneity scores and heatmaps, we
identified parameter combinations of  and  that yielded
reliable and competitive projections for each task. We
aimed to ensure that the data representations used for</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>Our framework combines our entropy-based weighting
scheme and non-linear dimensional reduction techniques
to achieve a practical tradeoff between the model’s
introspection and high-quality predictions. By experimenting
with various combinations of parameters for  and  , we
gained valuable insights into task clustering and
separation. The results emphasized the significance of
considering the neighborhood size and the minimum number
of documents required for identifying and distinguishing
gender and ideology groups.</p>
      <p>Future studies can explore different parameter
configurations to assess their effects and a comprehensive error
analysis can be conducted to gain valuable insights into
the limitations of the models and identify areas for
improvement. By identifying specific challenges or patterns
in misclassifications, refined models can be developed,
and targeted strategies can be implemented to address
these issues.</p>
      <p>It’s also worth exploring other advanced
dimensionality reduction techniques to enhance classification
performance. Finally, our current approach is limited to lexical
features; other representations (e.g., transformer-based
ones) can help improve our approach in distinct scenarios.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C. I.</given-names>
            <surname>Eke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Norman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Shuib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. F.</given-names>
            <surname>Nweke</surname>
          </string-name>
          ,
          <article-title>A survey of user profiling: State-of-the-art, challenges, and solutions</article-title>
          ,
          <source>IEEE Access 7</source>
          (
          <year>2019</year>
          )
          <fpage>144907</fpage>
          -
          <lpage>144924</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2019</year>
          .
          <volume>2944243</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Overview of the pan/clef 2015 evaluation lab</article-title>
          ,
          <source>in: Proceedings of the 6th International Conference on Experimental IR Meets Multilinguality</source>
          , Multimodality, and Interaction - Volume
          <volume>9283</volume>
          , CLEF'15, Springer-Verlag, Berlin, Heidelberg,
          <year>2015</year>
          , p.
          <fpage>518</fpage>
          -
          <lpage>538</lpage>
          . URL: https://doi.org/ 10.1007/978-3-
          <fpage>319</fpage>
          -24027-5_
          <fpage>49</fpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>319</fpage>
          -24027-5_
          <fpage>49</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F. M. R.</given-names>
            <surname>Pardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. y Gómez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Overview of the 6th author profiling task at pan 2018: Multimodal gender identification in twitter</article-title>
          , in: CLEF,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <article-title>Author profiling tracks at fire</article-title>
          ,
          <source>SN Computer Science</source>
          <volume>1</volume>
          (
          <year>2020</year>
          )
          <article-title>72</article-title>
          . URL: https://doi. org/10.1007/s42979-020-0073-1. doi:
          <volume>10</volume>
          .1007/ s42979-020-0073-1.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. L. D. L. P.</given-names>
            <surname>Sarracén</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kestemont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Manjavacas</surname>
          </string-name>
          , I. Markov,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wolska</surname>
          </string-name>
          , , E. Zangerle, Overview of PAN 2021:
          <article-title>Authorship Verification,Profiling Hate Speech Spreaders on Twitter,and Style Change Detection</article-title>
          ,
          <source>in: 12th International Conference of the CLEF Association (CLEF</source>
          <year>2021</year>
          ), Springer,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. L. D. L. P.</given-names>
            <surname>Sarracén</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          , E. Fersini,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <source>Profiling Hate Speech Spreaders on Twitter Task at PAN</source>
          <year>2021</year>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maistro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          (Eds.),
          <article-title>CLEF 2021 Labs and Workshops, Notebook Papers, CEUR-WS</article-title>
          .org,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>Álvarez-Carmona</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Guzmán-Falcón</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Montes-y Gómez</surname>
            ,
            <given-names>H. J.</given-names>
          </string-name>
          <string-name>
            <surname>Escalante</surname>
          </string-name>
          , L. VillaseñorPineda, V.
          <string-name>
            <surname>Reyes-Meza</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rico-Sulayes</surname>
          </string-name>
          ,
          <article-title>Overview of mex-a3t at ibereval 2018: Authorship and aggressiveness analysis in mexican spanish tweets</article-title>
          ,
          <source>in: Notebook Papers of 3rd SEPLN Workshop on Evaluation of Human Language Technologies for Iberian Languages (IBEREVAL)</source>
          , Seville, Spain, September,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Martín-Valdivia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>García-Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. A. UreñaLópez</surname>
          </string-name>
          , R. Valencia-García, Overview of PoliticES 2022:
          <article-title>Spanish Author Profiling for Political Ideology</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>69</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Martín-Valdivia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>García-Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. A. UreñaLópez</surname>
          </string-name>
          , R. Valencia-García, Overview of PoliticES 2023 at IberLEF: Political Ideology Detection in Spanish Texts,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>71</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Cabrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. S.</given-names>
            <surname>Téllez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Miranda</surname>
          </string-name>
          , Infotec-labd at politices 2022:
          <article-title>Low-dimensional stacking model for political ideology profiling</article-title>
          ,
          <source>in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2022</year>
          ).
          <source>CEUR Workshop Proceedings</source>
          , CEUR-WS,
          <string-name>
            <given-names>A</given-names>
            <surname>Coruna</surname>
          </string-name>
          ,
          <year>Spain</year>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>McInnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Healy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Melville</surname>
          </string-name>
          , Umap:
          <article-title>Uniform manifold approximation and projection for dimension reduction</article-title>
          ,
          <year>2018</year>
          . URL: https://arxiv.org/ abs/
          <year>1802</year>
          .03426. doi:
          <volume>10</volume>
          .48550/ARXIV.
          <year>1802</year>
          .
          <volume>03426</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <article-title>On spectral clustering: Analysis and an algorithm</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>14</volume>
          (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , E. Duchesnay,
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          )
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Menini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          , G. Venturi,
          <year>Evalita 2023</year>
          :
          <article-title>Overview of the 8th evaluation campaign of natural language processing and speech tools for italian, in: Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2023</year>
          ), CEUR.org, Parma, Italy,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Frey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dueck</surname>
          </string-name>
          ,
          <article-title>Clustering by passing messages between data points</article-title>
          , science
          <volume>315</volume>
          (
          <year>2007</year>
          )
          <fpage>972</fpage>
          -
          <lpage>976</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hirschberg</surname>
          </string-name>
          ,
          <article-title>V-measure: A conditional entropy-based external cluster evaluation measure</article-title>
          ,
          <source>in: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning(EMNLP-CoNLL)</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>410</fpage>
          -
          <lpage>420</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>