Isolated Profile Style Representation
Carlos A. Rodríguez-Losada1 , Daniel Castro-Castro2
1
    Computer Science Department, University of Oriente "Antonio Maceo", Santiago de Cuba, Cuba
2
    Information Retrieval Lab, Computer Science Department, University of La Coruña, Spain


                                         Abstract
                                         This work shows the obtained results by the UO-UDC team at Profiling Irony and Stereotype Spreaders
                                         on Twitter shared task hosted by PAN22. We presented a hybrid model from BERT-like embeddings and
                                         the lexical representation of tweets. We exposed as experimental results the interactions and impact
                                         between combinations of representations in the final accuracy score. It is shown that it is not enough to
                                         represent a profile considering only independent features along with its corresponding class.

                                         Keywords
                                         irony profiling, stereotype, tweet’s representation


1. Introduction
Currently, large flows of information are managed on the Internet. Content creation is accompa-
nied by ethical questions regarding determining what is socially accepted. The irony is defined
as a clever way of expressing an idea when in fact, another is being expressed. This, when
accompanied by stereotyped ideas such as the sexual orientation of others, women’s rights, or
the LGBTQI+ community, can generate controversy on social networks and in some cases, hate
speech [1, 2, 3].
Interesting approaches have been taken along this time by some authors and one of the most
known environments this takes place happens at the PAN shared tasks1 .
Fersini et al. [4] proposed an approach based on stylometry, personality, emotions, and feed
embedding to train a Support Vector Machine (SVM) classifier. Espinosa et al. [5] extracted
features from text using N-grams of characters and words to train the SVM classifier as well.
Duan et al. [6] address the binary classification problem to detect Fake News Spreader extracting
linguistic and sentiment features from users tweet’s feed using the torch.nn2 library.
Carracedo et al. [7] proposed several emotion-prototypes to map user messages to an emotion
space, to finally test every prototype with Naïve Bayes, K-nearest neighbors, SVM, Logistic
Regression and Gradient Boosting, among others. Bagdon et al. [8] combine the results from a
n-gram-based logistic regression classifier with a transformer model based on RoBERTa [9] via
a SVM meta-classifier.


CLEF 2022: Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy
$ carlosarl1999@gmail.com (C. A. Rodríguez-Losada); daniel.castro3@udc.es (D. Castro-Castro)
 0000-0001-9102-7601 (D. Castro-Castro)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings         CEUR Workshop Proceedings (CEUR-WS.org)
                  http://ceur-ws.org
                  ISSN 1613-0073


                  1
                    https://pan.webis.de/shared-tasks.html
                  2
                    https://pytorch.org/docs/master/generated/torch.nn.GRU.html
In the Profiling Irony and Stereotype Spreaders on Twitter (IROESTERO) competition [10] orga-
nized by PAN22 [11], the main goal is to classify Twitter profiles as irony spreaders or non-irony
spreaders. The main difference concerning previous tasks is that this year’s task proposes to
develop systems capable of identifying Twitter profiles that spread irony and stereotypes given
the history of tweets of that profile.
As usual, the task organizers provide the TIRA platform [12] to perform all the heavy computa-
tions by the competitor’s models.
In our work we proposed a tweet profile representation using some techniques of deep learning
and lexical analysis, due to the state-of-the-art of this model [13, 14].
The paper is structured as follows. In Section 2, we described the proposed system to com-
petitions. In Section 3, an experimental description of developed experiments is provided.
Finally, in Section 4, we explain the acquired conclusions from the work and we suggest a future
investigation line related to this work.


2. Our proposal
2.1. Task’s specifications
The task consists in given a Twitter user’s feed written in English, composed of 200 tweets
sharing Irony and Stereotyping content, and discriminating whether the given user should be
labeled as an Irony and Stereotype Spreader (ISS) or not. As competition baselines, PAN22 will
use Character/Words n-grams + SVM/Logistic Regression, etc.

2.2. Model overview
Given a user timeline, our model builds its representation based on a Semantic Representation
(SR), Punctuation Marks Representation (PMR), and an Auxiliary Words Representation (AWR).
The first representation aims to distinguish what is spoken in a text document. To do so, we
employ a fine-tuned model from sentence-transformers named r2d2/stsb-bertweet-base-v0
which maps sentences and paragraphs to a 768-dimensional dense vector space and can be used
for tasks like clustering or semantic search. This model belongs to the Bidirectional Model from
Transformers (BERT) models [15].
The second representation looks to capture the author’s writing style based on his use of
punctuation marks such as emojis, emphasis signs, and special characters.
The last representation is intended to represent the writing style relying on author discourse
markers use.

2.3. Model Stages
2.3.1. Semantic, Punctuation and Auxiliary Words Representations
Figure 1 shows all the profile representations our model builds.
First of all, it is constructed a 10-vectors 768-dimensional vector to build the SR. Each of
the 10-vectors holds an accumulated sum obtained by encoding 20 user tweets with r2d2/stsb-
bertweet-base-v0 sentence-transformers embeddings. All these vectors were intended to acquire
the profile lexical and semantic style.
Subsequently, it is computed a Punctuation marks term frequency vector which holds the
number of occurrences of a given punctuation mark in the author profile. The possible terms
this vector could have are precomputed finding the dataset vocabulary punctuation marks.
This vector gives the model the capability to quantify the punctuation writing style similarity
between different authors and belongs to the PMR in our model.
Lastly, an auxiliary word term frequency vector is calculated in the same way the PMR builds
its vector. This final vector holds the AWR.


Figure 1: Author profile representations


2.3.2. Similarity metric
In our proposal, we need a measure that quantifies how similar any two profiles are. We fix this
measure like the mean of each representation’s similarities.
The SR similarity is computed by taking the mean from the highest vector embedding pair
similarities (𝑣𝑖 , 𝑣𝑗 ) such 𝑖 and 𝑗 are not associated, where 𝑖 represents the 𝑖th dense vector
associated to a tweet from a profile 𝐴 and 𝑗 represents the 𝑗 th dense vector associated to a tweet
from a profile 𝐵. This is illustrated in Figure 2.
We calculate the vector’s similarities using the usual cosine similarity:
                                                             𝑣1 .𝑣2
                               𝑠𝑖𝑚(𝑣1 , 𝑣2 ) = cos (𝜃) =                                        (1)
                                                           ‖𝑣1 ‖‖𝑣2 ‖
  Being 𝑣1 and 𝑣2 vector embeddings associated to two tweets.


Figure 2: Semantic Representation Similarity Calculation Method.


  In the previous graph, the numbers on the lines that connect two tweets represent the greatest
cosine similarities between the embeddings of the encodings of the tweets of profile A and the
tweets of profile B. The general similarity value of profiles A and B would be the average of
these similarities (i.e 0.85).
To compute the semantic similarity between pairs of tweets, this same methodology is applied,
considering the sentences of those tweets. This approach seeks to avoid the influence in the
comparison of documents (paragraphs) with different amounts of sentences and to build the
similarity based on the most similar tweets.
As the PMR and the AWR calculated the same length vectors for every profile pair, we take
advantage of this fact and compute its corresponding similarity with the same cosine similarity
shown in Equation 1

2.3.3. Core basis of profiling method
When the model attempts to classify a non-labeled profile into one of the two classes (ISS or
non-ISS) we state that it should belong to the class where the accumulated sum of the most
similar 𝑘 profiles to the unknown profile is the biggest. This is illustrated in Figure 3 for 𝑘 = 1
(i.e the unlabeled profile will belong to the class that contains the most similar profile to it).
    For some 𝑘 > 1 the model computes an accumulate sum on each class.
Figure 3: Irony spreader detection strategy 1-NN example classification


3. Experimental results
3.1. Parameter fitting
We fix the most similar 𝑘 profiles, the unlabeled profiles require to make the accumulated sum
as our model tunable parameter. Our model’s parameter fitting is focused on finding the best
k such that the accuracy of predictions increases. We also estimate how are the interactions
between representations by testing some representations and dropping the remaining.

3.2. Dataset Specification
A training corpus was released to train and validate competitor’s models3 . This dataset is
composed of 420 English Twitter user profiles timelines composed each one with 200 tweets,
along with a ground truth file holding the given profiles classification. A test dataset was
released for testing the models as well. The test corpus is composed of 180 Twitter timelines. As
organizers state in the competition’s description, the whole dataset is balanced (i.e the number
of ISS profiles and non-ISS profiles released is the same). The quality metric proposed to evaluate
the competitor’s results in this task was accuracy.

3.3. Developed experiments
First of all, we represent the whole training dataset into the proposed representations: SR,
PMR, and AWR. To estimate the best 𝑘 and representation combination we run a 5-fold cross-
validation over the training corpora, testing on each cross all possible 𝑘 and representations


   3
       https://pan.webis.de/clef22/pan22-web/author-profiling.html
Table 1
5-fold cross validation over training corpora
                                                Validation dataset mean accuracy
            SR   PMR     AWR
                                 K=1    K=7      K=13      K=19     K=25    K=31      K=37     K=41
            ×                    0,59   0,59     0,60      0,63     0,66    0,68      0,72     0,79
                 ×               0,53   0,55     0,56      0,59     0,60    0,60      0,60     0,64
                         ×       0,55   0,57     0,58      0,59     0,59    0,59      0,59     0,59
            ×    ×               0,54   0,57     0,55      0,60     0,61    0,61      0,64     0,69
            ×            ×       0,56   0,59     0,59      0,60     0,60    0,61      0,63     0,65
                 ×       ×       0,52   0,57     0,58      0,58     0,58    0,58      0,58     0,59
            ×    ×       ×       0,55   0,57     0,58      0,59     0,60    0,60      0,61     0,64


Table 2
5-fold cross validation over training dataset with models: Single Representation and Majority vote
                                                 Cross number                                Statistics
    Model
                             Cross 1    Cross 2     Cross 3       Cross 4   Cross 5     Mean     Std. Dev.
    Single Representation    0.50       0.67        0.84          0.83      0.83        0.73     0.14
    Majority vote            0.51       0.60        0.73          0.75      0.75        0.67     0.10


permutations. We illustrate in Table 1 the best k obtained results and the respective accuracy of
the model obtained using that parameter.
  We also considered testing a slightly different approach where instead of having a single
representation combination, have three different combinations (the ones with best results),
each one emitting a vote to determine the unknown profile predicted class. In this version, an
unknown profile belongs to the majority voted class from each of the representations.

3.4. Results analysis
From the developed experiments we elaborate two possibilities to test models: a single rep-
resentation one and a majority vote model. We run a 5-fold cross-validation over training to
compare the two models. Results are illustrated in Table 2.
In the test set, it was tested the Majority vote approach, getting an accuracy of 0.63, consistent
with validation set tests.


4. Conclusion and future work
Making a system able to extract author feature as age, gender or political orientation is a
challenging and interesting task in the research community. This work shows the results
obtained by the UO-UDC team at the Profiling Irony and Stereotype Spreaders on Twitter
shared task hosted by PAN. We strongly believe that adding some other lexical-semantic
representations to our model could improve its accuracy. For future work, we considered adding
a word phonetic representation that expresses some irony regarding words likely considering
their pronunciation due to the Twitter tweet’s informal nature. We also consider it appropriate
to evaluate an approach in which each representation contributes in a different and weighted
way to the general similarity measure.


Acknowledgments
This work was supported by projects PLEC2021-007662 (MCIN/AEI/10.13039/501100011033,
Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Plan de Recuperación,
Transformación y Resiliencia, Unión Europea-Next Generation EU) and RTI2018-093336-B-C22
(Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación). The second author also
thanks the financial support supplied by the Consellería de Cultura, Educación e Universidade
(GPC ED431B 2022/33)


References
 [1] F. Balouchzahi, H. Shashirekha, Las for hasoc-learning approaches for hate speech and
     offensive content identification., in: FIRE (Working Notes), 2020, pp. 145–151.
 [2] F. Rangel, G. L. De la Peña Sarracén, B. Chulvi, E. Fersini, P. Rosso, Profiling hate speech
     spreaders on twitter task at pan 2021., in: CLEF (Working Notes), 2021, pp. 1772–1789.
 [3] A. Reyes, P. Rosso, D. Buscaldi, From humor recognition to irony detection: The figurative
     language of social media, Data & Knowledge Engineering 74 (2012) 1–12. URL: https://
     www.sciencedirect.com/science/article/pii/S0169023X12000237. doi:https://doi.org/
     10.1016/j.datak.2012.02.005, applications of Natural Language to Information
     Systems.
 [4] E. Fersini, J. Armanini, M. D’Intorni, Profiling Fake News Spreaders: Stylometry, Per-
     sonality, Emotions and Embeddings—Notebook for PAN at CLEF 2020, in: L. Cappellato,
     C. Eickhoff, N. Ferro, A. Névéol (Eds.), CLEF 2020 Labs and Workshops, Notebook Papers,
     CEUR-WS.org, 2020. URL: http://ceur-ws.org/Vol-2696/.
 [5] D. Espinosa, H. Gómez-Adorno, G. Sidorov, Profiling Fake News Spreaders using Characters
     and Words N-grams—Notebook for PAN at CLEF 2020, in: L. Cappellato, C. Eickhoff,
     N. Ferro, A. Névéol (Eds.), CLEF 2020 Labs and Workshops, Notebook Papers, CEUR-
     WS.org, 2020. URL: http://ceur-ws.org/Vol-2696/.
 [6] X. Duan, E. Naghizade, D. Spina, X. Zhang, RMIT at PAN-CLEF 2020: Profiling Fake News
     Spreaders on Twitter—Notebook for PAN at CLEF 2020, in: L. Cappellato, C. Eickhoff,
     N. Ferro, A. Névéol (Eds.), CLEF 2020 Labs and Workshops, Notebook Papers, CEUR-
     WS.org, 2020. URL: http://ceur-ws.org/Vol-2696/.
 [7] Á. Carracedo, R. J. Mondéjar, Profiling Hate Speech Spreaders on Twitter—Notebook for
     PAN at CLEF 2021, in: G. Faggioli, N. Ferro, A. Joly, M. Maistro, F. Piroi (Eds.), CLEF 2021
     Labs and Workshops, Notebook Papers, CEUR-WS.org, 2021. URL: http://ceur-ws.org/
     Vol-2936/paper-152.pdf.
 [8] C. Bagdon, Profiling Spreaders of Hate Speech with N-grams and RoBERTa—Notebook
     for PAN at CLEF 2021, in: G. Faggioli, N. Ferro, A. Joly, M. Maistro, F. Piroi (Eds.), CLEF
     2021 Labs and Workshops, Notebook Papers, CEUR-WS.org, 2021. URL: http://ceur-ws.
     org/Vol-2936/paper-155.pdf.
 [9] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,
     V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, arXiv preprint
     arXiv:1907.11692 (2019).
[10] O.-B. Reynier, C. Berta, R. Francisco, R. Paolo, F. Elisabetta, Profiling Irony and Stereotype
     Spreaders on Twitter (IROSTEREO) at PAN 2022, in: CLEF 2022 Labs and Workshops,
     Notebook Papers, CEUR-WS.org, 2022.
[11] J. Bevendorff, B. Chulvi, E. Fersini, A. Heini, M. Kestemont, K. Kredens, M. Mayerl,
     R. Ortega-Bueno, P. Pezik, M. Potthast, F. Rangel, P. Rosso, E. Stamatatos, B. Stein, M. Wieg-
     mann, M. Wolska, E. Zangerle, Overview of PAN 2022: Authorship Verification, Profiling
     Irony and Stereotype Spreaders, and Style Change Detection, in: M. D. E. F. S. C. M. G. P. A.
     H. M. P. G. F. N. F. Alberto Barron-Cedeno, Giovanni Da San Martino (Ed.), Experimental
     IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Thirteenth
     International Conference of the CLEF Association (CLEF 2022), volume 13390 of Lecture
     Notes in Computer Science, Springer, 2022.
[12] M. Potthast, T. Gollub, M. Wiegmann, B. Stein, TIRA Integrated Research Architecture,
     in: N. Ferro, C. Peters (Eds.), Information Retrieval Evaluation in a Changing World, The
     Information Retrieval Series, Springer, Berlin Heidelberg New York, 2019. doi:10.1007/
     978-3-030-22948-1\_5.
[13] H. Wu, Y. Liu, J. Wang, Review of text classification methods on deep learning, Comput.
     Mater. Contin 63 (2020) 1309–1321.
[14] S. Hashida, K. Tamura, T. Sakai, Classifying tweets using convolutional neural networks
     with multi-channel distributed representation, IAENG International Journal of Computer
     Science 46 (2019) 68–75.
[15] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
     transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).