ghostwriter19 @ SardiStance: Generating new Tweets to Classify SardiStance EVALITA 2020 Political Tweets Mauro Bennici You Are My Guide Torino mauro@youaremyguide.com stessi utenti per mezzo di hashtag e Abstract1 menzioni. English. Understanding the events and the In questa ricerca mostrerò come un dominant thought is of great help to con- semplice traduttore può essere usato per vey the desired message to our potential portare a fattor comune stili, lessico, audience, be it marketing or political grammatica e altre caratteristiche che propaganda. portano ognuno di noi ad essere unico nel modo di esprimersi. Succeeding while the event is still ongo- ing is of vital importance to prepare alerts 1 Introduction that require immediate action. Each of us has a unique way of writing. However, A micro message platform like Twitter is the fewer options we have to experience ourselves the ideal place to be able to read a large to express our concept, the more the necessary amount of data linked to a theme and self- synthesis leads to the loss of precious information categorized by its users using hashtags to accurately assess our real intentions. and mentions. Furthermore, the more the subject is debated, the In this research, I will show how a simple more changes in style and tone occur. The conver- translator can be used to bring styles, vo- sation becomes full of irony or aggressive. Extrap- cabulary, grammar, and other characteris- olating a single line is dangerous without context. tics to a common factor that leads each of The same sentence can have different interpreta- us to be unique in the way we express our- tions depending on the moment in which it is pro- selves. nounced, the audience it is intended for, the place where you are, in the historical period in which it Italiano. Comprendere gli eventi e il was composed. pensiero dominante è di grande aiuto per veicolare alla nostra potenziale audience il My hypothesis is that we can translate all these messaggio desiderato sia esso di different styles into a single "language style" that marketing o di propaganda politica. fully expresses the real intentions of the writer. The challenge is to understand when a user has Riuscirci mentre l'evento è ancora in corso expressed a comment in favor, against, or neutral è di vitale importanza per predisporre alert towards the Sardines' Italian political movement. che richiedono un intervento immediato. The research was carried out for the SardiStance Una piattaforma di micro messaggi come (Cignarella et al., 2020) task in the EVALITA Twitter è il luogo ideale per poter leggere 2020 (Basile et al., 2020). Two models were cre- una grande quantità di dati legata ad un ated for the Task 1, but they also performed well tema, e spesso auto categorizzati dai suoi on the Task 2. 1 Copyright ©️2020 for this paper by its authors. Use permitted under Creative Commons License Attribu- tion 4.0 International (CC BY 4.0). 2 Description of the system In order to validate my hypotheses, I used the Al- BERTo model, created from tweets, (Polignano at The two tasks are similar. In Task A, it is neces- al., 2019) and an auto training system such as sary to classify the stance of a tweet based only on Ktrain2, a framework that wrap TensorFlow3, to the text of the tweet. Task A is divided into two classify the tweets. To avoid manual error and in- subtasks: voluntary optimization, I used the autofit option.  Constrained. It is allowed to use additional First, I wrote a series of algorithms to make the resources such as a Lexicon but no other re- texts to be compared homogeneous. sources (such as labeled tweets) to help the The first one was to break up the composed training process. hashtags into sentences and words.  Unconstrained. Where each resource used For example, using capital letters as a separator: must be reported in the final report.  #IoStoConLeSardine has become "io sto con le sardine" ["I'm with sardines"]. In Task B, you can use the context information  #NessunoTocchiLeSardine has become provided by the post author. Additional infor- "nessuno tocchi le sardine"["nobody mation refers: touches the sardines"].  to post statistics (favors, retweets, reply, As a second step, I made sure to remove repeated source) vowels in a sentence, such as:  to the author's information (number of  "Svegliaaaa" to get the word "Sveglia" posts, number of followers, emoji in the [Wake up!]. bio) I also replaced the word sardines with "PartitoPo-  to the author's circle of relationships liticoS" ["PoliticalPartyS"] to prevent the entity (friends, replies, retweets, and quotes) from being mistaken for the fish that is its symbol. I did not remove any stop words because it is use- The research focuses on Task A Constrained. ful to create the translation system. Considering the constraints of Task A, it is not At this point, I made a copy of the dataset to trans- possible to access any additional information late it. I used the spaCy 4 language functions of other than the text of the tweet, I concentrated on POS tagging, Dependency Parse, and Entity understanding how to clean it up. Recognition to have all the essential components of my translator. The Training dataset contains:  the tweet ID The translator is a simple text representation. It is a matter of rewriting the sentence following the  the user ID scheme:  the text  subject adjectives  the label  subjects  verb in the infinitive form The labels options are:  adjectives objects  Against  objects  Favor  exclamations / other words  Neutral / None To be sure to do not use any data except the text, At this stage, the words are not modified to make the user id, useful for Task B, was discarded. the sentence grammatically correct. Words are ex- changed places, only the verb are modified to the 2 4 https://github.com/amaiya/ktrain https://spacy.io/api/annotation 3 https://www.tensorflow.org/ infinitive form. The entities of type person [PER] ghostwriter19_Task_A_2_c F1-Score take precedence over others. Against 0.70 The translator concentrates its attention on the as- pect inside the sentences to be sure to do not re- Favor 0.50 move valid sentiment polarity words (Barbieri et al, 2016). And to avoid to lose them in a round- Neutral 0.32 trip translation activity on translation services Table 3: F1-score details of model 2 (Marivate & Sefara, 2020). The attempt to repre- sent the text in a more recognizable and identifia- The problem is evident. Model 1 has a more chal- ble form for an algorithm passes from the fact that lenging time distinguishing the favor tweets from it can still recognize the entities described and the neutral ones. The good news is that both the mod- polarity expressed for each of them. For this pur- els overcame the estimated baseline. pose, the translator makes several attempts to fit words into their suggested position. 2.2 Hashtags and Mentions Finally, I trained two models with the Ktrain Thinking that on Twitter the hashtags are also framework. The model 1, which use the translated used for classification purposes, the operation that tweets, was submitted as ghost- replaces them was modified. Now the hashtags are writer19_Task_A_1_c. The model 2, trained with added at the end of the new tweets. Also, the men- the only cleaned tweets, was submitted as ghost- tions are considered and processed as hashtags writer19_Task_A_2_c. (table 4). 2.1 First results Model F1-Score The model will be evaluated with the F1-score. The main score is the average of the F1-score of ghostwriter19_Task_A_1_c 0.5822 the Favor tweets and the F1-score of the Against ghostwriter19_Task_A_2_c 0.6004 tweets. Estimated Baseline 0.5386 When comparing the two models, the first result is that the translated tweets performed worse, al- Table 4: Model 1 with hashtags and mentions in the trans- lated tweets beit by a few percentage points (table 1). Analyzing the results in detail (table 5), we can Model F1-Score see that: ghostwriter19_Task_A_1_c 0.5613 ghostwriter19_Task_A_1_c F1-Score ghostwriter19_Task_A_2_c 0.6004 Against 0.71 Estimated Baseline 0.5386 Favor 0.45 Table 1: First results Neutral 0.41 Analyzing the results of both the models in detail Table 5: F1-score details of model 1 with hashtags and (table 2 and 3), we have that: mentions in the translated tweets ghostwriter19_Task_A_1_c F1-Score The model gained two percentage points for both Against and Favor, compared with a one-point Against 0.69 loss in Neutral. Unfortunately, it still remains two points below the model 2, with the only cleaned Favor 0.43 tweets. Neutral 0.42 Table 2: F1-score details of model 1 2.3 Passive verbs 3 Results Analyzing the new texts generated, I noticed that Model 1 was ultimately 3 percentage points better essential information was lost by putting all the than Model 2 with the Training dataset. The best verbs in the infinitive. If the verb was in the pas- performance of the model was also confirmed sive form, the subject and object of the sentence with Test datasets, with 2.5 percentage points of were reversed. At the same time, I noticed that advantage. very long tweets contained more than one sen- tence. 3.1 Results for Task A The final results with the Test dataset are: I modified the translator to consider passive and active verbs, swapping the sentence's subject and Model F1-score object if necessary. The hashtags inserted at the end of the tweet only left at the end of the new ghostwriter19_Task_A_1_c 0.6257 tweet generated (table 6). ghostwriter19_Task_A_2_c 0.6004 Model F1-Score Baseline 0.5784 Table 8: Test dataset results for Task A ghostwriter19_Task_A_1_c 0.6306 The model 1 is about 7.5% better than the baseline ghostwriter19_Task_A_2_c 0.6004 (table 8). I remember that both models were trained with the Estimated Baseline 0.5386 autofit option, so without any particular study, to Table 6: Model 1 with hashtags and mentions in the trans- validate whether a "translation" of the original lated tweets, plus active / passive verbs text could bring apparent advantages. Analyzing the results in detail (table 7), we can 3.2 Results for Task B see that: Although no context information was used, I still ghostwriter19_Task_A_1_c F1-Score proposed the predictions for Task A to Task B. The final results with the Test dataset are: Against 0.76 Model F1-score Favor 0.50 ghostwriter19_Task_A_1_c 0.6257 Neutral 0.40 ghostwriter19_Task_A_2_c 0.6004 Table 7: F1-score details of model 1 with hashtags and mentions in the translated tweets, plus active / passive verbs Baseline 0.6284 The model gained five percentage points for Table 9: Test dataset results for Task B Against and Favor tweets, compared with a one- point more loss for Neutral ones. Now the transla- Even if model 1 was not able to reach the pro- tion model is the best model. posed baseline, the difference between the two systems is 0.4% (table 9). The detailed results of the models are showed in the tables 10 and 11. 3.3 Detailed results for Task A model f-avg prec_a prec_f prec_n recall_a recall_f recall_n f_a f_f f_n 1_c 0.6257 0.8106 0.4709 0.3226 0.6981 0.5357 0.4651 0.7502 0.5012 0.3810 2_c 0.6004 0.8094 0.4772 0.2921 0.6523 0.4796 0.5349 0.7224 0.4784 0.3778 baseline 0.5784 0.7549 0.3975 0.2589 0.6806 0.4949 0.2965 0.7158 0.4409 0.2764 Table 10: TASK A detailed results of the proposed models compared to the baseline model. 3.4 Detailed results for Task B model f-avg prec_a prec_f prec_n recall_a recall_f recall_n f_a f_f f_n 1_c 0.6257 0.8106 0.4709 0.3226 0.6981 0.5357 0.4651 0.7502 0.5012 0.3810 2_c 0.6004 0.8094 0.4772 0.2921 0.6523 0.4796 0.5349 0.7224 0.4784 0.3778 baseline 0.6284 0.7845 0.4506 0.3054 0.7507 0.5357 0.2965 0.7672 0.4895 0.3009 Table 11: TASK B detailed results of the proposed models compared to the baseline model. 4 Conclusion In a preliminary way, the final results demonstrate Bennici, M., & Portocarrero, X. S. (2018). Ensemble that it is possible to obtain an improvement of the for aspect-based sentiment analysis. In Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo predictions by reducing the differences of expres- Rosso, editors, Proceedings of the 6th evaluation sion to a predetermined structure. campaign of Natural Language Processing and Speech tools for Italian (EVALITA’18), Turin, Italy. The system is, however, right now, more efficient CEUR-WS.org. in terms of training times and final scores than en- semble systems of Bi-LSTM, which were used successfully up to 2 years ago (Bennici & Porto- Cignarella, A. T., Lai, M., Bosco, C., Patti, V., & carrero, 2018). Rosso, P. (2020). SardiStance@EVALITA2020: Overview of the Task on Stance Detection in Italian The next step is also to optimize the model's train- Tweets. In Proceedings of the 7th Evaluation Cam- ing to ascertain that the performance gain is main- paign of Natural Language Processing and Speech Tools for Italian (EVALITA 2020). CEUR-WS.org. tained and in what percentage. At the same time, the translator can be improved by switching to a sequence-to-sequence system for a meaningful Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mo- and efficient text representation that will include, hamed, A., Levy, O., . . . Zettlemoyer, L. (2019, Octo- among other things, the change of every words ber 29). BART: Denoising Sequence-to-Sequence Pre- forms accordingly with the grammar and the orig- training for Natural Language Generation, Translation, inal intention of the writers (Lewis et al., 2019). and Comprehension. https://arxiv.org/abs/1910.13461 References Marivate, V., & Sefara, T. (2020). Improving Short Text Classification Through Global Augmentation Methods. Lecture Notes in Computer Science Ma- Barbieri, F., Basile, V., Croce, D., Nissim, M., chine Learning and Knowledge Extraction, 385- Novielli, N., & Patti, V. (2016). Overview of the 399. doi:10.1007/978-3-030-57321-8_21 Evalita 2016 SENTIment POLarity Classification Task. In Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) & Fifth Polignano, M., Basile, P., de Gemmis, M., Semeraro, Evaluation Campaign of Natural Language Pro- G., & Basile, V. (2019). Alberto: Italian bert lan- cessing and Speech Tools for Italian. Final Work- guage understanding model for nlp challenging shop (EVALITA 2016), Naples, Italy. CEUR- tasks based on tweets. In Proceedings of the Sixth WS.org. Italian Conference on Computational Linguistics (CLiC-it 2019). CEUR-WS.org. Basile, V., Croce, D., Di Maro, M., & Passaro, L. (2020). EVALITA 2020: Overview of the 7th Eval- uation Campaign of Natural Language Processing and Speech Tools for Italian. In Proceedings of Sev- enth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2020), CEUR-WS.org.