=Paper=
{{Paper
|id=Vol-2738/paper9
|storemode=property
|title=Native Sentiment Analysis Tools vs. Translation Services - Comparing GerVADER and VADER
|pdfUrl=https://ceur-ws.org/Vol-2738/LWDA2020_paper_9.pdf
|volume=Vol-2738
|authors=Karsten Tymann,Louis Steinkamp,Oxana Zhurakovskaya,Carsten Gips
|dblpUrl=https://dblp.org/rec/conf/lwa/TymannSZG20
}}
==Native Sentiment Analysis Tools vs. Translation Services - Comparing GerVADER and VADER==
Native sentiment analysis tools vs. translation services - Comparing GerVADER and VADER Karsten Michael Tymann, Louis Steinkamp, Oxana Zhurakovskaya, and Carsten Gips FH Bielefeld University of Applied Sciences, Minden, Germany ktymann@fh-bielefeld.de, louis.steinkamp@fh-bielefeld.de, oxana.zhurakovskaya@fh-bielefeld.de, carsten.gips@fh-bielefeld.de https://www.fh-bielefeld.de Abstract. VADER is a rule-based sentiment analysis tool for English texts with a social media focus. GerVADER is a German adaptation of VADER, which was developed following the steps of VADER’s develop- ment process. VADER showed high F1 scores especially for the social me- dia domain, whereas the German adaptation achieved much lower results within the same domain, although on other test data. In this work we examine the question of whether these differences are language-specific. Therefore we apply an improved version of GerVADER to German texts and compare the results with the application of VADER to the same texts that are automatically translated into English. The benchmarking showed, that the translation combined with VADER achieves up to 5% higher F1 scores in all test cases, which can be explained by the transla- tion tools automatic fixing of flawed sentences. However, native language tools can still be viable, since it saves time and costs and does not need another dependency to a third party service. Keywords: VADER· GerVADER· sentiment analysis· translation 1 Introduction Sentiment analysis describes the process of automatically rating texts or sen- tences with a sentiment value. The sentiment value ranges from negative, to neutral to positive and can be expressed as a numeric value or a classification in one of the three sentiment categories. Compared to machine learning based approaches classification can also be done by rule-based algorithms which have the advantage that they do not require any training data. However, developing a rule-based tool requires linguistic knowledge and is significantly more depen- dent on the target language. Hence one can not simply transfer one language’s features, e.g. German, to another language, e.g. English. The languages can dif- fer in their grammar and overall sentence structure, making it very error prone Copyright c 2020 by the paper’s authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). to simply transfer the algorithm to another language. How can negation be de- tected, or what is the meaning of a specific punctuation? Are there words that can have different meanings in different contexts and how can you derive those? Machine learning approaches simply train on lots of annotated data and extract the features with techniques like word embedding, but for a rule-based approach the developer has to design the process of detecting word patterns themselves. While a rule-based tool does not need bootstrap data, it will still need some form of lexicon with individual words or phrases to derive sentiments from. These lex- icons are usually created by humans and are often rated by different people to offer an average sentiment value. In this work, which was part of a student project at Bielefeld University of Applied Sciences, we will first discuss some improvements to our rule-based senti- ment analysis tool GerVADER [5]. Second, we examine the question whether the effort to develop adaptations in the native language like GerVADER is worth- while at all or whether one could achieve similarly good results by translating the corpora to be examined into English and using VADER [2] subsequently for the analysis. In order to investigate on this, we translated our test corpora to the English language with 3rd party tools and benchmarked the data with the English tool VADER. 2 Background 2.1 VADER & GerVADER VADER is an abbreviation for “Valence Aware Dictionary and sEntiment Rea- soner” and is free to use. It is a rule based tool for analyzing sentiments for En- glish sentences. VADER managed to outperform other lexicon-based approaches as well as machine learning models. Especially in the domain of social media VADER achieved high F1 scores of up to 96%. [2] GerVADER is an adaptation of the VADER tool for the German language. The process of VADER has been replicated in some steps, such as the crowd rating for the lexicon which is based on the SentiWS lexicon [3], while others have been simply transferred to the German language, such as the heuristics. GerVADER is as well free to use. [5] 2.2 Benchmarking corpora SCARE is a corpus consisting of Google Play Store reviews. The reviews are categorized by their star ratings (1 to 5) and are split into 11 app categories. In total there are over 800.000 user reviews. [4] The SB10k corpus consists of German tweets that are humanly labeled into the three sentiment categories: positive, neutral, negative. It consists of almost 10.000 tweets. Both corpora will be used for benchmarking purposes. [1] 2.3 Translation tools For translating our test data, we have mostly relied on Googles translation ser- vice. One of the datasets has been additionally translated with MyMemory. Googles translation service is based on the Google Neural Machine Trans- lation (GNMT) system. Its hybrid model consists of a Transformer [6] encoder and RNN decoder. The learning is based on sequence-to-sequence neural network learning and is a mix between character and word-delimited models. [7] MyMemory1 is a large collection of Translation Memories that are collected and provided by humans and organizations. The translations are saved as words or sequences in databases which can then be matched by the users input. As of now there are over 4 billion human contributions. 3 Process The process section is divided into two subsections. In the first we analyze the flaws of GerVADER and how we improved the algorithm. In the second subsec- tion we will give insight on the benchmarking itself. 3.1 Flaws in GerVADER The analysis of the initial version of GerVADER [5] showed three flaws that promised room for improvement. Firstly the negation detection is inaccurate and can not be converted to the German language by simply translating the negation keywords (e.g. ’not’). Secondly booster words (e.g. ’super’, ’very’) are sometimes the only words with a sentiment meaning in a sentence, but they do not get noticed, since they simply serve as booster for following words valences. Thus the sentences receive a neutral rating, although the booster word itself might carry sentiment meaning (e.g. ’super’). Thirdly misspelled words do not get noticed, since the words have to be written exactly like in the lexicon. For every problem case we developed test corpora so that we were able to tell whether our changes improved the overall rating. Details on the changes can be found on GitHub2 . 3.2 VADER vs. GerVADER Both VADER tools cover their own languages. A question however arises whether it even makes sense to translate a sentiment analysis tool to one’s native lan- guage. To investigate on this we translated our test corpora with Google and MyMemory. For MyMemory only the SB10k corpus was used, whereas for Google Trans- late we tested multiple corpora (SB10k, SCARE). Additionally we constructed a SCARE Balanced corpus, consisting of 400 positive, 400 negative and 400 neu- tral reviews of each SCARE corpus file. Thus a balanced file of all 3 sentiments is built, consisting of 13.200 entries. 1 MyMemory by translated LABS https://mymemory.translated.net/ 2 GitHub - GerVADER https://github.com/KarstenAMF/GerVADER Table 1. Benchmarking results for GerVADER and VADER (F1-3: mean of positive, negative, neutral F1-scores) Tool Corpus F1 F1-3 GerVADER2.0 SB10k 39,63% 38,97% VADER (Google) SB10k 42,91% 44,20% VADER (MyMemory) SB10k 42,95% 44,44% GerVADER2.0 SportNews (SCARE) 70,26% 50,98% VADER (Google) SportNews (SCARE) 71,62% 51,27% GerVADER2.0 SCARE Balanced (SCARE) 55,37% 44,06% VADER (Google) SCARE Balanced (SCARE) 58,52% 45,69% 4 Results GerVADER improved in all three areas for our test corpora by adjusting the original algorithm rules as well as adding new features such as fuzzy-matching. Thus the overall classification score of German texts has increased. When comparing VADER with GerVADER Table 1 shows that VADER out- performs GerVADER of up to 5%. Even with the improvements in GerVADER, it is still outmatched. This is due to the fact, that translation tools do not just translate sentences word by word, but also consist of features such as fuzzy- matching and entity or POS (part-of-speech) tagging. Those are features, that we have partly integrated into GerVADER as well. Googles API is pre-trained with methods of Deep Learning on millions of data and adjusted to word and phrase sequences and not a simple word-to-word translation. Therefore it goes beyond a simple translation mechanism. As a result, spelling errors are corrected and the overall structure of the sentences is adapted to the desired output lan- guage. Thus it is not just a simple translation but also a text correction, which may explain the better results for VADER in the benchmark. 5 Conclusion & Future work While GerVADER has been improved, looking at the translation comparison the question arises whether GerVADER serves any purpose. Developing a native language adopted tool is challenging and has lots of potential for creating new flaws. It requires linguistic knowledge in the target language, but this allows one to address language specific characteristics more appropriately than with a translation. Also, the translation service is an additional dependency, which can be problematic in factors of costs, time and complexity. With the current version it might not be worth it to trade GerVADER for VADER for a maximum of 5% F1 score improvement. However, translation tools handle the linguistic features for the developer and are therefore an interesting research topic for VADER and similar tools that are available in several languages. References [1] Mark Cieliebak, Jan Deriu, Dominic Egger, and Fatih Uzdilli. “A Twit- ter Corpus and Benchmark Resources for German Sentiment Analysis.” In: Proceedings of the 4th International Workshop on Natural Language Pro- cessing for Social Media (SocialNLP 2017)”, Valencia, Spain, 2017. 2017. doi: 10.18653/v1/W17-1106. [2] C. Hutto and Eric Gilbert. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. 2014. url: https://www.aaai. org/ocs/index.php/ICWSM/ICWSM14/paper/view/8109. [3] R. Remus, U. Quasthoff, and G. Heyer. “SentiWS - a Publicly Available German-language Resource for Sentiment Analysis.” In: Proceedings of the 7th International Language Ressources and Evaluation (LREC’10), pp. 1168- 1171. (2010). 2010. [4] Mario Sänger, Ulf Leser, Steffen Kemmerer, Peter Adolphs, and Roman Klinger. “SCARE - The Sentiment Corpus of App Reviews with Fine- grained Annotations in German”. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia, 2016. isbn: 978-2-9517408-9-1. url: https://www.aclweb.org/ anthology/L16-1178/. [5] Karsten Michael Tymann, Matthias Lutz, Patrick Palsbröker, and Carsten Gips. “GerVADER - A German adaptation of the VADER sentiment anal- ysis tool for social media texts.” In: In Proceedings of the Conference ”Ler- nen, Wissen, Daten, Analysen” (LWDA 2019), Berlin, Germany, Septem- ber 30 - October 2, 2019. 2019. url: http://ceur- ws.org/Vol- 2454/ paper_14.pdf. [6] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. “Attention Is All You Need”. In: CoRR abs/1706.03762 (2017). arXiv: 1706.03762. url: http: //arxiv.org/abs/1706.03762. [7] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”. In: CoRR abs/1609.08144 (2016). arXiv: 1609.08144. url: http://arxiv.org/abs/1609.08144.