<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>GerVADER - A German adaptation of the VADER sentiment analysis tool for social media texts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Karsten Michael Tymann</string-name>
          <email>ktymann@fh-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthias Lutz</string-name>
          <email>matthias.lutz@fh-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patrick Palsbroker</string-name>
          <email>patrick.palsbroeker@fh-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carsten Gips</string-name>
          <email>carsten.gips@fh-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FH Bielefeld University of Applied Sciences</institution>
          ,
          <addr-line>Minden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>For the English language sentiment analysis tools are fairly popular. One is called VADER [1] which o ers a rather simple process for sentiment classi cation. Due to its lexicon-based approach with a design focus on social media texts, no additional training data is required. In this paper the process of creating VADER is applied to build a German adaptation which is called GerVADER. The paper will present the concept of VADER and how a German version can be built within reasonable time. GerVADER uses SentiWS as a starting point for the lexicon, combines it with language independent parts of the VADER lexicon and copies the process of having users rate the words intensity and polarity. The next step is comprised of comparing the algorithmically changes due to the natural di erences in language between German and English. Then GerVADER is compared to the results of the SB10k [2] corpus classication which contains more than 9000 human labeled tweets. Finally GerVADER is tested with parts of the SCARE [3] dataset which contains reviews for mobile apps. The results show that GerVADER lacks some additional work to increase its classi cation accuracy, but it promises better results considering how well the original performed.</p>
      </abstract>
      <kwd-group>
        <kwd>VADER German sentiment analysis SB10k</kwd>
        <kwd>SCARE</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Sentiment analysis is often based on machine learning which requires lots of
data and sometimes even additional human work, e.g. for labeling the data
beforehand. In the German language collecting reasonable amounts of data for
machine learning is quite di cult, since not many work has been done in the
eld yet. This was the motivation of this work to build an own corpus and label a
reasonable amount of it for training purposes. Especially for the domain of social
media and micro blogging the internet is lacking up-to-date German corpora to
Copyright c 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
bootstrap one's sentiment analysis tool. While there exist corpora like SB10k or
DAI [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], those are not available to the public and can be di cult to obtain in
its entirety, given how companies like Twitter handle their data policy. Even if
one obtains the corpora it still requires lots of research and additional work to
get a running sentiment analysis tool for the German language.
      </p>
      <p>
        Another crucial factor for a sentiment analysis tool can be lexicons. For the
German language there are multiple lexicons free to use. SentiWS [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for example
is a German lexicon with polarity and intensity. For every word there are multiple
grammatical forms, e.g. the plural form of the word.
      </p>
      <p>
        GermanPolarityClues (GPC) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is another resource for a German dictionary,
however the project seems abandoned and superseded by SentiWS. Nevertheless,
it is stated that in tests GPC reached a F1 score of up to 0.88.
      </p>
      <p>Both of the mentioned lexicons are not adapted to the social media domain
and lack multiple linguistic features that are common in such a domain.</p>
      <p>This paper will show how GerVADER builds upon the good results of VADER
with its own lexicon. VADER reached great classi cation accuracy for
microblogging platforms (up to F1 = 0.96) and was able to score better results than human
raters in some cases. VADER is free to use, requires no knowledge of machine
learning and can be easily executed and expanded with Python or its multiple
adaptations in other programming languages.</p>
      <p>It will be shown how GerVADERs lexicon was built with the SentiWS
lexicon and parts of VADERs lexicon. In the next step the lexicon has been rated
by a crowd and then cleaned of ambiguous data. Afterwards the grammatical
and lexical heuristics used in VADER (e.g. negation words) have been manually
adjusted to the German language. Then GerVADER has been tested on parts
of the SB10k corpus as well as on a subset of app reviews of the SCARE corpus.
GerVADER scores mediocre ratings (F1=0.36-0.70) depending on the test
corpus and allows future tools to compete against it in the mentioned datasets. The
scores hint that GerVADER has still unexploited potential and that the
German language might need additional grammatical rules for an improved VADER
adaptation. Especially the correct classi caton of negative sentiments is lacking
as well as further testing of GerVADER with di erent corpora and its usefulness
in other domains.</p>
      <p>
        GerVADER was developed as part of a student project and will be free to
download and to use like VADER [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>In this section the original VADER tool for the English language will be
described. Furthermore, the SentiWS lexicon, which was used as a basis for
GerVADERs lexicon, as well as the corpora SB10k and SCARE, which were used
for evaluating GerVADER, will be introduced.</p>
      <sec id="sec-2-1">
        <title>VADER</title>
        <p>VADER is short for "Valence Aware Dictionary and sEntiment Reasoner" and
is available under the MIT License. The tool was published in 2014 and is
especially focused on social media texts. It uses a lexicon driven approach as well as
additional heuristics for rating the input. Since VADER is not a machine
learning approach it o ers consistent ratings and requires no training data. VADER
achieved some remarkable scores for multiple domains such as tweets, movie or
product reviews. The development of VADER can be split into seven steps:</p>
      </sec>
      <sec id="sec-2-2">
        <title>Gather lexical features of established sentiment lexicons: The cre</title>
        <p>ators of VADER rst researched for existing sentiment lexicons like LIWC,
ANEW and GI. They took parts (words) of it and integrated them into their
own VADER lexicon.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Gather lexical features characteristic for microblogging domains:</title>
        <p>Texts on social media and other microblogging platforms have their own unique
characteristics. The creators gathered emoticons, domain speci c words and
other abbreviations from these platforms and integrated them into the lexicon.</p>
        <p>
          Rate lexical feature candidates: In this step the creators gathered a
crowd in order to rate the words individually in their intensity and polarity.
Every rater received batches of 25 words, in which ve words have been intentionally
integrated that function as a gold-standard validation. If a user rates three or
more of the ve gold-standard validation words wrong, the whole batch will be
discarded. The gold-standard words have been manually set and do not seem to
be available for download. In [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] it is stated that for good results the
participants have received nancial compensation. Also, the raters have been carefully
selected. Every rater had to pass a reading comprehension test and took part
in an online sentiment training. At the end of the rating process VADER had
more than 9000 words rated with 10 individual ratings each in the range of very
negative (-4) to neutral (0) to positive (+4).
        </p>
        <p>Filtering: In this step the lexicon has been cleaned from inconclusive words.
These words were either rated neutral in total by the crowd, or the crowd was
divided over the polarity and intensity of the word. This means that the standard
deviation of the word was 2.5 or higher and thus resulting in a value that cannot
be trusted for a sentiment classi cation. After the ltering the lexicon contained
more than 7.500 words.</p>
        <p>Building human heuristics: VADER contains ve heuristics that can shift
or boost the sentiment of a sentence. These heuristics include punctuation marks,
capitalization (words in all caps), booster words (negative and positive, e.g.
words like "amazingly"), contrastive conjunctions and words that negate a
sentence (e.g. "not", "won't"). When a sentence is being rated these keywords are
identi ed and can shift or impact the rating.</p>
        <p>Evaluate heuristics: In order to evaluate how much the gathered heuristics
can in uence a sentiment, the authors have conducted a controlled experiment
with 30 tweets that have been manually modi ed into di erent versions which
include the features explained in subsection 2.1. Those tweets have been mixed
into other tweets and were again rated by a crowd. As a result, the authors
were able to analyze how much a lexical or grammatical feature can impact the
sentiment of a tweet. The ndings were integrated into VADERs heuristics.</p>
        <p>Evaluation and results: In the last step VADER was tested in four di
erent domains against other established lexicon-based approaches. The domains
included social media text, movie and product reviews as well as newspaper
articles. The results have shown that VADER outperforms every other lexicon in
every domain and is even able to outperform human raters in the domain of
social media texts. Against machine learning models (NB, ME, SVM) VADER
was ranked rst in three of the four domains. Only in the movie domain the
Naive Bayes and Maximum Entropy methods (both trained on movie corpora)
were able to reach better results than VADER (F1 = 0.75 vs. F1 = 0.61). In
summary VADER achieves good scores in the social media domain. Another
bene t of the approach is that VADER does not require any training and rates
consistently. On the downside VADER does not detect irony as well as longer
more complex sentences might be rated wrongly, since the heuristics only apply
to small ranges of words. Additionally, VADER does not detect phrases but will
rate every single word individually which can lead to wrong conclusions.
2.2</p>
      </sec>
      <sec id="sec-2-4">
        <title>SentiWS</title>
        <p>
          SentiWS is a German lexicon for sentiment analysis. It o ers 1.644 positive and
1.827 negative words with a polarity and intensity range from -1 to 1. All words
are given in their base form and additionally in variations like plural form for
nouns or tenses for verbs (see Table 1). Therefore, while only the basic form is
rated one can easily transfer the rating to its grammatical variations. SentiWS
has been lastly updated at the end of 2018 and is therefore an up-to-date resource
for a German sentiment lexicon.
SB10k is a German Twitter corpus with almost 10.000 tweets. It consists of
human labeled tweets that can be used for machine learning algorithms for
sentiment analysis. In the paper the authors compared how two di erent classi ers
(SVM and CNN) performed on the SB10k corpus as well as on two other
additional German corpora (DAI and MGS [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]). Their results showed that the best
rating for the SB10k corpus was a F1 = 0.65 classi cation. The corpus is freely
available for download (as collection of the relevant Twitter IDs) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. SB10ks
corpus and classi cation results will serve as benchmark in section 4.
2.4
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>SCARE</title>
        <p>3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Process</title>
      <p>SCARE is another corpus that o ers around 800.000 app reviews for di erent
app categories from the Google Play Store. The reviews have an id, a star rating
from 1 to 5, a review headline and a review text. Some categories and reviews
will be used for evaluating GerVADERs performance in the app review domain.
While VADER started from scratch, for GerVADER some steps can be copied
while other steps have to be replicated accordingly for the German language.
Just like VADER the development of GerVADER starts with the creation of the
lexicon, followed by the ratings of the crowd, ltering of ambiguous words, an
adaptation of the booster and negation words including some smaller changes
on the source code and lastly the classi cation tests.
3.1</p>
      <sec id="sec-3-1">
        <title>Constructing the initial lexicon</title>
        <p>The initial lexicon is based on the SentiWS dataset. Only the basic forms of the
words have been taken into consideration thus resulting in a sum of 3471 words
(1.644 positive, 1.827 negative). Additionally, unique German terms have been
added to the lexicon that are commonly used in slang expressions and on social
media platforms (see Fig. 1). Note however that only single words have been
taken account of since phrases cannot be part of the lexicon. Two sources were
taken into consideration for the additional words.</p>
        <p>
          One of them is from Langenscheidt which is a German company for
language and language-to-language dictionaries. Langenscheidt [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] is in Germany
also known for their annual ranking of teenager slang words ("Jugendworter des
Jahres" [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]). Every year Langenscheidt collects words that are commonly and
exclusively used by teenagers and young adults. This contest results in the
publication of the Top10 "Jugendworter des Jahres". The Top3 words of the years
2008-2017 have been added to the lexicon as well as the Top10 words of the year
2018 [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Only single words have been taken into consideration.
        </p>
        <p>
          The second source is a website that collects German slang words [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Several
single words have been manually selected and were added to the lexicon.
        </p>
        <p>Both sources added together have contributed 80 words to the initial lexicon
resulting in an overall size of the initial lexicon of 3.546 words. All words are
contained in the same lexicon without any initial polarity rating. In the next step
a crowd contributed with their individual ratings to the lexicon (see Fig. 1).</p>
        <p>For the validation lexicon that is used for validating a user rated batch, a
manually written lexicon was created. It consists of more than 100 words. The
most positive and most negative words in VADER (e.g. words like "death" or
"hell") were collected and manually translated by the author. The gold standard
words did not block o any lexicon words so that a word like "Tod" (i.e. "death")
was still able to be rated by the user.
For GerVADER the crowd consisted of fellow students and friends. The crowd
was introduced to the project but did not have to pass any tests or trainings.
The list of participants has not been shared so that no rater knew who the other
raters were in order to block any communication within the crowd.</p>
        <p>For the rating platform a custom-made application has been developed. The
raters have been given access to the website with everyone receiving a username
and a password. On the site were two main sections.</p>
        <p>The rst section was a tutorial section where the functionality of the site was
brie y explained.</p>
        <p>The second section was the component of rating the words in which the user
was introduced to a randomly generated batch consisting of 25 words. The server
kept track of every raters progress and returned within the batch 20 words that
the rater had not yet rated plus 5 gold standard words. The user was then able
to rate the words according to the -4 to +4 scale. After rating a batch, the user
was allowed to submit it. The server then checked whether the ratings were valid
and reviewed whether the gold standard words have been rated correctly. If three
or more words of the gold standard words were rated signi cantly di erent, the
batch has been dropped without notifying the user. The ratings were then not
saved so that the user was still able to rate the words in another batch.</p>
        <p>Since no participant was nancially compensated, motivation was a huge
factor. In order to tackle this problem each rater was linked with an animal
image. On the main page the number of already rated words was shown as
well as the animal pictures of raters who have rated one or more batches on
the according day. Thus, feelings of competition and cooperation were invoked.
This update was done one week after the release and increased the participation
rate of the crowd signi cantly. Furthermore, a graphic showing the number of
rated words by each rater was created periodically and sent to every user via
email. Both steps were necessary since the participation was overall not enough
to create a lexicon with every word having 10 individual ratings. Within one
month all words received round about 7 individual ratings.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Generating the nal lexicon</title>
        <p>Similar to VADER, words that have a neutral rating were ltered out for the
nal lexicon. Also, words with a standard deviation of 2.5 or higher have been
removed. For the nalization of the lexicon there have been two additional steps.</p>
        <p>The rst step involved the VADER lexicon. Since most emoticons and English
abbreviations are also common in German social media texts, more than 800
words of this type have been added to GerVADER with their original intensity.
Therefore, the users had not to rate common terms like "lol" or ":)" (see Fig. 1).</p>
        <p>The second step was to take into consideration that VADER does not do
any kind of pre-processing. So the words from the lexicon are directly compared
with the words of the sentence being analyzed. However, SentiWS o ers for every
word multiple grammatical forms. Therefore, those grammatical forms have been
added to the lexicon, meaning that every single one of them represents its own
word in the lexicon. They received the same rating as the basic form. With the
expansion of the lexicon by grammatical forms the size of the lexicon increased
to more than 34.000 words.
3.4</p>
      </sec>
      <sec id="sec-3-3">
        <title>Adapting the VADER heuristics</title>
        <p>The heuristics are part of the source code and are in some parts not exclusive to
the English language. Characteristics like capitalization of words or punctuation
marks convey the same meaning for both the German and English languages.</p>
        <p>Only 3 of the 5 heuristics had to be adapted for the German language. These
heuristics revolve around booster, negation and contrast conjunction words. For
GerVADER the English words have been simply translated to German.
3.5</p>
      </sec>
      <sec id="sec-3-4">
        <title>Evaluation and additional steps</title>
        <p>Before evaluation, another adjustment has been done to the algorithm. While
comparing the words from the text to the lexicon words, the inspected word is
transformed to all lower cases. Since the lexicon however contains words with
upper and lower cases that are identical and just di er from their POS tag
(e.g. "Anstieg" noun and "anstieg" verb), the lowercase transformation had to
be edited. In the lexicon, words can have a di erent intensity whether it is for
example a noun (written in capital in German) or a verb. However, users usually
do not care about the correct capitalization of the words. Thus, the following
adjustment has been made:</p>
        <sec id="sec-3-4-1">
          <title>1. Check if the currently inspected word can be found in the lexicon</title>
          <p>2. If not, transform the word to all lower cases and recheck the lexicon
3. If not, only capitalize the rst letter of the word and recheck the lexicon</p>
          <p>At the end of this process if the word has not been found in the lexicon
in any of these steps, the next word will be inspected. This adjustment allows
to match more words without relying on the user to write the word correctly
regarding the capitalization. Apart from that adjustment no further adjustments
have been done on the VADER algorithm.</p>
          <p>
            For the performance evaluation of GerVADER, the SB10k and parts of the
SCARE corpus have been used. Since tweets can be deleted, only 7.000+ tweets
(about 70%) of the original SB10k corpus could be collected. Therefore,
comparing the results to the original results of the authors is not fully reliable.
Additionally, in [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] it is left open on which 10% of the corpus the authors have
tested on. To conquer this problem, GerVADER comes with the corpus as well
as another dataset containing only 10% of the SB10k corpus. For future work
this will allow for comparison.
          </p>
          <p>For the SCARE corpus a selection of review categories has been made. Since
the classi cation labels of user reviews are given as stars (1-5), before being able
to classify the data with GerVADER the star ratings rst have to be translated
to as either positive, negative or neutral. To do this as simply as possible,
1+2star ratings are interpreted as negative, 3 as neutral and 4+5 as positive. The
headline will be merged into the comment. If the headline has no punctuation
mark at the end of the headline, a dot will be put in between the headline
and review text. The idea is to prevent that words from the headline in uence
the sentiment of the review comment. However, in some cases this might be a
problem if the user started a sentence in the headline and continued it in the
text, but we assume that these cases are neglectable rare. Only two app review
categories will be tested.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>For testing the performance of GerVADER the obtained SB10k corpus, 10%
of the SB10k corpus as well as parts of the SCARE corpus were taken into
consideration (see Table 2). For every test the precision, recall and F1 score
for every label (pos, neg, neu) is measured. The total F1 score that will be the
deciding factor in how well GerVADER and the other classi ers perform is only
calculated with the positive and negative F1 score. Additionally, the sum of
F 1pos, F 1neg and F 1neu will be calculated and called F 13.</p>
      <p>Starting with the SB10k corpus, the corpus is already human-labeled and
requires no additional work. The results for 7476 tweets (pos, neg, neu) show an
overall F1 score of 39,42% (see Table 2, No. 1). The positive F1 score is 43,54%
while the negative F1 score is 35,30%. Overall the results show a good recall for
positive tweets and a good precision for neutral tweets, however the numbers
for the other criteria and labels are less than 50% (see Fig. 2, No. 1). These
low numbers decrease the F1 scores signi cantly. The numbers show that while
most positive tweets are classi ed correctly, the numbers for the negative and
neutral tweets are less accurate. Especially for the neutral tweets many tweets
have been wrongly classi ed as positive. The negative tweets however have been
almost evenly distributed on all three labels which hints at a problem with
detecting negation words. GerVADER achieves better results with ltering out
the neutral statements beforehand. In these test cases the F1 score is circa 64%
(see Table 2, No. 2+4). Remember however that GerVADER is still rating some
tweets as neutral, so this classi cation option is still enabled. Only the neutral
statements have been ltered out, the process has been kept the same.</p>
      <p>Comparing the results with the results by the SB10k authors, one can see
that GerVADER does not outperform any classi er (see Table 2, No. 5-8). We
have to take into consideration however, that the original test corpus is not
publicly available, therefore one can question the results. Nonetheless classi ers
that have not trained on the SB10k corpus still reach 20% better results.</p>
      <p>In another test GerVADER has been tested on some parts of the SCARE
corpus (see Table 2, No. 9-14). Tests have been run with reviews referring to news
and sport news apps. Since the reviews are given with a comment as well as with
a star rating, the structure of the reviews had been altered. 1-2-star reviews are
interpreted as negative reviews, 3 as neutral and 4-5 as positive reviews.</p>
      <p>With the sport news apps results you can see that GerVADER classi es 70%
correctly. Especially the F1 score for positive labels is very good with more than
85% (see Table 2, No. 9). If we sort out the neutral reviews before classifying
it, meaning that we just classify 1, 2, 4 or 5-star ratings, so either negative or
positive labeled reviews, the F1 score raises to 72% (see Table 2, No. 10). Again,
GerVADER still rates reviews as neutral. If we now assume that neutral reviews
are always positive, since one can suggest that a user would rather review an
app that he likes than one he dislikes, we can merge the neutral numbers into
the positive ratings in both labeling cases. We then achieve a F1 score of 74,25%
with an F 1pos score of 90,72% (see Table 2, No. 11).</p>
      <p>For the news apps almost equal results have been achieved with the same 3
classi cation tests (see Table 2, No. 12-14).</p>
      <p>In summary one can see that the classi cations di er from domain to domain.
Comparing with the original VADER classi cation the F1 scores are signi cantly
lower for multiple reasons that will be discussed in the next chapter.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion and future work</title>
      <p>Although GerVADER has been tested in domains that VADER achieved its best
results, the tests show in some regards rather bad scores. The reason why the
overall F1 scores are that low is the classi cation of neutral and negative texts.</p>
      <p>Concerning the negative texts, one has to ask why so many of the data is
labeled as positive. Actual negative texts are almost as often rated positive as
they are also rated negative. Additionally, also the neutral prediction is in many
tests almost as high as both the negative and positive classi cation. Therefore
the F 1neg score for negative texts is in all tests much lower than the F 1pos score.
The reason why this is the case is how in German sentences the negation word
"nicht" can often occur at the end of the sentence. It can be very common that
the negation is at the end of the sentence (after the verb) while in English the
negation word is always in pair with the verb(s).</p>
      <p>GerVADER for example does not detect the di erence between the following
sentences:</p>
      <sec id="sec-5-1">
        <title>1. Ich mag das. (meaning: I like it - i like it)</title>
        <p>2. Ich mag das nicht. (meaning: I don't like it - i like it not)</p>
        <p>In both cases GerVADER detects the word "mag" as a positive word and
therefore calculates the overall sentiment as positive. While the negation word
"nicht" is detected, the sentiment is not shifted. The reason is that only the
sentiment ratings after the negation word are in uenced. So, if the negation
word appears at the end of the sentence it has no impact on the overall sentiment.</p>
        <p>This is a aw in the current status of GerVADER that needs to be addressed.
Such an adjustment however requires an overhaul of the algorithm in order to
be more suitable for the German language. Because English does not have such
negated sentences, such a logic is not implemented in VADER. If GerVADER
detects the negation words at the end of the sentences, the classi cation scores
will most likely increase for negated sentences. If this is the case, the overall
F1 score will also improve. Therefore, one can conclude that GerVADER in its
current state biggest aw is the detection of negative sentences.</p>
        <p>Moreover, VADER and therefore GerVADER in its current state have
problems with the negation of longer sentences, even if the negation word is at the
beginning of the sentence. For example, the following sentence is wrongly rated:
1. Ich nde nicht, dass diese Menschen wirklich freundlich sind. (rated
positive, should be negative)</p>
        <p>While the negation word "nicht" is recognized, there are no following words
with a sentiment rating except for the word "freundlich" (rated positive). But
the occurrence of the word is too far away from the negation word to shift the
sentiment of the word. So, for longer sentences the words following the negation
word are not shifted if there are too many words in between. In combination with
the already mentioned problem with negation words, this gives more insight why
many negative texts are falsely rated positive.</p>
        <p>Other than that, GerVADER also needs some other improvements. The
booster and negation words are only translated from the English original. It
lacks a real adaptation for the language. Additionally, there are no phrases
covered. Therefore, phrases like "Alles in Butter" are not detected. If you split up
the phrase there are no words with a sentiment rating, but read as a phrase it
has a positive meaning (meaning "all right").</p>
        <p>Furthermore, the lexicon might need more words. Especially German slang
words and words that are commonly used in social media texts might be lacking.
Moreover, the rating process for the current lexicon could be continued, since a
larger crowd promises more trustworthy results.</p>
        <p>Lastly a wider benchmark might show better how useful GerVADER in its
present state really is.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>
        This paper has shown that GerVADER has potential to become another useful
tool in the sentiment analysis for the German language. While the results
compared to VADER are kind of underwhelming, GerVADER promises many areas
for improvement. Given how little the algorithm has actually been altered, the
results for the app reviews clearly show that even in its current form it already
achieves remarkable results. Just like other lexicon approaches, the lexicon can
be easily expanded. Since no machine learning is needed, GerVADER can be
basically used plug-and-play and achieves much faster results without needing
any sort of training. This also results in a consistent rating, meaning that a text
will not be rated di erently, where unlike in machine learning approaches the
text might be classi ed di erently depending on the training data. If the
proposed adjustments are made, GerVADER might be one of the most viable tools
for classifying sentiments. In any case its speed and consistency are two of its
biggest strengths, plus the lexicon can be useful for other researches. Given how
well the role model performs, there is no reason to doubt that GerVADER will
achieve comparable results in the future. GerVADER is publicly available and
will be further worked on [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hutto</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Gilbert</surname>
            ,
            <given-names>E.E.</given-names>
          </string-name>
          :
          <article-title>VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text</article-title>
          .
          <source>Eighth International Conference on Weblogs and Social Media (ICWSM-14)</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cieliebak</surname>
          </string-name>
          ,
          <article-title>Mark and Deriu, Jan and Egger, Dominic and Uzdilli, Fatih: A Twitter Corpus and Benchmark Resources for German Sentiment Analysis</article-title>
          .
          <source>Social NLP @ EACL</source>
          . https://doi.org/10.18653/v1/
          <fpage>W17</fpage>
          -1106
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Mario</surname>
            <given-names>Sa</given-names>
          </string-name>
          nger, Ulf Leser, Ste en Kemmerer,
          <string-name>
            <surname>Peter Adolphs</surname>
          </string-name>
          , and
          <article-title>Roman Klinger: SCARE { The Sentiment Corpus of App Reviews with Fine-grained Annotations in German</article-title>
          .
          <source>In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)</source>
          , Portoroz, Slovenia, May
          <year>2016</year>
          .
          <article-title>European Language Resources Association (ELRA).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>R.</given-names>
            <surname>Remus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Quastho</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Heyer: SentiWS - a Publicly Available Germanlanguage Resource for Sentiment Analysis</article-title>
          .
          <source>In Proceedings of the 7th International Language Ressources and Evaluation (LREC'10)</source>
          , pp.
          <fpage>1168</fpage>
          -
          <lpage>1171</lpage>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Ulli</given-names>
            <surname>Waltinger: Sentiment Analysis Reloaded</surname>
          </string-name>
          :
          <string-name>
            <given-names>A Comparative</given-names>
            <surname>Study</surname>
          </string-name>
          <article-title>On Sentiment Polarity Identi cation Combining Machine Learning And Subjectivity Features</article-title>
          .
          <source>In Proceedings of the 6th International Conference on Web Information Systems and Technologies (WEBIST '10)</source>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Butow,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Lommatzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Ploch</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          :
          <article-title>Creation of a German Corpus for Internet News Sentiment Analysis</article-title>
          .
          <source>Project report</source>
          , Berlin Institute of Technology, AOT (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Igor</given-names>
            <surname>Mozetic</surname>
          </string-name>
          , Miha Grcar, and Jasmina Smailovic:
          <article-title>Multilingual Twitter Sentiment Classi cation: The Role of Human Annotators</article-title>
          .
          <source>PloS one</source>
          ,
          <volume>11</volume>
          (
          <issue>5</issue>
          ):
          <fpage>e0155036</fpage>
          . (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Wikipedia</surname>
          </string-name>
          , Langenscheidt. https://www.langenscheidt.
          <source>com/ Last accessed 22 Jan 2019</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Wikipedia</surname>
          </string-name>
          , Jugendwort des Jahres. https://de.wikipedia.org/wiki/Jugendwort_des_Jahres_
          <article-title>(Deutschland) Last accessed 22 Jan 2019</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Langenscheidt</surname>
          </string-name>
          ,
          <source>Jugendwort des Jahres</source>
          <year>2018</year>
          .https://www.langenscheidt.com/ jugendwort-des-jahres
          <source>Last accessed 22 Jan 2019</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. CoolSlang, German Slang Dictionary. https://www.coolslang.
          <source>com/ Last accessed 22 Jan 2019</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Tymann</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : GerVADER. https://github.com/KarstenAMF/GerVADER
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>