<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparing Brand Perception Through Exploratory Sentiment Analysis in Social Media</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mario Cichonczyk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carsten Gips</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bielefeld University of Applied Sciences</institution>
          ,
          <addr-line>Minden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The presented student project outlines a natural language processing pipeline for brand metric comparison in the Twitter ecosystem. Sentiment calculation for an unlabeled data set is demonstrated and calibrated using the statistical Central Limit Theorem as a guidance to anchor the sentiment indicator in a homogeneous market. The process is evaluated by comparing the sentimental market performance of three leading German logistics companies. A support for the value of sentiment analysis for automated customer feedback analysis in real-time is concluded.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        in the Twitter ecosystem. To achieve this objective, the hashtag space of
leading German logistics companies was analyzed and collected as an exemplary
use case. An analytics pipeline was constructed and applied to the dataset to
better understand the customer base and approximate overall consumer
sentiment towards the logistics brands, linking a scalar metric to consensus opinion
as outlined in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This approach can show the value of sentiment analysis in
social media analysis for customer satisfaction research of single companies and
it will also allow for contextualization in regard to competitors. As social media
is essentially a means of open many-to-many communication, the same pipeline
can be applied to feedback aimed towards other brands and will then allow a
more empirical comparison.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Approach</title>
      <p>
        The main contribution of this work aims to show how current ideas in natural
language processing may be applied to take advantage of automated social media
brand perception analysis. We would like to outline our course of action and the
thought process used in this applied research project as it may inspire other
research towards the stated problem case. For this purpose, two typical marketing
query questions were choosen as an example to demonstrate the approach:
1. How do leading German consumer logistics brands rank in Twitter user
approval?
2. How can Twitter meta information augment user approval analysis?
Related works predominantly make use of machine learning on labeled data with
the primary goal of method testing and evaluation [28][
        <xref ref-type="bibr" rid="ref15">15</xref>
        ][26]. These approaches
perform well regarding categorization in sentiment classes, but they cannot be
directly transferred to the investigated problem without manual data annotation
as a pre-step for model training. Real-world data is unlabeled and seed datasets
do not yet exist for the highly speci c domain of German logistics, hence
seemingly producing an unsupervised clustering problem with feature engineering
and selection in focus.
      </p>
      <p>The speci c characteristics of brand perception analysis allows for a third
option. Munoz &amp; Kumar [23] point out that perceptional metrics help to gauge the
e ectiveness of brand-building activities across points of customer interaction.
Brand pro ling can be achieved by xing an indicator within a metric of a
market and then comparing this indicator to a companies brand and its competitors
[23]. Therefore, the aim of the presented approach is to acquire such a polarity
indicator. Tools, models and methods from both supervised and unsupervised
natural language classi cation and processing become available with this
modi ed problem de nition. The constructed data pipeline adheres to this premise
and is presented in detail in the following paragraphs.
2.1</p>
      <sec id="sec-2-1">
        <title>Collection and Transformation</title>
        <p>
          Data acquisition was implemented using the Tweepy1 library for the Python
programming language to interact with the commercially available Twitter API2
for developers. Tweepy was chosen for its native ability to handle rate limited free
access tokens and dynamic adjustment of tra c bandwidth. To stay within these
limits and to focus on customer opinion, only the top three logistics companies
for the courier, express and parcel services in Germany where selected, which as
of 2018 are DHL, Hermes and DPD [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Therefore, the Tweepy lter
"#dhl OR #hermes OR #dpd OR dhl OR hermes OR dpd"
was constructed and 10594 tweets where collected over 3 weeks in the
winter of 2018, limited to tweets with a set "German" language ag. The Twitter
API returns tweets in JSON format, containing a large number of partly
redundant data elds. All JSON data was parsed and attributes of interest to
the research question where selected and named accordingly. The selected and
renamed elds where "usr id", "tweet id", "usr followers", "timestamp",
"favorites", "retweets", "client", "hashtags" and the "text" itself. After
transformation, the tweets where stacked as rows to form a 10594x9 matrix M.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Data Exploration, Filtering and Feature Construction</title>
        <p>First exploration and inspection of the dataset had shown that the Tweepy lter
was not su cient to separate the communications concerning the three selected
logistics brands in a satisfactory manner. The data contained tweets which did
not directly relate to the domain of interest. It became clear that a more in-depth
analysis of the communication topics was necessary to further sanitize the results.
Tweet topic modeling through utilization of keyword annotations - better known
as "hashtags" - was identi ed to hold valuable advantages for the solution of this
problem [31][29]. Based on the work presented by Wang, Wei &amp; Zhang [30], a
graph model was de ned by the set of all hashtags H = fh0; h1; :::; hmg contained
within the dataset, where each hashtag hi represents a node weighted by its
global occurence count and is associated with a set of tweets Tk = f 0; 1; :::; ng
in which it occurs. The set of edges E consists of a link between two hashtags if
they co-occur in the same tweet. The weight of an edge eij between hi and hj is
incremented for each co-occurence of hi and hj . The graph model HG= fH; E g
was used to isolate the logistics subgraph of interest which reduced T , and
thereby the rows of M, to only hold tweets relevant to the research.
The "hashtags" column of M was transformed to HG and stored in
GEXFformat3. This step made it possible to import the hashtag graph into existing
graph analysis software4, providing access to pre-optimized methods. The Yifan</p>
        <sec id="sec-2-2-1">
          <title>1 https://github.com/tweepy/tweepy</title>
          <p>2 https://developer.twitter.com/
3 https://networkx.github.io/documentation/stable/reference/</p>
          <p>
            readwrite/gexf.html
4 https://gephi.org/
Hu multilevel layout algorithm was chosen for its speed and good quality on
large graphs [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] to achieve topic modeling. With this technique, all node and
edge weights are used as force simulations of attraction and repulsion, creating a
dispersed planar graph projection. Clusters of hashtags in frequent co-occurence
form topic subgraphs while unrelated topics drift apart. After the embedding
step, topics weakly connected to logistics can be visually identi ed.
The collective topics "dhl", "dpd" and "hermes" were found to form a distinct
subgraph with minor outliers regarding "jobs" topics. Stronger interrelations
to unrelated themes existed for the single topic "hermes". This e ect can be
attributed to the ambiguity of the term. Subgraphs concerning "fashion" and
"export politics" where intertwined with tweets about the logistics company.
The set of hashtags identifying those outlying tweets was added as a lter to
remove non-logistics rows from M, resulting in an on-topic dataset.
Tweets which did not resemble a consumer opinion where ltered out, e.g.
advertisements and news where undesirable data as they would distort the sentiment
analysis. Therefore, the column "client" was investigated. Under the
assumption that consumer opinion is predominantly voiced through consumer client
software, other agents can be ignored. All user agents of the "client" column
were manually mapped to the categories "Android", "iOS", "Desktop", "Other"
and "Professional". "Other" resembles multi-platform clients which cannot
directly be associated with a single platform. The "Professional" label identi es all
user agents known to be developed for commercial usage, e.g. tweet automation
or social media management software. All "Professional" rows where removed.
Commercial users who rely on consumer client software are exempt from this
pruning step. These were further analyzed by the count of their followers. For all
rows in M, the unique "usr id" elds where collected and then ranked by their
"usr followers" value. Since the Twitter follower count adheres to the power law
[22], manually inspecting the top followed pro les was su cient to mark
opinion bearing commercial accounts for removal and neutralize their in uence on
global sentiment. For professional tweets which may have been remaining in the
data after all lters, it was assumed that their opinion in ux was signi cantly
outweighed by the now superior consumer class tweets.
          </p>
          <p>To nally conclude ltering and feature construction, M was extended by the
columns "is dhl", "is dpd", "is hermes", "hour" and "weekday". With these
extra columns, searching and querying the dataset for the following steps is faster
and more convenient. Finding tweets associated with speci c brands is done by
simple boolean masking, resulting in desired subsets of M without having to
repeatedly parse and search tweet content for each query. Aggregation over time
windows is sped up by pre-splitting the complex timestamp format of the
Twitter API.</p>
          <p>After the outlined data sanitation treatment, actual text analysis was possible.
2.3</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Text Preprocessing</title>
        <p>
          The Natural Language Tool Kit (NLTK) o ers the basic functionalities
necessary for natural language processing [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. As such, it was used to pre-process the
tweet content information formulated by Twitter users. The "text" column was
tokenized and all single tokens of interest were added as a new "tokens" column,
containing a list of tokens for each tweet. All tokens were stripped of non-textual
information, URLs removed, umlauts converted and otherwise undesirable
information ltered out. The sanitized tokens represent all German words of a tweet,
but not all words hold analytic information. NLTK provides a list of stop words
for several languages, including German. Accordingly, all tokens where checked
against the German NLTK stop word list and removed if they where considered
a match. Afterwards, the term frequency of all tokens was calculated to identify
other possible stop words. Several high ranking tokens were found to lack value
for analysis:
"dhl, dpd, hermes, paket, mal, dass, schon, kommt, immer, seit,
fuer"
These were added to the stop word list and also removed. Brand name tokens
are redundant as their information is already stored in M (see Paragraph 2.2).
With the token list constructed, semantic polarity estimation can be examined
on a word level.
2.4
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Token Polarity</title>
        <p>
          Since algorithms do not by default have any understanding of the emotional
impact of word semantics, sentiment analysis relies on human consensus opinion.
Databases with annotated word polarities between [
          <xref ref-type="bibr" rid="ref1">-1, 1</xref>
          ] for negative and
positive sentiment respectively are used to look up a scalar value for a given token.
For the German language, such a dictionary exists in the form of the SentiWS
[27] project. As a rst, naive approach, the pipeline's sentiment resolver tries to
annotate the tokens of M directly through a query to SentiWS. If the word is
present in the dictionary, no further search is required and its sentiment can be
returned. SentiWS contains about 3500 basic forms and 34000 in ections.
Despite this size, less than 15% of all tweet tokens where found in SentiWS. This
result is not surprising given the combination of language syntax complexity, a
raw tweet tokens count of more than 80000 and the e ort involved in dictionary
building. It was expected that only a low number of entries would directly match.
Therefore, each token that could not be resolved in SentiWS is forwarded to the
NLTK German stemmer. Stemming is the process of reducing a complex word,
which might be an in ection, to its basic form [19]. This step can be understood
as a simpli cation of the token, or in a more technical sense even a functional
projection. The goal is to project the token space sample in such a way that its
transformation aligns with the SentiWS target space. Thereby, a morphological
alternative is potentially found and might allow for a sentiment look up.
Increasing the number of token morphemes using this method also increased polarity
coverage.
        </p>
        <p>
          If the stemmer was not able to nd a morpheme in SentiWS, syntactical
alternative search is exhausted. Therefore, the actual meaning of the token can
be used to nd synonyms whose sentiment is known. Liebeck [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] found the
introduction of semantic equivalence to be of advantagous value for more
thorough sentiment analysis and referred to synonyom search in synset databses
like GermaNet [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Mohtarami [21] et al. explored and alternatively suggested
the use of vector-based approaches for the same purpose. They observed that
WordNet [20] - and therefore GermaNet as a descendant - perform satisfactory
when semantic synonyms are searched but lack in accuracy when sentimental
equivalence is the primary metric. To accomodate this problem, they introduced
emotional features of words to construct their vectors. The key insight for the
presented work is the necessity of a more general human-like language
intelligence to identify word alternatives beyond pure semantic synonyms. Webber [32]
came to the same conclusion and presented a proof of concept disambiguation
and language analysis system trained on half a million Wikipedia articles instead
of a domain speci c corpus. The results suggested superior context-based
general language processing capabilities. Therefore, a similar approach was chosen.
Instead of synonym search in statically assembled synsets, a Word2Vec model
trained on Wikipedia articles was used to nd alternatives for tokens which
neither themselves nor their stemmed variants could be resolved through SentiWS.
Word2Vec was chosen due to its proven performance [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] regarding this purpose
and because a large (650 million words), pre-trained, general Wikipedia
knowledge model already exists5 for the gensim6 Word2Vec implementation. Training
a similarly large model would not have been possible for the purposes of this
student project.
        </p>
        <p>
          The lookup process was constructed as follows. Gensim is used to retrieve a
vector space embedding for the token, e.g. the word "house". As stated above, this
vector representation shares a topological vicinity with its contextual synonyms,
e.g "building" or "home". The aim is to nd a spatially close synonym which can
be associated with an entry in SentiWS. Starting from the "house" embedding,
its neighbouring entries are probed in order of increasing distance. For
example, "building" may highlight as the closest approximation to "house". The word
"building" is therefore chosen as an alternative candidate. This candidate is then
tested against SentiWS and if a polarity value could be retrieved, the candidate
is selected. In this case, "building" may not have been a valid alternative, is
rejected and the process continues with the next neighbour in increasing distance,
which is "house". The new candidate is tested in the same manner and can be
successfully matched with a sentiment, leading to its selection as a valid
alternative to "house". The algorithm encapsulates the same process a human would
employ in thinking about other phrasings of the same utterance. Figuratively
speaking, the amount of imagination necessary to come up with another phrase
is continually incremented until a suitable rephrasement is found. This can of
course lead to misleading levels of synonymity if done ad in nitum. Therefore,
it was decided to penalize the retrieved sentiment by the distance between the
original token vector and its alternative, similarly to Kim &amp; Shin [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <sec id="sec-2-4-1">
          <title>5 https://github.com/devmount/GermanWordEmbeddings 6 https://radimrehurek.com/gensim/models/word2vec.html</title>
          <p>(1)
(2)
The complete algorithm to resolve the sentiment value for a token vector t can
be de ned as follows:
sentiment(t) =</p>
          <p>max
s2SentiWS\Word2Vec</p>
          <p>SentiWS(s)</p>
          <p>D(s; t)
Note that this algorithm implicitly neutralizes alternatives if they are too broadly
associated synonyms:
The end behaviour of sentiment(t) ensures the absence of polarity distortion in
synonym search.</p>
          <p>All tokens in the data were labeled using this process.
2.5</p>
        </sec>
      </sec>
      <sec id="sec-2-5">
        <title>Sentiment Weighting and Analysis</title>
        <p>
          After the tokens of M were given an emotional weight as outlined in Section 2.4
on a word level, further analysis on a sentence level can proceed. Fang &amp; Zhan
[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] summarize that every word of a sentence has its syntactic role which de nes
how the word is used. These roles, also known as part of speech, have signi cant
impact on the importance of their underlying sentiment for the polarity of the
complete sentence. For example, words like pronouns usually do not contain any
sentiment and are therefore neutral. In contrast, verbs or adjectives can hold
different weights respectively [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Part of speech taggers are used to classify words
according to their syntactic role. The NLTK tagger class has been extended
for the German language and trainend on the TIGER [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] corpus in a di erent
project7, achieving an accuracy of 98% as stated by the authors. This e ective
tagger was chosen for the presented work due to its good performance,
generalization capabilities and fast integration in NLTK. After POS processing, all tokens
were labeled according to the STTS [33] tag system. Nichols &amp; Song [25] have
examined the relationship between scalar sentiment, part of speech and overall
sentence polarity. They empirically compared the in uence of POS strengths
on classi er performance and approximated an optimal solution. Their
exhaustive search for the set P OS = fnoun; verb; adjective; adverbg and the strength
weights str(P OSi) 2 f1; 2; 3; 4; 5g has shown that the best performance for
purposes of sentiment analysis was achieved with the following scalar weight vector:
(str(noun) = 2; str(verb) = 3; str(adv) = 4; str(adj) = 5)
(3)
A mapping from the STTS tags to the categories utilized by Nichols &amp; Song was
introduced to ensure compatibility between the German tagger and their
weighting approach. Afterwards, all tokens were accentuated according to their
syntactical sentence function, resulting in increased sentiment variance and therefore
more expressive overall tweet polarity.
        </p>
        <p>As a last step before the culminating con ation of all individual token
polarities per tweet, negations need to be handled as they signi cantly in uence the
7 https://github.com/ptnplanet/NLTK-Contributions/</p>
        <p>
          tree/master/Classi erBasedGermanTagger
calculated emotion by their valency scopes [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Two primary ways for negation
handling were tested: syntax scope analysis and a heuristic approach. Carrillo
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] et al. proposed that superior performance is achieved if the negation scope
is determined by examining the valence subtree of the negation token based on
part of speech association. After successfully labeling each word with a POS tag,
the grammatical syntax reveals the subtree which is supposed to be negated and
therefore inversely in uential on sentiment. While this approach would grant
realistic language sentiment, it presupposes that the syntax tree is
immaculate. Especially for Twitter, this is rarely the case. Gui [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] et al. found that
the Twitter culture of mutual communication is inherently comprised of
nonstandard orthography and reconstructing an approximately valid syntax tree
requires substantial e ort. Their ndings where con rmable and therefore
opposed the syntactical negation handling as proposed by Carrillo [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] et al. for
practical appliance in the presented project. For this reason, the more widely [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
used heuristics solution was employed. The German tagger was able to reliably
identify the negation token itself (e.g. "nicht") and labeled it accordingly with
the tting STTS tag. This label gave an anchor to which a rule-based negation
heuristic could be expediently attached. Inspired by the syntactical solution, the
heuristic successively searches for the next token with a sentiment that has been
weighted by its tag (see the beginning of this section). The rationale behind
the algorithm is that the sentiment bearing successor feature is assumed to be
the most likely target for negation. Samples suggested that this heuristic rule
performs su ciently in relation to the goals of analysis.
        </p>
        <p>
          After all negations were handled, it was nally possible to propose an estimated
polarity per tweet. Similarly to the work of Kumar &amp; Sebastian [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], sentiment
was calculated by summing the weighted and - if necessary - negated token
polarity scalars. The resulting values were added as new column to M.
2.6
        </p>
      </sec>
      <sec id="sec-2-6">
        <title>Scale Calibration</title>
        <p>Having calculated a value which one might consider "sentiment" is not enough
for actual market analyses due to two reasons:
1. The scale - while argumentative coherent and grounded in the outlined
rationale - can be understood as a valid indicator, it is still arbitrarily de ned.
Its de nition is sound, but given that the scale is supposed to measure
levels of human emotion, it needs validation. Such a test would require human
evaluation, altering the problem to a supervised interpretation.
2. As Section 2 stated, Munoz &amp; Kumar [23] emphasize indicator xation and
anchoring within the metric of a market to achieve brand pro ling. Only then
is empirical comparison to the calibrated indicator, and thereby competitive
brands, feasible.</p>
        <p>
          These two problems seemingly demand further research and evaluation.
Contradistinctively, it is argued that the combination of both allows for a use-case
speci c solution if the fundamental nature of the underlying data is exploited by
utilizing the broad scope of opinions being uttered on Twitter. This
characteristic permits the introduction of the established statistical Central Limit Theorem
(CLT). The CLT is the observation of the convergence behaviour of
probability distributions of an increasing number of one- or multi-dimensional random
variables to a normal distribution [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. For a public opinion surveying purpose as
presented, the CLT leads to a bene cial conclusion: if a su ciently large number
of unrelated, random sample opinions are gathered from a sample population,
the overall sample mean will be normally distributed around the population
mean. Furthermore, if the opinion distribution is limited to the interval [ 1; 1]
by the pre-conceived sentiment constraint and additionally, baseline polarity is
assumed to be neutral, all essential properties of the expected opinion
distribution are therefore known in advance without human intervention for validation.
To exploit this reasoning for the calibration of the proposed polarity estimation
process, a second Twitter dataset GT (for Ground Truth) was collected using
the Tweepy lter
        </p>
        <p>"#2018 OR #2019 OR #december OR #january"
and is processed through the same pipeline as M, leading to a broad, dispersed
set of tweets unrelated to any speci c topic. As these tweets cover a wide range
of independent themes and conversational domains, it can be reasoned that the
global population sentiment characteristics behave according to the CLT. This
theory resembles the missing link between the "arbitrarily" constructed polarity
estimation pipeline and actual market sentiment, resulting in the desired
indicator described by Munoz &amp; Kumar. Establishing the connection mathematically
is possible in a multitude of ways, as long as the link adheres to the following
formalism. Since the global sentiment distribution of the general dataset GT
should at best follow the listed constraints, its histogram (GTpolarity) should
resemble the normal distribution as close as possible. As such, the aim is to nd
a projection of GTpolarity which minimizes the error between the histogram and
the normal distribution. If such a projection is found, it acts as the calibration
metric for the analytics pipeline. Thereupon, the calibrated projection can be
used on the actual logistics dataset to infer class labels in relation to the opinion
of the general population. If the distribution is discretized at the interquartile
ranges (IQR), half of all opinions fall in the central area. These will be considered
"neutral" and make up the overall majority. A quarter of all opinions fall left
of the rst IQR marker and will be considered negatively extreme. Their class
label is "negative". And lastly, the remaining datapoints fall beyond the third
IQR marker and are hence labeled "positive" as they express positively extreme
sentiment. This baseline polarity will be the ground truth reference and M can
be labeled in the same way, using the absolute sentiment boundaries dictated
by GT. Subsequently, sentiment analysis and classi cation are concluded and
evaluation of logistics opinions is nally possible.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Analysis</title>
      <p>For evaluation, the questions put forward in Section 2 were answered using the
constructed pipeline.</p>
      <p>After discretization, the Twitter class label distribution is normally distributed
and zero centered. The IQR markers force the calibration into the CLT
assumption. Therefore, specialized tweet topics can be compared by calculating the
relative distance between the class label tendencies.
3.1
"How do leading German consumer logistics brands rank in</p>
      <sec id="sec-3-1">
        <title>Twitter user approval?"</title>
        <p>For the complete logistics dataset M, the class labels deviate from the
Twitter baseline GT. Neutral sentiment is 11.92% less present in tweets relating
to logistics while positive and negative sentiment are 4.58% and 7.34% above
baseline respectively. This observation of increased variance shows that users
communicate with higher emotional tendencies towards the topic. It can be
concluded that opinions regarding the logistics domain are mostly stated more
vigorously. To reduce selection bias, the logistics brands must therefore be
compared exclusively within their domain. Otherwise, their relative ranking would
be distorted by the overall preconceived notions of opinion. Appendix Figure 1
(top) visualizes this distortion. On rst glance, all three brands perform with
high emotional response, skewed towards negative feedback. This issue is the
result of the distributional relationship to (GTpolarity). Drawing the
conclusion that the three speci c brands perform bad on Twitter is not precise as the
entire domain generally provokes the shown response. Due to this implication,
a more accurate baseline indicator for performance ranking is the sentiment
histogram of the logistics domain. All brands react di erently and more
truthfully to the this metric and better conclusions can be drawn, as presented in
Appendix Figure 1 (middle). Therein, it can be seen that the relative
ranking is now increasingly expressive. DHL performs better than its competitors
within the domain, having less negative class labels and more positive class
labels than average. DPD and HERMES are negatively skewed beyond
average expectation, performing worse than DHL. HERMES exclusively falls behind
in both negative (more than average) and positive (less than average)
opinions.
3.2
"How can Twitter meta information augment user approval
analysis?"
The presented comparison solely relies on single tweet content and thus
individual sentiment. The Twitter API grants access to information beyond pure textual
data. Correlating the meta data to polarity can yield clari ed insight. For
example, one of the de ning functionalities of Twitter is the ability to like and/or share
("retweet") opinions of other users. The implications for sentiment statistics are
pivotal. Nagarajan, Purohit &amp; Sheth [24] observed di erent levels of endorsement
by peer users depending on tweet positivity. Hence, weighting tweet sentiment by
the amount of shared and approved peer a rmation links argumentative
popularity to brand performance indication. Furthermore, if a large enough dataset is
gathered, sentiment can be followed along the chain of retweets, forming an
interesting graph traversal problem. It could be mapped out how positive and negative
sentiment propagate through the Twitter ecosystem and how these
multiplicative patterns di er for brands. For the presented project, the dataset is not large
enough to reconstruct such patterns. Nonetheless, each entry of M does contain
the integer counts of likes and retweets and these values hold analytic value.
Reasoned by the arguments above, the two integers were summed per row and added
as a new column labeled "Propagation". The new value resembles the amount of
persons sharing the view being expressed in the tweet and can be used as a weight
vector for sentiment aggregation. The results of Nagarajan, Purohit &amp; Sheth were
observable afterwards aswell. Our observations indicate that tweets in the
logistics domain are generally shared less often than baseline, but if they are shared,
they are more likely negative in contrast to GT. Due to this nding, the class
distributions change signi cantly if polarity propagation is factored into relational
brand metrics as shown in Appendix Figure 1 (bottom). User approval leads
to considerable ampli cation of disparity. DPD and HERMES shift their
distributions to pronounced neutrality. They both reduce negative sentiment slightly
but also exceedingly decrease in positive sentiment by a large factor. In contrast,
DHL overwhelmingly pro ts from the shared user opinions. Negative and neutral
sentiment labels are shifted to an 11.04% increase in positive sentiment.</p>
        <p>In addition to likes and retweets, the data also contains the timestamp at
which a tweet was written. Especially in the logistics business, time plays an
important role and should therefore be targeted analytically. In Appendix Figure
2, a heatmap of aggregated class labels per day and hour, averaged in mean rows
and columns, is shown. For aggregation, the labels were interpreted as f 1; 0; 1g
for fnegative; neutral; positiveg respectively. In GT, it can be observed that
negative sentiment correlates with business hours. Furthermore, negativity peaks
towards the end of the business week. Intuitively, non-delivery at the beginning
of the weekend may induce disappointment as the customer would have to wait
past the work-free days until the next business day. Such a hypothesis could be
further evaluated if domain knowledge is introduced.</p>
        <p>For the individual brands, di erent observations stand out:
1. DHL: Generally, DHL performs above average on Mondays. Sentiment then
declines with progression of the week. Negativity peaks at the weekend.
2. HERMES: Best performance is expected on Tuesdays. Daily peak negativity
shifts with weekly progression. Customers tend to express negative feedback
incrementally in later hours, peaking at 20:00 Hours. The relation may be
connected to the longer business hours employed by HERMES.
3. DPD: No obvious pattern is present except a sentiment low on Mondays
14:00 Hours. Otherwise, the data correlates to the baseline.</p>
        <p>The underlying tweet texts and their themes where investigated at the
emphasized days and hours but did not reveal any apparent common denominator
explaining their occurrence in addition to there mere temporal correlation. These
time distributions can serve more appropriate analyses for research with added
knowledge of the internal structures of the di erent companies. Then, more
precise assumptions can be made about the cause of the observed patterns.
The di erent client agents were also investigated in relation to sentiment but
did not show any discernible correlation.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>The outlined process has shown how powerful Twitter can be for sentiment
analysis in brand perception polling. Especially when meta data is introduced,
insights far beyond classical polling techniques become apparent. The di
erent logistics brands have shown distinguishable approval performance and the
inclusion of Twitter meta data was bene cial to inquire into these variations.
Acquiring a professional API access may be costly at rst, but the increase in
data volume and quality can add more capabilities, such as instant metrics or
even precise geolocation, a variable directly linkable to geographical key
performance indicators in logistics. As such, the real-time data pipeline could be
integrated into business intelligence and monitoring systems for anomaly
detection and cause-e ect analysis.</p>
      <p>Furthermore, natural language processing methods were employed to
demonstrate their value for modern-day feedback evaluation. In-depth studies into the
many di erent ways to approach language processing problems may highlight
even more fruitful pipeline steps. Building a domain speci c language corpus
should be the rst obstacle to take in that direction. The current lack thereof
discouraged the usage of many techniques like supervised machine learning. The
same holds for the limited size of the tested data set. If more comprehensive
collection was possible, access to other ways of statistical description and model
building opens up. Especially in regard to ner recognition of irony and sarcasm,
further improvement of the pipeline is required. In its current form, sarcasm was
not adequately recognized. Relative brand metric comparison is still valid, as all
brands su ered from this de cit in the same way. Sarcasm seems to be inherently
linked to strongly opinionated utterances, which is a reason why the proposed
work ow did not rely on emoticon recognition as other works suggest for label
inference. Data exploration has shown that emoticons played a predominant role
in sarcasm emphasis. This observation may hold value for further research but
discouraged their importance for the current student project.</p>
      <p>Summarizing, the work supports the assumption that modern social media can
have a vital contribution to fast re nement of a brands marketing mix for
strategic realignment in real-time. This is a bene t to ensure positive reception of
corporate philosophies directly as a reaction to automated analyses of
innovative, digital consumer feedback channels, using contemporary research in natural
language processing.</p>
      <p>Fig. 2. Daily logistics sentiment.
18. Lo er, R., Wittern, H.: Markenwahrnehmung und marken-di erenzierung im
zeitalter des web 2.0. In: Markendi erenzierung, pp. 359{375. Springer (2011)
19. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval,
p. 32. Cambridge University Press, New York, NY, USA (2008)
20. Miller, G.A.: Wordnet: a lexical database for english. Communications of the ACM
38(11), 39{41 (1995)
21. Mohtarami, M., Amiri, H., Lan, M., Tran, T.P., Tan, C.L.: Sense sentiment
similarity: An analysis. In: Proceedings of the Twenty-Sixth AAAI
Conference on Arti cial Intelligence. pp. 1706{1712. AAAI'12, AAAI Press (2012),
http://dl.acm.org/citation.cfm?id=2900929.2900970
22. Mueller, J., Stumme, G.: Predicting rising follower counts on twitter using pro le
information. CoRR abs/1705.03214 (2017), http://arxiv.org/abs/1705.03214
23. Munoz, T., Kumar, S.: Brand metrics: Gauging and linking brands with business
performance. Journal of Brand Management 11(5), 381{387 (2004)
24. Nagarajan, M., Purohit, H., Sheth, A.P.: A qualitative examination of topical tweet
and retweet practices. In: ICWSM (2010)
25. Nicholls, C., Song, F.: Improving sentiment analysis with part-of-speech weighting.</p>
      <p>vol. 3, pp. 1592 { 1597 (08 2009)
26. Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion
mining. In: LREC (2010)
27. Remus, R., Quastho , U., Heyer, G.: Sentiws - a publicly available
germanlanguage resource for sentiment analysis. In: LREC (2010)
28. Schweidel, D.A., Moe, W.W., Boudreaux, C.: Social media intelligence: Measuring
brand sentiment from online conversations (2011)
29. Steinskog, A., Therkelsen, J., Gamback, B.: Twitter topic modeling by tweet
aggregation. In: Proceedings of the 21st Nordic Conference on
Computational Linguistics. pp. 77{86. Association for Computational Linguistics (2017),
http://aclweb.org/anthology/W17-0210
30. Wang, X., Wei, F., Liu, X., Zhou, M., Zhang, M.: Topic sentiment analysis in
twitter: a graph-based hashtag sentiment classi cation approach. In: CIKM (2011)
31. Wang, Y., Liu, J., Qu, J., Huang, Y., Chen, J., Feng, X.: Hashtag graph based
topic model for tweet mining. In: 2014 IEEE International Conference on Data
Mining. pp. 1025{1030 (Dec 2014). https://doi.org/10.1109/ICDM.2014.60
32. Webber, F.D.S.: Semantic folding theory and its application in semantic
ngerprinting. CoRR abs/1511.08855 (2015), http://arxiv.org/abs/1511.08855
33. Westpfahl, S., Schmidt, T., Jonietz, J., Borlinghaus, A.: Stts 2.0. guidelines fuer
die annotation von pos-tags fuer transkripte gesprochener sprache in anlehnung an
das stuttgart tuebingen tagset (stts) (2017)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Carrillo de Albornoz, J.,
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diaz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ballesteros</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Ucm-i: A rule-based syntactic approach for resolving the scope of negation</article-title>
          .
          <source>In: *SEM 2012: The First Joint Conference on Lexical and Computational Semantics { Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval</source>
          <year>2012</year>
          ). pp.
          <volume>282</volume>
          {
          <fpage>287</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2012</year>
          ), http://aclweb.org/anthology/S12- 1037
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loper</surname>
          </string-name>
          , E.:
          <article-title>Nltk: The natural language toolkit</article-title>
          .
          <source>In: Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions. ACLdemo '04</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, Stroudsburg, PA, USA (
          <year>2004</year>
          ). https://doi.org/10.3115/1219044.1219075
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Culotta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cutler</surname>
          </string-name>
          , J.:
          <article-title>Mining brand perceptions from twitter social networks</article-title>
          .
          <source>Marketing Science</source>
          <volume>35</volume>
          (
          <issue>3</issue>
          ),
          <volume>343</volume>
          {
          <fpage>362</fpage>
          (
          <year>2016</year>
          ). https://doi.org/10.1287/mksc.
          <year>2015</year>
          .0968
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dipper</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Kubler, S.: German Treebanks: TIGER and
          <string-name>
            <surname>Tu</surname>
          </string-name>
          Ba-D/Z, pp.
          <volume>595</volume>
          {
          <fpage>639</fpage>
          . Springer Netherlands, Dordrecht (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhan</surname>
          </string-name>
          , J.:
          <article-title>Sentiment analysis using product review data</article-title>
          .
          <source>Journal of Big Data</source>
          <volume>2</volume>
          (
          <issue>1</issue>
          ), 5 (Jun
          <year>2015</year>
          ). https://doi.org/10.1186/s40537-015-0015-2
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Farooq</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mansoor</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nongaillard</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ouzrout</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qadir</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Negation handling in sentiment analysis at sentence level</article-title>
          .
          <source>Journal of Computers</source>
          <volume>12</volume>
          ,
          <issue>470</issue>
          {
          <volume>478</volume>
          (01
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Fischer</surname>
          </string-name>
          , H.:
          <article-title>A history of the central limit theorem: From classical to modern probability theory</article-title>
          . Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.:</given-names>
          </string-name>
          <article-title>word2vec explained: deriving mikolov et al.'s negative-sampling word-embedding method</article-title>
          .
          <source>CoRR abs/1402</source>
          .3722 (
          <year>2014</year>
          ), http://arxiv.org/abs/1402.3722
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Gui</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Part-of-speech tagging for twitter with adversarial neural networks</article-title>
          .
          <source>In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <volume>2411</volume>
          {
          <issue>2420</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hamp</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feldweg</surname>
          </string-name>
          , H.:
          <article-title>Germanet-a lexical-semantic net for german</article-title>
          .
          <source>Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications</source>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hu</surname>
          </string-name>
          , Y.:
          <article-title>E cient and high quality force-directed graph drawing</article-title>
          .
          <source>Mathematica Journal</source>
          <volume>10</volume>
          ,
          <volume>37</volume>
          {
          <volume>71</volume>
          (01
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Jansen</surname>
            ,
            <given-names>B.J.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Sobel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Chowdhury</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Twitter power: Tweets as electronic word of mouth</article-title>
          .
          <source>JASIST 60</source>
          ,
          <issue>2169</issue>
          {
          <fpage>2188</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Jansen</surname>
            ,
            <given-names>B.J.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Sobel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Chowdhury</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Umsatzverteilung im kependkundenmarkt in deutschland nach anbietern im geschaftsjahr 2017/18</article-title>
          . Handelsblatt 134,
          <issue>16</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shin</surname>
          </string-name>
          , H.:
          <article-title>A new approach for measuring sentiment orientation based on multi-dimensional vector space</article-title>
          . CoRR abs/
          <year>1801</year>
          .00254 (
          <year>2018</year>
          ), http://arxiv.org/abs/
          <year>1801</year>
          .00254
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kouloumpis</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , Wilson,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.D.</surname>
          </string-name>
          :
          <article-title>Twitter sentiment analysis: The good the bad and the omg!</article-title>
          <source>Icwsm</source>
          <volume>11</volume>
          (
          <fpage>538</fpage>
          -
          <lpage>541</lpage>
          ),
          <volume>164</volume>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sebastian</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Sentiment analysis on twitter</article-title>
          .
          <source>International Journal of Computer Science Issues</source>
          <volume>9</volume>
          ,
          <issue>372</issue>
          {
          <volume>378</volume>
          (07
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Liebeck</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Aspekte einer automatischen meinungsbildungsanalyse von onlinediskussionen</article-title>
          . In: Ritter,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Henrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Lehner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Thor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Wingerath</surname>
          </string-name>
          , W. (eds.) Datenbanksysteme fur Business,
          <source>Technologie und Web (BTW</source>
          <year>2015</year>
          )
          <article-title>- Workshopband</article-title>
          . pp.
          <volume>203</volume>
          {
          <fpage>212</fpage>
          . Gesellschaft fur Informatik e.V.,
          <string-name>
            <surname>Bonn</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>