<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Evaluation of Employee Satisfaction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marco Piersanti</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giulia Brandetti</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Data Modeling</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Analysis - Enel Italia S.R.L. Rome</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>name}.</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>surname}@enel.com</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>English. Human Resources are one of the most important assets in modern organizations. Their capability of facing employees' needs is critical in order to have an effective and efficient company, where people are the center of all business processes. This work is focused on developing new techniques that, leveraging a data driven approach, can help Human Resources to find a more precise employee satisfaction categorization, to easily identify possible issues and to act in a proactive fashion.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Italiano. Le Risorse Umane sono una
delle funzioni piu´ importanti nelle aziende
moderne. La loro capacita` di affrontare
le necessita` dei dipendenti e`
fondamentale per avere un’azienda efficiente, dove
le persone sono al centro di tutti i processi
di business. Il presente lavoro e`
focalizzato sullo sviluppo di nuove tecniche che,
facendo leva su un approccio data driven,
possano aiutare le Risorse Umane a dare
una categorizzazione della soddisfazione
dei dipendenti piu´ precisa, ad identificare
piu´ facilmente possibili problemi condivisi
e ad agire in maniera proattiva.</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction</title>
      <p>Every modern organization has a dedicated
function which takes care of its employees, commonly
called Human Resources (HR). HR duties are
related to the capability of creating value through
people, ensuring that everyone can express his
own potential and has a productive and
comfortable office environment.</p>
      <p>Nowadays, HR can rely on data to create a new
paradigm based on a data driven approach, where
analysts can leverage data in order to get more
complete, detailed and data-supported decisions.</p>
      <p>
        Being able to monitor employees’ engagement
and satisfaction is critical in order to maintain a
positive and constructive office environment. The
benefit for the company is in the capability of
retaining the best employees and keeping the overall
workforce strong and motivated. Furthermore,
recent surveys
        <xref ref-type="bibr" rid="ref14">(Globoforce, 2015)</xref>
        show the issues
that companies are facing when they try to do
retention or improve engagement.
      </p>
      <p>This paper is organized as follows. Section 2
presents a literature review on both themes of HR
Management and text mining, Section 3
summarizes the motivations that drove the present study,
Sections 4 and 5 discuss data and methodology,
respectively, and Section 6 presents the results.
Finally, Section 7 discusses the implications of the
findings and further possible developments.
2</p>
    </sec>
    <sec id="sec-3">
      <title>Related Works</title>
      <p>Despite the great interest that is arising around the
application of Data Science methods and Natural
Language Processing (NLP) to HR problems, very
few studies exist on the topic.</p>
      <p>The entire field of corporate HR Management
has been revolutionized by the pioneering work
done by People Operations at Google (well
described in Bock (2015)), that first put a spotlight
on the benefits of having a more scientific and
rigorous approach to these areas which have been
traditionally more reluctant to adopt change.</p>
      <p>
        Employee satisfaction has been linked to
longrun stock returns
        <xref ref-type="bibr" rid="ref13">(Edmans, 2011)</xref>
        , consistently
with human relations theories which argue that
employee satisfaction brings a stronger
corporate performance through improved recruitment,
retention, and motivation. Furthermore, Moniz
and Jong (2014) followed an interesting approach
to link employee satisfaction and firm earnings,
based on sentiment analysis of employees’
reviews from the career community website www.
glassdoor.com.
      </p>
      <p>
        Text clustering, and more generally text
classification, is a well established topic in the NLP
research area
        <xref ref-type="bibr" rid="ref1 ref15 ref16 ref26">(Sebastiani, 2002; Aggarwal and
Zhai, 2012; Kadhim et al., 2014)</xref>
        . The automated
categorization of texts, although dating back to
the early ’60s
        <xref ref-type="bibr" rid="ref5 ref9">(Maron, 1961; Borko and Bernick,
1963)</xref>
        , went through a booming interest in the last
twenty years, due to the explosion of the amount
of documents available in digital form and the
impelling need to organize them. Nowadays text
classification is used in many applications,
ranging from automatic document indexing and
automated metadata generation, to document filtering
(e.g., spam filters
        <xref ref-type="bibr" rid="ref11">(Drucker et al., 1999)</xref>
        ), word
sense disambiguation
        <xref ref-type="bibr" rid="ref20">(Navigli, 2009)</xref>
        , population
of hierarchical catalogs of Web resources
        <xref ref-type="bibr" rid="ref10 ref12">(Dumais
and Chen, 2000)</xref>
        , and in general any application
requiring document understanding.
      </p>
      <p>
        Flourished in the last decade, sentiment
analysis aims to classify the polarity of a given text –
whether the expressed opinion in a document or
a sentence is positive, negative, or neutral
        <xref ref-type="bibr" rid="ref16 ref17 ref21 ref22 ref3">(Pang
et al., 2002; Pang and Lee, 2008; Baccianella et
al., 2010; Liu, 2012)</xref>
        . The growing interest on the
subject reflects on the success of the tasks of
sentiment analysis on Twitter data at SemEval since
2013
        <xref ref-type="bibr" rid="ref19 ref23 ref24">(Rosenthal et al., 2014; Rosenthal et al.,
2015; Nakov, 2016)</xref>
        . Even if the driving language
for most of those techniques is English, we started
to see an increasing trend also in Italy
        <xref ref-type="bibr" rid="ref4 ref6 ref7">(Basile and
Nissim, 2013; Basile et al., 2014; Basile et al.,
2015)</xref>
        , confirming the great interest of the Italian
NLP community in sentiment analysis techniques.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>Task Description</title>
      <p>Enel HR Business Partners’ (HR-BPs) job consists
in monitoring employees’ well-being, acting when
necessary to solve issues. In doing so, they
periodically interview employees and register
information about their satisfaction, motivation, work-life
balance and other personal issues in textual notes.</p>
      <p>Currently, employees are manually classified by
HR-BPs in three main categories: Demotivated,
Neutral and Motivated. Unfortunately, employee
motivation is not a very reliable indicator of
employee well-being, since it may mask an
underlying dissatisfaction, or more generally the
presence of issues that HR department should act on.
Indeed, one can face several problems in the
office everyday life but still be motivated. We
therefore chose to consider the sentiment, as it shows
through interviews, as a proxy of employee
satisfaction.</p>
      <p>With the present study, we aim to categorize
employee satisfaction in a more detailed and
automatic way, identifying common trends among
employees and clustering them into groups that share
similar problems. The goal is to help HR-BPs in
having an overall view of their resources’ mood
and make effective adjustments in critical
situations. It will also help in such situations when
new HR-BPs take over a group of already
interviewed resources, allowing them to have a clearer
understanding of the employees and their
criticalities without having to read all interviews.</p>
      <p>For all the aforementioned reasons, we
performed a classification of the interviews based on
their sentiment (Section 5.1) prior to send them
into the text clustering algorithm (Section 5.2). In
the present study, we chose to focus only on
negative moods, since they include the biggest issues
HR should monitor. Nevertheless, the practical
usage of this system involves the whole set of
sentiment classes, since HR is interested in monitoring
the entire workforce well-being and in following
its evolution over time.</p>
      <p>In choosing methods, we had to tackle the
challenge to balance the scientific rigor and the need of
ease of interpretation and communication to all
actors involved in the process. We therefore chose to
use well understood and controllable techniques,
like sentiment analysis and k-means clustering.
4
4.1</p>
    </sec>
    <sec id="sec-5">
      <title>Experiments and Data</title>
      <sec id="sec-5-1">
        <title>Data Description</title>
        <p>HR System Integration provided interviews data,
a file containing 53k textual notes in more than 5
languages taken by HR-BPs during interviews.
Interviews spanned approximately 1 year, from June
2015 to July 2016, and they were performed by
142 different HR-BPs.</p>
        <p>For the present study, we focused only on
Italian interviews (25k interviews) and selected a
single interview for each employee (23k interviews),
since in the few cases of repeated interviews texts
were not relevant (e.g., “See previous interview”).</p>
        <p>Notes shorter than 5 words (the 5th percentile
of the distribution of the number of words in each
note) were considered irrelevant. As a result, in
the present study we considered a dataset of 22k
4.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Data Preprocessing</title>
        <p>
          Data preparation includes removing punctuation,
numbers and stop words (we removed 300
common Italian stop words, including some
peculiar words that are not relevant in this context,
like “Enel”, “colloquio”, etc.), changing letters
to lower case and lemmatization
          <xref ref-type="bibr" rid="ref25">(Schmid, 1994)</xref>
          .
We assumed all unrecognized words to be
typos, and we corrected them by using a
dictionary composed by 110k Italian words and 650
English words commonly used in business
dailylife1. In order to have an effective correction,
we used Optimal String Alignment distance
          <xref ref-type="bibr" rid="ref10 ref12">(Brill
and Moore, 2000)</xref>
          (OSA distance), an extension of
Levenshtein distance that, together with insertion,
deletion and substitution, includes transpositions
among its allowable operations.
5
5.1
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Model Description</title>
      <sec id="sec-6-1">
        <title>Sentiment Analysis</title>
        <p>We performed sentiment classification of texts by
customizing and improving a publicly available
lexicon2. In total, we used 3428 Italian labeled
unigrams and 10451 bigrams, categorized as positive
(4736), neutral (4367) or negative (4776) based on
their polarity.</p>
        <p>The sentiment classification model proposed in
this paper is based on a score 'sent that weights
differently unigrams and bigrams with a factor :
'sent = (1
) 'uni +
'bi
where 0 1, 'uni is the difference between
the number of positive and negative unigrams,
normalized by the number of words in the text and 'bi
is the difference between the number of positive
and negative bigrams, normalized by the number
of bigrams in the text. Final sentiment was then
calculated according to the formula
8+1 if 'sent &gt;
&gt;
Sent = &lt; 1 if 'sent &lt;
&gt;:0</p>
        <p>otherwise.</p>
        <p>Model calibration (i.e. the choice of parameters
and ) was performed by comparing model
re1https://github.com/napolux/
paroleitaliane</p>
        <p>2https://github.com/opener-project/
public-sentiment-lexicons
sults with the ones produced by manually
annotating a subset of 200 (randomly chosen) texts
(training set): two judges classified texts independently
and a third one solved the cases where there wasn’t
agreement. Agreement between the two
independent judges was measured by calculating Cohen’s
Kappa ( = 0:6).</p>
        <p>We chose = 0:7 and = 0:0004 so that
accuracy, recall and precision of the sentiment model
were maximized. Although we may have chosen
to optimize parameters in order to maximize
negative texts recognition, we chose to consider the
overall accuracy on the three classes, because from
a business perspective it is more valuable to
monitor the entire workforce satisfaction and to follow
its evolution over time. While for we tried
manually different settings, weighting more bigrams
than unigrams, for we used the ROC curve and
the area under it, picking the one with maximal
sum of true-positive and false-negative values.
5.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Text Clustering</title>
        <p>For notes’ clustering, we focused only on those
classified as negative from the sentiment model
(Section 5.1).</p>
        <p>
          Since we didn’t have a target variable to
model (unsupervised classification), we chose to
adopt the k-means clustering algorithm, using
kmeans++ technique to seed the initial cluster
centers
          <xref ref-type="bibr" rid="ref2">(Arthur and Vassilvitskii, 2007)</xref>
          .
        </p>
        <p>The clustering model was applied on the
TFIDF matrix, built with bigrams appearing in at
least 2 documents. In this way, we reduced our
dimensionality from the initial 37k bigrams to 5k.
To calculate proximity among documents, we used
cosine similarity.</p>
        <p>Additionally, Silhouette distance has been
chosen to select the best number of clusters:
different models were computed by varying the number
of clusters between 2 and 30 and the respective
Silhouette scores were compared, fixing the
number of clusters at 12 (corresponding to the highest
score).
6</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Results</title>
      <p>The application of this sentiment model (Section
5.1) classified interviews in 3655 negatives, 956
neutrals and 17297 positives. As we can see in
Table 1, sentiment classification is more clearly
related to employee satisfaction than motivation
classes provided by HR-BPs, although they
someText (after preprocessing)
risorsa brillante neodirigente clima positivo ansioso molto positivo
(brilliant resource new executive positive mood anxious very positive)
assenteista risorsa molto critico non riuscire nulla
(absentee very critical resource don’t succeed in anything)
non valorizzare poco riconoscimento non potere rimanere
(don’t valorize inadequate recognition can’t stay)
molto scontento non credere azienda reale meritocrazia interessare piano esodo
(very unhappy don’t believe company real meritocracy interest retirement plan)
stabile routinario non proattivo scarso impegno
(stable routine not proactive scarce effort)
assumere direttamente assistente seguire particolare sicurezza vedere capo
(hire directly assistant follow particular safety see boss)
HR-BP Motivation</p>
      <p>Sentiment
Motivated
Demotivated
Motivated
Motivated
Neutral
Neutral
+1
-1
-1
-1
-1
0
times are aligned.</p>
      <p>A different subset of 200 manually labeled texts
(test set), labeled with the same methodology as
described in Section 5.1, was used for
evaluating model performance. Accuracy and recall were
both 64%, while precision was 70%. For more
details about the sentiment classification
performance, see confusion matrix in Table 2.</p>
      <p>The clustering algorithm was applied only on
the 2392 negative interviews and it identified 8
clusters that we were able to precisely label, while
for the remaining 4 clusters labeling was
unfeasible (see Table 3). Labels were applied by manually
looking at the most frequent bigrams within
clusters, trying to identify common significant topics.</p>
      <p>The most frequent identified issues preventing
employee satisfaction were health problems, the
will to change activity, compensation and the high
workload. The most frequent bigrams for clusters
0–3 were not specific enough to lead to a precise
labeling, since they refer to work activity and job
in general and they don’t focus on clear issues.</p>
      <p>
        In Figure 1, we represented clustering results
by means of t-SNE, a popular method for
exploring high-dimensional data
        <xref ref-type="bibr" rid="ref17 ref21">(Maaten and
Hinton, 2008)</xref>
        . By this mean, we reduced the
highdimensionality space of bigrams to an artificial
two-dimensional space (since dimensions here
don’t have a real meaning, we excluded them from
the plot). For the sake of clarity, we chose not to
show unlabeled clusters; the resulting plot shows
that clusters are well separated and on average
quite dense.
The proposed approach could be a powerful tool
for HR-BPs to better understand the main issues
related to the lack of employees’ satisfaction.
Furthermore, it could help HR analysts to quickly
decide which are the best actions to solve those
issues, analyzing whether a complaint is isolated or
shared by a group, whether it’s trivial or urgent and
act accordingly. As an example, HR Departments
could test different actions over a group of
unsatisfied employees, in order to understand which one
is the most effective for a given issue.
      </p>
      <p>The very same model could also be used on
neutral and positive subjects, so that HR could check
whether the quality of life at work of these
employees could be somehow improved, and
understand which are the essential key factors for the
employees’ well-being.</p>
      <p>From a technical point of view, one possible
improvement in order to strengthen the solidity of the
present approach could be to manually annotate a
subset of (anonymized) texts, developing a gold
standard of HR interview clusters, to be used as
a test set for techniques like the one presented in
this study. This gold standard may be made
available company-wise, in order to encourage
collaboration and to foster the creation of a data science
community, to help bring a data driven way of
thinking even to those areas which have been
traditionally more reluctant to adopt a rigorous digital
transformation.</p>
      <p>This is a first step to improve how HR
Departments operate nowadays. We strongly believe that
the introduction of a data driven approach can
support critical HR decisional processes and improve
companies’ productivity, without having to
sacrifice each individual’s quality of life.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>This research was supported by Enel. We thank
our colleagues from HR System Integration dep.
who provided the data analyzed in this study.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Charu C.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          and
          <string-name>
            <given-names>ChengXiang</given-names>
            <surname>Zhai</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>A survey of text clustering algorithms</article-title>
          .
          <source>In Mining text data</source>
          , pages
          <fpage>77</fpage>
          -
          <lpage>128</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>David</given-names>
            <surname>Arthur</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sergei</given-names>
            <surname>Vassilvitskii</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>kmeans++: The advantages of careful seeding</article-title>
          .
          <source>In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms</source>
          , pages
          <fpage>1027</fpage>
          -
          <lpage>1035</lpage>
          . Society for Industrial and Applied Mathematics.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Stefano</given-names>
            <surname>Baccianella</surname>
          </string-name>
          , Andrea Esuli, and
          <string-name>
            <given-names>Fabrizio</given-names>
            <surname>Sebastiani</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining</article-title>
          .
          <source>In LREC</source>
          , volume
          <volume>10</volume>
          , pages
          <fpage>2200</fpage>
          -
          <lpage>2204</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          and
          <string-name>
            <given-names>Malvina</given-names>
            <surname>Nissim</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Sentiment analysis on italian tweets</article-title>
          .
          <source>In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis</source>
          , pages
          <fpage>100</fpage>
          -
          <lpage>107</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Melvin</given-names>
            <surname>Earl Maron</surname>
          </string-name>
          .
          <year>1961</year>
          .
          <article-title>Automatic indexing: an experimental inquiry</article-title>
          .
          <source>Journal of the ACM (JACM)</source>
          ,
          <volume>8</volume>
          (
          <issue>3</issue>
          ):
          <fpage>404</fpage>
          -
          <lpage>417</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Andrea Bolioli, Malvina Nissim, Viviana Patti, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Overview of the evalita 2014 sentiment polarity classification task</article-title>
          .
          <source>In Proceedings of the 4th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA'14).</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          , Valerio Basile, Malvina Nissim, and
          <string-name>
            <given-names>Nicole</given-names>
            <surname>Novielli</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Deep tweets: from entity linking to sentiment analysis</article-title>
          .
          <source>In Proceedings of the Italian Computational Linguistics</source>
          Conference (CLiC-it
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Laszlo</given-names>
            <surname>Bock</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Work rules!: Insights from inside Google that will transform how you live and lead</article-title>
          .
          <source>Hachette UK.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Harold</given-names>
            <surname>Borko</surname>
          </string-name>
          and
          <string-name>
            <given-names>Myrna</given-names>
            <surname>Bernick</surname>
          </string-name>
          .
          <year>1963</year>
          .
          <article-title>Automatic document classification</article-title>
          .
          <source>Journal of the ACM (JACM)</source>
          ,
          <volume>10</volume>
          (
          <issue>2</issue>
          ):
          <fpage>151</fpage>
          -
          <lpage>162</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Eric</given-names>
            <surname>Brill</surname>
          </string-name>
          and
          <string-name>
            <given-names>Robert C.</given-names>
            <surname>Moore</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>An improved error model for noisy channel spelling correction</article-title>
          .
          <source>In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, ACL '00</source>
          , pages
          <fpage>286</fpage>
          -
          <lpage>293</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Harris</given-names>
            <surname>Drucker</surname>
          </string-name>
          ,
          <string-name>
            <surname>Donghui Wu</surname>
          </string-name>
          , and
          <string-name>
            <surname>Vladimir</surname>
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Vapnik</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>Support vector machines for spam categorization</article-title>
          .
          <source>IEEE Transactions on Neural networks</source>
          ,
          <volume>10</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1048</fpage>
          -
          <lpage>1054</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Susan</given-names>
            <surname>Dumais</surname>
          </string-name>
          and
          <string-name>
            <given-names>Hao</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Hierarchical classification of web content</article-title>
          .
          <source>In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <fpage>256</fpage>
          -
          <lpage>263</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Alex</given-names>
            <surname>Edmans</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Does the stock market fully value intangibles? employee satisfaction and equity prices</article-title>
          .
          <source>Journal of Financial Economics</source>
          ,
          <volume>101</volume>
          (
          <issue>3</issue>
          ):
          <fpage>621</fpage>
          -
          <lpage>640</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Globoforce</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>2015 employee recognition report - culture as a competitive differentiator</article-title>
          .
          <source>Technical report.</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Ammar</given-names>
            <surname>Ismael</surname>
          </string-name>
          <string-name>
            <given-names>Kadhim</given-names>
            ,
            <surname>Yu-N Cheah</surname>
          </string-name>
          , and Nurul Hashimah Ahamed.
          <year>2014</year>
          .
          <article-title>Text document preprocessing and dimension reduction techniques for text document clustering</article-title>
          .
          <source>In 2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology</source>
          , pages
          <fpage>69</fpage>
          -
          <lpage>73</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Bing</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Sentiment analysis and opinion mining</article-title>
          .
          <source>Synthesis lectures on human language technologies</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>167</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Laurens van der Maaten and Geoffrey</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Visualizing data using t-sne</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>9</volume>
          (Nov):
          <fpage>2579</fpage>
          -
          <lpage>2605</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Andy</given-names>
            <surname>Moniz</surname>
          </string-name>
          and Franciska Jong.
          <year>2014</year>
          .
          <article-title>Sentiment analysis and the impact of employee satisfaction on firm earnings</article-title>
          .
          <source>In Proceedings of the 36th European Conference on IR Research on Advances in Information Retrieval -</source>
          Volume
          <volume>8416</volume>
          ,
          <string-name>
            <surname>ECIR</surname>
          </string-name>
          <year>2014</year>
          , pages
          <fpage>519</fpage>
          -
          <lpage>527</lpage>
          , New York, NY, USA. SpringerVerlag New York, Inc.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Preslav</given-names>
            <surname>Nakov</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Sentiment analysis in twitter: A semeval perspective</article-title>
          .
          <source>In Proceedings of NAACLHLT</source>
          , pages
          <fpage>171</fpage>
          -
          <lpage>172</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Navigli</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Word sense disambiguation: A survey</article-title>
          .
          <source>ACM Computing Surveys (CSUR)</source>
          ,
          <volume>41</volume>
          (
          <issue>2</issue>
          ):
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Bo</given-names>
            <surname>Pang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lillian</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Opinion mining and sentiment analysis</article-title>
          .
          <source>Foundations and Trends in Information Retrieval</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          -2):
          <fpage>1</fpage>
          -
          <lpage>135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Bo</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Lillian</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Shivakumar</given-names>
            <surname>Vaithyanathan</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Thumbs up?: sentiment classification using machine learning techniques</article-title>
          .
          <source>In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-</source>
          Volume
          <volume>10</volume>
          , pages
          <fpage>79</fpage>
          -
          <lpage>86</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Sara</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          , Alan Ritter, Preslav Nakov, and
          <string-name>
            <given-names>Veselin</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Semeval-2014 task 9: Sentiment analysis in twitter</article-title>
          .
          <source>In Proceedings of the 8th international workshop on semantic evaluation (SemEval</source>
          <year>2014</year>
          ), pages
          <fpage>73</fpage>
          -
          <lpage>80</lpage>
          . Dublin, Ireland.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Sara</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          , Preslav Nakov, Svetlana Kiritchenko,
          <string-name>
            <surname>Saif M. Mohammad</surname>
            , Alan Ritter, and
            <given-names>Veselin</given-names>
          </string-name>
          <string-name>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <year>2015</year>
          . Semeval-2015 task 10:
          <article-title>Sentiment analysis in twitter</article-title>
          .
          <source>In Proceedings of the 9th international workshop on semantic evaluation (SemEval</source>
          <year>2015</year>
          ), pages
          <fpage>451</fpage>
          -
          <lpage>463</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>Helmut</given-names>
            <surname>Schmid</surname>
          </string-name>
          .
          <year>1994</year>
          .
          <article-title>Probabilistic part-of-speech tagging using decision trees</article-title>
          .
          <source>In Proceedings of International Conference on New Methods in Language Processing</source>
          , pages
          <fpage>154</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>Fabrizio</given-names>
            <surname>Sebastiani</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Machine learning in automated text categorization</article-title>
          .
          <source>ACM computing surveys (CSUR)</source>
          ,
          <volume>34</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>