Automatic Evaluation of Employee Satisfaction

                    Marco Piersanti           Giulia Brandetti      Pierluigi Failla
                           Data Modeling and Analysis – Enel Italia S.R.L.
                                           Rome, Italy
                                {name}.{surname}@enel.com


                     Abstract                         analysts can leverage data in order to get more
                                                      complete, detailed and data-supported decisions.
    English. Human Resources are one of the              Being able to monitor employees’ engagement
    most important assets in modern organiza-         and satisfaction is critical in order to maintain a
    tions. Their capability of facing employ-         positive and constructive office environment. The
    ees’ needs is critical in order to have an ef-    benefit for the company is in the capability of re-
    fective and efficient company, where peo-         taining the best employees and keeping the overall
    ple are the center of all business processes.     workforce strong and motivated. Furthermore, re-
    This work is focused on developing new            cent surveys (Globoforce, 2015) show the issues
    techniques that, leveraging a data driven         that companies are facing when they try to do re-
    approach, can help Human Resources to             tention or improve engagement.
    find a more precise employee satisfaction            This paper is organized as follows. Section 2
    categorization, to easily identify possible       presents a literature review on both themes of HR
    issues and to act in a proactive fashion.         Management and text mining, Section 3 summa-
                                                      rizes the motivations that drove the present study,
    Italiano. Le Risorse Umane sono una               Sections 4 and 5 discuss data and methodology, re-
    delle funzioni piú importanti nelle aziende      spectively, and Section 6 presents the results. Fi-
    moderne. La loro capacità di affrontare          nally, Section 7 discusses the implications of the
    le necessità dei dipendenti è fondamen-         findings and further possible developments.
    tale per avere un’azienda efficiente, dove
    le persone sono al centro di tutti i processi     2   Related Works
    di business. Il presente lavoro è focaliz-
    zato sullo sviluppo di nuove tecniche che,        Despite the great interest that is arising around the
    facendo leva su un approccio data driven,         application of Data Science methods and Natural
    possano aiutare le Risorse Umane a dare           Language Processing (NLP) to HR problems, very
    una categorizzazione della soddisfazione          few studies exist on the topic.
    dei dipendenti piú precisa, ad identificare         The entire field of corporate HR Management
    piú facilmente possibili problemi condivisi      has been revolutionized by the pioneering work
    e ad agire in maniera proattiva.                  done by People Operations at Google (well de-
                                                      scribed in Bock (2015)), that first put a spotlight
                                                      on the benefits of having a more scientific and rig-
1   Introduction                                      orous approach to these areas which have been tra-
                                                      ditionally more reluctant to adopt change.
Every modern organization has a dedicated func-          Employee satisfaction has been linked to long-
tion which takes care of its employees, commonly      run stock returns (Edmans, 2011), consistently
called Human Resources (HR). HR duties are re-        with human relations theories which argue that
lated to the capability of creating value through     employee satisfaction brings a stronger corpo-
people, ensuring that everyone can express his        rate performance through improved recruitment,
own potential and has a productive and comfort-       retention, and motivation. Furthermore, Moniz
able office environment.                              and Jong (2014) followed an interesting approach
   Nowadays, HR can rely on data to create a new      to link employee satisfaction and firm earnings,
paradigm based on a data driven approach, where       based on sentiment analysis of employees’ re-
views from the career community website www.           fice everyday life but still be motivated. We there-
glassdoor.com.                                         fore chose to consider the sentiment, as it shows
   Text clustering, and more generally text clas-      through interviews, as a proxy of employee satis-
sification, is a well established topic in the NLP     faction.
research area (Sebastiani, 2002; Aggarwal and             With the present study, we aim to categorize
Zhai, 2012; Kadhim et al., 2014). The automated        employee satisfaction in a more detailed and auto-
categorization of texts, although dating back to       matic way, identifying common trends among em-
the early ’60s (Maron, 1961; Borko and Bernick,        ployees and clustering them into groups that share
1963), went through a booming interest in the last     similar problems. The goal is to help HR-BPs in
twenty years, due to the explosion of the amount       having an overall view of their resources’ mood
of documents available in digital form and the im-     and make effective adjustments in critical situa-
pelling need to organize them. Nowadays text           tions. It will also help in such situations when
classification is used in many applications, rang-     new HR-BPs take over a group of already inter-
ing from automatic document indexing and auto-         viewed resources, allowing them to have a clearer
mated metadata generation, to document filtering       understanding of the employees and their critical-
(e.g., spam filters (Drucker et al., 1999)), word      ities without having to read all interviews.
sense disambiguation (Navigli, 2009), population          For all the aforementioned reasons, we per-
of hierarchical catalogs of Web resources (Dumais      formed a classification of the interviews based on
and Chen, 2000), and in general any application        their sentiment (Section 5.1) prior to send them
requiring document understanding.                      into the text clustering algorithm (Section 5.2). In
   Flourished in the last decade, sentiment analy-     the present study, we chose to focus only on neg-
sis aims to classify the polarity of a given text –    ative moods, since they include the biggest issues
whether the expressed opinion in a document or         HR should monitor. Nevertheless, the practical us-
a sentence is positive, negative, or neutral (Pang     age of this system involves the whole set of senti-
et al., 2002; Pang and Lee, 2008; Baccianella et       ment classes, since HR is interested in monitoring
al., 2010; Liu, 2012). The growing interest on the     the entire workforce well-being and in following
subject reflects on the success of the tasks of sen-   its evolution over time.
timent analysis on Twitter data at SemEval since          In choosing methods, we had to tackle the chal-
2013 (Rosenthal et al., 2014; Rosenthal et al.,        lenge to balance the scientific rigor and the need of
2015; Nakov, 2016). Even if the driving language       ease of interpretation and communication to all ac-
for most of those techniques is English, we started    tors involved in the process. We therefore chose to
to see an increasing trend also in Italy (Basile and   use well understood and controllable techniques,
Nissim, 2013; Basile et al., 2014; Basile et al.,      like sentiment analysis and k-means clustering.
2015), confirming the great interest of the Italian
NLP community in sentiment analysis techniques.        4     Experiments and Data
                                                       4.1    Data Description
3   Task Description
                                                       HR System Integration provided interviews data,
Enel HR Business Partners’ (HR-BPs) job consists       a file containing 53k textual notes in more than 5
in monitoring employees’ well-being, acting when       languages taken by HR-BPs during interviews. In-
necessary to solve issues. In doing so, they period-   terviews spanned approximately 1 year, from June
ically interview employees and register informa-       2015 to July 2016, and they were performed by
tion about their satisfaction, motivation, work-life   142 different HR-BPs.
balance and other personal issues in textual notes.       For the present study, we focused only on Ital-
   Currently, employees are manually classified by     ian interviews (25k interviews) and selected a sin-
HR-BPs in three main categories: Demotivated,          gle interview for each employee (23k interviews),
Neutral and Motivated. Unfortunately, employee         since in the few cases of repeated interviews texts
motivation is not a very reliable indicator of em-     were not relevant (e.g., “See previous interview”).
ployee well-being, since it may mask an under-            Notes shorter than 5 words (the 5th percentile
lying dissatisfaction, or more generally the pres-     of the distribution of the number of words in each
ence of issues that HR department should act on.       note) were considered irrelevant. As a result, in
Indeed, one can face several problems in the of-       the present study we considered a dataset of 22k
interviews.                                             sults with the ones produced by manually annotat-
                                                        ing a subset of 200 (randomly chosen) texts (train-
4.2    Data Preprocessing                               ing set): two judges classified texts independently
Data preparation includes removing punctuation,         and a third one solved the cases where there wasn’t
numbers and stop words (we removed 300 com-             agreement. Agreement between the two indepen-
mon Italian stop words, including some pecu-            dent judges was measured by calculating Cohen’s
liar words that are not relevant in this context,       Kappa (κ = 0.6).
like “Enel”, “colloquio”, etc.), changing letters          We chose α = 0.7 and θ = 0.0004 so that accu-
to lower case and lemmatization (Schmid, 1994).         racy, recall and precision of the sentiment model
We assumed all unrecognized words to be ty-             were maximized. Although we may have chosen
pos, and we corrected them by using a dictio-           to optimize parameters in order to maximize neg-
nary composed by 110k Italian words and 650             ative texts recognition, we chose to consider the
English words commonly used in business daily-          overall accuracy on the three classes, because from
life1 . In order to have an effective correction,       a business perspective it is more valuable to mon-
we used Optimal String Alignment distance (Brill        itor the entire workforce satisfaction and to follow
and Moore, 2000) (OSA distance), an extension of        its evolution over time. While for α we tried man-
Levenshtein distance that, together with insertion,     ually different settings, weighting more bigrams
deletion and substitution, includes transpositions      than unigrams, for θ we used the ROC curve and
among its allowable operations.                         the area under it, picking the one with maximal
                                                        sum of true-positive and false-negative values.
5     Model Description
5.1    Sentiment Analysis                               5.2    Text Clustering

We performed sentiment classification of texts by       For notes’ clustering, we focused only on those
customizing and improving a publicly available          classified as negative from the sentiment model
lexicon2 . In total, we used 3428 Italian labeled un-   (Section 5.1).
igrams and 10451 bigrams, categorized as positive          Since we didn’t have a target variable to
(4736), neutral (4367) or negative (4776) based on      model (unsupervised classification), we chose to
their polarity.                                         adopt the k-means clustering algorithm, using k-
   The sentiment classification model proposed in       means++ technique to seed the initial cluster cen-
this paper is based on a score ϕsent that weights       ters (Arthur and Vassilvitskii, 2007).
differently unigrams and bigrams with a factor α:          The clustering model was applied on the TF-
                                                        IDF matrix, built with bigrams appearing in at
         ϕsent = (1 − α) · ϕuni + α · ϕbi               least 2 documents. In this way, we reduced our
                                                        dimensionality from the initial 37k bigrams to 5k.
where 0 ≤ α ≤ 1, ϕuni is the difference between         To calculate proximity among documents, we used
the number of positive and negative unigrams, nor-      cosine similarity.
malized by the number of words in the text and ϕbi         Additionally, Silhouette distance has been cho-
is the difference between the number of positive        sen to select the best number of clusters: differ-
and negative bigrams, normalized by the number          ent models were computed by varying the number
of bigrams in the text. Final sentiment was then        of clusters between 2 and 30 and the respective
calculated according to the formula                     Silhouette scores were compared, fixing the num-
                                                       ber of clusters at 12 (corresponding to the highest
                  +1 if ϕsent > θ
                                                       score).
           Sent = −1 if ϕsent < −θ
                                                        6     Results
                  
                    0     otherwise.
                  

  Model calibration (i.e. the choice of parameters      The application of this sentiment model (Section
α and θ) was performed by comparing model re-           5.1) classified interviews in 3655 negatives, 956
                                                        neutrals and 17297 positives. As we can see in
  1
    https://github.com/napolux/                         Table 1, sentiment classification is more clearly
paroleitaliane
  2
    https://github.com/opener-project/                  related to employee satisfaction than motivation
public-sentiment-lexicons                               classes provided by HR-BPs, although they some-
 Text (after preprocessing)                                                         HR-BP Motivation   Sentiment
 risorsa brillante neodirigente clima positivo ansioso molto positivo               Motivated          +1
 (brilliant resource new executive positive mood anxious very positive)
 assenteista risorsa molto critico non riuscire nulla                               Demotivated        -1
 (absentee very critical resource don’t succeed in anything)
 non valorizzare poco riconoscimento non potere rimanere                            Motivated          -1
 (don’t valorize inadequate recognition can’t stay)
 molto scontento non credere azienda reale meritocrazia interessare piano esodo     Motivated          -1
 (very unhappy don’t believe company real meritocracy interest retirement plan)
 stabile routinario non proattivo scarso impegno                                    Neutral            -1
 (stable routine not proactive scarce effort)
 assumere direttamente assistente seguire particolare sicurezza vedere capo         Neutral            0
 (hire directly assistant follow particular safety see boss)

    Table 1: Examples of sentiment classification and comparison with HR-BPs motivation classes.


         True/Predicted       -1   0    1     All            two-dimensional space (since dimensions here
         -1                   12   11   3     26             don’t have a real meaning, we excluded them from
         0                    3    20   18    41             the plot). For the sake of clarity, we chose not to
         1                    1    37   95    133            show unlabeled clusters; the resulting plot shows
         All                  16   68   116   200            that clusters are well separated and on average
                                                             quite dense.
Table 2: Confusion matrix. True values here rep-
resent manually labeled texts.


times are aligned.
   A different subset of 200 manually labeled texts
(test set), labeled with the same methodology as
described in Section 5.1, was used for evaluat-
ing model performance. Accuracy and recall were
both 64%, while precision was 70%. For more
details about the sentiment classification perfor-
mance, see confusion matrix in Table 2.
   The clustering algorithm was applied only on
the 2392 negative interviews and it identified 8
clusters that we were able to precisely label, while
for the remaining 4 clusters labeling was unfeasi-
ble (see Table 3). Labels were applied by manually
looking at the most frequent bigrams within clus-            Figure 1: Clustering results represented with
ters, trying to identify common significant topics.          t-SNE. Only labeled clusters are shown.
   The most frequent identified issues preventing
employee satisfaction were health problems, the
                                                             7   Conclusions
will to change activity, compensation and the high
workload. The most frequent bigrams for clusters             The proposed approach could be a powerful tool
0–3 were not specific enough to lead to a precise            for HR-BPs to better understand the main issues
labeling, since they refer to work activity and job          related to the lack of employees’ satisfaction. Fur-
in general and they don’t focus on clear issues.             thermore, it could help HR analysts to quickly de-
   In Figure 1, we represented clustering results            cide which are the best actions to solve those is-
by means of t-SNE, a popular method for ex-                  sues, analyzing whether a complaint is isolated or
ploring high-dimensional data (Maaten and Hin-               shared by a group, whether it’s trivial or urgent and
ton, 2008). By this mean, we reduced the high-               act accordingly. As an example, HR Departments
dimensionality space of bigrams to an artificial             could test different actions over a group of unsatis-
               Cluster id   Docs #   Label                   Most frequent bigrams
               0            382      (NA)                    lavoro svolgere (do work)
               1            76       (NA)                    persona supporto (support person)
                                                             supporto dipendente (employee support)
                                                             carico lavoro (workload)
               2            1985     (NA)                    lavoro piacere (enjoy work)
               3            33       (NA)                    attività poco (activity low)
                                                             solo attività (only activity)
                                                             attività dovere (activity must)
               4            149      Workload                carico lavoro (workload)
                                                             eccessivo carico (exaggerated load)
                                                             lamentare eccessivo (complain about exaggerated)
               5            297      Health issues           problema salute (health issue)
                                                             grave problema (difficult problem)
                                                             serio problema (serious problem)
               6            206      Change activity         cambiare attività (change activity)
                                                             volere cambiare (want to change)
               7            81       Low productivity        poco produttivo (low productivity)
               8            67       Not productive          rispetto compito (compliance with task)
                                                             compito non produttivo (not productive task)
               9            173      Compensation            mancato riconoscimento (lacking recognition)
                                                             lamentare mancato (complain about lacking)
               10           134      Don’t change activity   svolgere attività (do activity)
                                                             volere continuare (want to go on)
                                                             continuare svolgere (keep doing)
               11           72       Change job              cambio attività (activity change)
                                                             cambiare lavoro (change job)

Table 3: Clustering results. Cluster id, number of documents within clusters, cluster labels and most
frequent bigrams inside clusters are shown. Labels were applied by manually looking at the most frequent
bigrams within clusters.


fied employees, in order to understand which one                 companies’ productivity, without having to sacri-
is the most effective for a given issue.                         fice each individual’s quality of life.
   The very same model could also be used on neu-
tral and positive subjects, so that HR could check               Acknowledgements
whether the quality of life at work of these em-                 This research was supported by Enel. We thank
ployees could be somehow improved, and under-                    our colleagues from HR System Integration dep.
stand which are the essential key factors for the                who provided the data analyzed in this study.
employees’ well-being.
   From a technical point of view, one possible im-
provement in order to strengthen the solidity of the             References
present approach could be to manually annotate a
                                                                 Charu C. Aggarwal and ChengXiang Zhai. 2012. A
subset of (anonymized) texts, developing a gold                    survey of text clustering algorithms. In Mining text
standard of HR interview clusters, to be used as                   data, pages 77–128. Springer.
a test set for techniques like the one presented in
this study. This gold standard may be made avail-                David Arthur and Sergei Vassilvitskii. 2007. k-
                                                                   means++: The advantages of careful seeding. In
able company-wise, in order to encourage collab-                   Proceedings of the eighteenth annual ACM-SIAM
oration and to foster the creation of a data science               symposium on Discrete algorithms, pages 1027–
community, to help bring a data driven way of                      1035. Society for Industrial and Applied Mathemat-
thinking even to those areas which have been tradi-                ics.
tionally more reluctant to adopt a rigorous digital              Stefano Baccianella, Andrea Esuli, and Fabrizio Sebas-
transformation.                                                     tiani. 2010. Sentiwordnet 3.0: An enhanced lexical
   This is a first step to improve how HR Depart-                   resource for sentiment analysis and opinion mining.
                                                                    In LREC, volume 10, pages 2200–2204.
ments operate nowadays. We strongly believe that
the introduction of a data driven approach can sup-              Valerio Basile and Malvina Nissim. 2013. Sentiment
port critical HR decisional processes and improve                  analysis on italian tweets. In Proceedings of the 4th
  Workshop on Computational Approaches to Subjec-        Melvin Earl Maron. 1961. Automatic indexing: an
  tivity, Sentiment and Social Media Analysis, pages      experimental inquiry. Journal of the ACM (JACM),
  100–107.                                                8(3):404–417.

Valerio Basile, Andrea Bolioli, Malvina Nissim, Vi-      Andy Moniz and Franciska Jong. 2014. Sentiment
  viana Patti, and Paolo Rosso. 2014. Overview of the      analysis and the impact of employee satisfaction on
  evalita 2014 sentiment polarity classification task.     firm earnings. In Proceedings of the 36th Euro-
  In Proceedings of the 4th evaluation campaign of         pean Conference on IR Research on Advances in
  Natural Language Processing and Speech tools for         Information Retrieval - Volume 8416, ECIR 2014,
  Italian (EVALITA’14).                                    pages 519–527, New York, NY, USA. Springer-
                                                           Verlag New York, Inc.
Pierpaolo Basile, Valerio Basile, Malvina Nissim, and
   Nicole Novielli. 2015. Deep tweets: from en-          Preslav Nakov. 2016. Sentiment analysis in twitter:
   tity linking to sentiment analysis. In Proceedings      A semeval perspective. In Proceedings of NAACL-
   of the Italian Computational Linguistics Conference     HLT, pages 171–172.
   (CLiC-it 2015).
                                                         Roberto Navigli. 2009. Word sense disambiguation: A
Laszlo Bock. 2015. Work rules!: Insights from inside       survey. ACM Computing Surveys (CSUR), 41(2):10.
  Google that will transform how you live and lead.
  Hachette UK.                                           Bo Pang and Lillian Lee. 2008. Opinion mining and
                                                           sentiment analysis. Foundations and Trends in In-
Harold Borko and Myrna Bernick. 1963. Auto-                formation Retrieval, 2(1–2):1–135.
  matic document classification. Journal of the ACM
  (JACM), 10(2):151–162.                                 Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.
                                                           2002. Thumbs up?: sentiment classification using
Eric Brill and Robert C. Moore. 2000. An improved          machine learning techniques. In Proceedings of the
   error model for noisy channel spelling correction.      ACL-02 conference on Empirical methods in natural
   In Proceedings of the 38th Annual Meeting on As-        language processing-Volume 10, pages 79–86. As-
   sociation for Computational Linguistics, ACL ’00,       sociation for Computational Linguistics.
   pages 286–293. Association for Computational Lin-
   guistics.                                             Sara Rosenthal, Alan Ritter, Preslav Nakov, and
                                                           Veselin Stoyanov. 2014. Semeval-2014 task 9: Sen-
Harris Drucker, Donghui Wu, and Vladimir N. Vap-           timent analysis in twitter. In Proceedings of the 8th
  nik. 1999. Support vector machines for spam cate-        international workshop on semantic evaluation (Se-
  gorization. IEEE Transactions on Neural networks,        mEval 2014), pages 73–80. Dublin, Ireland.
  10(5):1048–1054.
                                                         Sara Rosenthal, Preslav Nakov, Svetlana Kiritchenko,
Susan Dumais and Hao Chen. 2000. Hierarchical              Saif M. Mohammad, Alan Ritter, and Veselin Stoy-
  classification of web content. In Proceedings of         anov. 2015. Semeval-2015 task 10: Sentiment anal-
  the 23rd annual international ACM SIGIR confer-          ysis in twitter. In Proceedings of the 9th interna-
  ence on Research and development in information          tional workshop on semantic evaluation (SemEval
  retrieval, pages 256–263. ACM.                           2015), pages 451–463.

Alex Edmans. 2011. Does the stock market fully           Helmut Schmid. 1994. Probabilistic part-of-speech
  value intangibles? employee satisfaction and eq-         tagging using decision trees. In Proceedings of
  uity prices.    Journal of Financial Economics,          International Conference on New Methods in Lan-
  101(3):621–640.                                          guage Processing, pages 154–164.

Globoforce. 2015. 2015 employee recognition report       Fabrizio Sebastiani. 2002. Machine learning in auto-
  – culture as a competitive differentiator. Technical     mated text categorization. ACM computing surveys
  report.                                                  (CSUR), 34(1):1–47.

Ammar Ismael Kadhim, Yu-N Cheah, and Nu-
 rul Hashimah Ahamed. 2014. Text document pre-
 processing and dimension reduction techniques for
 text document clustering. In 2014 4th International
 Conference on Artificial Intelligence with Applica-
 tions in Engineering and Technology, pages 69–73.

Bing Liu. 2012. Sentiment analysis and opinion min-
  ing. Synthesis lectures on human language tech-
  nologies, 5(1):1–167.

Laurens van der Maaten and Geoffrey Hinton. 2008.
  Visualizing data using t-sne. Journal of Machine
  Learning Research, 9(Nov):2579–2605.