Automatic Evaluation of Employee Satisfaction Marco Piersanti Giulia Brandetti Pierluigi Failla Data Modeling and Analysis – Enel Italia S.R.L. Rome, Italy {name}.{surname}@enel.com Abstract analysts can leverage data in order to get more complete, detailed and data-supported decisions. English. Human Resources are one of the Being able to monitor employees’ engagement most important assets in modern organiza- and satisfaction is critical in order to maintain a tions. Their capability of facing employ- positive and constructive office environment. The ees’ needs is critical in order to have an ef- benefit for the company is in the capability of re- fective and efficient company, where peo- taining the best employees and keeping the overall ple are the center of all business processes. workforce strong and motivated. Furthermore, re- This work is focused on developing new cent surveys (Globoforce, 2015) show the issues techniques that, leveraging a data driven that companies are facing when they try to do re- approach, can help Human Resources to tention or improve engagement. find a more precise employee satisfaction This paper is organized as follows. Section 2 categorization, to easily identify possible presents a literature review on both themes of HR issues and to act in a proactive fashion. Management and text mining, Section 3 summa- rizes the motivations that drove the present study, Italiano. Le Risorse Umane sono una Sections 4 and 5 discuss data and methodology, re- delle funzioni piú importanti nelle aziende spectively, and Section 6 presents the results. Fi- moderne. La loro capacità di affrontare nally, Section 7 discusses the implications of the le necessità dei dipendenti è fondamen- findings and further possible developments. tale per avere un’azienda efficiente, dove le persone sono al centro di tutti i processi 2 Related Works di business. Il presente lavoro è focaliz- zato sullo sviluppo di nuove tecniche che, Despite the great interest that is arising around the facendo leva su un approccio data driven, application of Data Science methods and Natural possano aiutare le Risorse Umane a dare Language Processing (NLP) to HR problems, very una categorizzazione della soddisfazione few studies exist on the topic. dei dipendenti piú precisa, ad identificare The entire field of corporate HR Management piú facilmente possibili problemi condivisi has been revolutionized by the pioneering work e ad agire in maniera proattiva. done by People Operations at Google (well de- scribed in Bock (2015)), that first put a spotlight on the benefits of having a more scientific and rig- 1 Introduction orous approach to these areas which have been tra- ditionally more reluctant to adopt change. Every modern organization has a dedicated func- Employee satisfaction has been linked to long- tion which takes care of its employees, commonly run stock returns (Edmans, 2011), consistently called Human Resources (HR). HR duties are re- with human relations theories which argue that lated to the capability of creating value through employee satisfaction brings a stronger corpo- people, ensuring that everyone can express his rate performance through improved recruitment, own potential and has a productive and comfort- retention, and motivation. Furthermore, Moniz able office environment. and Jong (2014) followed an interesting approach Nowadays, HR can rely on data to create a new to link employee satisfaction and firm earnings, paradigm based on a data driven approach, where based on sentiment analysis of employees’ re- views from the career community website www. fice everyday life but still be motivated. We there- glassdoor.com. fore chose to consider the sentiment, as it shows Text clustering, and more generally text clas- through interviews, as a proxy of employee satis- sification, is a well established topic in the NLP faction. research area (Sebastiani, 2002; Aggarwal and With the present study, we aim to categorize Zhai, 2012; Kadhim et al., 2014). The automated employee satisfaction in a more detailed and auto- categorization of texts, although dating back to matic way, identifying common trends among em- the early ’60s (Maron, 1961; Borko and Bernick, ployees and clustering them into groups that share 1963), went through a booming interest in the last similar problems. The goal is to help HR-BPs in twenty years, due to the explosion of the amount having an overall view of their resources’ mood of documents available in digital form and the im- and make effective adjustments in critical situa- pelling need to organize them. Nowadays text tions. It will also help in such situations when classification is used in many applications, rang- new HR-BPs take over a group of already inter- ing from automatic document indexing and auto- viewed resources, allowing them to have a clearer mated metadata generation, to document filtering understanding of the employees and their critical- (e.g., spam filters (Drucker et al., 1999)), word ities without having to read all interviews. sense disambiguation (Navigli, 2009), population For all the aforementioned reasons, we per- of hierarchical catalogs of Web resources (Dumais formed a classification of the interviews based on and Chen, 2000), and in general any application their sentiment (Section 5.1) prior to send them requiring document understanding. into the text clustering algorithm (Section 5.2). In Flourished in the last decade, sentiment analy- the present study, we chose to focus only on neg- sis aims to classify the polarity of a given text – ative moods, since they include the biggest issues whether the expressed opinion in a document or HR should monitor. Nevertheless, the practical us- a sentence is positive, negative, or neutral (Pang age of this system involves the whole set of senti- et al., 2002; Pang and Lee, 2008; Baccianella et ment classes, since HR is interested in monitoring al., 2010; Liu, 2012). The growing interest on the the entire workforce well-being and in following subject reflects on the success of the tasks of sen- its evolution over time. timent analysis on Twitter data at SemEval since In choosing methods, we had to tackle the chal- 2013 (Rosenthal et al., 2014; Rosenthal et al., lenge to balance the scientific rigor and the need of 2015; Nakov, 2016). Even if the driving language ease of interpretation and communication to all ac- for most of those techniques is English, we started tors involved in the process. We therefore chose to to see an increasing trend also in Italy (Basile and use well understood and controllable techniques, Nissim, 2013; Basile et al., 2014; Basile et al., like sentiment analysis and k-means clustering. 2015), confirming the great interest of the Italian NLP community in sentiment analysis techniques. 4 Experiments and Data 4.1 Data Description 3 Task Description HR System Integration provided interviews data, Enel HR Business Partners’ (HR-BPs) job consists a file containing 53k textual notes in more than 5 in monitoring employees’ well-being, acting when languages taken by HR-BPs during interviews. In- necessary to solve issues. In doing so, they period- terviews spanned approximately 1 year, from June ically interview employees and register informa- 2015 to July 2016, and they were performed by tion about their satisfaction, motivation, work-life 142 different HR-BPs. balance and other personal issues in textual notes. For the present study, we focused only on Ital- Currently, employees are manually classified by ian interviews (25k interviews) and selected a sin- HR-BPs in three main categories: Demotivated, gle interview for each employee (23k interviews), Neutral and Motivated. Unfortunately, employee since in the few cases of repeated interviews texts motivation is not a very reliable indicator of em- were not relevant (e.g., “See previous interview”). ployee well-being, since it may mask an under- Notes shorter than 5 words (the 5th percentile lying dissatisfaction, or more generally the pres- of the distribution of the number of words in each ence of issues that HR department should act on. note) were considered irrelevant. As a result, in Indeed, one can face several problems in the of- the present study we considered a dataset of 22k interviews. sults with the ones produced by manually annotat- ing a subset of 200 (randomly chosen) texts (train- 4.2 Data Preprocessing ing set): two judges classified texts independently Data preparation includes removing punctuation, and a third one solved the cases where there wasn’t numbers and stop words (we removed 300 com- agreement. Agreement between the two indepen- mon Italian stop words, including some pecu- dent judges was measured by calculating Cohen’s liar words that are not relevant in this context, Kappa (κ = 0.6). like “Enel”, “colloquio”, etc.), changing letters We chose α = 0.7 and θ = 0.0004 so that accu- to lower case and lemmatization (Schmid, 1994). racy, recall and precision of the sentiment model We assumed all unrecognized words to be ty- were maximized. Although we may have chosen pos, and we corrected them by using a dictio- to optimize parameters in order to maximize neg- nary composed by 110k Italian words and 650 ative texts recognition, we chose to consider the English words commonly used in business daily- overall accuracy on the three classes, because from life1 . In order to have an effective correction, a business perspective it is more valuable to mon- we used Optimal String Alignment distance (Brill itor the entire workforce satisfaction and to follow and Moore, 2000) (OSA distance), an extension of its evolution over time. While for α we tried man- Levenshtein distance that, together with insertion, ually different settings, weighting more bigrams deletion and substitution, includes transpositions than unigrams, for θ we used the ROC curve and among its allowable operations. the area under it, picking the one with maximal sum of true-positive and false-negative values. 5 Model Description 5.1 Sentiment Analysis 5.2 Text Clustering We performed sentiment classification of texts by For notes’ clustering, we focused only on those customizing and improving a publicly available classified as negative from the sentiment model lexicon2 . In total, we used 3428 Italian labeled un- (Section 5.1). igrams and 10451 bigrams, categorized as positive Since we didn’t have a target variable to (4736), neutral (4367) or negative (4776) based on model (unsupervised classification), we chose to their polarity. adopt the k-means clustering algorithm, using k- The sentiment classification model proposed in means++ technique to seed the initial cluster cen- this paper is based on a score ϕsent that weights ters (Arthur and Vassilvitskii, 2007). differently unigrams and bigrams with a factor α: The clustering model was applied on the TF- IDF matrix, built with bigrams appearing in at ϕsent = (1 − α) · ϕuni + α · ϕbi least 2 documents. In this way, we reduced our dimensionality from the initial 37k bigrams to 5k. where 0 ≤ α ≤ 1, ϕuni is the difference between To calculate proximity among documents, we used the number of positive and negative unigrams, nor- cosine similarity. malized by the number of words in the text and ϕbi Additionally, Silhouette distance has been cho- is the difference between the number of positive sen to select the best number of clusters: differ- and negative bigrams, normalized by the number ent models were computed by varying the number of bigrams in the text. Final sentiment was then of clusters between 2 and 30 and the respective calculated according to the formula Silhouette scores were compared, fixing the num-  ber of clusters at 12 (corresponding to the highest +1 if ϕsent > θ  score). Sent = −1 if ϕsent < −θ 6 Results  0 otherwise.  Model calibration (i.e. the choice of parameters The application of this sentiment model (Section α and θ) was performed by comparing model re- 5.1) classified interviews in 3655 negatives, 956 neutrals and 17297 positives. As we can see in 1 https://github.com/napolux/ Table 1, sentiment classification is more clearly paroleitaliane 2 https://github.com/opener-project/ related to employee satisfaction than motivation public-sentiment-lexicons classes provided by HR-BPs, although they some- Text (after preprocessing) HR-BP Motivation Sentiment risorsa brillante neodirigente clima positivo ansioso molto positivo Motivated +1 (brilliant resource new executive positive mood anxious very positive) assenteista risorsa molto critico non riuscire nulla Demotivated -1 (absentee very critical resource don’t succeed in anything) non valorizzare poco riconoscimento non potere rimanere Motivated -1 (don’t valorize inadequate recognition can’t stay) molto scontento non credere azienda reale meritocrazia interessare piano esodo Motivated -1 (very unhappy don’t believe company real meritocracy interest retirement plan) stabile routinario non proattivo scarso impegno Neutral -1 (stable routine not proactive scarce effort) assumere direttamente assistente seguire particolare sicurezza vedere capo Neutral 0 (hire directly assistant follow particular safety see boss) Table 1: Examples of sentiment classification and comparison with HR-BPs motivation classes. True/Predicted -1 0 1 All two-dimensional space (since dimensions here -1 12 11 3 26 don’t have a real meaning, we excluded them from 0 3 20 18 41 the plot). For the sake of clarity, we chose not to 1 1 37 95 133 show unlabeled clusters; the resulting plot shows All 16 68 116 200 that clusters are well separated and on average quite dense. Table 2: Confusion matrix. True values here rep- resent manually labeled texts. times are aligned. A different subset of 200 manually labeled texts (test set), labeled with the same methodology as described in Section 5.1, was used for evaluat- ing model performance. Accuracy and recall were both 64%, while precision was 70%. For more details about the sentiment classification perfor- mance, see confusion matrix in Table 2. The clustering algorithm was applied only on the 2392 negative interviews and it identified 8 clusters that we were able to precisely label, while for the remaining 4 clusters labeling was unfeasi- ble (see Table 3). Labels were applied by manually looking at the most frequent bigrams within clus- Figure 1: Clustering results represented with ters, trying to identify common significant topics. t-SNE. Only labeled clusters are shown. The most frequent identified issues preventing employee satisfaction were health problems, the 7 Conclusions will to change activity, compensation and the high workload. The most frequent bigrams for clusters The proposed approach could be a powerful tool 0–3 were not specific enough to lead to a precise for HR-BPs to better understand the main issues labeling, since they refer to work activity and job related to the lack of employees’ satisfaction. Fur- in general and they don’t focus on clear issues. thermore, it could help HR analysts to quickly de- In Figure 1, we represented clustering results cide which are the best actions to solve those is- by means of t-SNE, a popular method for ex- sues, analyzing whether a complaint is isolated or ploring high-dimensional data (Maaten and Hin- shared by a group, whether it’s trivial or urgent and ton, 2008). By this mean, we reduced the high- act accordingly. As an example, HR Departments dimensionality space of bigrams to an artificial could test different actions over a group of unsatis- Cluster id Docs # Label Most frequent bigrams 0 382 (NA) lavoro svolgere (do work) 1 76 (NA) persona supporto (support person) supporto dipendente (employee support) carico lavoro (workload) 2 1985 (NA) lavoro piacere (enjoy work) 3 33 (NA) attività poco (activity low) solo attività (only activity) attività dovere (activity must) 4 149 Workload carico lavoro (workload) eccessivo carico (exaggerated load) lamentare eccessivo (complain about exaggerated) 5 297 Health issues problema salute (health issue) grave problema (difficult problem) serio problema (serious problem) 6 206 Change activity cambiare attività (change activity) volere cambiare (want to change) 7 81 Low productivity poco produttivo (low productivity) 8 67 Not productive rispetto compito (compliance with task) compito non produttivo (not productive task) 9 173 Compensation mancato riconoscimento (lacking recognition) lamentare mancato (complain about lacking) 10 134 Don’t change activity svolgere attività (do activity) volere continuare (want to go on) continuare svolgere (keep doing) 11 72 Change job cambio attività (activity change) cambiare lavoro (change job) Table 3: Clustering results. Cluster id, number of documents within clusters, cluster labels and most frequent bigrams inside clusters are shown. Labels were applied by manually looking at the most frequent bigrams within clusters. fied employees, in order to understand which one companies’ productivity, without having to sacri- is the most effective for a given issue. fice each individual’s quality of life. The very same model could also be used on neu- tral and positive subjects, so that HR could check Acknowledgements whether the quality of life at work of these em- This research was supported by Enel. We thank ployees could be somehow improved, and under- our colleagues from HR System Integration dep. stand which are the essential key factors for the who provided the data analyzed in this study. employees’ well-being. From a technical point of view, one possible im- provement in order to strengthen the solidity of the References present approach could be to manually annotate a Charu C. Aggarwal and ChengXiang Zhai. 2012. A subset of (anonymized) texts, developing a gold survey of text clustering algorithms. In Mining text standard of HR interview clusters, to be used as data, pages 77–128. Springer. a test set for techniques like the one presented in this study. This gold standard may be made avail- David Arthur and Sergei Vassilvitskii. 2007. k- means++: The advantages of careful seeding. In able company-wise, in order to encourage collab- Proceedings of the eighteenth annual ACM-SIAM oration and to foster the creation of a data science symposium on Discrete algorithms, pages 1027– community, to help bring a data driven way of 1035. Society for Industrial and Applied Mathemat- thinking even to those areas which have been tradi- ics. tionally more reluctant to adopt a rigorous digital Stefano Baccianella, Andrea Esuli, and Fabrizio Sebas- transformation. tiani. 2010. Sentiwordnet 3.0: An enhanced lexical This is a first step to improve how HR Depart- resource for sentiment analysis and opinion mining. In LREC, volume 10, pages 2200–2204. ments operate nowadays. We strongly believe that the introduction of a data driven approach can sup- Valerio Basile and Malvina Nissim. 2013. Sentiment port critical HR decisional processes and improve analysis on italian tweets. In Proceedings of the 4th Workshop on Computational Approaches to Subjec- Melvin Earl Maron. 1961. Automatic indexing: an tivity, Sentiment and Social Media Analysis, pages experimental inquiry. Journal of the ACM (JACM), 100–107. 8(3):404–417. Valerio Basile, Andrea Bolioli, Malvina Nissim, Vi- Andy Moniz and Franciska Jong. 2014. Sentiment viana Patti, and Paolo Rosso. 2014. Overview of the analysis and the impact of employee satisfaction on evalita 2014 sentiment polarity classification task. firm earnings. In Proceedings of the 36th Euro- In Proceedings of the 4th evaluation campaign of pean Conference on IR Research on Advances in Natural Language Processing and Speech tools for Information Retrieval - Volume 8416, ECIR 2014, Italian (EVALITA’14). pages 519–527, New York, NY, USA. Springer- Verlag New York, Inc. Pierpaolo Basile, Valerio Basile, Malvina Nissim, and Nicole Novielli. 2015. Deep tweets: from en- Preslav Nakov. 2016. Sentiment analysis in twitter: tity linking to sentiment analysis. In Proceedings A semeval perspective. In Proceedings of NAACL- of the Italian Computational Linguistics Conference HLT, pages 171–172. (CLiC-it 2015). Roberto Navigli. 2009. Word sense disambiguation: A Laszlo Bock. 2015. Work rules!: Insights from inside survey. ACM Computing Surveys (CSUR), 41(2):10. Google that will transform how you live and lead. Hachette UK. Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in In- Harold Borko and Myrna Bernick. 1963. Auto- formation Retrieval, 2(1–2):1–135. matic document classification. Journal of the ACM (JACM), 10(2):151–162. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: sentiment classification using Eric Brill and Robert C. Moore. 2000. An improved machine learning techniques. In Proceedings of the error model for noisy channel spelling correction. ACL-02 conference on Empirical methods in natural In Proceedings of the 38th Annual Meeting on As- language processing-Volume 10, pages 79–86. As- sociation for Computational Linguistics, ACL ’00, sociation for Computational Linguistics. pages 286–293. Association for Computational Lin- guistics. Sara Rosenthal, Alan Ritter, Preslav Nakov, and Veselin Stoyanov. 2014. Semeval-2014 task 9: Sen- Harris Drucker, Donghui Wu, and Vladimir N. Vap- timent analysis in twitter. In Proceedings of the 8th nik. 1999. Support vector machines for spam cate- international workshop on semantic evaluation (Se- gorization. IEEE Transactions on Neural networks, mEval 2014), pages 73–80. Dublin, Ireland. 10(5):1048–1054. Sara Rosenthal, Preslav Nakov, Svetlana Kiritchenko, Susan Dumais and Hao Chen. 2000. Hierarchical Saif M. Mohammad, Alan Ritter, and Veselin Stoy- classification of web content. In Proceedings of anov. 2015. Semeval-2015 task 10: Sentiment anal- the 23rd annual international ACM SIGIR confer- ysis in twitter. In Proceedings of the 9th interna- ence on Research and development in information tional workshop on semantic evaluation (SemEval retrieval, pages 256–263. ACM. 2015), pages 451–463. Alex Edmans. 2011. Does the stock market fully Helmut Schmid. 1994. Probabilistic part-of-speech value intangibles? employee satisfaction and eq- tagging using decision trees. In Proceedings of uity prices. Journal of Financial Economics, International Conference on New Methods in Lan- 101(3):621–640. guage Processing, pages 154–164. Globoforce. 2015. 2015 employee recognition report Fabrizio Sebastiani. 2002. Machine learning in auto- – culture as a competitive differentiator. Technical mated text categorization. ACM computing surveys report. (CSUR), 34(1):1–47. Ammar Ismael Kadhim, Yu-N Cheah, and Nu- rul Hashimah Ahamed. 2014. Text document pre- processing and dimension reduction techniques for text document clustering. In 2014 4th International Conference on Artificial Intelligence with Applica- tions in Engineering and Technology, pages 69–73. Bing Liu. 2012. Sentiment analysis and opinion min- ing. Synthesis lectures on human language tech- nologies, 5(1):1–167. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. Journal of Machine Learning Research, 9(Nov):2579–2605.