Gender Differences in Early Career Performance Reviews: a Text Mining Study Shivangi Chopra1 , Lukasz Golab1 1 University of Waterloo, Canada Abstract It is well known that fewer women than men earn STEM degrees and persist in STEM careers. Since early career experiences affect career attrition, we investigate gender differences in early career performance reviews. Our analysis is enabled by a unique dataset, with nearly 6,000 performance reviews of undergraduate engineering students participating in co-operative internships. Text mining of workplace supervisor comments included in the reviews reveals several gender differences. Male students are more likely to be described as eager, efficient, and independent, whereas female students are perceived as thorough and collaborative. Moreover, male students are more likely to be asked to improve their interpersonal skills, whereas female students are more likely to receive suggestions to improve their business knowledge. Our results thus suggest that men and women are perceived differently in the STEM workplace from the beginning of their careers. Keywords gender gap in STEM, co-operative internships, text mining 1. Introduction significant frequency differences between the reviews received by male and female students. The gender gap in Science, Technology, Engineering, and We find that male and female students are perceived Mathematics (STEM) is well-documented: studies show differently by their co-op employers. Male students are that fewer women apply to STEM programs [1], obtain more likely to be described as eager, efficient, and in- STEM degrees [2], and continue with STEM careers [2, 3]. dependent, whereas female students are more likely to Workplace experiences, especially early career experi- be described as thorough, dedicated, and collaborative. ences, are known to drive career attrition [4, 3]. We Besides, male students receive recommendations to im- therefore ask the following research question: Are there prove interpersonal skills and female students are asked gender differences in early career performance reviews? to improve their business knowledge. Furthermore, the To answer this question, we analyze workplace perfor- gender composition of the programs seemed to affect the mance reviews of students from a large North American feedback and recommendations received by the students. university participating in co-operative (co-op) intern- The majority gender was more likely to receive technical ships. Co-op programs in STEM fields have become pop- feedback and recommendations, whereas the minority ular worldwide, and allow students to alternate between gender was asked to work on their confidence and ask academic study terms and work internships. For many more questions. students, co-op internships are the first career experi- Our results suggest that men and women are perceived ences in the engineering workplace. differently in the STEM workplace from the beginning The dataset we analyze consists of nearly 6,000 perfor- of their careers. Whether these gender differences are mance reviews from the 2015/2016 academic year given due to employer perceptions or differences in competen- to undergraduate engineering students. Each review con- cies cannot be determined directly from our data. How- tains two comments: 1) supervisor’s feedback on the ever, regardless of the underlying reasons, we argue that student’s performance, and 2) supervisor’s recommen- universities offering co-operative programs should com- dations for the student’s future development. Addition- municate with participating employers to emphasize the ally each review includes the student’s gender, academic importance of unbiased feedback in talent recruitment program, and academic level, and we are also given the and retention. gender composition of each engineering program at the The remainder of this paper is organized as follows. university. We parse out the words used in these com- Section 2 summarizes prior work on gender differences in ments and we run statistical tests to identify words with performance reviews. Section 3 describes our dataset and the methodology used to analyze it. Section 4 presents the Published in the Workshop Proceedings of the EDBT/ICDT 2022 Joint results. Section 5 summarizes the findings, offers possible Conference (March 29-April 1, 2022), Edinburgh, UK explanations for the findings, and presents actionable $ s9chopra@uwaterloo.ca (S. Chopra); lgolab@uwaterloo.ca insights. Finally, Section 6 concludes the paper with (L. Golab) © 2022 Copyright for this paper by its authors. Use permitted under Creative directions of future work. Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 2. Related Work be true for girls [23]. In physical education, male students tend to receive more attention and technical feedback We are not aware of any previous work on gender differ- than female students [18, 19]. Studies also found that ences in supervisor comments included in early career female students were more likely than male students to performance reviews. However, there has been work on internalize the feedback they receive [19]. This internal- gender differences in numeric performance scores given ization of feedback lowered their self-efficacy beliefs and to student interns [5, 6, 7]. The results, however, are performance [18, 24]. Our study analyzes early career inconclusive. A study where technology professionals experiences of STEM students to understand if similar rated hypothetical interns on competence, intelligence, differences in feedback persist. and potential field issues found that men are rated more highly than women [7]. Another study that analyzed evaluations from co-op employers found that female stu- 3. Data and Methods dents are rated more highly (than male students) on over- all performance as well as on specific criteria including 3.1. Data communication, teamwork, and quality of work [5, 6]. We analyze three semesters of work performance eval- More broadly, in the context of postgraduate employ- uations, from September 2015 to August 2016, collected ment, gender differences in employee (or peer) evalua- by a large North American university. The dataset con- tions have been studied in various fields, including tech- sists of 5,708 workplace performance reviews of students nology, the military, politics, law, sports, and medicine enrolled in undergraduate engineering co-operative pro- [8, 9, 10, 11, 12, 13, 14, 15, 5, 16]. The evaluations un- grams. Each review was completed at the end of a four- der study were either numeric (ratings), categorical (tags month internship (in the remainder of this paper, we use chosen from a predefined list of attributes), or textual. the terms ‘internship’ and ‘work term’ interchangeably). The reported findings are consistent across industries: As part of the evaluation, students receive an overall men receive more actionable and task-oriented feedback performance rating that indicates whether the student and women receive more critical and personality-related exceeded, matched, or did not meet the employer’s expec- feedback. tations. Hence, we divide students into three categories: Among studies that analyzed gender differences in above-average, average, and below-average. Along with written performance reviews, we found only one that this overall evaluation rating, the student’s supervisor used text mining methods (topic modeling) [8]. This was required to submit short free-text responses to the paper studied gender differences in the leadership rep- following questions: resentation of 146 political leaders by analyzing 1057 comments they received from their colleagues. Other 1. Feedback: Please comment on the student’s over- studies that analyzed comments in performance reviews all job performance in terms of their behavioral either conducted a qualitative analysis or manually coded and developmental performance and expectations the language of the reviews [11, 12, 10, 13, 14, 15, 16]. In with respect to output, quality standards, delivery those studies, researchers read the comments and coded of goals and assignments. them according to various parameters, including tone, 2. Recommendations: Please provide your recom- valence, and skills discussed (technical, communal, agen- mendations for the student’s personal and pro- tic, and others). However, one drawback of these studies fessional development (optional). 42% of the per- is the small data size (under 300 performance reviews). formance reviews have a non-blank recommen- Lastly, we discuss research on gender differences in dation. academic performance reviews. An analysis of 1,224 Along with this end-of-term performance review, our recommendation letters for postdoctoral fellows in geo- dataset contains the following information about each science found that female applicants were only half as student: likely to receive excellent versus good letters compared to male applicants [17]. These recommendation letters were 1. Gender: male or female, manually coded in terms of the letter tone and length. 2. Academic program: one of the 13 engineering We found no studies in elementary, primary, or sec- programs listed in Table 1, which also shows the ondary education that analyzed gender differences in gender distribution of each program, sorted by written performance feedback. However, some studies percentage of male students. analyzed gender differences in teacher-student interac- 3. Seniority: measured in terms of the number of tion (i.e., verbal feedback) [18, 19, 20, 21, 22]. Studies in work terms completed: junior students are those STEM classrooms found that teachers tend to attribute who have completed zero or one work terms, and boys’ success in STEM to ability and boys’ failures in senior students are those who have completed at STEM to lack of effort, while the opposite is believed to least four work terms (out of a maximum of six). The dataset does not include information about the job Table 1 (for example, job title, company, and location) or the Gender breakdown by program evaluator (for example, position or gender). We report results for two groups of students: those Program %Male %Female from programs with less than 40% female students (the Computer 88% 12% first nine in Table 1), and those from programs with Mechanical 87% 13% greater than or equal to 40% female students (the last four Mechatronics 86% 14% in Table 1)1 . Table 2 shows the proportions of students Electrical 83% 17% in programs with < 40% and ≥ 40% female students and Software 82% 18% the proportions of students within each group evaluated Nanotechnology 75% 25% as below-average, average, and above-average. The table Geological 70% 30% also shows the proportion of male and female students Civil 67% 33% within each group. System Design 67% 33% Chemical 60% 40% Management 58% 42% 3.2. Methods Environmental 41% 59% Biomedical 41% 59% The goal of this paper is to understand gender differences in written reviews received by student interns. Since Total 77% 23% these comments have a free-text format, we implemented a parser in Python to convert each comment to a set of standardized word forms (referred to as “words”, “tokens”, analysis to identify words that are more frequently or “terms” in the remainder of the paper). The parser used for male students than for female students, and consists of the following standard text mining steps [25]: vice versa. We report differences that are statistically significant at a p-value of 0.05 (when using a two-tailed 1. The text is converted to lower case. two proportion z-test) and have a statistical power 2. Stopwords, which are words that serve a gram- greater than 80%. In addition, for each difference, we matical purpose but do not contain any meaning- report the odds ratio (OR), calculated according to the ful information, such as “and”, “the” and “is”, are formula below. The OR indicates the strength (or size) of removed. Words common in the co-op internship the difference and can be interpreted as follows. Suppose context, including “workterm”, “university and the odds ratio of token W is 1.5. This means that token “co-op”, are also removed. W is 1.5 times more likely to occur in Group A (for 3. Various forms of certain words and phrases are example, male students) than Group B (for example, converted to a common form using regular ex- female students). pression matching2 (e.g., occurrences of “inter- personal”, “interpersonal”, and “interpersonal” 𝑂𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜 𝑓 𝑜𝑟 𝑇 𝑜𝑘𝑒𝑛 𝑊 𝑖𝑛 𝐺𝑟𝑜𝑢𝑝 𝐴 𝑣𝑒𝑟𝑠𝑢𝑠 𝐵 = are converted to “interpersonal”, and “hard work” and “hardwork” are converted to “hardwork”). # 𝑜𝑓 𝑟𝑒𝑣𝑖𝑒𝑤𝑠 𝑖𝑛 𝐺𝑟𝑜𝑢𝑝 𝐴 𝑡ℎ𝑎𝑡 𝑚𝑒𝑛𝑡𝑖𝑜𝑛 𝑊 4. Special characters, digits, and punctuation are # 𝑜𝑓 𝑟𝑒𝑣𝑖𝑒𝑤𝑠 𝑖𝑛 𝐺𝑟𝑜𝑢𝑝 𝐴 𝑡ℎ𝑎𝑡 𝑑𝑜 𝑛𝑜𝑡 𝑚𝑒𝑛𝑡𝑖𝑜𝑛 𝑊 replaced by white space. # 𝑜𝑓 𝑟𝑒𝑣𝑖𝑒𝑤𝑠 𝑖𝑛 𝐺𝑟𝑜𝑢𝑝 𝐵 𝑡ℎ𝑎𝑡 𝑚𝑒𝑛𝑡𝑖𝑜𝑛 𝑊 5. Finally, the text is tokenized by white space and # 𝑜𝑓 𝑟𝑒𝑣𝑖𝑒𝑤𝑠 𝑖𝑛 𝐺𝑟𝑜𝑢𝑝 𝐵 𝑡ℎ𝑎𝑡 𝑑𝑜 𝑛𝑜𝑡 𝑚𝑒𝑛𝑡𝑖𝑜𝑛 𝑊 3 stemmed using the NLTK snowball stemmer . Stemming converts words with common mean- We separately report significant gender differences ings but different endings to a common stem. For in the feedback and recommendations received by stu- example, the words “efficient”, “efficiently”, and dents from programs with < 40% female students and “efficiency” are converted to “effici”, and “expect”, programs with ≥ 40% female students. The analysis is “expected”, and “expectation” are converted to “ex- repeated for students with different overall performance pect”. ratings (above-average, average and below-average), and seniority levels. To avoid overfitting, we ensure that each Then, for each supervisor comment (feedback and group has more than 100 non-blank comments. Common recommendations), we conduct a term frequency English words with significant differences are excluded 1 We also analyzed the comments received by students from from the report for brevity. each program separately. However, we observed that the comments received by students in the two groups mentioned above displayed similar trends. Thus, we omit per-program results for brevity. 2 https://docs.python.org/3/library/re.html 3 https://www.nltk.org/_modules/nltk/stem/snowball.html Table 2 Feedback received by male students contains more Groups based on performance evaluation level technical terms. Table 3 shows that words relating to technical tasks, including “code”, “tool”, “written”, “hard- Programs with < 40% Programs with ≥ 40% war”, “machin”, and “analyz”, are more frequent in the Female students Female students feedback received by male students. Supervisors of male All 86% (82%M, 18%F) 14% (56%M, 44%F) students are four times more likely to refer to them as an “expert”. On the other hand, feedback received by female Above-average 32% (82%M, 18%F) 28% (61%M, 39%F) students mentions their general ability (“profici” in Ta- Average 47% (80%M, 20%F) 49% (49%M, 51%F) ble 3). This gender difference in the amount of technical Below-average 21% (86%M, 14%F) 23% (61%M, 39%F) feedback received exists in all groups with < 40% female students, irrespective of program, overall evaluation rat- ing, or seniority. 4. Results Feedback received by male students contains more mentions of the word “eager”. Manual inspection of the We now describe the results, treating students in pro- comments containing the token “eager” revealed that grams with less than 40% female students and those in these students suggest new ideas and take the initiative programs with greater than or equal to 40% female stu- to start new tasks. In addition, male students receive feed- dents separately, as mentioned in Section 3.1. Section 4.1 back on their efficiency and planning (indicated by words presents word frequency differences in the feedback and such as “effici”, “priori”, “deadlin”, “iter”, and “tackl”). recommendations received by male and female students Table 3 shows that the words “fulltim” and “ecoop” enrolled in programs with < 40% female students. Sec- occur more frequently in the feedback received by male tion 4.2 presents gender differences in word frequencies students. The token “fulltim” indicates that the employer in feedback and recommendations received by students has extended a full-time job offer to the student. The enrolled in programs with ≥ 40% female students. token “ecoop” refers to a program established by the university under study to allow students to work in their 4.1. Gender Differences in Programs with own company (i.e., their start-up) for a co-op work term. < 40% Female Students Table 3 shows that the token “ecoop” is mentioned in the feedback for 1% of male students and no female students. 4.1.1. Feedback Feedback received by female students contains more Table 3 shows the differences in token frequencies in the references to their teamwork and interpersonal skills feedback received by male and female students. On the (indicated by words such as “help”, “collabor”, “delight”, left, Table 3 shows tokens that are mentioned statistically “wonder”, and “joy” in Table 3). In addition, female stu- significantly more frequently in the feedback received dents receive more feedback on their thoroughness (in- by male students. On the right, Table 3 shows the tokens dicated by words such as “attentiontodetail”, and “thor- mentioned significantly more frequently in the feedback ough” in Table 3), dedication (“dedic”, “enthusiast”), and received by female students. adaptability (the token “adapt” is mentioned in the feed- The lists are sorted by the difference in frequencies, back received by female students 3.7 times more often abbreviated ∆, computed as the percentage of male (or than in the feedback received by male students). female) students whose feedback mentioned a token mi- Some tokens in Table 3 indicate that male and female nus the percentage of female (or male) students whose students are referred to differently by their employers. feedback mentioned this token. For example, feedback Manual inspection of the comments containing the word received by male students contained “code” 4% more of- “addition” indicates that female students are referred to ten than feedback received by female students. Asterisks as a “good addition to the team/company”. Manual in- indicate the strength of the statistical significance of the spection of the comments containing the word “potenti” difference, with all reported differences having a p-value indicates that the word is generally used in the context of at least 0.05. In addition, Table 3 mentions the odds of “has a lot of potential”, and the word “demand” is used ratio for each difference. For example, employers are 1.6 to describe a student’s ability to cope with a demanding times more likely to describe female students as “thor- work environment. These tokens are found more often ough” than male students. in the feedback received by female students. Even though some of the differences shown in Table 3 Gender differences in the feedback received by stu- appear small in magnitude, they are statistically signifi- dents with different overall performance ratings and se- cant at a p-value of 0.05, have statistical power greater niority levels follow the same trends as above. We omit than 80%, and have an odds ratio greater than one (men- the details for brevity. tioned in Section 3.2). Table 3 Word frequency differences in feedback received by male and female students enrolled in Programs with < 40% female students Token Male Female Δ OR Token Female Male Δ OR code 14% 10% 4%*** 1.51 help 25% 20% 5%*** 1.36 tool 7% 4% 3%** 1.62 dedic 9% 5% 4%*** 1.84 fulltim 7% 5% 2%* 1.5 attentiontodetail 7% 4% 3%*** 1.94 eager 2% 0% 2%* 2.88 collabor 5% 3% 2%** 1.86 written 3% 1% 2%* 2.55 thorough 6% 4% 2%** 1.63 prioriti 3% 1% 2%** 2.54 enthusiast 5% 3% 2%** 1.79 effici 2% 0% 2%* 6.29 addition 5% 3% 2%** 1.75 hardwar 2% 1% 1%* 2.55 profici 3% 1% 2%*** 2.2 machin 2% 1% 1%* 3.07 delight 3% 1% 2%** 2.09 analyz 1% 0% 1%* 4.23 demand 2% 1% 1%** 2.17 expert 1% 0% 1%* 4.16 timemanag 2% 1% 1%*** 3.3 deadlin 1% 0% 1%* 6.74 wonder 2% 1% 1%*** 2.86 iter 1% 0% 1%* 6.74 adapt 1% 0% 1%*** 3.68 ecoop 1% 0% 1%* inf joy 1% 0% 1%** 3.01 tackl 3% 2% 1%* 1.85 potenti 1% 0% 1%*** inf Note. ***: p < .001; **: p < .01; *: p < .05 Table 4 Word frequency differences in recommendations received by male and female students enrolled in Programs with < 40% female students Token Male Female Δ OR Token Female Male Δ OR solut 8% 4% 4%** 2.15 allow 8% 3% 5%*** 2.88 seek 4% 1% 3%** 3.91 express 4% 0% 4%** inf system 4% 1% 3%** 3.21 network 4% 0% 4%*** inf read 3% 1% 2%* 3.27 oper 5% 1% 4%** 3.13 architectur 3% 1% 2%* 4.75 encourag 7% 5% 4%** 1.55 maintain 3% 1% 2%* 4.41 challeng 9% 5% 4%** 1.89 mistak 2% 0% 2%* inf askquestion 9% 5% 4%** 1.93 attent 3% 1% 2%* 3.91 general 4% 1% 3%** 3.53 web 1% 0% 1%* inf varieti 3% 0% 3%*** inf algorithm 1% 0% 1%* inf afraid 3% 0% 3%** 3.01 help 1% 0% 1%* inf shi 3% 0% 3%*** 17.62 cooperat 1% 0% 1%* inf explor 4% 1% 3%* 4.05 opinion 1% 0% 1%* inf market 3% 1% 2%*** 3.34 hear 1% 0% 1%* inf tell 1% 0% 1%*** 7.2 distract 1% 0% 1%* inf comfortzon 1% 0% 1%*** 3.62 Note. ***: p < .001; **: p < .01; *: p < .05 4.1.2. Recommendations “solution”), “system”, “read”, “architectur”, “maintain”, “web”, and “algorithm”. In addition, male students are rec- Table 4 follows the same format as Table 3 and shows the ommended to be more attentive to mistakes (indicated differences in token frequencies in the recommendations by the tokens “attent” and “mistak” in Table 4) and im- received by male and female students. Again, gender prove their teamwork and interpersonal skills (indicated differences in the recommendations received by students by “help”, “cooperat”, “opinion”, and “hear”). with different overall performance ratings and different On the other hand, female students are recommended seniority levels showed similar trends and are not shown to “express” themselves, to “network”, to not be “afraid” for brevity. or “shy”, and to ask more questions (see Table 4). The Tokens in Table 4 suggest that male students receive recommendations received by female students contains more recommendations related to technical skills. This more mentions of the tokens “oper”, “general”, “varieti”, is suggested by words such as “solut” (stem of the word Table 5 Word frequency differences in feedback received by male and female students enrolled in Programs with ≥ 40% female students Token Male Female Δ OR Token Female Male Δ OR abil 22% 14% 8%** 1.71 hardwork 13% 6% 7%** 2.25 understand 20% 12% 8%** 1.81 team 7% 3% 4%** 2.86 littlesupervis 9% 3% 6%*** 3.01 applic 6% 2% 4%** 2.89 effici 11% 6% 5%* 2.04 execut 3% 0% 3%* inf initi 7% 2% 5%** 3.6 user 3% 0% 3%* inf pictur 4% 0% 4%* inf technic 3% 0% 3%** 7.09 surpris 5% 1% 4%** 4.35 comprehens 2% 0% 2%** inf devic 3% 0% 3%* inf writtencomm 2% 0% 2%* inf matur 3% 0% 3%* inf expertis 2% 0% 2%* 8.93 prioriti 3% 0% 3%* inf smart 1% 0% 1%* inf newtask 3% 0% 3%** 10.67 stack 1% 0% 1%* inf growth 2% 0% 2%* inf legaci 1% 0% 1%* inf difficulti 1% 0% 1%* inf style 1% 0% 1%* inf persist 1% 0% 1%* inf joy 1% 0% 1%* inf ecoop 1% 0% 1%* inf read 1% 0% 1%* inf Note. ***: p < .001; **: p < .01; *: p < .05 “explor”, and “market” (see Table 4). Manual inspection references their “ability”. This is in contrast to the results of comments containing these tokens reveals that female presented in Section 4.1, where male students received students receive more recommendations to explore and more technical feedback than female students. increase their variety of knowledge, especially about busi- Nevertheless, some of the feedback received by male ness operations. students is similar to the feedback received by male stu- Table 4 indicates that recommendations received by dents from programs with < 40% female students (Sec- female students contained more occurrences of the words tion 4.1). Male students are more likely to receive feed- “allow”, “encourag”, “challeng”, and “comfortzon”. Manual back on their eagerness to start new tasks (suggested by inspection of comments containing these tokens suggests the tokens “newtask” and “initi” in Table 5, where “initi” that female students were encouraged to challenge them- is the word stem for “initiate” and “initiative”). They are selves and leave their comfort zones more often than also more likely to receive feedback on their planning and male students. efficiency (“effic”, “pictur”, “prioriti”). The token “little- supervis” in Table 5 indicates that supervisors find male 4.2. Gender Differences in Programs with students to be more independent than female students. Table 5 indicates that female students received more ≥ 40% Female Students feedback on their hard work, thoroughness (“compre- Tables 5 and 6 list the differences in word frequencies hens”, which is the word stem for “comprehensive”), in the feedback and recommendations, respectively, re- teamwork, and interpersonal skills. Female students from ceived by students enrolled in programs with ≥ 40% programs with < 40% female students received similar female students. These tables follow the same format feedback from their employers (see Section 4.1). as Tables 3 and 4. Again, we omit gender differences in Feedback given to male students contains more men- groups based on overall performance ratings and senior- tions of the words “surpris”, “growth”, “persist”, “diffi- ity, which show similar trends. culti”, and “matur” (see Table 5). Manual inspection of comments containing these terms revealed that these 4.2.1. Feedback employers were pleasantly surprised to see the students’ growth, persistence, and maturity. Table 5 indicates that comments received by female stu- Finally, similar to programs with < 40% female students dents are more related to technical performance (sug- (Section 4.1), the token “ecoop” is mentioned for 1% of gested by tokens such as “applic”, “execut”, “user”, “tech- male students and no female students. nic”, “writtencomm”, “stack”, and “read”). In addition, tokens such as “expertis” and “legaci” are found more frequently in the feedback received by female students. On the other hand, feedback received by male students Table 6 Word frequency differences in recommendations received by male and female students enrolled in Programs with ≥ 40% female students Token Male Female Δ OR Token Female Male Δ OR say 4% 0% 4%* inf oper 5% 1% 4%* 8.81 mistak 3% 0% 3%* inf creativ 5% 1% 4%* 8.81 reserv 3% 0% 3%* inf surround 4% 0% 4%* inf team 3% 0% 3%* inf knowledg 4% 0% 4%* inf public 3% 0% 3%* inf instinct 3% 0% 3%* inf speak 3% 0% 3%* inf quick 3% 0% 3%* inf open 3% 0% 3%* inf generat 3% 0% 3%* inf expect 2% 0% 2%* inf difficult 3% 0% 3%* inf distract 2% 0% 2%* inf system 3% 0% 3%* inf error 2% 0% 2%* inf learn 3% 1% 2%*** 19.33 topic 2% 0% 2%* inf document 2% 0% 2%* inf softskil 2% 0% 2%* inf explor 2% 0% 2%* inf listen 2% 0% 2%* inf interest 1% 0% 1%* inf respect 2% 0% 2%* inf compani 1% 0% 1%** 2.02 complex 2% 0% 2%** inf deal 1% 0% 1%*** 4.97 Note. ***: p < .001; **: p < .01; *: p < .05 4.2.2. Recommendations were received by female students from programs with < 40% female students. Table 6 indicates that male students are referred to as “reserved” and are recommended to “speak” (suggested by tokens such as “reserv”, “say”, “public”, “speak”, and 5. Discussion “open”). This is in contrast to the results reported in Section 4.1, where female students were recommended The main findings of this study and their significance are to ask more questions. as follows. Table 6 also indicates that female students receive more Observation #1: We found the following gender dif- technical recommendations than male students. Tokens ferences in all groups of students, irrespective of the such as “creativ”, “knowledg”, “generate”, “system”, “in- overall performance rating, seniority, and the gender terest”, “document”, and “learn”, are more common in the composition of their academic programs. recommendations received by female students. On the 1. Female students are more likely than male stu- other hand, recommendations received by male students dents to be appreciated for their thoroughness, contain more occurrences of the tokens “topic” and “com- dedication, enthusiasm, hard work, adaptability, plex”. Again, this is in contrast to the results shown in teamwork, and interpersonal skills. Section 4.1, where male students received more technical 2. Male students are more likely than female stu- recommendations. dents to be appreciated for their eagerness, plan- Nevertheless, some recommendations given to stu- ning, efficiency, and independence. dents in programs with ≥ 40% female students are sim- ilar to those given to students in programs with < 40% 3. Female students are recommended to increase female students (Section 4.1). For example, similar to their business knowledge, including general in- male students from programs with < 40% female students formation about the market and company opera- (Section 4.1), male students from programs with ≥ 40% tions. female students are also recommended to keep an eye out 4. Male students are recommended to keep an eye for mistakes (indicated by “mistak”, “distract”, “error” in out for mistakes and improve their teamwork and Table 6) and improve their teamwork and interpersonal interpersonal skills. skills (“team”, “softskill”, “listen”, “respect”). Female stu- These gender differences in feedback and recommen- dents from programs with ≥ 40% female students are dations may be due to gender differences in (a) how em- recommended to gain operational knowledge (indicated ployers perceive their students’ competencies, (b) oppor- by “oper”, “surround”, “explor”, and “compani” in Table 6 tunity, or (c) students’ abilities. and confirmed by manual inspection of the comments Gender differences in perceived competencies: containing these tokens). The same recommendations The gender differences we found are consistent with past studies that examined feedback in education and in the [32]. This masculine culture may cause female students workplace. For example, studies examining profession- to consciously or unconsciously limit their workplace als in technology, military, politics, and law found that interactions (with peers and supervisors), limiting their women were appreciated for their communal qualities access to operational knowledge. Given fewer female su- (e.g., those related to social relationships) and men were pervisors [2], female students may have found it difficult appreciated for their agentic qualities (e.g., those related to communicate within a male-dominated hierarchy. to goal achievement) [8, 9, 10, 11, 12, 13, 14]. In addition, Gender differences in ability: Biological or society- women were more often tagged as “enthusiastic”, “orga- driven differences in ability may have led to the gender nized”, and “unaware” and men as “analytical”, “depend- differences in performance evaluations reported in this able”, and “irresponsible” [9]. Studies in STEM classrooms study. Past studies found that females were more likely indicate that teachers attribute male students’ achieve- to possess both high mathematical and verbal abilities ments to their ability, and female students’ achievements and males were more likely to demonstrate higher math- to their hard work [23]. Social scientists and psychol- ematical abilities relative to their verbal abilities [28]. In ogists confirm the existence of stereotypes of men and addition, studies found that female students preferred women [26, 27]. Therefore, a possible reason behind the people-oriented roles [33], displayed more altruistic ten- gender differences we found may be the unconscious gen- dencies [1], scored higher on teamwork and interpersonal der bias of the evaluator (i.e., the work term supervisor). communication [5, 6], and outperformed male students Studies suggest that positive and negative gender at collaborative problem solving tasks [34]. stereotypes found in evaluations affect students’ self- Observation #2: There appears to be a relationship be- image and career choices [26, 28, 29, 24, 19]. Addition- tween the gender composition of academic programs and ally, experiments found that gendered language in per- the comments received by students in those programs. formance evaluations may affect hiring and promotion This is particularly noteworthy because it occurs in a field decisions [14, 9]. For example, when conducting a blind with (traditionally) pro-male ability beliefs. We found review of candidates for promotion, participants chose that in programs with < 40% female students, a higher candidates described as “good at taking initiative”. Since proportion of male students received feedback on their these (agentic) characteristics occur in the performance technical performance in comparison to female students. evaluations of men more often than women, this may The recommendations received by male students also con- lead to fewer promotion opportunities for women. Addi- tained more technical directions for improvement. On tionally, participants considered collaborative skills, and the other hand, female students were recommended to thus, female profiles, less suitable for leadership roles participate, be less shy, and ask more questions. For pro- [14]. Overall, since task-oriented qualities are more valu- grams with ≥ 40% female students, the opposite is true. able to an organization than social-oriented qualities [30], In these programs, female students receive more techni- the gender stereotypes in performance evaluations may cal feedback and recommendations, and male students give men a better chance to be hired, promoted, and more are recommended to be less reserved and speak more highly paid. openly. This trend exists across all groups of students, More female than male students leave STEM programs irrespective of overall performance scores and seniority. and careers [31, 2]. Potential reasons for this include Gender differences in technical evaluation: The sexism in teams, the masculine culture in the STEM edu- above observation is consistent with past observational cation and workplace, and dissatisfaction over pay and studies that analyzed gender differences in teacher- promotion opportunities [3]. Therefore, eliminating gen- student interaction and the feedback received by sec- der bias from early career performance reviews can help ondary school students. Some studies found that male stu- plug the “leaky” pipeline. In particular, universities of- dents received more attention and feedback, particularly fering co-op programs should communicate with partici- praise, criticism, and technical information, irrespective pating co-op employers to emphasize the importance of of the subject being taught (sports, modern languages, unbiased feedback. One problem with implicit bias is that mathematics, science, and humanities) [19, 18, 20]. How- many people are not aware that they are biased, empha- ever, this was reversed in classes that contained as many sizing the importance of diversity training for workplace or more female students [20]. Since feedback and rec- supervisors. ommendations on technical and behavioral skills are im- Gender differences in opportunity: We found that portant for co-op students [30], universities may want female students were appreciated for their adaptability to ensure that co-op evaluation forms include explicit more often than male students, indicating that perhaps requests to comment on students’ technical skills. female students were initially perceived to be more in- Studies that analyzed the performance reviews of men compatible with the company culture. Past studies sug- and women in (a) technology and professional-services gest that the masculine work and after-work culture of firms [14], (b) a leadership development program [8], and male-dominated professions make women uncomfortable (c) navy academy students [9], found more mentions of technical words in the feedback received by men than zone”, are more common in the comments received by women. These gender differences in technical feedback female students from programs with < 40% female stu- were attributed to the pro-male ability bias that exists dents. On the other hand, phrases including “surprised in these fields. However, since all of these studies inves- by performance” and “mature” are more common in the tigated samples containing less than 25% women, our comments received by male students from programs with results suggest the need for further investigation. ≥ 40% female students. Gender differences in participation: A study con- Studies of tokenism support the above observation and ducted in a secondary school reported that both male suggest that bias against a group occurs when said group and female students participated more when their own is a minority in any given field [37]. Related work on gender was the majority gender in the classroom [20]. minority groups (in terms of race and gender) presents This was found irrespective of the subject being taught. conflicting reports on whether the feedback provided to Similarly, a study where engineering students were ran- those groups is more lenient or harsh [11, 12, 38]. How- domly assigned to teams (or “micro-environments”) with ever, most studies that report gender differences in feed- varying gender composition reported similar conclusions. back note that the same trait is described more positively This study found that when female students were the mi- for men than for women [8, 11, 12, 13, 15]. Note that all nority in a team (less than 25%), they spoke less, were these studies were conducted in male-dominated profes- less involved in teamwork, and felt less confident than sions. female students assigned to teams where they were in the majority (75% or more) [35]. This was true regardless of the students’ academic seniority. Moreover, female 6. Conclusions students from male-majority teams reported lowered en- In this paper, we analyzed gender differences in early gineering career aspirations after the team interaction career workplace performance reviews. To do so, we [35]. used a unique dataset corresponding to work term evalu- Past studies attribute the reason behind this difference ations of students enrolled in engineering co-operative in participation to isolation (or social-belongingness con- programs. We used text mining methods to analyze word cerns) and stereotype threat (the concern that one will be frequency differences in employer feedback and recom- judged in terms of a stereotype) [20, 35]. Female students mendations for professional development. were more affected by the gender composition in a class- We found that male students were appreciated for tak- room, leading to recommendations to create single-sex or ing initiative more often than female students. They gender-parity micro-environments (e.g., in-class teams were described as efficient and independent and were or study groups) [35, 20]. Researchers experimenting recommended to improve their interpersonal and team- with varying proportions of male and female students work skills. On the other hand, female students were in engineering teams found that gender-balanced micro- appreciated for being thorough, hardworking, social, and environments are particularly important for first-year collaborative. They were advised to gain business knowl- students, to ensure these students do not lose confidence edge more often than male students. We also found dif- and drop out of STEM fields [35]. Gender-balanced micro- ferences in the comments received by students in male environments helped students focus on learning, partici- versus female-dominated programs. We found that in pate more freely, and in turn, gain the confidence to per- both groups of engineering programs, the majority gen- sist in gender-imbalanced environments. Another study der received more technical feedback and recommenda- found that participation in social-belonging interventions tions, and the minority gender was advised to ask more during student orientation programs improved female questions and be more confident. students’ social attitude and academic performance in Our main takeaway message is that men and women male-dominated STEM programs [36]. appear to be perceived differently in the STEM workplace Our results similarly suggest that co-op students work- from the beginning of their careers. Since reiteration of ing in environments where they are not the majority gendered feedback leads to career dissatisfaction and at- gender participate less in team activities and may need trition [24, 14, 3], our results emphasize the importance additional encouragement. As suggested by past studies, of unbiased feedback in early career settings such as gender imbalanced classrooms and workplaces may ex- co-operative internships. Moreover, since our results periment with social-belonging interventions and gender- suggest a possible link between the gender composition parity micro-environments and note their effect on stu- of the programs and the feedback received by the major- dent confidence. ity and minority gender, special attention should be paid Observation #3: Different words were used to de- to encourage minority groups. scribe the minority and the majority gender. Phrases The results presented in this paper should be inter- including “has a lot of potential”, “challenge yourself”, preted carefully since they are based on data from a sin- “allow yourself to grow”, and “come out of your comfort gle North American institution. Nevertheless, we believe [11] P. Cecchi-Dimeglio, How gender bias corrupts per- that our data-driven study is a useful starting point for formance reviews, and what to do about it, Harvard further analysis. For example, an interesting direction Business Review 12 (2017). for future work is to interview STEM alumni to deter- [12] K. Snyder, The abrasiveness trap: High-achieving mine if their co-op experiences affected their career paths. men and women are described differently in re- Furthermore, it may be useful to investigate the effect views, Fortune Magazine 26 (2014) 08–14. of the workplace supervisor’s gender on performance [13] S. J. Correll, K. R. Weisshaar, A. T. Wynn, J. D. reviews (we were unable to do this analysis because our Wehner, Inside the black box of organizational life: dataset did not include any information about workplace The gendered language of performance assessment, supervisors). American Sociological Review 85 (2020) 1022–1050. [14] R. Silverman, Gender bias at work turns up in feedback, 2015. URL: https://www.wsj.com/articles/ References gender-bias-at-work-turns-up-in-feedback-1443600759. [15] K. Brucker, N. Whitaker, Z. S. Morgan, K. Pettit, [1] S. Chopra, H. Gautreau, A. Khan, M. Mirsafian, E. Thinnes, A. M. Banta, M. M. Palmer, Exploring L. Golab, Gender differences in undergraduate en- gender bias in nursing evaluations of emergency gineering applicants: A text mining approach, in: medicine residents, Academic Emergency Medicine Proceedings of the 11th International Conference 26 (2019) 1266–1272. on Educational Data Mining, EDM 2018, Buffalo, [16] A. S. Mueller, T. M. Jenkins, M. Osborne, A. Dayal, NY, USA, July 15-18, 2018, 2018, pp. 44–54. D. M. O’Connor, V. M. Arora, Gender differences [2] A. Perreault, Analysis of the distribu- in attending physicians’ feedback to residents: a tion of gender in stem fields in canada, qualitative analysis, Journal of Graduate Medical http://wiseatlantic.ca/wp-content/uploads/ Education 9 (2017) 577–585. 2018/03/WISEReport2017_final.pdf, ???? Accessed: [17] K. Dutt, D. L. Pfaff, A. F. Bernstein, J. S. Dillard, 20th March, 2019. C. J. Block, Gender differences in recommendation [3] J. Hunt, Why do women leave science and engi- letters for postdoctoral fellowships in geoscience, neering?, ILR Review 69 (2016) 199–226. Nature Geoscience 9 (2016) 805. [4] A. Kauhanen, S. Napari, Gender differences in ca- [18] V. Nicaise, G. Cogérino, J. Bois, A. J. Amorose, Stu- reers, Annals of Economics and Statistics (2015) dents’ perceptions of teacher feedback and physical 61–88. competence in physical education classes: Gender [5] S. Chopra, A. Khan, M. Mirsafian, L. Golab, Gen- effects, Journal of teaching in Physical Education der differences in work-integrated learning assess- 25 (2006) 36–57. ments., in: Proceedings of the International Con- [19] V. Nicaise, J. E. Bois, S. J. Fairclough, A. J. Amorose, ference on Educational Data Mining (EDM), 2019, G. Cogérino, Girls’ and boys’ perceptions of phys- pp. 524–527. ical education teachers’ feedback: Effects on per- [6] S. Chopra, A. Khan, M. Mirsafian, L. Golab, Gender formance and psychological responses, Journal of differences in work-integrated learning experiences sports sciences 25 (2007) 915–926. of stem students: From applications to evaluations, [20] S. Drudy, M. Ú. Chatháin, Gender effects in class- International Journal of Work-Integrated Learning room interaction: Data collection, self-analysis and 21 (2020) 253–274. reflection, Evaluation & Research in Education 16 [7] E. D. Reilly, K. R. Rackley, G. H. Awad, Perceptions (2002) 34–50. of male and female stem aptitude: The moderating [21] P. C. Burnett, Teacher praise and feedback and stu- effect of benevolent and hostile sexism, Journal of dents’ perceptions of the classroom environment, Career Development 44 (2017) 159–173. Educational psychology 22 (2002) 5–16. [8] E. Doldor, M. Wyatt, J. Silvester, Statesmen or cheer- [22] M. G. Jones, J. Wheatley, Gender differences in leaders? using topic modeling to examine gendered teacher-student interactions in science classrooms, messages in narrative developmental feedback for Journal of research in Science Teaching 27 (1990) leaders, The Leadership Quarterly 30 (2019) 101308. 861–874. [9] D. G. Smith, J. E. Rosenstein, M. C. Nikolov, D. A. [23] J. Tiedemann, Gender-related beliefs of teachers Chaney, The power of language: Gender, status, in elementary school mathematics, Educational and agency in performance evaluations., Sex Roles studies in Mathematics 41 (2000) 191–207. 80 (2019). [24] M. Mayo, M. Kakarika, J. C. Pastor, S. Brutus, Align- [10] L. H. Keith, Visibility invisibility: Feedback bias ing or inflating your leadership self-image? a lon- in the legal profession, J. Gender Race & Just. 23 gitudinal study of responses to peer feedback in (2020) 315. mba teams, Academy of Management Learning & Education 11 (2012) 631–652. a positive bias., Journal of personality and social [25] W. B. Croft, D. Metzler, T. Strohman, Search en- psychology 74 (1998) 622. gines: Information retrieval in practice, volume 520, Addison-Wesley Reading, 2010. [26] M. E. Heilman, Gender stereotypes and workplace bias, Research in organizational Behavior 32 (2012) 113–135. [27] J. Lorber, S. A. Farrell, et al., The social construction of gender, Sage Newbury Park, CA, 1991. [28] M.-T. Wang, J. L. Degol, Gender gap in sci- ence, technology, engineering, and mathematics (stem): Current knowledge, implications for prac- tice, policy, and future directions, Educational Psy- chology Review 29 (2017) 119–140. URL: https:// doi.org/10.1007/s10648-015-9355-x. doi:10.1007/ s10648-015-9355-x. [29] N. Dasgupta, J. G. Stout, Girls and women in sci- ence, technology, engineering, and mathematics: Steming the tide and broadening participation in stem careers, Policy Insights from the Behavioral and Brain Sciences 1 (2014) 21–29. [30] R. K. Coll, K. E. Zegwaard, Perceptions of desirable graduate competencies for science and technology new graduates, Research in Science & Technologi- cal Education 24 (2006) 29–58. [31] D. Hango, Gender differences in science, technol- ogy, engineering, mathematics, and computer sci- ence (STEM) programs at university, Insights on Canadian Society (2013). [32] C. Seron, S. S. Silbey, E. Cech, B. Rubineau, Persis- tence is cultural: Professional socialization and the reproduction of sex segregation, Work and Occu- pations 43 (2016) 178–214. [33] R. Su, J. Rounds, P. I. Armstrong, Men and things, women and people: a meta-analysis of sex differ- ences in interests., Psychological bulletin 135 (2009) 859. [34] OECD, Collaborative problem solving (2017). URL: https://www.oecd-ilibrary.org/content/paper/ cdae6d2e-en. doi:https://doi.org/https:// doi.org/10.1787/cdae6d2e-en. [35] N. Dasgupta, M. M. Scircle, M. Hunsinger, Female peers in small work groups enhance women’s mo- tivation, verbal participation, and career aspira- tions in engineering, Proceedings of the National Academy of Sciences 112 (2015) 4988–4993. [36] G. M. Walton, C. Logel, J. M. Peach, S. J. Spencer, M. P. Zanna, Two brief interventions to mitigate a “chilly climate” transform women’s experience, rela- tionships, and achievement in engineering., Journal of Educational Psychology 107 (2015) 468. [37] R. M. Kanter, Some effects of proportions on group life, in: The gender gap in psychotherapy, Springer, 1977, pp. 53–78. [38] K. D. Harber, Feedback to minorities: Evidence of