=Paper= {{Paper |id=Vol-3135/darliap_paper3 |storemode=property |title=Gender Differences in Early Career Performance Reviews: a Text Mining Study |pdfUrl=https://ceur-ws.org/Vol-3135/darliap_paper3.pdf |volume=Vol-3135 |authors=Shivangi Chopra,Lukasz Golab |dblpUrl=https://dblp.org/rec/conf/edbt/ChopraG22 }} ==Gender Differences in Early Career Performance Reviews: a Text Mining Study == https://ceur-ws.org/Vol-3135/darliap_paper3.pdf
Gender Differences in Early Career Performance Reviews: a
Text Mining Study
Shivangi Chopra1 , Lukasz Golab1
1
    University of Waterloo, Canada


                                             Abstract
                                             It is well known that fewer women than men earn STEM degrees and persist in STEM careers. Since early career experiences
                                             affect career attrition, we investigate gender differences in early career performance reviews. Our analysis is enabled by a
                                             unique dataset, with nearly 6,000 performance reviews of undergraduate engineering students participating in co-operative
                                             internships. Text mining of workplace supervisor comments included in the reviews reveals several gender differences.
                                             Male students are more likely to be described as eager, efficient, and independent, whereas female students are perceived
                                             as thorough and collaborative. Moreover, male students are more likely to be asked to improve their interpersonal skills,
                                             whereas female students are more likely to receive suggestions to improve their business knowledge. Our results thus suggest
                                             that men and women are perceived differently in the STEM workplace from the beginning of their careers.

                                             Keywords
                                             gender gap in STEM, co-operative internships, text mining



1. Introduction                                                                                                       significant frequency differences between the reviews
                                                                                                                      received by male and female students.
The gender gap in Science, Technology, Engineering, and                                                                  We find that male and female students are perceived
Mathematics (STEM) is well-documented: studies show                                                                   differently by their co-op employers. Male students are
that fewer women apply to STEM programs [1], obtain                                                                   more likely to be described as eager, efficient, and in-
STEM degrees [2], and continue with STEM careers [2, 3].                                                              dependent, whereas female students are more likely to
Workplace experiences, especially early career experi-                                                                be described as thorough, dedicated, and collaborative.
ences, are known to drive career attrition [4, 3]. We                                                                 Besides, male students receive recommendations to im-
therefore ask the following research question: Are there                                                              prove interpersonal skills and female students are asked
gender differences in early career performance reviews?                                                               to improve their business knowledge. Furthermore, the
   To answer this question, we analyze workplace perfor-                                                              gender composition of the programs seemed to affect the
mance reviews of students from a large North American                                                                 feedback and recommendations received by the students.
university participating in co-operative (co-op) intern-                                                              The majority gender was more likely to receive technical
ships. Co-op programs in STEM fields have become pop-                                                                 feedback and recommendations, whereas the minority
ular worldwide, and allow students to alternate between                                                               gender was asked to work on their confidence and ask
academic study terms and work internships. For many                                                                   more questions.
students, co-op internships are the first career experi-                                                                 Our results suggest that men and women are perceived
ences in the engineering workplace.                                                                                   differently in the STEM workplace from the beginning
   The dataset we analyze consists of nearly 6,000 perfor-                                                            of their careers. Whether these gender differences are
mance reviews from the 2015/2016 academic year given                                                                  due to employer perceptions or differences in competen-
to undergraduate engineering students. Each review con-                                                               cies cannot be determined directly from our data. How-
tains two comments: 1) supervisor’s feedback on the                                                                   ever, regardless of the underlying reasons, we argue that
student’s performance, and 2) supervisor’s recommen-                                                                  universities offering co-operative programs should com-
dations for the student’s future development. Addition-                                                               municate with participating employers to emphasize the
ally each review includes the student’s gender, academic                                                              importance of unbiased feedback in talent recruitment
program, and academic level, and we are also given the                                                                and retention.
gender composition of each engineering program at the                                                                    The remainder of this paper is organized as follows.
university. We parse out the words used in these com-                                                                 Section 2 summarizes prior work on gender differences in
ments and we run statistical tests to identify words with                                                             performance reviews. Section 3 describes our dataset and
                                                                                                                      the methodology used to analyze it. Section 4 presents the
Published in the Workshop Proceedings of the EDBT/ICDT 2022 Joint                                                     results. Section 5 summarizes the findings, offers possible
Conference (March 29-April 1, 2022), Edinburgh, UK                                                                    explanations for the findings, and presents actionable
$ s9chopra@uwaterloo.ca (S. Chopra); lgolab@uwaterloo.ca
                                                                                                                      insights. Finally, Section 6 concludes the paper with
(L. Golab)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative   directions of future work.
                                       Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
2. Related Work                                               be true for girls [23]. In physical education, male students
                                                              tend to receive more attention and technical feedback
We are not aware of any previous work on gender differ- than female students [18, 19]. Studies also found that
ences in supervisor comments included in early career female students were more likely than male students to
performance reviews. However, there has been work on internalize the feedback they receive [19]. This internal-
gender differences in numeric performance scores given ization of feedback lowered their self-efficacy beliefs and
to student interns [5, 6, 7]. The results, however, are performance [18, 24]. Our study analyzes early career
inconclusive. A study where technology professionals experiences of STEM students to understand if similar
rated hypothetical interns on competence, intelligence, differences in feedback persist.
and potential field issues found that men are rated more
highly than women [7]. Another study that analyzed
evaluations from co-op employers found that female stu- 3. Data and Methods
dents are rated more highly (than male students) on over-
all performance as well as on specific criteria including 3.1. Data
communication, teamwork, and quality of work [5, 6].
                                                              We analyze three semesters of work performance eval-
   More broadly, in the context of postgraduate employ-
                                                              uations, from September 2015 to August 2016, collected
ment, gender differences in employee (or peer) evalua-
                                                              by a large North American university. The dataset con-
tions have been studied in various fields, including tech-
                                                              sists of 5,708 workplace performance reviews of students
nology, the military, politics, law, sports, and medicine
                                                              enrolled in undergraduate engineering co-operative pro-
[8, 9, 10, 11, 12, 13, 14, 15, 5, 16]. The evaluations un-
                                                              grams. Each review was completed at the end of a four-
der study were either numeric (ratings), categorical (tags
                                                              month internship (in the remainder of this paper, we use
chosen from a predefined list of attributes), or textual.
                                                              the terms ‘internship’ and ‘work term’ interchangeably).
The reported findings are consistent across industries:
                                                              As part of the evaluation, students receive an overall
men receive more actionable and task-oriented feedback
                                                              performance rating that indicates whether the student
and women receive more critical and personality-related
                                                              exceeded, matched, or did not meet the employer’s expec-
feedback.
                                                              tations. Hence, we divide students into three categories:
   Among studies that analyzed gender differences in
                                                              above-average, average, and below-average. Along with
written performance reviews, we found only one that
                                                              this overall evaluation rating, the student’s supervisor
used text mining methods (topic modeling) [8]. This
                                                              was required to submit short free-text responses to the
paper studied gender differences in the leadership rep-
                                                              following questions:
resentation of 146 political leaders by analyzing 1057
comments they received from their colleagues. Other                1. Feedback: Please comment on the student’s over-
studies that analyzed comments in performance reviews                  all job performance in terms of their behavioral
either conducted a qualitative analysis or manually coded              and developmental performance and expectations
the language of the reviews [11, 12, 10, 13, 14, 15, 16]. In           with respect to output, quality standards, delivery
those studies, researchers read the comments and coded                 of goals and assignments.
them according to various parameters, including tone,              2. Recommendations: Please provide your recom-
valence, and skills discussed (technical, communal, agen-              mendations for the student’s personal and pro-
tic, and others). However, one drawback of these studies               fessional development (optional). 42% of the per-
is the small data size (under 300 performance reviews).                formance reviews have a non-blank recommen-
   Lastly, we discuss research on gender differences in                dation.
academic performance reviews. An analysis of 1,224
                                                                 Along with this end-of-term performance review, our
recommendation letters for postdoctoral fellows in geo-
                                                              dataset contains the following information about each
science found that female applicants were only half as
                                                              student:
likely to receive excellent versus good letters compared to
male applicants [17]. These recommendation letters were            1. Gender: male or female,
manually coded in terms of the letter tone and length.             2. Academic program: one of the 13 engineering
   We found no studies in elementary, primary, or sec-                 programs listed in Table 1, which also shows the
ondary education that analyzed gender differences in                   gender distribution of each program, sorted by
written performance feedback. However, some studies                    percentage of male students.
analyzed gender differences in teacher-student interac-            3. Seniority: measured in terms of the number of
tion (i.e., verbal feedback) [18, 19, 20, 21, 22]. Studies in          work terms completed: junior students are those
STEM classrooms found that teachers tend to attribute                  who have completed zero or one work terms, and
boys’ success in STEM to ability and boys’ failures in                 senior students are those who have completed at
STEM to lack of effort, while the opposite is believed to              least four work terms (out of a maximum of six).
The dataset does not include information about the job             Table 1
(for example, job title, company, and location) or the             Gender breakdown by program
evaluator (for example, position or gender).
   We report results for two groups of students: those                             Program     %Male   %Female
from programs with less than 40% female students (the                            Computer        88%    12%
first nine in Table 1), and those from programs with                            Mechanical       87%    13%
greater than or equal to 40% female students (the last four                   Mechatronics       86%    14%
in Table 1)1 . Table 2 shows the proportions of students                          Electrical     83%    17%
in programs with < 40% and ≥ 40% female students and                               Software      82%    18%
the proportions of students within each group evaluated                     Nanotechnology       75%    25%
as below-average, average, and above-average. The table                          Geological      70%    30%
also shows the proportion of male and female students                                  Civil     67%    33%
within each group.                                                           System Design       67%    33%
                                                                                  Chemical       60%    40%
                                                                              Management         58%    42%
3.2. Methods                                                                 Environmental       41%    59%
                                                                                Biomedical       41%    59%
The goal of this paper is to understand gender differences
in written reviews received by student interns. Since                                Total    77%        23%
these comments have a free-text format, we implemented
a parser in Python to convert each comment to a set of
standardized word forms (referred to as “words”, “tokens”, analysis to identify words that are more frequently
or “terms” in the remainder of the paper). The parser used for male students than for female students, and
consists of the following standard text mining steps [25]: vice versa. We report differences that are statistically
                                                              significant at a p-value of 0.05 (when using a two-tailed
     1. The text is converted to lower case.
                                                              two proportion z-test) and have a statistical power
     2. Stopwords, which are words that serve a gram- greater than 80%. In addition, for each difference, we
        matical purpose but do not contain any meaning- report the odds ratio (OR), calculated according to the
        ful information, such as “and”, “the” and “is”, are formula below. The OR indicates the strength (or size) of
        removed. Words common in the co-op internship the difference and can be interpreted as follows. Suppose
        context, including “workterm”, “university and the odds ratio of token W is 1.5. This means that token
        “co-op”, are also removed.                            W is 1.5 times more likely to occur in Group A (for
     3. Various forms of certain words and phrases are example, male students) than Group B (for example,
        converted to a common form using regular ex- female students).
        pression matching2 (e.g., occurrences of “inter-
        personal”, “interpersonal”, and “interpersonal”          𝑂𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜 𝑓 𝑜𝑟 𝑇 𝑜𝑘𝑒𝑛 𝑊 𝑖𝑛 𝐺𝑟𝑜𝑢𝑝 𝐴 𝑣𝑒𝑟𝑠𝑢𝑠 𝐵 =
        are converted to “interpersonal”, and “hard work”
        and “hardwork” are converted to “hardwork”).                  # 𝑜𝑓 𝑟𝑒𝑣𝑖𝑒𝑤𝑠 𝑖𝑛 𝐺𝑟𝑜𝑢𝑝 𝐴 𝑡ℎ𝑎𝑡 𝑚𝑒𝑛𝑡𝑖𝑜𝑛 𝑊
     4. Special characters, digits, and punctuation are           # 𝑜𝑓 𝑟𝑒𝑣𝑖𝑒𝑤𝑠 𝑖𝑛 𝐺𝑟𝑜𝑢𝑝 𝐴 𝑡ℎ𝑎𝑡 𝑑𝑜 𝑛𝑜𝑡 𝑚𝑒𝑛𝑡𝑖𝑜𝑛 𝑊
        replaced by white space.                                      # 𝑜𝑓 𝑟𝑒𝑣𝑖𝑒𝑤𝑠 𝑖𝑛 𝐺𝑟𝑜𝑢𝑝 𝐵 𝑡ℎ𝑎𝑡 𝑚𝑒𝑛𝑡𝑖𝑜𝑛 𝑊
     5. Finally, the text is tokenized by white space and         # 𝑜𝑓 𝑟𝑒𝑣𝑖𝑒𝑤𝑠 𝑖𝑛 𝐺𝑟𝑜𝑢𝑝 𝐵 𝑡ℎ𝑎𝑡 𝑑𝑜 𝑛𝑜𝑡 𝑚𝑒𝑛𝑡𝑖𝑜𝑛 𝑊
                                                            3
        stemmed using the NLTK snowball stemmer .
        Stemming converts words with common mean-                We separately report significant gender differences
        ings but different endings to a common stem. For in the feedback and recommendations received by stu-
        example, the words “efficient”, “efficiently”, and dents from programs with < 40% female students and
        “efficiency” are converted to “effici”, and “expect”, programs with ≥ 40% female students. The analysis is
        “expected”, and “expectation” are converted to “ex- repeated for students with different overall performance
        pect”.                                                ratings (above-average, average and below-average), and
                                                              seniority levels. To avoid overfitting, we ensure that each
   Then, for each supervisor comment (feedback and
                                                              group has more than 100 non-blank comments. Common
recommendations), we conduct a term frequency
                                                              English words with significant differences are excluded
    1
      We also analyzed the comments received by students from from the report for brevity.
each program separately. However, we observed that the comments
received by students in the two groups mentioned above displayed
similar trends. Thus, we omit per-program results for brevity.
    2
      https://docs.python.org/3/library/re.html
    3
      https://www.nltk.org/_modules/nltk/stem/snowball.html
Table 2                                                           Feedback received by male students contains more
Groups based on performance evaluation level                   technical terms. Table 3 shows that words relating to
                                                               technical tasks, including “code”, “tool”, “written”, “hard-
                Programs with < 40% Programs with ≥ 40%        war”, “machin”, and “analyz”, are more frequent in the
                  Female students     Female students          feedback received by male students. Supervisors of male
      All        86% (82%M, 18%F)      14% (56%M, 44%F)        students are four times more likely to refer to them as an
                                                               “expert”. On the other hand, feedback received by female
Above-average 32% (82%M, 18%F)         28% (61%M, 39%F)        students mentions their general ability (“profici” in Ta-
   Average    47% (80%M, 20%F)         49% (49%M, 51%F)
                                                               ble 3). This gender difference in the amount of technical
Below-average 21% (86%M, 14%F)         23% (61%M, 39%F)
                                                               feedback received exists in all groups with < 40% female
                                                               students, irrespective of program, overall evaluation rat-
                                                               ing, or seniority.
4. Results                                                        Feedback received by male students contains more
                                                               mentions of the word “eager”. Manual inspection of the
We now describe the results, treating students in pro-
                                                               comments containing the token “eager” revealed that
grams with less than 40% female students and those in
                                                               these students suggest new ideas and take the initiative
programs with greater than or equal to 40% female stu-
                                                               to start new tasks. In addition, male students receive feed-
dents separately, as mentioned in Section 3.1. Section 4.1
                                                               back on their efficiency and planning (indicated by words
presents word frequency differences in the feedback and
                                                               such as “effici”, “priori”, “deadlin”, “iter”, and “tackl”).
recommendations received by male and female students
                                                                  Table 3 shows that the words “fulltim” and “ecoop”
enrolled in programs with < 40% female students. Sec-
                                                               occur more frequently in the feedback received by male
tion 4.2 presents gender differences in word frequencies
                                                               students. The token “fulltim” indicates that the employer
in feedback and recommendations received by students
                                                               has extended a full-time job offer to the student. The
enrolled in programs with ≥ 40% female students.
                                                               token “ecoop” refers to a program established by the
                                                               university under study to allow students to work in their
4.1. Gender Differences in Programs with                       own company (i.e., their start-up) for a co-op work term.
     < 40% Female Students                                     Table 3 shows that the token “ecoop” is mentioned in the
                                                               feedback for 1% of male students and no female students.
4.1.1. Feedback
                                                                  Feedback received by female students contains more
Table 3 shows the differences in token frequencies in the      references to their teamwork and interpersonal skills
feedback received by male and female students. On the          (indicated by words such as “help”, “collabor”, “delight”,
left, Table 3 shows tokens that are mentioned statistically    “wonder”, and “joy” in Table 3). In addition, female stu-
significantly more frequently in the feedback received         dents receive more feedback on their thoroughness (in-
by male students. On the right, Table 3 shows the tokens       dicated by words such as “attentiontodetail”, and “thor-
mentioned significantly more frequently in the feedback        ough” in Table 3), dedication (“dedic”, “enthusiast”), and
received by female students.                                   adaptability (the token “adapt” is mentioned in the feed-
   The lists are sorted by the difference in frequencies,      back received by female students 3.7 times more often
abbreviated ∆, computed as the percentage of male (or          than in the feedback received by male students).
female) students whose feedback mentioned a token mi-             Some tokens in Table 3 indicate that male and female
nus the percentage of female (or male) students whose          students are referred to differently by their employers.
feedback mentioned this token. For example, feedback           Manual inspection of the comments containing the word
received by male students contained “code” 4% more of-         “addition” indicates that female students are referred to
ten than feedback received by female students. Asterisks       as a “good addition to the team/company”. Manual in-
indicate the strength of the statistical significance of the   spection of the comments containing the word “potenti”
difference, with all reported differences having a p-value     indicates that the word is generally used in the context
of at least 0.05. In addition, Table 3 mentions the odds       of “has a lot of potential”, and the word “demand” is used
ratio for each difference. For example, employers are 1.6      to describe a student’s ability to cope with a demanding
times more likely to describe female students as “thor-        work environment. These tokens are found more often
ough” than male students.                                      in the feedback received by female students.
   Even though some of the differences shown in Table 3           Gender differences in the feedback received by stu-
appear small in magnitude, they are statistically signifi-     dents with different overall performance ratings and se-
cant at a p-value of 0.05, have statistical power greater      niority levels follow the same trends as above. We omit
than 80%, and have an odds ratio greater than one (men-        the details for brevity.
tioned in Section 3.2).
Table 3
Word frequency differences in feedback received by male and female students enrolled in Programs with < 40% female students

       Token       Male      Female       Δ         OR                  Token             Female      Male          Δ       OR
         code      14%        10%       4%***      1.51                   help                25%        20%     5%***      1.36
         tool       7%         4%       3%**       1.62                   dedic                9%         5%     4%***      1.84
       fulltim     7%         5%         2%*        1.5           attentiontodetail            7%         4%     3%***      1.94
        eager       2%         0%        2%*       2.88                 collabor               5%         3%     2%**       1.86
      written       3%         1%        2%*       2.55                thorough                6%         4%     2%**       1.63
       prioriti     3%         1%       2%**       2.54               enthusiast               5%         3%     2%**       1.79
        effici     2%         0%         2%*       6.29                 addition               5%         3%     2%**       1.75
      hardwar       2%         1%        1%*       2.55                  profici               3%         1%     2%***      2.2
      machin        2%         1%        1%*       3.07                  delight               3%         1%     2%**       2.09
       analyz       1%         0%        1%*       4.23                 demand                 2%         1%     1%**       2.17
       expert       1%         0%        1%*       4.16              timemanag                 2%         1%     1%***      3.3
      deadlin       1%         0%        1%*       6.74                 wonder                 2%         1%     1%***      2.86
          iter     1%         0%         1%*       6.74                  adapt                 1%         0%     1%***      3.68
        ecoop       1%         0%        1%*        inf                    joy                 1%         0%     1%**       3.01
        tackl       3%         2%        1%*       1.85                 potenti               1%         0%      1%***       inf

                                         Note. ***: p < .001; **: p < .01; *: p < .05

Table 4
Word frequency differences in recommendations received by male and female students enrolled in Programs with < 40% female
students

        Token        Male     Female        Δ        OR                Token            Female      Male        Δ        OR
         solut        8%         4%       4%**      2.15               allow             8%         3%         5%***      2.88
         seek         4%         1%       3%**      3.91              express            4%         0%         4%**        inf
       system         4%         1%       3%**      3.21             network             4%         0%         4%***       inf
         read         3%         1%       2%*       3.27                oper             5%         1%         4%**      3.13
     architectur      3%         1%       2%*       4.75            encourag             7%         5%         4%**      1.55
      maintain        3%         1%       2%*       4.41             challeng            9%         5%         4%**       1.89
        mistak        2%         0%       2%*        inf           askquestion           9%         5%         4%**      1.93
        attent        3%         1%       2%*       3.91             general             4%         1%         3%**       3.53
         web          1%         0%       1%*        inf              varieti            3%         0%         3%***       inf
      algorithm       1%         0%       1%*        inf               afraid            3%         0%         3%**      3.01
         help         1%         0%       1%*        inf                 shi             3%         0%         3%***     17.62
      cooperat        1%         0%       1%*        inf               explor            4%         1%          3%*      4.05
       opinion        1%         0%       1%*        inf              market             3%         1%         2%***     3.34
         hear         1%         0%       1%*        inf                 tell            1%         0%         1%***       7.2
       distract       1%         0%       1%*        inf           comfortzon            1%         0%         1%***     3.62

                                         Note. ***: p < .001; **: p < .01; *: p < .05


4.1.2. Recommendations                                          “solution”), “system”, “read”, “architectur”, “maintain”,
                                                                “web”, and “algorithm”. In addition, male students are rec-
Table 4 follows the same format as Table 3 and shows the
                                                                ommended to be more attentive to mistakes (indicated
differences in token frequencies in the recommendations
                                                                by the tokens “attent” and “mistak” in Table 4) and im-
received by male and female students. Again, gender
                                                                prove their teamwork and interpersonal skills (indicated
differences in the recommendations received by students
                                                                by “help”, “cooperat”, “opinion”, and “hear”).
with different overall performance ratings and different
                                                                   On the other hand, female students are recommended
seniority levels showed similar trends and are not shown
                                                                to “express” themselves, to “network”, to not be “afraid”
for brevity.
                                                                or “shy”, and to ask more questions (see Table 4). The
   Tokens in Table 4 suggest that male students receive
                                                                recommendations received by female students contains
more recommendations related to technical skills. This
                                                                more mentions of the tokens “oper”, “general”, “varieti”,
is suggested by words such as “solut” (stem of the word
Table 5
Word frequency differences in feedback received by male and female students enrolled in Programs with ≥ 40% female
students

      Token         Male      Female       Δ         OR                Token           Female   Male       Δ        OR
         abil        22%       14%       8%**       1.71            hardwork            13%      6%       7%**      2.25
    understand       20%       12%       8%**       1.81                team            7%       3%       4%**      2.86
   littlesupervis     9%        3%       6%***      3.01               applic            6%      2%       4%**      2.89
        effici       11%        6%        5%*       2.04              execut            3%       0%       3%*        inf
         initi        7%        2%       5%**        3.6                 user           3%       0%       3%*        inf
        pictur        4%        0%        4%*        inf              technic            3%      0%       3%**      7.09
       surpris        5%        1%       4%**       4.35           comprehens            2%      0%       2%**       inf
        devic         3%        0%        3%*        inf           writtencomm           2%      0%       2%*        inf
        matur         3%        0%        3%*        inf             expertis           2%       0%       2%*       8.93
       prioriti       3%        0%        3%*        inf               smart            1%       0%       1%*        inf
      newtask         3%        0%       3%**       10.67              stack            1%       0%       1%*        inf
       growth         2%        0%        2%*        inf               legaci            1%      0%       1%*        inf
      difficulti      1%        0%        1%*        inf                style           1%       0%       1%*        inf
       persist        1%        0%        1%*        inf                 joy            1%       0%       1%*        inf
        ecoop         1%        0%        1%*        inf                read            1%       0%       1%*        inf

                                        Note. ***: p < .001; **: p < .01; *: p < .05


“explor”, and “market” (see Table 4). Manual inspection        references their “ability”. This is in contrast to the results
of comments containing these tokens reveals that female        presented in Section 4.1, where male students received
students receive more recommendations to explore and           more technical feedback than female students.
increase their variety of knowledge, especially about busi-       Nevertheless, some of the feedback received by male
ness operations.                                               students is similar to the feedback received by male stu-
   Table 4 indicates that recommendations received by          dents from programs with < 40% female students (Sec-
female students contained more occurrences of the words        tion 4.1). Male students are more likely to receive feed-
“allow”, “encourag”, “challeng”, and “comfortzon”. Manual      back on their eagerness to start new tasks (suggested by
inspection of comments containing these tokens suggests        the tokens “newtask” and “initi” in Table 5, where “initi”
that female students were encouraged to challenge them-        is the word stem for “initiate” and “initiative”). They are
selves and leave their comfort zones more often than           also more likely to receive feedback on their planning and
male students.                                                 efficiency (“effic”, “pictur”, “prioriti”). The token “little-
                                                               supervis” in Table 5 indicates that supervisors find male
4.2. Gender Differences in Programs with                       students to be more independent than female students.
                                                                  Table 5 indicates that female students received more
     ≥ 40% Female Students                                     feedback on their hard work, thoroughness (“compre-
Tables 5 and 6 list the differences in word frequencies        hens”, which is the word stem for “comprehensive”),
in the feedback and recommendations, respectively, re-         teamwork, and interpersonal skills. Female students from
ceived by students enrolled in programs with ≥ 40%             programs with < 40% female students received similar
female students. These tables follow the same format           feedback from their employers (see Section 4.1).
as Tables 3 and 4. Again, we omit gender differences in           Feedback given to male students contains more men-
groups based on overall performance ratings and senior-        tions of the words “surpris”, “growth”, “persist”, “diffi-
ity, which show similar trends.                                culti”, and “matur” (see Table 5). Manual inspection of
                                                               comments containing these terms revealed that these
4.2.1. Feedback                                                employers were pleasantly surprised to see the students’
                                                               growth, persistence, and maturity.
Table 5 indicates that comments received by female stu-           Finally, similar to programs with < 40% female students
dents are more related to technical performance (sug-          (Section 4.1), the token “ecoop” is mentioned for 1% of
gested by tokens such as “applic”, “execut”, “user”, “tech-    male students and no female students.
nic”, “writtencomm”, “stack”, and “read”). In addition,
tokens such as “expertis” and “legaci” are found more
frequently in the feedback received by female students.
On the other hand, feedback received by male students
Table 6
Word frequency differences in recommendations received by male and female students enrolled in Programs with ≥ 40%
female students

        Token       Male      Female        Δ       OR                  Token            Female   Male    Δ        OR
           say       4%         0%        4%*       inf                  oper             5%      1%      4%*     8.81
        mistak       3%         0%        3%*       inf                 creativ           5%      1%      4%*     8.81
         reserv      3%         0%        3%*       inf              surround             4%      0%      4%*      inf
          team       3%         0%        3%*       inf              knowledg             4%      0%      4%*      inf
         public      3%         0%        3%*       inf                instinct           3%      0%      3%*      inf
         speak       3%         0%        3%*       inf                  quick            3%      0%      3%*      inf
          open       3%         0%        3%*       inf               generat             3%      0%      3%*      inf
        expect       2%         0%        2%*       inf               difficult           3%      0%      3%*      inf
       distract      2%         0%        2%*       inf                system             3%      0%      3%*      inf
          error      2%         0%        2%*       inf                  learn            3%      1%     2%***    19.33
          topic      2%         0%        2%*       inf              document             2%      0%      2%*      inf
        softskil     2%         0%        2%*       inf                 explor            2%      0%      2%*      inf
         listen      2%         0%        2%*       inf               interest            1%      0%      1%*      inf
        respect      2%         0%        2%*       inf               compani             1%      0%     1%**     2.02
       complex       2%         0%        2%**      inf                   deal            1%      0%     1%***    4.97

                                          Note. ***: p < .001; **: p < .01; *: p < .05


4.2.2. Recommendations                                           were received by female students from programs with <
                                                                 40% female students.
Table 6 indicates that male students are referred to as
“reserved” and are recommended to “speak” (suggested
by tokens such as “reserv”, “say”, “public”, “speak”, and         5. Discussion
“open”). This is in contrast to the results reported in
Section 4.1, where female students were recommended              The main findings of this study and their significance are
to ask more questions.                                           as follows.
   Table 6 also indicates that female students receive more         Observation #1: We found the following gender dif-
technical recommendations than male students. Tokens             ferences in all groups of students, irrespective of the
such as “creativ”, “knowledg”, “generate”, “system”, “in-        overall performance rating, seniority, and the gender
terest”, “document”, and “learn”, are more common in the         composition of their academic programs.
recommendations received by female students. On the
                                                                       1. Female students are more likely than male stu-
other hand, recommendations received by male students
                                                                          dents to be appreciated for their thoroughness,
contain more occurrences of the tokens “topic” and “com-
                                                                          dedication, enthusiasm, hard work, adaptability,
plex”. Again, this is in contrast to the results shown in
                                                                          teamwork, and interpersonal skills.
Section 4.1, where male students received more technical
                                                                       2. Male students are more likely than female stu-
recommendations.
                                                                          dents to be appreciated for their eagerness, plan-
   Nevertheless, some recommendations given to stu-
                                                                          ning, efficiency, and independence.
dents in programs with ≥ 40% female students are sim-
ilar to those given to students in programs with < 40%                 3. Female students are recommended to increase
female students (Section 4.1). For example, similar to                    their business knowledge, including general in-
male students from programs with < 40% female students                    formation about the market and company opera-
(Section 4.1), male students from programs with ≥ 40%                     tions.
female students are also recommended to keep an eye out                4. Male students are recommended to keep an eye
for mistakes (indicated by “mistak”, “distract”, “error” in               out for mistakes and improve their teamwork and
Table 6) and improve their teamwork and interpersonal                     interpersonal skills.
skills (“team”, “softskill”, “listen”, “respect”). Female stu-     These gender differences in feedback and recommen-
dents from programs with ≥ 40% female students are               dations may be due to gender differences in (a) how em-
recommended to gain operational knowledge (indicated             ployers perceive their students’ competencies, (b) oppor-
by “oper”, “surround”, “explor”, and “compani” in Table 6        tunity, or (c) students’ abilities.
and confirmed by manual inspection of the comments                 Gender differences in perceived competencies:
containing these tokens). The same recommendations               The gender differences we found are consistent with past
studies that examined feedback in education and in the          [32]. This masculine culture may cause female students
workplace. For example, studies examining profession-           to consciously or unconsciously limit their workplace
als in technology, military, politics, and law found that       interactions (with peers and supervisors), limiting their
women were appreciated for their communal qualities             access to operational knowledge. Given fewer female su-
(e.g., those related to social relationships) and men were      pervisors [2], female students may have found it difficult
appreciated for their agentic qualities (e.g., those related    to communicate within a male-dominated hierarchy.
to goal achievement) [8, 9, 10, 11, 12, 13, 14]. In addition,      Gender differences in ability: Biological or society-
women were more often tagged as “enthusiastic”, “orga-          driven differences in ability may have led to the gender
nized”, and “unaware” and men as “analytical”, “depend-         differences in performance evaluations reported in this
able”, and “irresponsible” [9]. Studies in STEM classrooms      study. Past studies found that females were more likely
indicate that teachers attribute male students’ achieve-        to possess both high mathematical and verbal abilities
ments to their ability, and female students’ achievements       and males were more likely to demonstrate higher math-
to their hard work [23]. Social scientists and psychol-         ematical abilities relative to their verbal abilities [28]. In
ogists confirm the existence of stereotypes of men and          addition, studies found that female students preferred
women [26, 27]. Therefore, a possible reason behind the         people-oriented roles [33], displayed more altruistic ten-
gender differences we found may be the unconscious gen-         dencies [1], scored higher on teamwork and interpersonal
der bias of the evaluator (i.e., the work term supervisor).     communication [5, 6], and outperformed male students
   Studies suggest that positive and negative gender            at collaborative problem solving tasks [34].
stereotypes found in evaluations affect students’ self-            Observation #2: There appears to be a relationship be-
image and career choices [26, 28, 29, 24, 19]. Addition-        tween the gender composition of academic programs and
ally, experiments found that gendered language in per-          the comments received by students in those programs.
formance evaluations may affect hiring and promotion            This is particularly noteworthy because it occurs in a field
decisions [14, 9]. For example, when conducting a blind         with (traditionally) pro-male ability beliefs. We found
review of candidates for promotion, participants chose          that in programs with < 40% female students, a higher
candidates described as “good at taking initiative”. Since      proportion of male students received feedback on their
these (agentic) characteristics occur in the performance        technical performance in comparison to female students.
evaluations of men more often than women, this may              The recommendations received by male students also con-
lead to fewer promotion opportunities for women. Addi-          tained more technical directions for improvement. On
tionally, participants considered collaborative skills, and     the other hand, female students were recommended to
thus, female profiles, less suitable for leadership roles       participate, be less shy, and ask more questions. For pro-
[14]. Overall, since task-oriented qualities are more valu-     grams with ≥ 40% female students, the opposite is true.
able to an organization than social-oriented qualities [30],    In these programs, female students receive more techni-
the gender stereotypes in performance evaluations may           cal feedback and recommendations, and male students
give men a better chance to be hired, promoted, and more        are recommended to be less reserved and speak more
highly paid.                                                    openly. This trend exists across all groups of students,
   More female than male students leave STEM programs           irrespective of overall performance scores and seniority.
and careers [31, 2]. Potential reasons for this include            Gender differences in technical evaluation: The
sexism in teams, the masculine culture in the STEM edu-         above observation is consistent with past observational
cation and workplace, and dissatisfaction over pay and          studies that analyzed gender differences in teacher-
promotion opportunities [3]. Therefore, eliminating gen-        student interaction and the feedback received by sec-
der bias from early career performance reviews can help         ondary school students. Some studies found that male stu-
plug the “leaky” pipeline. In particular, universities of-      dents received more attention and feedback, particularly
fering co-op programs should communicate with partici-          praise, criticism, and technical information, irrespective
pating co-op employers to emphasize the importance of           of the subject being taught (sports, modern languages,
unbiased feedback. One problem with implicit bias is that       mathematics, science, and humanities) [19, 18, 20]. How-
many people are not aware that they are biased, empha-          ever, this was reversed in classes that contained as many
sizing the importance of diversity training for workplace       or more female students [20]. Since feedback and rec-
supervisors.                                                    ommendations on technical and behavioral skills are im-
   Gender differences in opportunity: We found that             portant for co-op students [30], universities may want
female students were appreciated for their adaptability         to ensure that co-op evaluation forms include explicit
more often than male students, indicating that perhaps          requests to comment on students’ technical skills.
female students were initially perceived to be more in-            Studies that analyzed the performance reviews of men
compatible with the company culture. Past studies sug-          and women in (a) technology and professional-services
gest that the masculine work and after-work culture of          firms [14], (b) a leadership development program [8], and
male-dominated professions make women uncomfortable             (c) navy academy students [9], found more mentions of
technical words in the feedback received by men than          zone”, are more common in the comments received by
women. These gender differences in technical feedback         female students from programs with < 40% female stu-
were attributed to the pro-male ability bias that exists      dents. On the other hand, phrases including “surprised
in these fields. However, since all of these studies inves-   by performance” and “mature” are more common in the
tigated samples containing less than 25% women, our           comments received by male students from programs with
results suggest the need for further investigation.           ≥ 40% female students.
   Gender differences in participation: A study con-             Studies of tokenism support the above observation and
ducted in a secondary school reported that both male          suggest that bias against a group occurs when said group
and female students participated more when their own          is a minority in any given field [37]. Related work on
gender was the majority gender in the classroom [20].         minority groups (in terms of race and gender) presents
This was found irrespective of the subject being taught.      conflicting reports on whether the feedback provided to
Similarly, a study where engineering students were ran-       those groups is more lenient or harsh [11, 12, 38]. How-
domly assigned to teams (or “micro-environments”) with        ever, most studies that report gender differences in feed-
varying gender composition reported similar conclusions.      back note that the same trait is described more positively
This study found that when female students were the mi-       for men than for women [8, 11, 12, 13, 15]. Note that all
nority in a team (less than 25%), they spoke less, were       these studies were conducted in male-dominated profes-
less involved in teamwork, and felt less confident than       sions.
female students assigned to teams where they were in
the majority (75% or more) [35]. This was true regardless
of the students’ academic seniority. Moreover, female         6. Conclusions
students from male-majority teams reported lowered en-
                                                              In this paper, we analyzed gender differences in early
gineering career aspirations after the team interaction
                                                              career workplace performance reviews. To do so, we
[35].
                                                              used a unique dataset corresponding to work term evalu-
   Past studies attribute the reason behind this difference
                                                              ations of students enrolled in engineering co-operative
in participation to isolation (or social-belongingness con-
                                                              programs. We used text mining methods to analyze word
cerns) and stereotype threat (the concern that one will be
                                                              frequency differences in employer feedback and recom-
judged in terms of a stereotype) [20, 35]. Female students
                                                              mendations for professional development.
were more affected by the gender composition in a class-
                                                                 We found that male students were appreciated for tak-
room, leading to recommendations to create single-sex or
                                                              ing initiative more often than female students. They
gender-parity micro-environments (e.g., in-class teams
                                                              were described as efficient and independent and were
or study groups) [35, 20]. Researchers experimenting
                                                              recommended to improve their interpersonal and team-
with varying proportions of male and female students
                                                              work skills. On the other hand, female students were
in engineering teams found that gender-balanced micro-
                                                              appreciated for being thorough, hardworking, social, and
environments are particularly important for first-year
                                                              collaborative. They were advised to gain business knowl-
students, to ensure these students do not lose confidence
                                                              edge more often than male students. We also found dif-
and drop out of STEM fields [35]. Gender-balanced micro-
                                                              ferences in the comments received by students in male
environments helped students focus on learning, partici-
                                                              versus female-dominated programs. We found that in
pate more freely, and in turn, gain the confidence to per-
                                                              both groups of engineering programs, the majority gen-
sist in gender-imbalanced environments. Another study
                                                              der received more technical feedback and recommenda-
found that participation in social-belonging interventions
                                                              tions, and the minority gender was advised to ask more
during student orientation programs improved female
                                                              questions and be more confident.
students’ social attitude and academic performance in
                                                                 Our main takeaway message is that men and women
male-dominated STEM programs [36].
                                                              appear to be perceived differently in the STEM workplace
   Our results similarly suggest that co-op students work-
                                                              from the beginning of their careers. Since reiteration of
ing in environments where they are not the majority
                                                              gendered feedback leads to career dissatisfaction and at-
gender participate less in team activities and may need
                                                              trition [24, 14, 3], our results emphasize the importance
additional encouragement. As suggested by past studies,
                                                              of unbiased feedback in early career settings such as
gender imbalanced classrooms and workplaces may ex-
                                                              co-operative internships. Moreover, since our results
periment with social-belonging interventions and gender-
                                                              suggest a possible link between the gender composition
parity micro-environments and note their effect on stu-
                                                              of the programs and the feedback received by the major-
dent confidence.
                                                              ity and minority gender, special attention should be paid
   Observation #3: Different words were used to de-
                                                              to encourage minority groups.
scribe the minority and the majority gender. Phrases
                                                                 The results presented in this paper should be inter-
including “has a lot of potential”, “challenge yourself”,
                                                              preted carefully since they are based on data from a sin-
“allow yourself to grow”, and “come out of your comfort
gle North American institution. Nevertheless, we believe [11] P. Cecchi-Dimeglio, How gender bias corrupts per-
that our data-driven study is a useful starting point for         formance reviews, and what to do about it, Harvard
further analysis. For example, an interesting direction           Business Review 12 (2017).
for future work is to interview STEM alumni to deter- [12] K. Snyder, The abrasiveness trap: High-achieving
mine if their co-op experiences affected their career paths.      men and women are described differently in re-
Furthermore, it may be useful to investigate the effect           views, Fortune Magazine 26 (2014) 08–14.
of the workplace supervisor’s gender on performance [13] S. J. Correll, K. R. Weisshaar, A. T. Wynn, J. D.
reviews (we were unable to do this analysis because our           Wehner, Inside the black box of organizational life:
dataset did not include any information about workplace           The gendered language of performance assessment,
supervisors).                                                     American Sociological Review 85 (2020) 1022–1050.
                                                             [14] R. Silverman, Gender bias at work turns up in
                                                                  feedback, 2015. URL: https://www.wsj.com/articles/
References                                                        gender-bias-at-work-turns-up-in-feedback-1443600759.
                                                             [15] K. Brucker, N. Whitaker, Z. S. Morgan, K. Pettit,
 [1] S. Chopra, H. Gautreau, A. Khan, M. Mirsafian,
                                                                  E. Thinnes, A. M. Banta, M. M. Palmer, Exploring
      L. Golab, Gender differences in undergraduate en-
                                                                  gender bias in nursing evaluations of emergency
      gineering applicants: A text mining approach, in:
                                                                  medicine residents, Academic Emergency Medicine
      Proceedings of the 11th International Conference
                                                                  26 (2019) 1266–1272.
      on Educational Data Mining, EDM 2018, Buffalo,
                                                             [16] A. S. Mueller, T. M. Jenkins, M. Osborne, A. Dayal,
      NY, USA, July 15-18, 2018, 2018, pp. 44–54.
                                                                  D. M. O’Connor, V. M. Arora, Gender differences
 [2] A. Perreault, Analysis of the distribu-
                                                                  in attending physicians’ feedback to residents: a
      tion of gender in stem fields in canada,
                                                                  qualitative analysis, Journal of Graduate Medical
      http://wiseatlantic.ca/wp-content/uploads/
                                                                  Education 9 (2017) 577–585.
      2018/03/WISEReport2017_final.pdf, ???? Accessed:
                                                             [17] K. Dutt, D. L. Pfaff, A. F. Bernstein, J. S. Dillard,
      20th March, 2019.
                                                                  C. J. Block, Gender differences in recommendation
 [3] J. Hunt, Why do women leave science and engi-
                                                                  letters for postdoctoral fellowships in geoscience,
      neering?, ILR Review 69 (2016) 199–226.
                                                                  Nature Geoscience 9 (2016) 805.
 [4] A. Kauhanen, S. Napari, Gender differences in ca-
                                                             [18] V. Nicaise, G. Cogérino, J. Bois, A. J. Amorose, Stu-
      reers, Annals of Economics and Statistics (2015)
                                                                  dents’ perceptions of teacher feedback and physical
      61–88.
                                                                  competence in physical education classes: Gender
 [5] S. Chopra, A. Khan, M. Mirsafian, L. Golab, Gen-
                                                                  effects, Journal of teaching in Physical Education
      der differences in work-integrated learning assess-
                                                                  25 (2006) 36–57.
      ments., in: Proceedings of the International Con-
                                                             [19] V. Nicaise, J. E. Bois, S. J. Fairclough, A. J. Amorose,
      ference on Educational Data Mining (EDM), 2019,
                                                                  G. Cogérino, Girls’ and boys’ perceptions of phys-
      pp. 524–527.
                                                                  ical education teachers’ feedback: Effects on per-
 [6] S. Chopra, A. Khan, M. Mirsafian, L. Golab, Gender
                                                                  formance and psychological responses, Journal of
      differences in work-integrated learning experiences
                                                                  sports sciences 25 (2007) 915–926.
      of stem students: From applications to evaluations,
                                                             [20] S. Drudy, M. Ú. Chatháin, Gender effects in class-
      International Journal of Work-Integrated Learning
                                                                  room interaction: Data collection, self-analysis and
      21 (2020) 253–274.
                                                                  reflection, Evaluation & Research in Education 16
 [7] E. D. Reilly, K. R. Rackley, G. H. Awad, Perceptions
                                                                  (2002) 34–50.
      of male and female stem aptitude: The moderating
                                                             [21] P. C. Burnett, Teacher praise and feedback and stu-
      effect of benevolent and hostile sexism, Journal of
                                                                  dents’ perceptions of the classroom environment,
      Career Development 44 (2017) 159–173.
                                                                  Educational psychology 22 (2002) 5–16.
 [8] E. Doldor, M. Wyatt, J. Silvester, Statesmen or cheer-
                                                             [22] M. G. Jones, J. Wheatley, Gender differences in
      leaders? using topic modeling to examine gendered
                                                                  teacher-student interactions in science classrooms,
      messages in narrative developmental feedback for
                                                                  Journal of research in Science Teaching 27 (1990)
      leaders, The Leadership Quarterly 30 (2019) 101308.
                                                                  861–874.
 [9] D. G. Smith, J. E. Rosenstein, M. C. Nikolov, D. A.
                                                             [23] J. Tiedemann, Gender-related beliefs of teachers
      Chaney, The power of language: Gender, status,
                                                                  in elementary school mathematics, Educational
      and agency in performance evaluations., Sex Roles
                                                                  studies in Mathematics 41 (2000) 191–207.
      80 (2019).
                                                             [24] M. Mayo, M. Kakarika, J. C. Pastor, S. Brutus, Align-
[10] L. H. Keith, Visibility invisibility: Feedback bias
                                                                  ing or inflating your leadership self-image? a lon-
      in the legal profession, J. Gender Race & Just. 23
                                                                  gitudinal study of responses to peer feedback in
      (2020) 315.
                                                                  mba teams, Academy of Management Learning &
     Education 11 (2012) 631–652.                                a positive bias., Journal of personality and social
[25] W. B. Croft, D. Metzler, T. Strohman, Search en-            psychology 74 (1998) 622.
     gines: Information retrieval in practice, volume 520,
     Addison-Wesley Reading, 2010.
[26] M. E. Heilman, Gender stereotypes and workplace
     bias, Research in organizational Behavior 32 (2012)
     113–135.
[27] J. Lorber, S. A. Farrell, et al., The social construction
     of gender, Sage Newbury Park, CA, 1991.
[28] M.-T. Wang, J. L. Degol, Gender gap in sci-
     ence, technology, engineering, and mathematics
     (stem): Current knowledge, implications for prac-
     tice, policy, and future directions, Educational Psy-
     chology Review 29 (2017) 119–140. URL: https://
     doi.org/10.1007/s10648-015-9355-x. doi:10.1007/
     s10648-015-9355-x.
[29] N. Dasgupta, J. G. Stout, Girls and women in sci-
     ence, technology, engineering, and mathematics:
     Steming the tide and broadening participation in
     stem careers, Policy Insights from the Behavioral
     and Brain Sciences 1 (2014) 21–29.
[30] R. K. Coll, K. E. Zegwaard, Perceptions of desirable
     graduate competencies for science and technology
     new graduates, Research in Science & Technologi-
     cal Education 24 (2006) 29–58.
[31] D. Hango, Gender differences in science, technol-
     ogy, engineering, mathematics, and computer sci-
     ence (STEM) programs at university, Insights on
     Canadian Society (2013).
[32] C. Seron, S. S. Silbey, E. Cech, B. Rubineau, Persis-
     tence is cultural: Professional socialization and the
     reproduction of sex segregation, Work and Occu-
     pations 43 (2016) 178–214.
[33] R. Su, J. Rounds, P. I. Armstrong, Men and things,
     women and people: a meta-analysis of sex differ-
     ences in interests., Psychological bulletin 135 (2009)
     859.
[34] OECD, Collaborative problem solving (2017).
     URL: https://www.oecd-ilibrary.org/content/paper/
     cdae6d2e-en. doi:https://doi.org/https://
     doi.org/10.1787/cdae6d2e-en.
[35] N. Dasgupta, M. M. Scircle, M. Hunsinger, Female
     peers in small work groups enhance women’s mo-
     tivation, verbal participation, and career aspira-
     tions in engineering, Proceedings of the National
     Academy of Sciences 112 (2015) 4988–4993.
[36] G. M. Walton, C. Logel, J. M. Peach, S. J. Spencer,
     M. P. Zanna, Two brief interventions to mitigate a
     “chilly climate” transform women’s experience, rela-
     tionships, and achievement in engineering., Journal
     of Educational Psychology 107 (2015) 468.
[37] R. M. Kanter, Some effects of proportions on group
     life, in: The gender gap in psychotherapy, Springer,
     1977, pp. 53–78.
[38] K. D. Harber, Feedback to minorities: Evidence of