Gender Stereotypes in Film Language: A Corpus-Assisted Analysis Lucia Busso § Gianmarco Vignozzi § CoLingLab-Università di Pisa* Università di Pisa** l.busso0@fileli.unipi.it gianmarco.vignozzi@fileli.unipi.it § The research and the writing were carried out by both authors equally. G. V. is responsible for sections 1, 2, 3.1 and 3.2., L.B. for sections 3.3., 4 and 5. * on leave at the Department of English and Applied Linguistics, University of Birmingham ** on leave at the Department of Linguistics, University of Sydney active part in creating symbolic role models (Kord Abstract 2005, Bednarek 2015). Accordingly, it is interest- ing to examine the ways in which both females English: The present study concentrates and males are represented on celluloid to better on the representation and the reception of understand the ideologies they bear, and how gen- gender stereotypes. The analysis was first der identities are idealized. There seems to be carried out on an ad hoc corpus of cult ro- wide agreement on the fact that characterization mantic comedies and dramedies of Anglo- in filmic discourse heavily relies on archetypes American pop contemporary culture and and simplification (Culpeper 2001; Bednarek secondly with a perception test. Both the 2010). This is especially true in gender represen- corpus-driven analysis and the test results tation, as stereotypical roles simplify characteri- provide useful insights into the represen- zation in a way that it is easier to be received by tation, recognition and entrenchment of the viewing audience. This, however, often results gender stereotypes in language and in in an extreme polarization of gender roles. Film western culture. The preliminary findings dialogues are therefore an ideal ground on which generally confirm and validate the scien- to study gender stereotypes and their linguistic tific literature, although showing some no- representation and reception. Hence, this paper table new elements. aims to fathom the discursive representation and the perception of well-established gender stereo- Italiano: Il lavoro si incentra sulla rap- types in the dialogues of a sample of cult British presentazione e la percezione degli ste- and American romantic comedies, by integrating reotipi di genere. La ricerca è stata prima the tools of discourse analysis, corpus linguistics condotta su un corpus costruito ad hoc di and perception analysis. film cult della cultura pop contemporanea anglo-americana appartenenti ai generi 2 Films, language and gender romantic comedy e dramedy, ed in seguito con un test di percezione. Il duplice ap- The nature of film language is still an object of proccio utilizzato fa luce sulla rappresen- debate. Movie scripts can be classified as texts tazione, il riconoscimento e il radica- that are “written‐to‐be‐spoken‐as‐if‐not‐written” mento degli stereotipi di genere nella lin- (Gregory & Carroll 1978: 42). Dialogues, in fact, gua e nella cultura occidentale. I risultati portray a sort of “prefabricated orality” in that si trovano in linea con la letteratura, seb- they are carefully written to be performed and bene mostrino alcuni nuovi elementi. sound natural to the audience, who longs for au- thenticity (Chaume 2012: 81). Corpus-based stud- 1 Introduction ies have proved that spontaneous conversation In the era of digital revolution and screen prolif- and scripted dialogues are very similar in nature, eration, movies have undoubtedly acquired, sharing almost the same array of lexico-grammat- thanks to their significance, a pivotal role in shap- ical features (Quaglio 2009, Bednarek 2010, ing our worldviews. In fact, popular films have the Forchini 2012, Baker 2014, amongst others), but power to sway our collective imagination and in- due to the evident need for clarity and speed in fluence our attitudes on crucial issues related to audio-visual texts, there may be changes in terms race, class, gender, etc. Characters in films reflect of their frequency. In fact, film scripts, sometimes and perpetuate the status and options of them in tend to over-use features of spontaneous conver- today’s society and culture, and thus play an sation (e.g.: greetings and leave-takings, Bruti & Vignozzi (2016)) both for dramatic reasons and to render the speech of characters as natural-sound- Bend It Like Beckham 2002 UK ing as possible. The Devil Wears Prada 2006 USA Starting from the premises that gender is socially Juno 2007 USA constructed (Cameron 2010) and that a large part of its perception relies on the observation of pre- Eat, Pray, Love 2010 USA established models, television and films provide Letters to Juliet 2010 USA the perfect field for examining generalized west- Table 1: corpus rationale ern social representation of accepted human be- haviour (Shrum 2008). In this vein, verbal lan- The resulting corpus is therefore a synchronic ad guage becomes one of the pivotal means to create, hoc corpus of 95,036 tokens. We further subdi- reinforce and most importantly perpetuate stereo- vided it into two subcorpora consisting of the typical representations. Canonical research on turns of female and male characters – respectively language and gender has shown that traits such as 55,766 (58.7%) and 39,270 (41.3%) tokens hedges, empty adjectives, excessively polite (henceforth: M and F). We chose to gather a new forms, intensifiers, troubles talk etc. are more typ- corpus – instead of relying on existing ones – to ical of women (Lakoff, 1975; Tannen 1994; obtain a higher control on the data. Moreover, Coates 1993), whereas males are associated with popular romantic comedies are the perfect humus substandard and diatopically marked registers for a polarized representation of gender roles, be- (Trudgill 1972; Tannen 1991) and a use of lan- cause of their content and intrinsic structure. As guage that is aimed at retaining status and atten- will be seen, however, our results are comparable tion. However, nowadays many of these ideas with the ones extracted from much the larger film have been partially rejected and framed as stereo- corpus Cornell Movie-Dialogs Corpus.1 typical norms around feminity and masculinity, Keywords and semantic domains clouds analy- which do not leave space for diversity (Cameron sis. We used the online text analysis software 2010, Mullany 2007; Bednarek, 2015). In recent WMatrix (Rayson 2003, 2004) to compare M and times, corpus linguistics and computational lin- F both against each other and a reference corpus guistics have shown interest in analysing differ- – the BNC-spoken. WMatrix performs automatic ences in language between genders (Argamom et semantic analysis (of English) texts. This seman- al, 2003, Baker 2006, Herring & Paolillo 2006, tic analysis is carried out by a first POS tagging McEnery 2006, Monroe et al. 2008, amongst oth- phase; the output is then semantically tagged from ers). This body of literature represents the back- a set of 21 predefined semantic fields, further sub- bone structure of our work, which aims to put to- divided into 232 category labels for more fine- gether “corpus linguistics and gender analysis: grained classification. Thus, from the comparative two strands of linguistic research that do not go analyses starting from males and females’ sub- together frequently” (Kreyer 2014: 570). corpora, keywords and semantic domains clouds (calculated with log-likelihood statistic). Statisti- 3 Data and corpus driven analysis cally significant items are the ones with LL values near or over 7, since 6.63 is the cut-off for 99% The corpus. We compiled a corpus out of the or- confidence of significance. The automatically ob- thographic transcriptions of eight English and tained clouds were manually analysed to filter American romantic comedies, using the web soft- possible errors and select the more significant se- ware SketchEngine (Kilgarriff et al. 2004, 2014). mantic domains associated with our sub-corpora. The films were chosen not only for their themes, From the comparisons of the two sub-corpora but also for chronological coherence, as they against each other and against the BNC Spoken, cover approximately the first decade of the 21st we selected the most relevant semantic domains century (table 1). and keywords (i.e. with the higher LL values) for more qualitative-like evaluation. Tables 2 and 3 Title Year Nation report the domains and the keywords that we se- Sliding Doors 1998 UK lected. Billy Elliot 2000 UK Bridget Jones’ Diary 2001 UK/USA 1 addressed to women and has therefore more female leading The fact that F is bigger than M should not come as a sur- prise. The film genre of romantic comedy is generally characters. Sem. domains F Sem. domains M Interestingly, the tendencies that emerged from Business: Selling Industry our small corpus are in line with Schofield and Evaluation: Authentic Evaluation_Inaccurate Mehr (2016)’s analysis of the Cornell Movie-Di- alogs Corpus (Danescu-Niculescu-Mizil et al. Clothes and Personal Sports Belongings 2012a), a vast corpus of more than 600 films of Time: New and Young Money_Generally different genres. The similarity of the results gave Judgments of Appearance Greedy us confidence in using the stereotypical represen- People: Female People: Male tations of genders’ speech to investigate its recep- Kin Foolish tion by means of a test. Informal/Friendly Able:Intelligent The test. With the aim of testing the reception Anatomy and Physiology Anatomy and and entrenchment of gender stereotypes in speak- Physiology ers, we developed a perception test based on the Intimacy and Sex Intimacy and Sex results of our corpus-driven analysis. We manu- ally extracted 18 lines per subcorpus2, each con- Keywords F Keywords M taining one or more of the stereotypical semantic Feelings (in_love, love) Friendship (lads, man, domains and keywords that emerged from the pre- mate) vious WMatrix analysis. The resulting 36 ex- God, oh God, my God Swearing (fuck, fuck off, tracted lines were used as stimuli in the perception fucking) test3. The choice of such limited number of sen- Swearing and Euphemisms Right, all_right tences was determined by two reasons. The first, (Shit, Shagging) theoretically motivated, was not to repeat the Mom Dad same keywords and stereotypes too many times. Politeness (Thank You, sorry Such repetition, in our opinion, could have influ- Sorry) enced or biased the participants. The second rea- People (Me, My, You) son, of a more practical nature, was to construct a Table 2 and 3: WMatrix semantic domains and keywords reasonably-sized test to maintain participants’ at- used in the test tention and avoid fatigue, which could have influ- enced the responses. We extracted film lines con- As it can be seen, in our corpus women tend to taining a variable concentration of stereotypes, speak about shopping, cleaning, personal care, ranging from sentences referring to only one to and family, whereas men appear to discuss several stereotypical domains. The selection was money, sports, work and male friendship. In table done manually, based on the rather obvious hy- 2 are also present semantic domains which were pothesis that sentences more “stereotypically relevant for both M and F speech, i.e. “Anatomy dense” would be recognised more easily. The and Physiology” and “Intimacy and Sex” (in stimuli-sentences were also chosen as deprived of bold). These last two domains may emerge as context as possible, in order not to give any clue strongly relevant due to corpus-specific reasons. about the film of origin. Proper names were omit- Romantic comedies, in fact, are most often cen- ted, and when this was not possible, substituted tred around romantic and quite physical relation- with the string [XXX]. For example, in (1) the ships. However, what we think is of interest when name of the male romantic partner was obscured analysing the overlapping between semantic do- so that the only clue to the gender of the speaker mains between females and males is the different would be the linguistic stereotypes (shopping, wording. Women and men refer to their bodies mitigated swearwords, weaving). and their relationships in different ways, which are consistent with a polarization of gender roles 1) When [XXX] and I broke up for two (E.g.: breasts vs. boobs). Keywords are also worth weeks, I bought a loom, a frigging loom mentioning. Their evaluation showed that women make larger use of politeness forms, while men The test was presented to 22 native, bilingual or resort to more swearwords and interjections, such highly proficient speakers of English, 15 women as “right, all right”. and 7 men (mean age: 39.5). The task was to de- cide whether a given sentence had been uttered by 2 The stimuli-sentences were chosen to be as representative 3 For reasons of space we do not include the complete list of as possible of the entire corpus: they are evenly distributed the sentences extracted and used for the test. Several exam- among all the films of the corpus, with two or three instances ples are reported in the text and in following footnotes. from each film for each subcorpus. a man or a woman. In order not to force partici- recognisable clusters of linguistic and conceptual pants to a necessarily binary choice, the option “I stereotypes4. The second group is instead formed don’t know” was also included. We additionally by stereotypes that were recognised by a substan- asked speakers to specify words, expressions or tial part of the informants, but not by the majority. general concepts that influenced their answers. This, in our opinion, may be due to several fac- This provided us with interesting insights into par- tors: some concepts, for example, could be per- ticipants’ process of thinking and categorizing. ceived as less prototypical than others. In addi- tion, some linguistic features (e.g. discourse 4 Results markers) were not fully recognised as stereotypi- Several interesting considerations arise from the cal due to our limitation to the written dimension. analysis of the data. Firstly, it appears that overall Prosody, contextual information and multimodal- the stereotypes were correctly spotted and catego- ity are in fact fundamental aspects of language rized. that were inevitably excluded from our experi- mental design5. Finally, the last group consists of stereotypes that were not perceived as such by speakers (e.g.: family as a typical argument of women’s speech), and of what we called reverse stereotypes. That is, utterances that conceptually represented ambiguous events or anti-prototypical situations: a woman swearing, a man talking about his feelings.6 As predicted, these stereotypes were not recognised at all by participants, who tended to assign them to the opposite gender. It is inter- Chart. 1: Percentage of recognised stereotypes (in red) esting to note that also some male-produced sen- tences were not recognized by our informants, However, it also emerges that female stereotypes perhaps due to the composition of our corpus. were more unambiguously recognisable, with Several predominant keywords and domains in M, fewer answers assigned to the other gender or to in fact, may be strictly related to the chosen film the “I don’t know” category (chart.1). genre. For example, the massive presence of the By examining more closely the results, a subdivi- WMatrix domain Evaluation_inaccurate -- i.e. sion of the data can be made to account for the apologies --reflects the archetypical situation in differences in it: recognised (more than 50% cor- romantic comedies of men apologizing for their rect), ambiguous (between 25-50% correct) and mistakes to women. Being so context-related, completely misunderstood (less than 25% correct) however, speakers were not able to correctly lo- stereotypes. Table 4 illustrates the distribution of cate sentences containing expressions from this answers in the three frequency slots. domain.7 > 50% 25-50% < 25% Another aspect that was taken into consideration F LINES 61,1 % 27,8% 11,1% in our analysis was the gender of the informants, to see if a relation with the data could be recog- M LINES 33,3% 38,9 % 27,8% nised. There was a statistically significant differ- Table 4: distribution of participants’ answers ence between the gender of the participant and the answer to the test (H (2) = 9.2388, p-value = As was firstly hypothesized, sentences with a 0.0024, Kruskal-Wallis test with Wilcoxon post- higher “density” of stereotypical keywords or se- hoc, Bonferroni p-value correction). mantic domains were usually the ones that speak- A chi-square test of independence was performed ers better recognised. Stimuli in the first group, as well to examine the relation between gender of therefore, consist of clear-cut and well the speaker and responses given. 4 E.g.: “Give me the bag! I've got to get some proper shoes I. Oh, shit! I stubbed my foot on the side of the shagging for the wedding now” (71%) (f); “What are you doing, eh? bath! (f) You're me best mate!” (82%) (m). II. This is the first time in 18 years I'm going to be able 5 E.g.: “God! My mum had a fit when she saw the boots!” to call the shots in my own life! (m) 7- I made a mistake, such a big, BIG mistake and I'm sorry. (47%) (f); “He's a kid. He's just a fucking little kid.” (47%) (m). I'm truly, truly sorry. 6 The reverse stereotypes utterances are the following. - We accept that we fight a lot, and we hardly have sex any- more, but we don't wanna live without each other. The relation between these variables was signifi- 5 Conclusions cant. (χ2 = 10.298, p-value= 0.0058). The present paper proposes an original take on in- vestigating gender stereotypes in language. The novelty in our approach lies in the hybrid method- ology that falls neither in the tradition of the liter- ature on “gendered discourse” nor in the more re- cent field of corpus linguistics, but combines the two and adds insights from psycholinguistics as well. This kind of integrated analysis provided us with preliminary results that help identify gender Chart. 2: mosaic plot of the results divided by gender. archetypical roles, behaviours and linguistic rep- resentations in modern western culture. What is Chart 2 shows the difference in male and female interesting to note is that the gender representa- informants’ answers. The numbers of the variable tions coming to light from our corpus of pop-cul- “responses” indicate the three possible answers of ture films are based on features that are now dis- the test: “male” (1), “female” (2), “I don’t know” missed as clichéd and stereotypical by the litera- (3). As it can be seen, men assigned overall more ture (see Cameron 2005, 2010; Bexter 2006), but utterances to the “I don’t know” option rather than which seem to be nonetheless entrenched in our to one of the two genders. Women, instead, show interpretation of reality. a fairly equal distribution of responses among the The archetypical depiction of characters is partic- three conditions. Furthermore, both men and ularly evident in popular comedies, which do not women assigned more utterances to female char- examine characters’ psychology in depth. The test acters than to male ones (see table 5). This result validated our assumption that film language stere- is in line with the fact that women stereotypes otypically portrays the way in which men and were better recognised overall, in the sense that women talk drawing on recognisable traits at- fewer answers were assigned to the other gender. tached to femininity and masculinity in our cul- MEN WOMEN ture. In fact, speakers were mostly able to cor- rectly assign the utterances to the right gender. m 23% 30% In addition, all our informants showed metalin- f 29% 34% guistic –or second-level –awareness about stereo- idk 48% 36% typical concepts and linguistic clues, and several Table 5: distribution of informants’ an- of them also provided us with insightful and crea- swers divided by gender of the speaker tive inferences based on the event described in the utterance. We interpret this as a sign of stereo- Other useful insights into the data came from the types being conceptual in nature, deeply en- words our informants identified as relevant to trenched in our representation of the world and ac- their decision. In fact, two tendencies emerged: cessed via linguistic clues. The “reverse stereo- speakers either indicated specific words, colloca- types” also reinforce this idea. tions or phrases, or answered with abstract con- cepts and pragmatic inferences based on the utter- Acknowledgements: We would like to thank profes- ances. Interestingly, words and expressions ex- sors Silvia Bruti, Alessandro Lenci, Belinda Crawford actly replicated keywords, while general and ab- and Monika Bednarek for useful comments. We also stract concepts reflected the semantic domains gratefully acknowledge the reviewers for their valua- that emerged in the corpus analysis. In addition, ble comments. several speakers performed actual pragmatic in- ferences based on the stereotypical concepts con- References tained in the sentences. For example, to (2) sub- jects reacted either with a specific word like in a) Shlomo Argamon, Moshe Koppel, Jonathan Fine, and or with a more general consideration as in b). Anat Rachel Shimoni. 2003. Gender, genre, and 2) Ooh, you must feel like you're about to find writing style in formal written texts. Text & Talk, your long-lost soul mate! 23(3), 321–346. a) "soul mate" Paul Baker. 2006. Using Corpora in Discourse Analy- b) talking about feelings in general sis. London: Continuum. Paul Baker. 2014. Using Corpora to Analyze Gender. Susanne Kord and Elisabeth Krimmer. 2005. Holly- London; New York: Bloomsbury Academic. wood Divas, Indie Queens, and Language Varieties and their Social Contexts. London/New York: Monika Bednarek. 2010. The Language of Fictional Routledge. Television: Drama and Identity. London: Contin- uum. Rolf Kreyer. 2014. Review: P. Baker. 2014. Using Cor- pora to Analyze Gender. London/New York: Monika Bednarek. 2015. Corpus-Assisted Multimodal Bloomsbury. International Journal of Corpus Lin- Discourse Analysis of Television and Film Narra- guistics 19, 570-575. tives. In P. Baker, T. McEnery (Eds.), Corpora and Discourse Studies: Integrating Discourse and Cor- Robin Lakoff. 1975. Language and woman’s place. pora, 63-87. Basingstoke, UK: Palgrave Macmillan. New York, NY: Harper & Row. Silvia Bruti and Gianmarco Vignozzi. 2016. Routines Tom McArthur. 1981. Lexicon of Contemporary Eng- as social pleasantries in period dramas: a corpus lin- lish. London: Longman. guistic analysis. in R. Ferrari,S. Bruti (eds), A Lan- Anthony M. McEnery, Richard Z. Xiao & Yukio Tono. guage of One’s Own: Idiolectal English, pp. 207- 2006.Corpus-based Language Studies: An Ad- 239, Bologna: I libri di Emil. vanced Resource Book. London/New York: Deborah Cameron. 2010. Gender, Language and the Routledge. New Biologism. Constellations, 17 (4), 526–39. Burt L Monroe, Michael P Colaresi, and Kevin M Frederic Chaume. 2012. Audiovisual Translation: Quinn. 2008. Fightin’words: Lexical feature selec- Dubbing. Manchester, St Jerome. tion and evaluation for identifying the content of po- litical conflict. Political Analysis, 16(4), 372–403. Jennifer Coates. 1993. Women, Men and Language. London: Longman. Louise Mullany. 2007. Gendered Discourse in the Pro- fessional Workplace. Basingstoke, NY: Palgrave Jonathan Culpeper. 2001.Language and characterisa- Macmillan. tion: people in plays and other texts. Harlow: Long- man. Paulo Quaglio. 2009. Television Dialogue: The Sitcom Friends vs. Natural Conversation. Philadelphia: Cristian Danescu-Niculescu-Mizil, Justin Cheng, Jon John Benjamins. Kleinberg, and Lillian Lee. 2012. You had me at hello: How phrasing affects memorability. In Pro- Paul Rayson. 2009. Wmatrix: A Web-based Corpus ceedings of ACL, 892–901. Processing Environment. Computing Department, Lancaster University. Available online at: Penelope Eckert and Sally McConnell-Ginet. 2003. http://ucrel.lancs.ac.uk/wmatrix/ Language and Gender. Cambridge: Cambridge Uni- versity Press. Paul Rayson, Dawn Archer, Scott Piao, & Anthony M. McEnery. 2004. The UCREL semantic analysis sys- Pierfranca Forchini. 2012. Movie Language Revisited. tem. Proceedings of the beyond named entity recog- Evidence from Multi-Dimensional Analysis and nition semantic labelling for NLP tasks workshop, Corpora. Bern: Peter Lang. Lisbon, Portugal, 2004. Michael Gregory and Susanne Carroll. 1978. Language Alexandra Schofield and Leo Mehr. 2016. Gender dis- and Situation: TV Heroines: Contemporary Screen tinguishing features in film dialogue. NAACL Images of Women. Lanham: Rowman & Littlefield. CLfL. Susan C Herring and John C Paolillo. 2006. Gender L.j. Shrum. 2008. Media consumption and perceptions and genre variation in weblogs. Journal of Sociolin- of social reality. In J. Bryant &M.B. Oliver (eds.), guistics, 10(4), 439–459. Media Effects: Advances in Theory and Research, Janet Holmes. 2006. Gendered Talk at Work: Con- 3rd Edition. New York, NY: Routledge. structing Gender Identity through Workplace Dis- Mary M Talbot. 2003. “Gender Stereotypes: Repro- course. Oxford: Blackwell. duction and Challenge”. In Holmes, J. & Meyerhoff, Adam Kilgarriff, Vít Baisa, Jan Bušta, Miloš Jaku- M. (eds.), The Handbook of Language and Gender. bíček, Vojtěch Kovvář, Jan Michelfeit, Pavel Oxford: Blackwell, 468-86. Rychlý, Vít Suchomel. 2014. The Sketch Engine: ten Deborah Tannen. 1991. You just don’t understand: years on. Lexicography, 1, 7-36. Women and men in conversation. Virago London. Adam Kilgarriff, Pavel Rychlý, Pavel Smrž, David Deborah Tannen. 1994. Gender and Discourse. New Tugwell. The Sketch Engine. 2004. Information York: Oxford University Press. Technology. Available online at: www.sketchengine.co.uk Peter Trudgill. 1972. Sex, covert prestige and linguis- tici change in the urban British English of Norwich. Language in Society 1, 179-195.