A quantitative study of gender representation and authors’ gender in a large-market print medium⋆ Christoph Bartl1 , Sharwin Rezagholi2,∗ and Mareike Schumacher3 1 Department Computer Science, University of Applied Sciences Technikum Wien, Austria 2 Department Computer Science, University of Applied Sciences Technikum Wien, Austria 3 Institute of Literary Studies, University of Stuttgart, Germany Abstract We analyse gender representation in articles published by the Austrian daily newspaper ’Der Standard’ in the years 2021 and 2022. We use named entity recognition and automated gender classification of first names to count the number of female and male persons in articles. The analysis reveals the dominance of male persons in article content. We find that female authors exhibit a significantly higher tendency to mention female persons in their articles. Keywords Gender of journalists, gender representation in articles, print newspaper content 1. Introduction This paper asks whether (i) female persons are less likely to be mentioned by newspaper jour- nalists than male persons, and (ii) whether the propensity to mention female persons differs between female and male journalists. To answer these questions we analyse the entire article output of the Austrian daily newspaper Der Standard (https://www.derstandard.at) from the years 2021 and 2022. These articles are publicly available online. We additionally obtained the full names of the authors of all articles from Der Standard; this authorship information is non- public for most of the articles. Der Standard is the fourth-largest daily newspaper in Austria, having more than 500,000 daily readers [29]. This study employs a binary notion of gender, as does the language policy of Der Standard, which prescribes the sole use of feminine and masculine pronouns, effectively prohibiting the use of neopronouns. Therefore this study does not contribute to the abolishment of a binary notion of gender, an aim prominently championed within the digital humanities by Laura Man- dell [20]. We employ pretrained models for natural language processing to automatically identify and enumerate female and male persons mentioned in article texts. In particular, we use named entity recognition and automated gender-assignment to first names. The respective methods CHR 2024: Computational Humanities Research Conference, December 4–6, 2024, Aarhus, Denmark ∗ Corresponding author. £ christoph.bartl@me.com (C. Bartl); sharwin.rezagholi@technikum-wien.at (S. Rezagholi); mareike.schumacher@ur.de (M. Schumacher) ȉ 0000-0003-1090-0240 (S. Rezagholi); 0000-0002-7952-4194 (M. Schumacher) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 1037 CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings are susceptible to bias. We therefore estimate the error profile of the gender assignment to first names and verify that there is no significant bias that might have distorted our analysis. We statistically estimate the probability that a journalist mentions a female person when mentioning a person. We find that females are less likely to be mentioned by journalists. This finding holds for male as well as female authors, but we find it to be less pronounced in female journalists. These effects are present to different degrees in different editorial departments, with some departments not exhibiting the imbalance at all. The causal pathways leading to these effects could include statistical ’self-selection’, whereby female authors and male authors have a differing propensity to report on certain issues, or female authors could tend to highlight the roles of female persons more than their male colleagues, even when reporting on the same issue. Literature review. The issue of gender representation and gender inequality is lively dis- cussed in the digital humanities, including computational linguistics [12], digital film studies [3, 32], game studies [31], and computational literary studies [4, 5, 8, 13, 19, 30, 32]. Newspa- pers and magazines in particular have been studied with respect to gender representation and possible gender bias. Yun et al. found that women were given more space in online than in print journals and that, in a significant fraction of cases, women were portrayed in stereotyp- ical ways [16]. Kian et al. analysed tennis news, finding that female reporters did not write more often about female athletes than their male colleagues but that female reporters tended to use more stereotypical descriptions [17]. Kozlowski et al. [18] analyse the magazines from an Argentinian publisher from 2008 to 2018 using topic modelling and find that the prevalence of thematic areas differs between magazines that target female readers and those that target male ones. They find that this gap is diminishing with respect to certain topics, such as ’family’ and ’children’, whereas it remains large in others, such as ’fashion’ and ’horoscope’. The large- scale analysis of Shor et al. [27], in which more than 20,000 prominent personalities of male and female gender and different (but matching) professions were searched in about 2,000 English- language newspapers, came to the conclusion that the reduced media coverage of women is not in line with the readers’ interest, which does not favour prominent men over prominent women. They thus provide some evidence in favour of the hypothesis that newspapers and magazines foster stereotypes and gender bias. The work most related to ours is due to Mateos de Cabo et al. [21], whose analysis of Spanish online newspapers found that females were more likely to be mentioned in female-authored articles, and to Shor et al. [28], whose analysis of about 2,000 news sources found that the fraction of females in articles increased from 19% in 1983 to 27% in 2008. The latter also found significant differences between editorial departments. The digital humanities community has formulated a need to further the application of its methods to questions of gender. These voices include Miriam Posner [23], who criticized that gender-related work in the digital humanities does not receive sufÏcient attention, neither from the scholarly community nor from news outlets. In 2018 Susan Brown stated that a feminist perspective is largely lacking in the digital humanities, going as far as calling ’feminism’ the ’f word’, suggesting that feminist approaches are effectively silenced [6]. In 2019 Laura Mandell argued that studies on gender within the digital humanities would rather reproduce stereotypes 1038 than analyse them [20]. Coining the term ’data feminism’ in 2020, D’Ignazio and Klein drew awareness to gender representation biases in the digital humanities and in other data-driven fields [10]. Although gender bias has been studied in various domains [9, 25, 15, 30, 19, 11, 24, 12], we agree that a sufÏcient corpus of statistical results remains absent. 2. Data and methods We start with the text and metadata of 87,032 articles, corresponding to the entire journalistic output of Der Standard between January 1, 2021 and December 31, 2022. The metadata of the articles includes authorship, publication date, title, and editorial department. We removed all articles from consideration that were written by a group of journalists or signed by a press agency. This restriction is in line with our interest in the behaviour of individual writers at a newspaper, as opposed to press agencies, group authorship, or anonymous authorship. We retain 36,204 articles. Named entity recognition and gender-assignment. We use the Python package Gender- Guesser 0.4.0 to assign gender to the authors’ given names. GenderGuesser uses a database of 40,000 gender-assigned first names to assign gender to a given name [2]. Since some given names, such as ‘Andrea’, ’Maria’ or ‘Robin’, are not gender-exclusive, GenderGuesser returns ‘mostly female’, ‘mostly male’, and ‘androgynous’ in some cases. We assigned the first two of these categories to ‘female’ and ‘male’ respectively. We manually checked the gender- assignments of all 1,571 authors and corrected three cases. We therefore treat the assignment of gender to the names of authors as certainly correct in the remainder of our analysis. We used the Python package Flair 0.12.2 [1] to recognize personal names in the article texts using the named entity recognition model ‘ner-german-large’ [26]. In a first step we restrict attention to names consisting of a given name and a family name since the editorial policy of Der Standard prescribes the use of the full name at least once per article. The gender of the detected given names was then determined using GenderGuesser. Names such as ‘Barack Obama’, ‘Viktor F.’, ‘Angela Merkel’, ‘Nina H.’, and ’Luke Skywalker’ were identified. In a second step we parse the articles for mentions of the identified persons using only a part of the full name. To clarify our counting method: An article mentioning two persons, the same male person 9 times and a female person once, is considered a text where 90% of mentioned persons are male. We evaluated our extraction of full names in comparison to the manual counting of the num- ber of female and male full names in 200 randomly selected articles (Table 2). We use binomial estimates to quantify the conditional accuracies of the automated extraction (Table 1). The fair- ness criterion of predictive parity requires the equality of the true positive rate for male and female cases [7]. We find that these two rates are numerically very similar, approximately 0.87 and 0.88, and that the hypothesis that they are equal can not be rejected (Binomial proportions test, 𝑝 = 0.68). Generative model. Our main interest is the probability that an author, when mentioning a person, does mention a female person. Our task is complicated by the fact that the automated 1039 Table 1 Performance of gendered name recognition. Estimate St. dev. 95% interval (Wald) ℙ(classified as male|male) 0.869 0.013 [0.844, 0.893] ℙ(classified as female|female) 0.881 0.026 [0.831, 0.931] ℙ(classified as female|male) 0.003 0.002 [0.000, 0.007] ℙ(classified as male|female) 0.019 0.010 [0.000, 0.040] ℙ(not detected|male) 0.129 0.013 [0.104, 0.153] ℙ(not detected|female) 0.100 0.024 [0.054, 0.146] 0.88 name is female detected as female p 0.10 0.02 name mentioned name not detected 0.13 0.00 1−p 0.87 name is male detected as male Figure 1: Generative model. Arrows are labelled with estimated conditional probabilities computed on the basis of the manually labelled test set (Table 2). detection of full names and the automated assignment of gender to the respective first names could be biased. The probability that we have access to is ℙ(detected as female|name detected). We consider a simple generative model for our data (Figure 1), which illustrates that the events ’detected as female’ and ’detected as male’ do not allow the identification of the parameter of interest, that is 𝑝, unless certain assumptions are made. The two necessary assumptions are the absence of mix-ups, that is ℙ(detected as female|male) = ℙ(detected as male|female) = 0, (1) and unbiased non-detection, that is ℙ(not detected|female) = ℙ(not detected|male). (2) If Equations 1 and 2 hold, then ℙ(fem. name detected|name detected) = ℙ(fem. name mentioned|name mentioned). 1040 Table 2 Contingency table for gendered name recognition. Classified as male Classified as female Not recognized Total Male 621 2 92 715 Female 3 141 16 160 Total 624 143 108 It is not necessary to assume that non-detection does not occur, but that non-detection is un- biased. The empirical probabilities corresponding to those in Equation 2 are similar, approx- imately 0.10 and 0.13, and the hypothesis that they are equal can not be rejected (Binomial proportions test, 𝑝 = 0.31). As Table 1 reports, the empirical probabilities for mix-ups equal approximately 0.00 and 0.02. We feel that these values are sufÏciently low to assume that Equa- tion 1 holds. In Table 2 the fraction of female persons among the mentioned persons equals 160/(160 + 715) ≈ 0.18 while the fraction of names classified as female among the classified names equals 143/(143 + 624) ≈ 0.19. This numerically illustrates the absence of bias. Statistical model. The details of our statistical approach are presented in Appendix A. Our basic modelling assumption is that the number of persons in an article is predetermined, but the respective journalist ’chooses’ the gender of the mentioned persons independently from a Bernoulli distribution. The parameter of this Bernoulli distribution is specific to the subset of the data for which an estimate is desired. We believe that this model is sufÏcient to organise the data in a tractable and intuitive fashion. To be concrete: We estimate the probability that a journalist uses the name of a female person when using the name of a person. Note that our es- timate does not equal the fraction of female persons in a certain subset of articles. Our estimate is descriptive of authors’ behaviour, not of article output (see Appendix A). It is important to note that every author, regardless of the number of persons mentioned in their respective arti- cles, has equal importance for our point estimates, but the differing degrees of uncertainty for different authors, caused by the different quantities of mentioned persons, are reflected in the interval estimates, that is in the confidence intervals. When we estimate probabilities specific to editorial departments, we employ a weighting scheme whereby the degree of membership of an author in an editorial department is taken into account. Consult Appendix A for details on our statistical approach. 3. Descriptive analysis Of the 36,204 articles, 12,736 (35%) were written by a female author and 23,468 (65%) were written by a male author. Among the 1,571 unique authors 606 (39%) are female. The average author has contributed 23 articles to the dataset. The articles are of varying length, the median length being 3,859 characters (1st quartile = 2,560, 3rd quartile = 5,248). The mean of the fraction of females in an article equals 25%. The respective mean for the arti- cles written by male writers equals 20%. On the other hand, the mean for the articles authored 1041 1.00 1.00 1.0 Fraction of female persons in article Fraction of female persons in article Fraction of female persons in article 0.6 0.43 0.33 0.2 0.07 0.00 0.00 0.0 All articles Male-authored articles Female-authored articles Figure 2: Empirical distributions of the fraction of females in an article, quartiles on the vertical axes. by female writers equals 33%. The empirical distributions of the fraction of female persons in an article are visualized in Figure 2. Note that we only consider articles that do mention at least one person. It is apparent that many articles that do mention persons do not mention females at all. This is true for female- as well as male-authored articles. It is evident that the dis- tributions for female- and male-authored articles differ. The distribution for female-authored articles stochastically dominates the distribution for male-authored articles (McFadden’s test [22], 𝑝 = 0.00). The distributions exhibit concentrations at the extremes. This illustrates that many articles solely mention persons of one gender. Differentiating with respect to the editorial departments of Der Standard (Table 3), one finds that the department Family produces the smallest number of articles but has the largest fraction of female authorship. The largest number of articles is produced by the department Culture, which corresponds to roughly 13% of our data. 4. Statistical estimation Our first interests are to estimate the probabilities that (i) an author who mentions a person does mention a female person, (ii) a female author who mentions a person does mention a female person, and (iii) a male author who mentions a person does mention a female person. The respective estimated probabilities are reported in Table 4 and visualised in Figure 3. While the probability that an author mentions a female person when mentioning a person is estimated to be roughly 27%, the respective estimate for female authors is roughly 46% and roughly 14% for male authors. The differences are highly statistically significant (𝑝 = 0.00). We conclude that there is strong evidence, at least in our data, that female journalists are more likely to mention female persons in their articles compared to male journalists. To elucidate the differences between authors from different editorial departments, we esti- mate the probability that a journalist from a given department mentions a female person when mentioning a person. These estimates are reported in Table 5 and visualized in Figure 4. There are editorial departments whose output is likely to mention females, such as Female Standard and Family, and departments whose writers are unlikely to mention females, such as Automo- 1042 Table 3 Data composition and author’s gender in terms of editorial departments. Department Article Article Female Fraction Unique Unique Fraction count per- author- female au- female female cent ship thors au- count thors Wirtschaft 3131 8.65 1462 0.47 134 44 0.33 (Economy) Karriere 724 2.00 584 0.81 78 40 0.51 (Career) Recht 704 1.94 148 0.21 131 36 0.27 (Law) Die Standard 367 1.01 335 0.91 54 46 0.85 (Female Standard) Gesundheit 657 1.81 603 0.92 47 30 0.64 (Health) Automobil 644 1.78 44 0.07 35 9 0.26 (Automobile) Web 3734 10.31 44 0.01 46 9 0.20 Reisen 319 0.88 140 0.44 52 23 0.44 (Travel) Meinung 4205 11.61 1472 0.35 692 226 0.33 (Opinion) International 3812 10.53 1360 0.36 150 54 0.36 (International) Inland 1972 5.45 638 0.32 88 41 0.47 (Domestic) Lifestyle 1588 4.39 833 0.52 112 60 0.54 Etat 1902 5.25 716 0.38 98 37 0.38 (Government) Immobilien 853 2.36 364 0.43 39 17 0.44 (Realty) Bildung 466 1.29 335 0.72 64 38 0.59 (Education) Zukunft 628 1.73 145 0.23 44 25 0.57 (Future) Sport 1375 3.80 28 0.02 61 14 0.23 Familie 238 0.66 230 0.97 32 25 0.78 (Family) Kultur 4625 12.77 1691 0.37 294 131 0.45 (Culture) Panorama 2330 6.44 858 0.37 177 70 0.40 Wissenschaft 1930 5.33 706 0.37 156 80 0.51 (Science) 1043 All authors Male authors Female authors 0 0.14 0.27 0.46 1 Probability that author mentions a female person when mentioning a person Figure 3: Estimated probabilities that an author, when mentioning a person, does mention a female person (with 95%-confidence intervals). Opinion International Domestic Career Government Female Standard Health Science Economy Sport Family Panorama Realty Law Culture Future Lifestyle Travel Web Education Automobile 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Probability that author mentions a female person when mentioning a person Figure 4: Estimated probability that an author from the respective department mentions a female person when mentioning a person (with 95%-confidence intervals). bile and Sport. These findings are in line with previous studies from different countries and languages [17, 28, 18]. To disentangle the effect of authors’ gender and editorial departments, we stratify our anal- ysis with respect to both. These estimates are visualized in Figure 5 and reported in Table 6, including hypothesis tests for the null-hypothesis that female and male authors behave identi- cally. We obtain significantly different estimates for female and male authors for many edito- 1044 Opinion, fem. auth. Opinion, male auth. International, fem. auth. International, male auth. Domestic, fem. auth. Domestic, male auth. Career, fem. auth. Career, male auth. Government, fem. auth. Government, male auth. Female Standard, fem. auth. Female Standard, male auth. Health, fem. auth. Health, male auth. Science, fem. auth. Science, male auth. Economy, fem. auth. Economy, male auth. Sport, fem. auth. Sport, male auth. Family, fem. auth. Family, male auth. Panorama, fem. auth. Panorama, male auth. Realty, fem. auth. Realty, male auth. Law, fem. auth. Law, male auth. Culture, fem. auth. Culture, male auth. Future, fem. auth. Future, male auth. Lifestyle, fem. auth. Lifestyle, male auth. Travel, fem. auth. Travel, male auth. Web, fem. auth. Web, male auth. Education, fem. auth. Education, male auth. Automobile, fem. auth. Automobile, male auth. 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Probability that author mentions a female person when mentioning a person Figure 5: Estimated probability that a journalist from the respective department and of the respective gender mentions a female person when mentioning a person (with 95%-confidence intervals). rial departments. The departments Opinion, Career, Female Standard, Science, Economy, Law, Culture, Lifestyle, Web, and Automobile exhibit highly significant differences with respect to author gender (𝑝 = 0.00). The departments Domestic, Government, Health, Realty, Travel, and Education do not exhibit statistically significant gender differences (𝑝 > 0.1). 1045 5. Conclusion and caveats We present some statistical evidence for the hypothesis that female and male journalists have differing propensities to mention female persons in their writing. This effect varies between editorial departments, at least in our data, and is stronger within certain editorial departments than across editorial departments. Further research is needed to elucidate whether the find- ings of the present study are driven by a mechanism through which female journalists write about different topics than male journalists. Even if this were true, it would remain ambiguous whether the ultimate cause is the issue-assignment policy within newsrooms, or a desire by journalists to write on topics featuring persons of a certain gender. The latter could also corre- spond to a conscious attempt by female journalists to highlight female persons in an attempt to counteract existing gender imbalances. We deem it an interesting avenue for further research to quantitatively elucidate the relationship between journalistic topics, gender representation, and authorship. If the observed differences were caused by different propensities in mentioning female persons even when reporting on the same topic, this would indicate gender-specificity in journalist’s viewpoints. Finally we want to highlight that the present study is most cer- tainly marred by the problem that many relevant and potentially confounding factors, such as topics, have not yet been taken into account. Therefore the present study is but an empirical quantification of a status quo. Acknowledgments The authors thank Martin Kotynek and Werner Weichselberger from Der Standard for their support. References [1] A. Akbik, T. Bergmann, D. Blythe, K. Rasul, S. Schweter, and R. Vollgraf. “FLAIR: An easy- to-use framework for state-of-the-art NLP”. In: Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (2019), pp. 54–59. [2] D. Arcos. Gender-guesser. 2016. url: https://github.com/lead-ratings/gender-guesser. [3] D. Bamman, B. O’Connor, and N. A. Smith. “Learning Latent Personas of Film Char- acters”. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2013, pp. 352–361. url: https://aclanthology.org/P13-1035. [4] O. Baylog, L. Dimmit, T. Heller, G. Kirilloff, S. Smith, G. Thomas, C. Warren, and J. Wehrwein. ”More than Custom has Pronounced Necessary”: Exploring the Correlation be- tween Gendered Verbs and Character in the 19th Century Novel Nebraska Literary Lab. 2016. 1046 [5] J. Bergenmar and K. Leppänen. “Gender and Vernaculars in Digital Humanities and World Literature”. In: NORA - Nordic Journal of Feminist and Gender Research 25.4 (2017), pp. 232–246. doi: 10.1080/08038740.2017.1378256. [6] S. Brown. “Delivery Service: Gender and the Political Unconscious of Digital Humani- ties”. In: Bodies of Information: Intersectional Feminism and the Digital Humanities. Uni- versity of Minnesota Press, 2018, pp. 261–286. [7] A. Castelnovo, R. Crupi, G. Greco, D. Regoli, I. G. Penco, and A. C. Cosentini. “A clari- fication of the nuances in the fairness metrics landscape”. In: Scientific Reports 12.4209 (2022). [8] J. Cheng. “Fleshing Out Models of Gender in English-Language Novels (1850–2000)”. In: Journal of Cultural Analytics 5.1 (2020). doi: 10.22148/001c.11652. [9] M. Conroy. “Quantifying the Gap: The Gender Gap in French Writers’ Wikidata”. In: Journal of Cultural Analytics 8.2 (2023). doi: 10.22148/001c.74068. [10] C. D’Ignazio and L. F. Klein. Data Feminism. MIT Press, 2020. [11] M. Flüh, J. Horstmann, and M. Schumacher. “Genderaspekte in Fantasy-Jugendromanen von 2008 bis 2020: Distant Gender Reading”. In: Gender in der deutschsprachigen Kinder- und Jugendliteratur. De Gruyter, 2022, pp. 457–482. doi: 10.1515/9783110726404-025. [12] C. Freitas and D. Santos. “Human Depiction in Portuguese. Distant reading Brazilian and Portuguese literature”. In: Journal of Computational Literary Studies 2 (2024). [13] S. Hota and S. Argamon. Performing gender: Automatic stylistic analysis of Shakespeare’s characters. 2006. [14] F. Hu and J. Zidek. “The weighted likelihood”. In: Canadian Journal of Statistics 30.3 (2002), pp. 347–371. doi: https://doi.org/10.2307/3316141. [15] M. Jockers and G. Kirilloff. “Understanding Gender and Character Agency in the 19th Century Novel”. In: Journal of Cultural Analytics 2.2 (2016). doi: 10.22148/16.010. [16] H. Jung Yun, M. Postelnicu, N. Ramoutar, and L. Lee Kaid. “Where Is She?: Coverage of women in online news magazines”. In: Journalism Studies 8.6 (2007), pp. 930–947. doi: 10.1080/14616700701556823. [17] E. M. Kian, J. S. Fink, and M. Hardin. “Examining the Impact of Journalists’ Gender in Online and Newspaper Tennis Articles”. In: Women in Sport and Physical Activity Journal 20.2 (2011), pp. 3–21. doi: 10.1123/wspaj.20.2.3. [18] D. Kozlowski, G. Lozano, C. Felcher, F. Gonzalez, and E. Altszyler. Gender bias in maga- zines oriented to men and women: A computational approach. 2020. [19] E. Kraicer and A. Piper. “Social Characters: The Hierarchy of Gender in Contemporary English-Language Fiction”. In: Journal of Cultural Analytics 3.2 (2019). doi: 10.22148/16 .032. [20] L. Mandell. “Gender and Cultural Analytics: Finding or Making Stereotypes?” In: Debates in the Digital Humanities 2019. University of Minnesota Press, 2019, pp. 3–26. 1047 [21] R. Mateos de Cabo, R. Gimeno, M. Martı́nez, and L. López. “Perpetuating Gender Inequal- ity via the Internet? An Analysis of Women’s Presence in Spanish Online Newspapers”. In: Sex Roles 70.1 (2014), pp. 57–71. doi: 10.1007/s11199-013-0331-y. [22] D. McFadden. “Testing for Stochastic Dominance”. In: Studies in the Economics of Uncer- tainty. Springer, 1989, pp. 113–134. [23] M. Posner. “What’s Next: The Radical, Unrealized Potential of Digital Humanities”. In: Debates in the Digital Humanities 2016. University of Minnesota Press, 2016, pp. 32–41. [24] T. Schmidt, I. Engl, J. Herzog, and L. Judisch. “Towards an Analysis of Gender in Video Game Culture: Exploring Gender specific Vocabulary in Video Game Magazines”. In: Proceedings of the Digital Humanities in the Nordic Countries 5th Conference (DHN 2020) (2020), pp. 333–341. [25] M. Schumacher and M. Flüh. “Made to Be a Woman: A case study on the categorization of gender using an individuation-based approach in the analysis of literary texts”. In: Digital Humanities Quarterly 17.3 (2023). url: https://www.digitalhumanities.org/dhq/v ol/17/3/000728/000728.html. [26] S. Schweter and A. Akbik. “FLERT: Document-level features for named entity recogni- tion”. 2020. [27] E. Shor, A. van de Rijt, and B. Fotouhi. “A Large-Scale Test of Gender Bias in the Media”. In: Sociological Science 6 (2019), pp. 526–550. doi: 10.15195/v6.a20. [28] E. Shor, A. van de Rijt, A. Miltsov, V. Kulkarni, and S. Skiena. “A Paper Ceiling: Ex- plaining the Persistent Underrepresentation of Women in Printed News”. In: American Sociological Review 80.5 (2015), pp. 960–984. doi: 10.1177/0003122415596999. [29] Statista. Österreich Tageszeitungen nach Anzahl der Leser. 2022. url: https://de.statista.c om/statistik/%20daten/studie/307114/umfrage/tageszeitungen-in-oesterreich-nach-anz ahl-der-leser/. [30] T. Underwood, D. Bamman, and S. Lee. “The Transformation of Gender in English- Language Fiction”. In: Journal of Cultural Analytics 3.2 (2018). doi: 10.22148/16.019. [31] T. Unterhuber. Männlich codiert?: Annäherung an eine Medien- und Geschlechtergeschichte des Computerspiels. 2021. [32] E.-M. Venzmer. ”Oh, the [digital] humanities¡‘ – Eine quantitative Gender-Analyse von The Big Bang Theory. 2023. A. Appendix Maximum likelihood estimate. We consider the weighted likelihood 𝑙 𝑤𝑖 𝑘 𝐿(𝑝1 , ..., 𝑝𝑙 ) ∝ ∏ (∏ (𝑝𝑗 𝑖 (1 − 𝑝𝑗 )𝑚𝑖 −𝑘𝑖 ) ) , (3) 𝑗=1 𝑖∈𝑆𝑗 1048 where 𝑘𝑖 denotes the number of persons detected as female in article 𝑖, 𝑚𝑖 denotes the number of detected persons in article 𝑖, {𝑆1 , ..., 𝑆𝑙 } are disjoint subsets of the data {1, ..., 𝑛}, and the weight 𝑤𝑖 equals the reciprocal of the number of natural persons detected in articles written by the respective journalist 𝑎𝑖 , that is 1 𝑤𝑖 = . (4) ∑𝑗∶𝑎𝑖 =𝑎𝑗 𝑚𝑗 Note that we have discarded multiplicative constants from Equation 3. Weighted likelihood estimation is a well-established method in several circumstances [14]. The likelihood (Equation 3) is maximized at the parameter-values {𝑝1̂ , ..., 𝑝𝑙̂ } given by ∑𝑖∈𝑆𝑗 𝑤𝑖 𝑘𝑖 𝑝𝑗̂ = . (5) ∑𝑖∈𝑆𝑗 𝑤𝑖 𝑚𝑖 Under our choice of weighting (Equation 4), the maximum-likelihood estimates according to Equation 5 can be written as 𝑘 ∑𝑎∈𝐴 𝑚𝑎,𝑗 𝑚𝑎,𝑗 𝑘𝑎,𝑗 𝑎 1 𝑝𝑗̂ = 𝑚 = 𝑚𝑎,𝑗 ∑ , ∑𝑎∈𝐴 𝑚𝑎,𝑗 ∑𝑎∈𝐴 𝑚 𝑎∈𝐴 𝑚𝑎 𝑚𝑎,𝑗 𝑎 𝑎 where 𝐴 denotes the set of unique authors, 𝑚𝑎,𝑗 denotes the number of persons detected in texts of author 𝑎 in subset 𝑆𝑗 , 𝑚𝑎 denotes the number of persons detected in texts of author 𝑎, and 𝑘𝑎,𝑗 denotes the number of female persons detected in articles by author 𝑎 in subset 𝑆𝑗 . This is but the weighted mean of the naive per-author estimates for the subset, that is 𝑘𝑎,𝑗 /𝑚𝑎,𝑗 , weighted by the ’degree of membership’ of author 𝑎 in subset 𝑆𝑗 , that is by 𝑚𝑎,𝑗 /𝑚𝑎 . Note that this estimator is such that multiplying all data from a certain author by a constant does not change the estimate. In the special case of a single subset equal to the entirety of the data, the estimator takes the form 1 ∑𝑖∶𝑎𝑖 =𝑎 𝑘𝑖 𝑝̂ = ∑( ), |𝐴| 𝑎∈𝐴 ∑𝑖∶𝑎𝑖 =𝑎 𝑚𝑖 which is but the arithmetic average of the per-author relative frequencies. Confidence intervals. The variance of 𝑝𝑗̂ equals 1 𝑉 (𝑝𝑗̂ ) = 2 ∑ 𝑤𝑖2 𝑉 (𝑘𝑖 ) ( ∑𝑖∈𝑆𝑗 𝑤𝑖 𝑚𝑖 ) 𝑖∈𝑆𝑗 where 𝑘𝑖 ∼ binomial(𝑚𝑖 , 𝑝𝑗 ) and hence 𝑉 (𝑘𝑖 ) = 𝑝𝑗 (1 − 𝑝𝑗 )𝑚𝑖 . Therefore the plug-in estimator for the variance of 𝑝𝑗̂ is ∑𝑖∈𝑆𝑗 𝑤𝑖2 𝑚𝑖 𝑉 (𝑝𝑗̂ ) ≈ 𝑝𝑗̂ (1 − 𝑝𝑗̂ ) 2 (6) ( ∑𝑖∈𝑆𝑗 𝑤𝑖 𝑚𝑖 ) This enables us to use a normal approximation to the distribution of 𝑝𝑗̂ to construct confidence intervals. 1049 Hypothesis tests. To test null-hypotheses of the form 𝑝𝑗 = 𝑝𝑗 ′ , we construct a test using the test statistic 𝑝𝑗̂ − 𝑝𝑗̂ ′ ∼ 𝑁 (𝑝𝑗̂ − 𝑝𝑗̂ ′ , 𝑉 (𝑝𝑗̂ ) + 𝑉 (𝑝𝑗̂ ′ )), where the variances are computed according to Equation 6. 1050 Table 4 Estimated probabilities that an author, when mentioning a person, does mention a female person. Model Article count Estimate St. dev. 95% interval All authors 30,099 0.265 0.006 [0.254,0.277] Male authors 19,444 0.141 0.006 [0.129,0.152] Female authors 10,655 0.464 0.011 [0.444,0.485] Table 5 Estimated department-specific probability that an author mentions a female person when mentioning a person. Model Article count Estimate St. dev. 95% interval Opinion 3293 0.241 0.011 [0.218,0.263] International 3577 0.203 0.013 [0.178,0.229] Domestic 1934 0.172 0.010 [0.152,0.191] Career 502 0.305 0.048 [0.211,0.399] Government 1780 0.265 0.032 [0.202,0.328] Female Standard 312 0.759 0.035 [0.691,0.827] Health 577 0.333 0.023 [0.289,0.378] Science 1803 0.308 0.021 [0.267,0.348] Economy 2858 0.185 0.016 [0.153,0.217] Sport 1333 0.136 0.014 [0.108,0.163] Family 139 0.514 0.081 [0.355,0.674] Panorama 2062 0.256 0.023 [0.211,0.302] Realty 685 0.174 0.030 [0.115,0.233] Law 374 0.204 0.040 [0.124,0.283] Culture 4400 0.313 0.012 [0.289,0.337] Future 484 0.318 0.050 [0.220,0.417] Lifestyle 1138 0.355 0.021 [0.313,0.396] Travel 161 0.274 0.058 [0.159,0.388] Web 1960 0.152 0.024 [0.104,0.200] Education 390 0.402 0.056 [0.293,0.511] Automobile 337 0.066 0.037 [0.000,0.138] 1051 Table 6 Estimated department- and gender-specific probabilities that a journalist mentions a female person when mentioning a person. Model Art. ct. Estimate St. dev. 95% interval p-value, 𝐻0 ∶ 𝑝𝑚 = 𝑝𝑓 Opinion, female authors 1030 0.579 0.025 [0.530,0.628] 0.00 Opinion, male authors 2263 0.093 0.009 [0.075,0.110] International, female authors 1279 0.234 0.024 [0.187,0.281] 0.09 International, male authors 2298 0.186 0.015 [0.157,0.216] Domestic, female authors 631 0.173 0.019 [0.135,0.211] 0.92 Domestic, male authors 1303 0.171 0.009 [0.154,0.188] Career, female authors 398 0.600 0.064 [0.475,0.724] 0.00 Career, male authors 104 0.060 0.037 [0.000,0.133] Government, female authors 662 0.340 0.066 [0.210,0.470] 0.11 Government, male authors 1118 0.220 0.034 [0.154,0.287] Female Standard, female authors 289 0.780 0.035 [0.711,0.850] 0.00 Female Standard, male authors 23 0.362 0.048 [0.267,0.456] Health, female authors 526 0.329 0.025 [0.280,0.378] 0.82 Health, male authors 51 0.341 0.045 [0.252,0.429] Science, female authors 673 0.443 0.031 [0.382,0.503] 0.00 Science, male authors 1130 0.134 0.022 [0.091,0.177] Economy, female authors 1332 0.258 0.026 [0.207,0.310] 0.00 Economy, male authors 1526 0.155 0.020 [0.117,0.194] Sport, female authors 23 0.064 Sport, male authors 1310 0.141 0.011 [0.120,0.162] Family, female authors 134 0.543 0.091 [0.364,0.722] Family, male authors 5 0.309 Panorama, female authors 766 0.336 0.052 [0.234,0.437] 0.03 Panorama, male authors 1296 0.214 0.023 [0.168,0.260] Realty, female authors 271 0.135 0.029 [0.079,0.191] 0.25 Realty, male authors 414 0.200 0.048 [0.106,0.294] Law, female authors 76 0.630 0.083 [0.468,0.792] 0.00 Law, male authors 298 0.036 0.023 [0.000,0.080] Culture, female authors 1591 0.432 0.019 [0.394,0.470] 0.00 Culture, male authors 2809 0.217 0.015 [0.188,0.245] Future, female authors 93 0.423 0.068 [0.289,0.558] 0.01 Future, male authors 391 0.169 0.064 [0.043,0.295] Lifestyle, female authors 537 0.455 0.033 [0.390,0.520] 0.00 Lifestyle, male authors 601 0.238 0.024 [0.191,0.286] Travel, female authors 40 0.351 0.072 [0.210,0.491] 0.19 Travel, male authors 121 0.207 0.084 [0.043,0.372] Web, female authors 21 0.374 0.063 [0.251,0.497] 0.00 Web, male authors 1939 0.111 0.024 [0.064,0.159] Education, female authors 268 0.360 0.060 [0.243,0.477] 0.40 Education, male authors 122 0.462 0.105 [0.256,0.669] Automobile, female authors 15 0.646 0.072 [0.505,0.787] 0.00 Automobile, male authors 322 0.060 0.035 [0.000,0.129] 1052