=Paper=
{{Paper
|id=Vol-3834/paper41
|storemode=property
|title=A quantitative study of gender representation and authors' gender in a large-market print medium
|pdfUrl=https://ceur-ws.org/Vol-3834/paper41.pdf
|volume=Vol-3834
|authors=Christoph Bartl,Sharwin Rezagholi,Mareike Schumacher
|dblpUrl=https://dblp.org/rec/conf/chr/BartlRS24
}}
==A quantitative study of gender representation and authors' gender in a large-market print medium==
<pdf width="1500px">https://ceur-ws.org/Vol-3834/paper41.pdf</pdf>
<pre>
                                A quantitative study of gender representation and
                                authors’ gender in a large-market print medium⋆
                                Christoph Bartl1 , Sharwin Rezagholi2,∗ and Mareike Schumacher3
                                1
                                  Department Computer Science, University of Applied Sciences Technikum Wien, Austria
                                2
                                  Department Computer Science, University of Applied Sciences Technikum Wien, Austria
                                3
                                  Institute of Literary Studies, University of Stuttgart, Germany


                                           Abstract
                                           We analyse gender representation in articles published by the Austrian daily newspaper ’Der Standard’
                                           in the years 2021 and 2022. We use named entity recognition and automated gender classification of first
                                           names to count the number of female and male persons in articles. The analysis reveals the dominance
                                           of male persons in article content. We find that female authors exhibit a significantly higher tendency
                                           to mention female persons in their articles.

                                           Keywords
                                           Gender of journalists, gender representation in articles, print newspaper content


                                1. Introduction
                                This paper asks whether (i) female persons are less likely to be mentioned by newspaper jour-
                                nalists than male persons, and (ii) whether the propensity to mention female persons differs
                                between female and male journalists. To answer these questions we analyse the entire article
                                output of the Austrian daily newspaper Der Standard (https://www.derstandard.at) from the
                                years 2021 and 2022. These articles are publicly available online. We additionally obtained the
                                full names of the authors of all articles from Der Standard; this authorship information is non-
                                public for most of the articles. Der Standard is the fourth-largest daily newspaper in Austria,
                                having more than 500,000 daily readers [29].
                                   This study employs a binary notion of gender, as does the language policy of Der Standard,
                                which prescribes the sole use of feminine and masculine pronouns, effectively prohibiting the
                                use of neopronouns. Therefore this study does not contribute to the abolishment of a binary
                                notion of gender, an aim prominently championed within the digital humanities by Laura Man-
                                dell [20].
                                   We employ pretrained models for natural language processing to automatically identify and
                                enumerate female and male persons mentioned in article texts. In particular, we use named
                                entity recognition and automated gender-assignment to first names. The respective methods


                                CHR 2024: Computational Humanities Research Conference, December 4–6, 2024, Aarhus, Denmark
                                ∗
                                 Corresponding author.
                                £ christoph.bartl@me.com (C. Bartl); sharwin.rezagholi@technikum-wien.at (S. Rezagholi);
                                mareike.schumacher@ur.de (M. Schumacher)
                                ȉ 0000-0003-1090-0240 (S. Rezagholi); 0000-0002-7952-4194 (M. Schumacher)
                                         © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


                                                                                                         1037
CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
are susceptible to bias. We therefore estimate the error profile of the gender assignment to first
names and verify that there is no significant bias that might have distorted our analysis.
   We statistically estimate the probability that a journalist mentions a female person when
mentioning a person. We find that females are less likely to be mentioned by journalists. This
finding holds for male as well as female authors, but we find it to be less pronounced in female
journalists. These effects are present to different degrees in different editorial departments,
with some departments not exhibiting the imbalance at all. The causal pathways leading to
these effects could include statistical ’self-selection’, whereby female authors and male authors
have a differing propensity to report on certain issues, or female authors could tend to highlight
the roles of female persons more than their male colleagues, even when reporting on the same
issue.

Literature review. The issue of gender representation and gender inequality is lively dis-
cussed in the digital humanities, including computational linguistics [12], digital film studies
[3, 32], game studies [31], and computational literary studies [4, 5, 8, 13, 19, 30, 32]. Newspa-
pers and magazines in particular have been studied with respect to gender representation and
possible gender bias. Yun et al. found that women were given more space in online than in
print journals and that, in a significant fraction of cases, women were portrayed in stereotyp-
ical ways [16]. Kian et al. analysed tennis news, finding that female reporters did not write
more often about female athletes than their male colleagues but that female reporters tended
to use more stereotypical descriptions [17]. Kozlowski et al. [18] analyse the magazines from
an Argentinian publisher from 2008 to 2018 using topic modelling and find that the prevalence
of thematic areas differs between magazines that target female readers and those that target
male ones. They find that this gap is diminishing with respect to certain topics, such as ’family’
and ’children’, whereas it remains large in others, such as ’fashion’ and ’horoscope’. The large-
scale analysis of Shor et al. [27], in which more than 20,000 prominent personalities of male and
female gender and different (but matching) professions were searched in about 2,000 English-
language newspapers, came to the conclusion that the reduced media coverage of women is
not in line with the readers’ interest, which does not favour prominent men over prominent
women. They thus provide some evidence in favour of the hypothesis that newspapers and
magazines foster stereotypes and gender bias.
   The work most related to ours is due to Mateos de Cabo et al. [21], whose analysis of Spanish
online newspapers found that females were more likely to be mentioned in female-authored
articles, and to Shor et al. [28], whose analysis of about 2,000 news sources found that the
fraction of females in articles increased from 19% in 1983 to 27% in 2008. The latter also found
significant differences between editorial departments.
   The digital humanities community has formulated a need to further the application of its
methods to questions of gender. These voices include Miriam Posner [23], who criticized that
gender-related work in the digital humanities does not receive sufÏcient attention, neither from
the scholarly community nor from news outlets. In 2018 Susan Brown stated that a feminist
perspective is largely lacking in the digital humanities, going as far as calling ’feminism’ the ’f
word’, suggesting that feminist approaches are effectively silenced [6]. In 2019 Laura Mandell
argued that studies on gender within the digital humanities would rather reproduce stereotypes


                                              1038
than analyse them [20]. Coining the term ’data feminism’ in 2020, D’Ignazio and Klein drew
awareness to gender representation biases in the digital humanities and in other data-driven
fields [10]. Although gender bias has been studied in various domains [9, 25, 15, 30, 19, 11, 24,
12], we agree that a sufÏcient corpus of statistical results remains absent.


2. Data and methods
We start with the text and metadata of 87,032 articles, corresponding to the entire journalistic
output of Der Standard between January 1, 2021 and December 31, 2022. The metadata of the
articles includes authorship, publication date, title, and editorial department. We removed all
articles from consideration that were written by a group of journalists or signed by a press
agency. This restriction is in line with our interest in the behaviour of individual writers at
a newspaper, as opposed to press agencies, group authorship, or anonymous authorship. We
retain 36,204 articles.

Named entity recognition and gender-assignment. We use the Python package Gender-
Guesser 0.4.0 to assign gender to the authors’ given names. GenderGuesser uses a database
of 40,000 gender-assigned first names to assign gender to a given name [2]. Since some given
names, such as ‘Andrea’, ’Maria’ or ‘Robin’, are not gender-exclusive, GenderGuesser returns
‘mostly female’, ‘mostly male’, and ‘androgynous’ in some cases. We assigned the first two
of these categories to ‘female’ and ‘male’ respectively. We manually checked the gender-
assignments of all 1,571 authors and corrected three cases. We therefore treat the assignment
of gender to the names of authors as certainly correct in the remainder of our analysis.
   We used the Python package Flair 0.12.2 [1] to recognize personal names in the article texts
using the named entity recognition model ‘ner-german-large’ [26]. In a first step we restrict
attention to names consisting of a given name and a family name since the editorial policy
of Der Standard prescribes the use of the full name at least once per article. The gender of
the detected given names was then determined using GenderGuesser. Names such as ‘Barack
Obama’, ‘Viktor F.’, ‘Angela Merkel’, ‘Nina H.’, and ’Luke Skywalker’ were identified. In a
second step we parse the articles for mentions of the identified persons using only a part of the
full name. To clarify our counting method: An article mentioning two persons, the same male
person 9 times and a female person once, is considered a text where 90% of mentioned persons
are male.
   We evaluated our extraction of full names in comparison to the manual counting of the num-
ber of female and male full names in 200 randomly selected articles (Table 2). We use binomial
estimates to quantify the conditional accuracies of the automated extraction (Table 1). The fair-
ness criterion of predictive parity requires the equality of the true positive rate for male and
female cases [7]. We find that these two rates are numerically very similar, approximately 0.87
and 0.88, and that the hypothesis that they are equal can not be rejected (Binomial proportions
test, 𝑝 = 0.68).

Generative model. Our main interest is the probability that an author, when mentioning a
person, does mention a female person. Our task is complicated by the fact that the automated


                                             1039
Table 1
Performance of gendered name recognition.
                                               Estimate      St. dev.        95% interval (Wald)
             ℙ(classified as male|male)          0.869        0.013             [0.844, 0.893]
             ℙ(classified as female|female)      0.881        0.026             [0.831, 0.931]
             ℙ(classified as female|male)        0.003        0.002             [0.000, 0.007]
             ℙ(classified as male|female)        0.019        0.010             [0.000, 0.040]
             ℙ(not detected|male)                0.129        0.013             [0.104, 0.153]
             ℙ(not detected|female)              0.100        0.024             [0.054, 0.146]


                                                                 0.88
                                            name is female                    detected as female
                                    p                                 0.10
                                                    0.02


            name mentioned                                                    name not detected
                                                                0.13
                                                     0.00
                              1−p

                                                                 0.87
                                            name is male                       detected as male

Figure 1: Generative model. Arrows are labelled with estimated conditional probabilities computed
on the basis of the manually labelled test set (Table 2).


detection of full names and the automated assignment of gender to the respective first names
could be biased. The probability that we have access to is

                              ℙ(detected as female|name detected).

We consider a simple generative model for our data (Figure 1), which illustrates that the events
’detected as female’ and ’detected as male’ do not allow the identification of the parameter of
interest, that is 𝑝, unless certain assumptions are made. The two necessary assumptions are
the absence of mix-ups, that is

                ℙ(detected as female|male) = ℙ(detected as male|female) = 0,                       (1)

and unbiased non-detection, that is

                        ℙ(not detected|female) = ℙ(not detected|male).                             (2)

If Equations 1 and 2 hold, then

     ℙ(fem. name detected|name detected) = ℙ(fem. name mentioned|name mentioned).


                                                 1040
Table 2
Contingency table for gendered name recognition.
                     Classified as male   Classified as female   Not recognized   Total
           Male             621                     2                 92           715
           Female            3                     141                16           160
           Total            624                    143                108


It is not necessary to assume that non-detection does not occur, but that non-detection is un-
biased. The empirical probabilities corresponding to those in Equation 2 are similar, approx-
imately 0.10 and 0.13, and the hypothesis that they are equal can not be rejected (Binomial
proportions test, 𝑝 = 0.31). As Table 1 reports, the empirical probabilities for mix-ups equal
approximately 0.00 and 0.02. We feel that these values are sufÏciently low to assume that Equa-
tion 1 holds. In Table 2 the fraction of female persons among the mentioned persons equals
160/(160 + 715) ≈ 0.18 while the fraction of names classified as female among the classified
names equals 143/(143 + 624) ≈ 0.19. This numerically illustrates the absence of bias.

Statistical model. The details of our statistical approach are presented in Appendix A. Our
basic modelling assumption is that the number of persons in an article is predetermined, but
the respective journalist ’chooses’ the gender of the mentioned persons independently from a
Bernoulli distribution. The parameter of this Bernoulli distribution is specific to the subset of
the data for which an estimate is desired. We believe that this model is sufÏcient to organise
the data in a tractable and intuitive fashion. To be concrete: We estimate the probability that a
journalist uses the name of a female person when using the name of a person. Note that our es-
timate does not equal the fraction of female persons in a certain subset of articles. Our estimate
is descriptive of authors’ behaviour, not of article output (see Appendix A). It is important to
note that every author, regardless of the number of persons mentioned in their respective arti-
cles, has equal importance for our point estimates, but the differing degrees of uncertainty for
different authors, caused by the different quantities of mentioned persons, are reflected in the
interval estimates, that is in the confidence intervals. When we estimate probabilities specific
to editorial departments, we employ a weighting scheme whereby the degree of membership
of an author in an editorial department is taken into account. Consult Appendix A for details
on our statistical approach.


3. Descriptive analysis
Of the 36,204 articles, 12,736 (35%) were written by a female author and 23,468 (65%) were
written by a male author. Among the 1,571 unique authors 606 (39%) are female. The average
author has contributed 23 articles to the dataset. The articles are of varying length, the median
length being 3,859 characters (1st quartile = 2,560, 3rd quartile = 5,248).
   The mean of the fraction of females in an article equals 25%. The respective mean for the arti-
cles written by male writers equals 20%. On the other hand, the mean for the articles authored


                                              1041
                                         1.00                                                          1.00                                                                    1.0


                                                                                                                                       Fraction of female persons in article
 Fraction of female persons in article


                                                               Fraction of female persons in article
                                                                                                                                                                               0.6

                                         0.43
                                                                                                       0.33
                                                                                                                                                                               0.2
                                         0.07
                                         0.00                                                          0.00                                                                    0.0
                                                All articles                                                  Male-authored articles                                                 Female-authored articles


Figure 2: Empirical distributions of the fraction of females in an article, quartiles on the vertical axes.


by female writers equals 33%. The empirical distributions of the fraction of female persons
in an article are visualized in Figure 2. Note that we only consider articles that do mention
at least one person. It is apparent that many articles that do mention persons do not mention
females at all. This is true for female- as well as male-authored articles. It is evident that the dis-
tributions for female- and male-authored articles differ. The distribution for female-authored
articles stochastically dominates the distribution for male-authored articles (McFadden’s test
[22], 𝑝 = 0.00). The distributions exhibit concentrations at the extremes. This illustrates that
many articles solely mention persons of one gender.
   Differentiating with respect to the editorial departments of Der Standard (Table 3), one finds
that the department Family produces the smallest number of articles but has the largest fraction
of female authorship. The largest number of articles is produced by the department Culture,
which corresponds to roughly 13% of our data.


4. Statistical estimation
Our first interests are to estimate the probabilities that (i) an author who mentions a person
does mention a female person, (ii) a female author who mentions a person does mention a
female person, and (iii) a male author who mentions a person does mention a female person.
The respective estimated probabilities are reported in Table 4 and visualised in Figure 3. While
the probability that an author mentions a female person when mentioning a person is estimated
to be roughly 27%, the respective estimate for female authors is roughly 46% and roughly 14%
for male authors. The differences are highly statistically significant (𝑝 = 0.00). We conclude
that there is strong evidence, at least in our data, that female journalists are more likely to
mention female persons in their articles compared to male journalists.
   To elucidate the differences between authors from different editorial departments, we esti-
mate the probability that a journalist from a given department mentions a female person when
mentioning a person. These estimates are reported in Table 5 and visualized in Figure 4. There
are editorial departments whose output is likely to mention females, such as Female Standard
and Family, and departments whose writers are unlikely to mention females, such as Automo-


                                                                                                                1042
Table 3
Data composition and author’s gender in terms of editorial departments.
  Department             Article   Article   Female      Fraction     Unique   Unique   Fraction
                         count     per-      author-     female       au-      female   female
                                   cent      ship                     thors    au-
                                             count                             thors
      Wirtschaft
                         3131      8.65      1462        0.47         134      44       0.33
      (Economy)
        Karriere
                         724       2.00      584         0.81         78       40       0.51
        (Career)
         Recht
                         704       1.94      148         0.21         131      36       0.27
         (Law)
      Die Standard
                         367       1.01      335         0.91         54       46       0.85
   (Female Standard)
      Gesundheit
                         657       1.81      603         0.92         47       30       0.64
       (Health)
      Automobil
                         644       1.78      44          0.07         35       9        0.26
     (Automobile)
          Web            3734      10.31     44          0.01         46       9        0.20
         Reisen
                         319       0.88      140         0.44         52       23       0.44
        (Travel)
        Meinung
                         4205      11.61     1472        0.35         692      226      0.33
       (Opinion)
      International
                         3812      10.53     1360        0.36         150      54       0.36
     (International)
        Inland
                         1972      5.45      638         0.32         88       41       0.47
      (Domestic)
        Lifestyle        1588      4.39      833         0.52         112      60       0.54
         Etat
                         1902      5.25      716         0.38         98       37       0.38
     (Government)
      Immobilien
                         853       2.36      364         0.43         39       17       0.44
       (Realty)
        Bildung
                         466       1.29      335         0.72         64       38       0.59
      (Education)
        Zukunft
                         628       1.73      145         0.23         44       25       0.57
        (Future)
         Sport           1375      3.80      28          0.02         61       14       0.23
         Familie
                         238       0.66      230         0.97         32       25       0.78
        (Family)
        Kultur
                         4625      12.77     1691        0.37         294      131      0.45
       (Culture)
       Panorama          2330      6.44      858         0.37         177      70       0.40
     Wissenschaft
                         1930      5.33      706         0.37         156      80       0.51
      (Science)


                                                  1043
         All authors
        Male authors
      Female authors
                       0           0.14       0.27          0.46                                     1
                           Probability that author mentions a female person when mentioning a person

Figure 3: Estimated probabilities that an author, when mentioning a person, does mention a female
person (with 95%-confidence intervals).


               Opinion
         International
             Domestic
                Career
         Government
      Female Standard
                Health
               Science
             Economy
                  Sport
                Family
            Panorama
                 Realty
                   Law
               Culture
                 Future
              Lifestyle
                 Travel
                   Web
            Education
           Automobile
                       0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
                         Probability that author mentions a female person when mentioning a person

Figure 4: Estimated probability that an author from the respective department mentions a female
person when mentioning a person (with 95%-confidence intervals).


bile and Sport. These findings are in line with previous studies from different countries and
languages [17, 28, 18].
   To disentangle the effect of authors’ gender and editorial departments, we stratify our anal-
ysis with respect to both. These estimates are visualized in Figure 5 and reported in Table 6,
including hypothesis tests for the null-hypothesis that female and male authors behave identi-
cally. We obtain significantly different estimates for female and male authors for many edito-


                                                     1044
           Opinion, fem. auth.
           Opinion, male auth.
     International, fem. auth.
    International, male auth.
         Domestic, fem. auth.
         Domestic, male auth.
             Career, fem. auth.
            Career, male auth.
     Government, fem. auth.
    Government, male auth.
 Female Standard, fem. auth.
 Female Standard, male auth.
             Health, fem. auth.
            Health, male auth.
           Science, fem. auth.
           Science, male auth.
         Economy, fem. auth.
         Economy, male auth.
               Sport, fem. auth.
              Sport, male auth.
             Family, fem. auth.
            Family, male auth.
        Panorama, fem. auth.
        Panorama, male auth.
              Realty, fem. auth.
             Realty, male auth.
                Law, fem. auth.
                Law, male auth.
            Culture, fem. auth.
           Culture, male auth.
             Future, fem. auth.
             Future, male auth.
          Lifestyle, fem. auth.
          Lifestyle, male auth.
              Travel, fem. auth.
             Travel, male auth.
                Web, fem. auth.
               Web, male auth.
        Education, fem. auth.
        Education, male auth.
       Automobile, fem. auth.
      Automobile, male auth.
                                0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
                                  Probability that author mentions a female person when mentioning a person

Figure 5: Estimated probability that a journalist from the respective department and of the respective
gender mentions a female person when mentioning a person (with 95%-confidence intervals).


rial departments. The departments Opinion, Career, Female Standard, Science, Economy, Law,
Culture, Lifestyle, Web, and Automobile exhibit highly significant differences with respect to
author gender (𝑝 = 0.00). The departments Domestic, Government, Health, Realty, Travel, and
Education do not exhibit statistically significant gender differences (𝑝 > 0.1).


                                                   1045
5. Conclusion and caveats
We present some statistical evidence for the hypothesis that female and male journalists have
differing propensities to mention female persons in their writing. This effect varies between
editorial departments, at least in our data, and is stronger within certain editorial departments
than across editorial departments. Further research is needed to elucidate whether the find-
ings of the present study are driven by a mechanism through which female journalists write
about different topics than male journalists. Even if this were true, it would remain ambiguous
whether the ultimate cause is the issue-assignment policy within newsrooms, or a desire by
journalists to write on topics featuring persons of a certain gender. The latter could also corre-
spond to a conscious attempt by female journalists to highlight female persons in an attempt to
counteract existing gender imbalances. We deem it an interesting avenue for further research
to quantitatively elucidate the relationship between journalistic topics, gender representation,
and authorship. If the observed differences were caused by different propensities in mentioning
female persons even when reporting on the same topic, this would indicate gender-specificity
in journalist’s viewpoints. Finally we want to highlight that the present study is most cer-
tainly marred by the problem that many relevant and potentially confounding factors, such as
topics, have not yet been taken into account. Therefore the present study is but an empirical
quantification of a status quo.


Acknowledgments
The authors thank Martin Kotynek and Werner Weichselberger from Der Standard for their
support.


References
 [1] A. Akbik, T. Bergmann, D. Blythe, K. Rasul, S. Schweter, and R. Vollgraf. “FLAIR: An easy-
     to-use framework for state-of-the-art NLP”. In: Proceedings of the 2019 Annual Conference
     of the North American Chapter of the Association for Computational Linguistics (2019),
     pp. 54–59.
 [2] D. Arcos. Gender-guesser. 2016. url: https://github.com/lead-ratings/gender-guesser.
 [3] D. Bamman, B. O’Connor, and N. A. Smith. “Learning Latent Personas of Film Char-
     acters”. In: Proceedings of the 51st Annual Meeting of the Association for Computational
     Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2013,
     pp. 352–361. url: https://aclanthology.org/P13-1035.
 [4] O. Baylog, L. Dimmit, T. Heller, G. Kirilloff, S. Smith, G. Thomas, C. Warren, and J.
     Wehrwein. ”More than Custom has Pronounced Necessary”: Exploring the Correlation be-
     tween Gendered Verbs and Character in the 19th Century Novel Nebraska Literary Lab.
     2016.


                                              1046
 [5] J. Bergenmar and K. Leppänen. “Gender and Vernaculars in Digital Humanities and
     World Literature”. In: NORA - Nordic Journal of Feminist and Gender Research 25.4 (2017),
     pp. 232–246. doi: 10.1080/08038740.2017.1378256.
 [6] S. Brown. “Delivery Service: Gender and the Political Unconscious of Digital Humani-
     ties”. In: Bodies of Information: Intersectional Feminism and the Digital Humanities. Uni-
     versity of Minnesota Press, 2018, pp. 261–286.
 [7] A. Castelnovo, R. Crupi, G. Greco, D. Regoli, I. G. Penco, and A. C. Cosentini. “A clari-
     fication of the nuances in the fairness metrics landscape”. In: Scientific Reports 12.4209
     (2022).
 [8] J. Cheng. “Fleshing Out Models of Gender in English-Language Novels (1850–2000)”. In:
     Journal of Cultural Analytics 5.1 (2020). doi: 10.22148/001c.11652.
 [9] M. Conroy. “Quantifying the Gap: The Gender Gap in French Writers’ Wikidata”. In:
     Journal of Cultural Analytics 8.2 (2023). doi: 10.22148/001c.74068.
[10]   C. D’Ignazio and L. F. Klein. Data Feminism. MIT Press, 2020.
[11]   M. Flüh, J. Horstmann, and M. Schumacher. “Genderaspekte in Fantasy-Jugendromanen
       von 2008 bis 2020: Distant Gender Reading”. In: Gender in der deutschsprachigen Kinder-
       und Jugendliteratur. De Gruyter, 2022, pp. 457–482. doi: 10.1515/9783110726404-025.
[12]   C. Freitas and D. Santos. “Human Depiction in Portuguese. Distant reading Brazilian and
       Portuguese literature”. In: Journal of Computational Literary Studies 2 (2024).
[13]   S. Hota and S. Argamon. Performing gender: Automatic stylistic analysis of Shakespeare’s
       characters. 2006.
[14]   F. Hu and J. Zidek. “The weighted likelihood”. In: Canadian Journal of Statistics 30.3
       (2002), pp. 347–371. doi: https://doi.org/10.2307/3316141.
[15]   M. Jockers and G. Kirilloff. “Understanding Gender and Character Agency in the 19th
       Century Novel”. In: Journal of Cultural Analytics 2.2 (2016). doi: 10.22148/16.010.
[16]   H. Jung Yun, M. Postelnicu, N. Ramoutar, and L. Lee Kaid. “Where Is She?: Coverage of
       women in online news magazines”. In: Journalism Studies 8.6 (2007), pp. 930–947. doi:
       10.1080/14616700701556823.
[17]   E. M. Kian, J. S. Fink, and M. Hardin. “Examining the Impact of Journalists’ Gender in
       Online and Newspaper Tennis Articles”. In: Women in Sport and Physical Activity Journal
       20.2 (2011), pp. 3–21. doi: 10.1123/wspaj.20.2.3.
[18]   D. Kozlowski, G. Lozano, C. Felcher, F. Gonzalez, and E. Altszyler. Gender bias in maga-
       zines oriented to men and women: A computational approach. 2020.
[19]   E. Kraicer and A. Piper. “Social Characters: The Hierarchy of Gender in Contemporary
       English-Language Fiction”. In: Journal of Cultural Analytics 3.2 (2019). doi: 10.22148/16
       .032.
[20]   L. Mandell. “Gender and Cultural Analytics: Finding or Making Stereotypes?” In: Debates
       in the Digital Humanities 2019. University of Minnesota Press, 2019, pp. 3–26.


                                             1047
[21]   R. Mateos de Cabo, R. Gimeno, M. Martı́nez, and L. López. “Perpetuating Gender Inequal-
       ity via the Internet? An Analysis of Women’s Presence in Spanish Online Newspapers”.
       In: Sex Roles 70.1 (2014), pp. 57–71. doi: 10.1007/s11199-013-0331-y.
[22]   D. McFadden. “Testing for Stochastic Dominance”. In: Studies in the Economics of Uncer-
       tainty. Springer, 1989, pp. 113–134.
[23]   M. Posner. “What’s Next: The Radical, Unrealized Potential of Digital Humanities”. In:
       Debates in the Digital Humanities 2016. University of Minnesota Press, 2016, pp. 32–41.
[24]   T. Schmidt, I. Engl, J. Herzog, and L. Judisch. “Towards an Analysis of Gender in Video
       Game Culture: Exploring Gender specific Vocabulary in Video Game Magazines”. In:
       Proceedings of the Digital Humanities in the Nordic Countries 5th Conference (DHN 2020)
       (2020), pp. 333–341.
[25]   M. Schumacher and M. Flüh. “Made to Be a Woman: A case study on the categorization
       of gender using an individuation-based approach in the analysis of literary texts”. In:
       Digital Humanities Quarterly 17.3 (2023). url: https://www.digitalhumanities.org/dhq/v
       ol/17/3/000728/000728.html.
[26]   S. Schweter and A. Akbik. “FLERT: Document-level features for named entity recogni-
       tion”. 2020.
[27]   E. Shor, A. van de Rijt, and B. Fotouhi. “A Large-Scale Test of Gender Bias in the Media”.
       In: Sociological Science 6 (2019), pp. 526–550. doi: 10.15195/v6.a20.
[28]   E. Shor, A. van de Rijt, A. Miltsov, V. Kulkarni, and S. Skiena. “A Paper Ceiling: Ex-
       plaining the Persistent Underrepresentation of Women in Printed News”. In: American
       Sociological Review 80.5 (2015), pp. 960–984. doi: 10.1177/0003122415596999.
[29]   Statista. Österreich Tageszeitungen nach Anzahl der Leser. 2022. url: https://de.statista.c
       om/statistik/%20daten/studie/307114/umfrage/tageszeitungen-in-oesterreich-nach-anz
       ahl-der-leser/.
[30]   T. Underwood, D. Bamman, and S. Lee. “The Transformation of Gender in English-
       Language Fiction”. In: Journal of Cultural Analytics 3.2 (2018). doi: 10.22148/16.019.
[31]   T. Unterhuber. Männlich codiert?: Annäherung an eine Medien- und Geschlechtergeschichte
       des Computerspiels. 2021.
[32]   E.-M. Venzmer. ”Oh, the [digital] humanities¡‘ – Eine quantitative Gender-Analyse von The
       Big Bang Theory. 2023.


A. Appendix
Maximum likelihood estimate. We consider the weighted likelihood

                                          𝑙                            𝑤𝑖
                                                      𝑘
                         𝐿(𝑝1 , ..., 𝑝𝑙 ) ∝ ∏ (∏ (𝑝𝑗 𝑖 (1 − 𝑝𝑗 )𝑚𝑖 −𝑘𝑖 ) ) ,                   (3)
                                         𝑗=1   𝑖∈𝑆𝑗


                                               1048
where 𝑘𝑖 denotes the number of persons detected as female in article 𝑖, 𝑚𝑖 denotes the number
of detected persons in article 𝑖, {𝑆1 , ..., 𝑆𝑙 } are disjoint subsets of the data {1, ..., 𝑛}, and the weight
𝑤𝑖 equals the reciprocal of the number of natural persons detected in articles written by the
respective journalist 𝑎𝑖 , that is
                                                           1
                                               𝑤𝑖 =                .                                        (4)
                                                      ∑𝑗∶𝑎𝑖 =𝑎𝑗 𝑚𝑗
Note that we have discarded multiplicative constants from Equation 3. Weighted likelihood
estimation is a well-established method in several circumstances [14]. The likelihood (Equation
3) is maximized at the parameter-values {𝑝1̂ , ..., 𝑝𝑙̂ } given by
                                                            ∑𝑖∈𝑆𝑗 𝑤𝑖 𝑘𝑖
                                                    𝑝𝑗̂ =                 .                                (5)
                                                            ∑𝑖∈𝑆𝑗 𝑤𝑖 𝑚𝑖

Under our choice of weighting (Equation 4), the maximum-likelihood estimates according to
Equation 5 can be written as
                                                𝑘
                                       ∑𝑎∈𝐴 𝑚𝑎,𝑗                        𝑚𝑎,𝑗 𝑘𝑎,𝑗
                                                    𝑎           1
                               𝑝𝑗̂ =        𝑚           =        𝑚𝑎,𝑗 ∑           ,
                                       ∑𝑎∈𝐴 𝑚𝑎,𝑗            ∑𝑎∈𝐴 𝑚 𝑎∈𝐴 𝑚𝑎 𝑚𝑎,𝑗
                                              𝑎                    𝑎

where 𝐴 denotes the set of unique authors, 𝑚𝑎,𝑗 denotes the number of persons detected in
texts of author 𝑎 in subset 𝑆𝑗 , 𝑚𝑎 denotes the number of persons detected in texts of author
𝑎, and 𝑘𝑎,𝑗 denotes the number of female persons detected in articles by author 𝑎 in subset 𝑆𝑗 .
This is but the weighted mean of the naive per-author estimates for the subset, that is 𝑘𝑎,𝑗 /𝑚𝑎,𝑗 ,
weighted by the ’degree of membership’ of author 𝑎 in subset 𝑆𝑗 , that is by 𝑚𝑎,𝑗 /𝑚𝑎 . Note that
this estimator is such that multiplying all data from a certain author by a constant does not
change the estimate. In the special case of a single subset equal to the entirety of the data, the
estimator takes the form
                                          1      ∑𝑖∶𝑎𝑖 =𝑎 𝑘𝑖
                                    𝑝̂ =     ∑(              ),
                                         |𝐴| 𝑎∈𝐴 ∑𝑖∶𝑎𝑖 =𝑎 𝑚𝑖
which is but the arithmetic average of the per-author relative frequencies.

Confidence intervals.          The variance of 𝑝𝑗̂ equals
                                                            1
                                   𝑉 (𝑝𝑗̂ ) =                       2
                                                                        ∑ 𝑤𝑖2 𝑉 (𝑘𝑖 )
                                                ( ∑𝑖∈𝑆𝑗 𝑤𝑖 𝑚𝑖 ) 𝑖∈𝑆𝑗

where 𝑘𝑖 ∼ binomial(𝑚𝑖 , 𝑝𝑗 ) and hence 𝑉 (𝑘𝑖 ) = 𝑝𝑗 (1 − 𝑝𝑗 )𝑚𝑖 . Therefore the plug-in estimator
for the variance of 𝑝𝑗̂ is
                                                            ∑𝑖∈𝑆𝑗 𝑤𝑖2 𝑚𝑖
                                𝑉 (𝑝𝑗̂ ) ≈ 𝑝𝑗̂ (1 − 𝑝𝑗̂ )                 2
                                                                                                (6)
                                                          ( ∑𝑖∈𝑆𝑗 𝑤𝑖 𝑚𝑖 )
This enables us to use a normal approximation to the distribution of 𝑝𝑗̂ to construct confidence
intervals.


                                                            1049
Hypothesis tests.   To test null-hypotheses of the form 𝑝𝑗 = 𝑝𝑗 ′ , we construct a test using the
test statistic
                           𝑝𝑗̂ − 𝑝𝑗̂ ′ ∼ 𝑁 (𝑝𝑗̂ − 𝑝𝑗̂ ′ , 𝑉 (𝑝𝑗̂ ) + 𝑉 (𝑝𝑗̂ ′ )),
where the variances are computed according to Equation 6.


                                                  1050
Table 4
Estimated probabilities that an author, when mentioning a person, does mention a female person.
                Model             Article count     Estimate   St. dev.   95% interval
                All authors             30,099       0.265      0.006     [0.254,0.277]
                Male authors            19,444       0.141      0.006     [0.129,0.152]
                Female authors          10,655       0.464      0.011     [0.444,0.485]


Table 5
Estimated department-specific probability that an author mentions a female person when mentioning
a person.
               Model              Article count     Estimate   St. dev.   95% interval
               Opinion                     3293       0.241     0.011     [0.218,0.263]
               International               3577       0.203     0.013     [0.178,0.229]
               Domestic                    1934       0.172     0.010     [0.152,0.191]
               Career                       502       0.305     0.048     [0.211,0.399]
               Government                  1780       0.265     0.032     [0.202,0.328]
               Female Standard              312       0.759     0.035     [0.691,0.827]
               Health                       577       0.333     0.023     [0.289,0.378]
               Science                     1803       0.308     0.021     [0.267,0.348]
               Economy                     2858       0.185     0.016     [0.153,0.217]
               Sport                       1333       0.136     0.014     [0.108,0.163]
               Family                       139       0.514     0.081     [0.355,0.674]
               Panorama                    2062       0.256     0.023     [0.211,0.302]
               Realty                       685       0.174     0.030     [0.115,0.233]
               Law                          374       0.204     0.040     [0.124,0.283]
               Culture                     4400       0.313     0.012     [0.289,0.337]
               Future                       484       0.318     0.050     [0.220,0.417]
               Lifestyle                   1138       0.355     0.021     [0.313,0.396]
               Travel                       161       0.274     0.058     [0.159,0.388]
               Web                         1960       0.152     0.024     [0.104,0.200]
               Education                    390       0.402     0.056     [0.293,0.511]
               Automobile                   337       0.066     0.037     [0.000,0.138]


                                                  1051
Table 6
Estimated department- and gender-specific probabilities that a journalist mentions a female person
when mentioning a person.
   Model                             Art. ct.   Estimate   St. dev.   95% interval    p-value, 𝐻0 ∶ 𝑝𝑚 = 𝑝𝑓
   Opinion, female authors             1030      0.579      0.025     [0.530,0.628]                    0.00
   Opinion, male authors               2263      0.093      0.009     [0.075,0.110]
   International, female authors       1279      0.234      0.024     [0.187,0.281]                    0.09
   International, male authors         2298      0.186      0.015     [0.157,0.216]
   Domestic, female authors             631      0.173      0.019     [0.135,0.211]                    0.92
   Domestic, male authors              1303      0.171      0.009     [0.154,0.188]
   Career, female authors                398     0.600      0.064     [0.475,0.724]                    0.00
   Career, male authors                  104     0.060      0.037     [0.000,0.133]
   Government, female authors           662      0.340      0.066     [0.210,0.470]                    0.11
   Government, male authors            1118      0.220      0.034     [0.154,0.287]
   Female Standard, female authors       289     0.780      0.035     [0.711,0.850]                    0.00
   Female Standard, male authors          23     0.362      0.048     [0.267,0.456]
   Health, female authors                526     0.329      0.025     [0.280,0.378]                    0.82
   Health, male authors                   51     0.341      0.045     [0.252,0.429]
   Science, female authors              673      0.443      0.031     [0.382,0.503]                    0.00
   Science, male authors               1130      0.134      0.022     [0.091,0.177]
   Economy, female authors             1332      0.258      0.026     [0.207,0.310]                    0.00
   Economy, male authors               1526      0.155      0.020     [0.117,0.194]
   Sport, female authors                 23      0.064
   Sport, male authors                 1310      0.141      0.011     [0.120,0.162]
   Family, female authors                134     0.543      0.091     [0.364,0.722]
   Family, male authors                    5     0.309
   Panorama, female authors             766      0.336      0.052     [0.234,0.437]                    0.03
   Panorama, male authors              1296      0.214      0.023     [0.168,0.260]
   Realty, female authors                271     0.135      0.029     [0.079,0.191]                    0.25
   Realty, male authors                  414     0.200      0.048     [0.106,0.294]
   Law, female authors                    76     0.630      0.083     [0.468,0.792]                    0.00
   Law, male authors                     298     0.036      0.023     [0.000,0.080]
   Culture, female authors             1591      0.432      0.019     [0.394,0.470]                    0.00
   Culture, male authors               2809      0.217      0.015     [0.188,0.245]
   Future, female authors                 93     0.423      0.068     [0.289,0.558]                    0.01
   Future, male authors                  391     0.169      0.064     [0.043,0.295]
   Lifestyle, female authors             537     0.455      0.033     [0.390,0.520]                    0.00
   Lifestyle, male authors               601     0.238      0.024     [0.191,0.286]
   Travel, female authors                 40     0.351      0.072     [0.210,0.491]                    0.19
   Travel, male authors                  121     0.207      0.084     [0.043,0.372]
   Web, female authors                   21      0.374      0.063     [0.251,0.497]                    0.00
   Web, male authors                   1939      0.111      0.024     [0.064,0.159]
   Education, female authors             268     0.360      0.060     [0.243,0.477]                    0.40
   Education, male authors               122     0.462      0.105     [0.256,0.669]
   Automobile, female authors             15     0.646      0.072     [0.505,0.787]                    0.00
   Automobile, male authors              322     0.060      0.035     [0.000,0.129]


                                                   1052

</pre>