<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>GOSt-MT: A Knowledge Graph for Occupation-related Gender Biases in Machine Translation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Orfeas Menis Mastromichalakis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giorgos Filandrianos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eva Tsouparopoulou</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dimitris Parsanoglou</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Symeonaki</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giorgos Stamou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Artificial Intelligence and Learning Systems Laboratory, National Technical University of Athens</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Social Policy, Panteion University of Social and Political Sciences</institution>
          ,
          <addr-line>Athens</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Sociology, National and Kapodistrian University of Athens</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Gender bias in machine translation (MT) systems poses significant challenges that often result in the reinforcement of harmful stereotypes. Especially in the labour domain where frequently occupations are inaccurately associated with specific genders, such biases perpetuate traditional gender stereotypes with a significant impact on society. Addressing these issues is crucial for ensuring equitable and accurate MT systems. This paper introduces a novel approach to studying occupation-related gender bias through the creation of the GOSt-MT (Gender and Occupation Statistics for Machine Translation) Knowledge Graph. GOSt-MT integrates comprehensive gender statistics from real-world labour data and textual corpora used in MT training. This Knowledge Graph allows for a detailed analysis of gender bias across English, French, and Greek, facilitating the identification of persistent stereotypes and areas requiring intervention. By providing a structured framework for understanding how occupations are gendered in both labour markets and MT systems, GOSt-MT contributes to eforts aimed at making MT systems more equitable and reducing gender biases in automated translations.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge Graph</kwd>
        <kwd>Gender Bias</kwd>
        <kwd>Machine Translation</kwd>
        <kwd>Occupations</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Gender bias in machine translation systems is a pervasive issue that compromises the accuracy and
fairness of automated translations. Such biases can reinforce harmful stereotypes and contribute to
gender inequality, particularly in the context of occupational terms. This problem is exacerbated when
MT systems, widely used in diverse applications, systematically associate certain professions with
specific genders. Consider the example of Figure 1 where “the doctor” without any gender indication, is
translated into Greek as ‘ο γιατρός’ (the male doctor), while “the nurse” is consistently rendered as ‘η
νοσοκόμα’ (the female nurse). This illustrates how MT systems can reinforce gender stereotypes by
associating certain occupations predominantly with one gender [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Such biases are not only misleading
but also detrimental, as they perpetuate traditional gender roles and contribute to the gender disparities
      </p>
      <p>© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
observed in various professional sectors. Addressing and mitigating these biases is critical to ensure
that technology promotes gender equality rather than perpetuating discrimination.</p>
      <p>
        Our motivation derives from two primary concerns: the persistent gender inequalities in the labour
market and the existence of gendered algorithmic bias [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5">2, 3, 4, 5</xref>
        ], as they are both highlighted in strategic
social policy documents such as the European Commission’s Gender Equality Strategy 2020-2025 (EU
Commission, 2020) 1. The Commission emphasises the necessity of challenging gender stereotypes,
which are fundamental drivers of gender inequality across all societal domains, and identifies gender
stereotypes as significant contributors to the gender pay and pension gaps. Moreover, the Strategy places
a specific focus on the impact of Artificial Intelligence, highlighting the need for further exploration of
its potential to amplify or contribute to gender biases. Specifically, gender bias in machine translation
systems is a significant element of this aspect.
      </p>
      <p>
        For identifying bias and its source, it is essential to incorporate external knowledge that accurately
reflects the actual world, such as the distribution of occupations in actual labour markets and within
training datasets among diferent countries and languages [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This underscores the importance of
employing tools like Knowledge Graphs to refine and improve AI systems, ensuring they support
fairness and transparency in decision-making [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Semantic information and specifically Knowledge Graphs (KGs) have become increasingly prominent
as tools that enhance machine learning systems, particularly in areas like explainable AI (XAI), [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8, 9, 10,
11, 12, 13</xref>
        ], fairness, fact checking [14, 15, 16], and reasoning [17, 18, 19, 20]. They serve as foundational
elements by structuring vast datasets, which grounds large language models and other AI technologies
with a well-organized layer of knowledge [21]. This structured knowledge is essential for addressing
critical issues, such as gender bias, ensuring these systems operate within ethical guidelines [21, 22].
      </p>
      <p>Our research aims to investigate both horizontal and vertical occupational gender segregation, and
how these phenomena manifest in various types of gender bias in machine translation, in English,
French, and lower-resource languages like Greek. In this paper, we propose a novel approach to studying
occupation-related gender bias in MT systems through the creation of a Knowledge Graph (KG) on
Gender and Occupation Statistics for Machine Translation (GOSt-MT). Built upon the International
Standard Classification of Occupations (ISCO-08), a hierarchical framework endorsed by the
International Labour Organisation (ILO, 2012) that categorizes occupations into groups on diferent levels,
GOSt-MT incorporates real labour statistical data and statistical data from textual corpora to support
and facilitate the detection and study of stereotypical automatic translations. By integrating structured
occupational classifications and comprehensive gender statistics into a Knowledge Graph, we ofer
a nuanced understanding of how occupations are “gendered” in both actual labour markets and MT
training datasets, ofering insights into discriminating and resisting gender biases in the world(s) of
employment with twofold utility: identifying recurring stereotypical representations that resist despite
reality, i.e. existing data, being fundamentally diferent; and mapping professional areas that still require
interventions to overcome gender imbalances. This work contributes to the broader efort of making
MT systems more equitable and reliable, promoting gender fairness and eliminating stereotypes in
various societal domains.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Recent research has increasingly focused on uncovering and mitigating gender biases in machine
translation systems. Notably, studies such as those by [23] have empirically demonstrated how commercial
translation systems often perpetuate gender stereotypes by assigning genders to professions based on
societal biases rather than linguistic accuracy. Similarly, [24] highlighted the tendency of translation
algorithms to prefer masculine pronouns even in contexts where gender is unspecified. In the era of Large
1https://commission.europa.eu/strategy-and-policy/policies/justice-and-fundamental-rights/gender-equality/
gender-equality-strategy_en
Language Models, the study [25] reveals that tools like ChatGPT2, Google Translate3, and Microsoft
Translator4 perpetuate gender defaults and stereotypes, particularly failing to translate the English
gender-neutral pronoun “they” into equivalent gender-neutral pronouns in other languages, resulting
in translations that are incoherent and incorrect, especially for low-resource languages. This conclusion
also holds for high-resource languages such as Italian, as the preliminary analysis in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] demonstrates
that ChatGPT’s performance across diferent scenarios reveals a strong male bias, particularly when
not explicitly prompted to consider gender alternatives.
      </p>
      <p>Furthermore, studies such as [23, 26, 27, 28] have highlighted how gender biases manifest in the
assignment of pronouns to professions in machine translation systems. Professions like doctors,
engineers, and presidents are frequently associated with male pronouns, while roles like dancers, nurses,
and teachers are typically linked to female pronouns. Moreover, language models have been shown to
override explicit gender information in translations; for example, a translation from English to Spanish
incorrectly changed the gender of a female doctor to male, as noted in [28].</p>
      <p>This leads to a systematic failure to include feminine and gender-neutral options, underscoring the
need for ongoing improvements in machine translation models to ensure they align with evolving
societal norms and support inclusive communication.</p>
      <p>Knowledge Graphs (KGs) have been increasingly utilized to promote responsible and fair AI
applications [29]. For instance, [30] provides a comprehensive survey on bias in AI and highlights the
role of KGs in detecting and correcting biases, demonstrating how integrating KGs with machine
learning models can enhance the transparency and accountability of AI applications. To the best of
our knowledge, no existing works have utilized statistics from knowledge graphs to identify biases in
MT systems. This research aims to fill this gap by providing a valuable resource to the community,
specifically for identifying occupational gender biases and tracing their origins for machine translation
systems.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>In this section, we delve into the methods and techniques employed to create the GOSt-MT Knowledge
Graph. GOSt-MT integrates statistics from both real-world labour data and textual datasets, providing
a comprehensive resource for analyzing gender bias in machine translation. To achieve this, we utilized
multiple sources, including national and European statistical agencies and databases, to extract accurate
and up-to-date labour statistics. In addition, we developed a pipeline to extract gender statistics from
textual datasets, ensuring thorough analysis and integration of diverse data sources. The following
subsections detail our methodology and the specific resources and tools we employed throughout this
work, highlighting the steps taken to ensure the accuracy and reliability of the GOSt-MT Knowledge
Graph.</p>
      <sec id="sec-3-1">
        <title>3.1. Real World Statistics</title>
        <p>For the purpose of this study, we conducted secondary analyses using mainly data from EUROSTAT’s
labour market participation indicators, which are drawn from the European Union Labour Force Survey
(EU-LFS), (EUROSTAT 2022, 2024), the National Statistical Authorities of Greece (ELSTAT5) and the UK’s
NOMIS-Ofice for National Statistics (ONS) 6. The EU-LFS is a European large scale sample survey that
provides quarterly and annual statistics on labour market participation and inactivity among individuals
aged 15 and older. It is the largest survey of its kind in Europe, ofering extensive data that ensures
comparability across countries and over time due to its standardised definitions, classifications, and
2https://chatgpt.com/
3https://translate.google.com
4https://translator.microsoft.com
5https://www.statistics.gr/en/home/
6OficialCensusandLabourMarketStatistics,ONS,https://www.nomisweb.co.uk/datasets/aps218/reports/
employment-by-occupation?compare=K02000001
variables. The survey follows guidelines set by the International Labour Organisation (ILO) and employs
standard classifications such as the International Standard Classification of Occupations (ISCO-08),
detailing occupations at the 4-digit level for the current main job and at the 3-digit level for the last
job. Publicly available statistics include employment data by detailed occupation (ISCO-08 2-digit
level), broken down by age and gender. More specifically, we employed secondary statistical analysis
on EUROSTAT’s and ELSTAT’s data to estimate the gendered distributions of occupations in Greece
(2011-2022 at ISCO-08 3-digit level), the UK (2013-2019) at ISCO-08 2-digit level and France
(20132022) at the same level. Analysis was also performed on NOMIS-ONS data for the UK 2020-2023 to
produce results for the gender distribution of occupations at SOC2020 4-digit level. SOC2020 stands
for the Standard Occupational Classification 2020, a system used in the UK to classify and categorise
occupations. SOC2020 is developed by ONS and is used for a variety of purposes, including statistical
analyses and labour market studies. SOC2020 is based upon the same classification principles as the
2008 version of its international equivalent ISCO-08.</p>
        <p>More specifically, the occupational distributions were estimated based on the availability of data in
each country, i.e. the gendered distribution at the 2-digit level was calculated for all three countries,
encompassing 43 respective occupations for males and females from 2011-2022 for Greece, 2013-2022
for France, and 2013-2019 for the UK. For Greece, gendered distributions at the 3-digit level were also
estimated based on secondary data analysis from the National Statistical Authority of Greece and the
Mechanism of Labour Market Diagnosis provided by the Hellenic Republic, Ministry of Labour and
Social Insurance. This analysis provides gendered distributions for 130 occupations at that level from
2011-2022. Additionally, for the UK, based on data from NOMIS-ONS, distributions for 412 occupations
are estimated for the years 2020-2023, encompassing the period following Brexit. Moreover, a secondary
analysis was conducted for specific 3-d level occupations, such as doctors, which were not publicly
available from EUROSTAT for France. This examination utilised data from the OECD Data Explorer
archive7 and the World Health Organization’s European Health Information Gateway8.</p>
        <p>Calculating the respective percentages for the three countries from 2011 and onwards enabled
an examination of the evolution of these distributions over time, revealing the occupational gender
segregation trends over these periods in the specified countries. For instance, Figure 2 illustrates the
changes in gender distribution among medical practitioners (doctors) over the past decade. The statistics
depicted in this Figure reveal notable trends and diferences across Greece, France, and the UK. In
Greece, male doctors ranged from 56.53% to 63.88%, with a significant decline to 56.53% in 2022, while
female doctors increased from 36.12% to 43.47%, indicating a shift towards gender balance. The UK
shows a more balanced distribution, with male doctors decreasing from 55.43% in 2011 to 50.56% in
2023, and female doctors rising from 44.57% to 49.44%. France also trends towards gender balance, with
male doctors decreasing from 60.61% in 2011 to 52.52% in 2021, and female doctors increasing from
39.39% to 47.48%. Comparatively, the UK has the most stable gender distribution, approaching parity by
2023, while France follows closely behind. Greece, although showing improvement, still has a more
pronounced gender disparity. This analysis highlights the dynamic nature of gender distribution in the
medical profession and the varying rates of progress across these countries.</p>
        <p>Further analysis on the gender distribution in occupations across Greece, France, and the UK reveals
consistent patterns of gender disparity. Technical and manual trades are predominantly male in all three
countries, indicating a significant gender imbalance in sectors requiring technical skills and manual
labour. Conversely, care-giving and administrative roles are predominantly female, reflecting societal
trends where women are more represented in these fields. However, the medical profession should not
be considered predominantly male, as evidenced by the nearly equal percentages in the UK (50.56% male
and 49.44% female) and the substantial female representation in Greece (57.53% male and 43.47% female
for 2023) according to the latest available data. Conversely, midwifery nurses can be categorically
considered female, as the respective percentage is equal to 100%.</p>
        <sec id="sec-3-1-1">
          <title>7https://data-explorer.oecd.org/ 8https://gateway.euro.who.int/en/</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dataset Statistics</title>
        <p>Our methodology for extracting gender statistics for occupations from textual datasets involves a
comprehensive three-fold pipeline. As illustrated in Figure 3, this pipeline consists of three sequential
modules designed to detect, link, and analyze occupational terms and their associated genders within
textual data. The first module focuses on detecting occupations within a given text. Using a Large
Language Model (LLM), this module scans the text to identify and extract occupational terms accurately.
Once the occupations are detected, the second module comes into play, linking these terms to the
corresponding occupations in the GOSt-MT Knowledge Graph. This linking process ensures that each
detected occupation is mapped to a standardized occupational classification, facilitating consistent
analysis and comparison. The third module is dedicated to identifying the gender associated with each
detected occupation. This module determines the gender references within the context of the text,
enabling us to compile precise gender statistics for each occupation. Through this pipeline, we are able
to generate detailed statistics on the gender distribution of occupations within a textual dataset. These
statistics are then incorporated into the GOSt-MT, enriching the Knowledge Graph with valuable data
on gender representation.
3.2.1. Occupation Extraction
For the Occupation Extraction module, we employed a Large Language Model (LLM) to detect
occupations in a given text. The LLM was instructed in a zero-shot prompting manner to identify occupations
from the text, along with their respective contextual references and corresponding descriptions. The
latter facilitated matching the identified occupations with the corresponding occupations in the GOSt-MT
KG (see 3.2.2 for further details) as well as mitigating LLM hallucinations. As an illustration, consider
the following example of an input sentence and the respective output of the Occupation Extraction
module (this example was created with the Llama-70b 9 [31] model as our deployed LLM):</p>
        <sec id="sec-3-2-1">
          <title>Example 1. Detecting occupations in text</title>
          <p>Input:
The doctor put the cast on my leg while talking to the nurses about his new car.</p>
          <p>Output:
Occupation title: Doctor
Appearing in text as: doctor
Description: A medical professional who diagnoses and treats illnesses and injuries.</p>
          <p>Occupation title: Nurse
Appearing in text as: nurses
Description: A healthcare professional who assists doctors and provides hands-on care to patients</p>
          <p>We experimented with multiple LLMs including variations of LLama2 [31], Mistral, Mixtral [32],
Tower [33], and Meltemi [34]. The results across the models were very similar, due to the simplicity of
the task, particularly in cases where one or more occupations were referred to in the texts. For the final
results, we utilized Mixtral-8x70-v0.1 10, which empirically has shown the best performance 11.</p>
          <p>While experimenting with the LLMs we identified two primary forms of hallucinations and addressed
them separately. The first form involves the LLM detecting occupations that are not present in the text.
To address this, we asked the LLM to provide the in-text form of the detected occupations along with
their titles and descriptions. We then used fuzzy string matching to verify that these detected terms
were indeed part of the input text. If a detected term did not match any words in the input text above
a certain threshold, it was disregarded as a hallucination. The second form of hallucination occurs
when the LLM incorrectly identifies non-occupational terms as occupations. This issue was particularly
prevalent with smaller models and in cases where no occupations were present in the input text. We
addressed this form of hallucination using the second module of our pipeline, which is described in
detail in the following subsection.
3.2.2. Linking to GOSt-MT
To ensure the occupations detected by the Large Language Model in the first stage align with the
GOStMT Knowledge Graph that is curated by domain specialists, we implemented a linking module. Since the
KG is based on ISCO-08, which includes not only an occupation taxonomy but also descriptions for each
occupation, we framed this task as a retrieval problem. The descriptions generated by the LLM for each
detected job title are used to retrieve the most closely matching occupation from the KG. To accomplish
this, we converted both the descriptions of each occupation in the KG and those generated by the LLM
into embeddings. Following the approach proposed by [35], we utilized angle-based embeddings to map
the descriptions into a latent space where they can be easily compared. We then used cosine similarity
as the distance metric to find the closest matching descriptions. In the following example, you can
see the occupations of the GOSt-MT KG that matched the detected occupations of the previous step
illustrated in Example 1.</p>
          <p>By setting a similarity threshold, we can efectively filter out hallucinations where a detected term is
misidentified as an occupation. If the similarity between the LLM-provided description and any existing
occupation description in the KG falls below this threshold, the detected occupation is disregarded. This
9https://huggingface.co/meta-llama/Llama-2-70b
10https://huggingface.co/mistralai/Mixtral-8x7B-v0.1
11https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
retrieval and embedding-similarity approach helps us ensure that only valid occupations, as denfied in
our curated KG, are considered, thereby addressing potential hallucinations from the initial detection
stage. By rigorously matching descriptions, we maintain the accuracy and reliability of the occupation
data integrated into the GOSt-MT Knowledge Graph.
3.2.3. Gender Identification
The final and most challenging part of our pipeline is the gender identification module. This module
aims to identify the gender of an occupation in the text or conclude that the gender cannot be determined
from the context. By doing this, we can calculate gender statistics for the occupations detected and
matched with GOSt-MT in the previous stages and ultimately incorporate these statistics into the
Knowledge Graph. We identified three distinct cases for deriving the gender of an occupation, which
we investigate stepwise.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Example 2. Linking the detected occupations to GOSt-MT</title>
          <p>Doctor → Medical Doctor (ISCO code: 221): Medical doctors (physicians) study, diagnose, treat and prevent
illness, disease, injury, and other physical and mental impairments in humans through the application of
the principles and procedures of modern medicine. They plan, supervise and evaluate the implementation
of care and treatment plans by other health care providers, and conduct medical education and research
activities.</p>
          <p>Nurse → Nursing and midwifery professional (ISCO code: 222): Nursing and midwifery professionals
provide treatment and care services for people who are physically or mentally ill, disabled or infirm, and
others in need of care due to potential risks to health including before, during and after childbirth. They
assume responsibility for the planning, management and evaluation of the care of patients, including the
supervision of other health care workers, working autonomously or in teams with medical doctors and
others in the practical application of preventive and curative measures.</p>
          <p>If one case determines the gender, we do not proceed to the next steps. The first case occurs when
the occupation word itself indicates gender. This is common in notional gender languages such as
English as well as grammatical gender languages such as Spanish, French, and Greek, where variations
in words often signify gender (e.g. waiter/waitress, or in Greek ‘νοσοκόμος’ for a male nurse and
‘νοσοκόμα’ for a female nurse). We use the SpaCy12 library to automatically detect if a word has a
gender indication. If the occupation word does not indicate gender, we proceed to the second case,
where gender is directly mentioned through pronouns. For example, in the sentence “He is a nurse”,
the pronoun “He” directly indicates the gender of the nurse. To identify such cases, we construct the
syntactic dependency tree using SpaCy and check for any direct links from a gendered pronoun to the
occupation. If neither the occupation word nor direct pronouns indicate gender, we move to the third
case: gender indication through coreference. Consider the text, “Today the doctor came to the hospital
45 minutes late. Consequently, his first appointment had already left.” Here, the gender of “doctor” is
inferred from the pronoun “his” in the second sentence. For this, we use the Coreferee13 library to find
all linguistic expressions (also called mentions) in the given text that refer to the same entity, here the
occupation of interest. We then check the gender of the words and pronouns linked to the occupation. If
we find a gender indication, we determine the occupation’s gender; if not, we conclude that the gender
cannot be determined from the text and exclude this detection from our statistics. Consider the example
below that follows Examples 1, and 2 and illustrates the output of the Gender Identification module for
the input of Example 1.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>Example 3. Identifying the gender of the detected occupations</title>
          <p>Doctor → Male (Coreference)
Nurse → Not Clear
12https://spacy.io/api/morphology
13https://pypi.org/project/coreferee/</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. The Knowledge Graph</title>
      <p>Based on the methodology described in Section 3, we collected real-world gender statistics on the
labour market as well as occupation-related gender statistics from textual datasets. In this work,
we have focused on employment data from the UK, Greece, and France, and we have extracted the
respective statistics for English, Greek, and French, from the WMT dataset 14 as well as a part of
the C4 dataset [36] 15 . This extensive data collection enabled us to create the GOSt-MT Knowledge
Graph. By systematically integrating structured occupational classifications with comprehensive gender
statistics, we have constructed a detailed and accurate representation of gender distribution across
various occupations.</p>
      <p>The GOSt-MT Knowledge Graph serves as a resource for studying gender bias in machine translation
systems and providing valuable insights into gender representation within diferent professional sectors.
This comprehensive approach allows for a nuanced understanding of how occupations are “gendered”
in both the actual labour market and the textual data used to train MT systems.</p>
      <p>For example, by analyzing the WMT dataset, a widely used resource for training machine translation
systems, we discovered a consistent gender misalignment in the occupational category of lawyers.
Specifically, in over 85% of instances where a gender was assigned to a lawyer, it was male rather than
female. This bias could potentially be transferred to a machine translation model trained using this
dataset, perpetuating a stereotype with significant societal impact. Moreover, such biases are not only
detrimental to societal equality but also fail to accurately represent the distribution of this profession
in the real world. Specifically, real-world statistics from 2011 to 2022 show that each year there were
consistently more female lawyers, with the ratio ranging from 56% to 62%.</p>
      <p>This analysis underscores that, beyond computer scientists and AI researchers, the GOSt-MT is also
of great interest to social researchers and scholars in fields such as Science and Technology Studies
(STS). It provides a robust tool for examining the intersection of technology and gender, ofering a
valuable resource for those aiming to address and mitigate gender biases in both technology and society.</p>
      <sec id="sec-4-1">
        <title>4.1. Structure</title>
        <p>The structure of the GOSt-MT Knowledge Graph is presented in Figure 4. GOSt-MT is fundamentally
based on the International Standard Classification of Occupations (ISCO-08), which provides a
hierarchical taxonomy of occupations. This hierarchy organizes occupations in our KG into broader and
narrower categories linked through “subclassOf” relations. For example, “Professionals” (ISCO Code
14https://huggingface.co/datasets/wmt/wmt14
15https://huggingface.co/datasets/allenai/c4
2) includes “Health Professionals” (ISCO Code 22) as a subclass, which further branches into several
occupations including “Medical Doctors” (ISCO Code 221) and “Nursing and Midwifery Professionals”
(ISCO Code 222). Each occupation in the KG has a title, description, and ISCO code, all extracted from
the ISCO-08 standard.</p>
        <p>The GOSt-MT Knowledge Graph also integrates comprehensive statistical data about gender
representation in various occupations. This integration is achieved through “Statistics” entities, which link
occupations to gender statistics. Each “Statistics” entity includes two key attributes: malePercentage
and femalePercentage. These percentages indicate the proportion of male and female workers in a
given occupation, or the respective proportion of masculine and feminine mentions of occupations in
textual corpora.</p>
        <p>The “Statistics” entities are connected to either a “Dataset” entity or a “Survey” entity, depending on
the source of the data. Each “Dataset” entity includes a title and description, reflecting the
dataset’s content. If the statistics are derived from a survey, the “Survey” entity also includes a title,
description, and the year or time period of the survey.</p>
        <p>Furthermore, each “Statistics” entity is linked to a “Country” entity, providing contextual information
about the geographical origin of the data or the language of the textual corpora respectively. When the
statistics are linked to a dataset, the relationship is represented by the “hasLanguage” relation, indicating
the language of the analyzed texts. Conversely, if the statistics are from a survey, the “linkedToCountry”
relation specifies the country from which the survey data originated and refer to.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion &amp; Future Work</title>
      <p>This study highlights the significant challenges posed by gender bias within machine translation (MT)
systems, particularly regarding the representation of occupational roles. The development of the
GOSt-MT Knowledge Graph represents a novel approach to integrating real-world labour statistics
with the textual corpora used in MT training. By combining statistics from multiple sources into a
single knowledge graph, we provide an opportunity to study and identify misalignments among the
occupational distributions
across genders in the real world and the training sets of MT models.</p>
      <p>Future work will focus on expanding our methodology to include a broader array of datasets, thereby
enriching the statistical analysis available for commonly used training corpora in large language models.
Additionally, using the GOSt-MT pipeline to identify occupational titles and their genders will be crucial
for detecting discrepancies in gender representation of occupations between the input and output of
MT systems. Subsequently, GOSt-MT could be employed to identify the sources of these misalignments,
whether they arise from the datasets, inherent algorithmic biases, or a combination of both.</p>
    </sec>
    <sec id="sec-6">
      <title>Limitations</title>
      <p>This study, is subject to certain limitations that must be acknowledged. First, the statistics integrated
into the GOSt-MT Knowledge Graph are derived exclusively from European and UK labour markets.
This regional focus may limit the generalizability of our findings to other geographic areas where
occupational roles and gender distributions may difer significantly.</p>
      <p>Additionally, the GOSt-MT pipeline itself may not be entirely free from biases, similar to those it
aims to identify. A particular point of concern is the coreference model, which relies on a language
model potentially vulnerable to the same gender biases we seek to identify. While these models have
been specifically trained to mitigate such biases—thereby making them less susceptible—it is crucial to
recognize that no model is completely immune to bias. This was a decisive factor in opting for these
specialized models over more general large language model (LLM) techniques, which may not have the
same focus on minimizing gender bias.</p>
      <p>Lastly, the GOSt-MT pipeline’s applicability to languages with limited available data represents
another limitation. For languages that lack substantial textual or labour market data, the efectiveness
of the GOSt-MT in detecting gender biases may be compromised. This underscores the need for future
research to adapt and refine the pipeline for broader linguistic coverage, ensuring that the benefits of
this research can be extended to a wider array of languages and cultural contexts.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT and Grammarly in order to check
grammar and spelling. After using this tool/service, the authors reviewed and edited the content as
needed and take full responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This research work is Co-funded by the European Union’s Horizon Europe Research and Innovation
programme under Grant Agreement No 101070631 and from the UK Research and Innovation(UKRI)
under the UK government’s Horizon Europe funding guarantee (Grant No 10039436).-FSTP – Pilot
Project:- SURE-GB
[11] O. M. Mastromichalakis, E. Dervakos, A. Chortaras, G. Stamou, Rule-based explanations of machine
learning classifiers using knowledge graphs, in: Proceedings of the AAAI Symposium Series,
volume 3, 2024, pp. 193–202.
[12] J. Liartis, E. Dervakos, O. Menis-Mastromichalakis, A. Chortaras, G. Stamou, Searching for
explanations of black-box classifiers in the space of semantic queries, Semantic Web (2023) 1–42.
[13] O. Menis-Mastromichalakis, G. Filandrianos, J. Liartis, E. Dervakos, G. Stamou, Semantic
prototypes: Enhancing transparency without black boxes, arXiv preprint arXiv:2407.15871 (2024).
[14] J. Kim, S. Park, Y. Kwon, Y. Jo, J. Thorne, E. Choi, FactKG: Fact verification via reasoning on
knowledge graphs, in: A. Rogers, J. Boyd-Graber, N. Okazaki (Eds.), Proceedings of the 61st
Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),
Association for Computational Linguistics, Toronto, Canada, 2023, pp. 16190–16206. URL: https:
//aclanthology.org/2023.acl-long.895. doi:10.18653/v1/2023.acl-long.895.
[15] Z. Yuan, A. Vlachos, Zero-shot fact-checking with semantic triples and knowledge graphs, 2023.</p>
      <p>URL: https://arxiv.org/abs/2312.11785. arXiv:2312.11785.
[16] L. Luo, T. Vu, D. Phung, R. Haf, Systematic assessment of factual knowledge in large language
models, in: H. Bouamor, J. Pino, K. Bali (Eds.), Findings of the Association for Computational Linguistics:
EMNLP 2023, Association for Computational Linguistics, Singapore, 2023, pp. 13272–13286. URL:
https://aclanthology.org/2023.findings-emnlp.885. doi: 10.18653/v1/2023.findings-emnlp.
885.
[17] M.-V. Nguyen, L. Luo, F. Shiri, D. Phung, Y.-F. Li, T.-T. Vu, G. Hafari, Direct evaluation of
chain-ofthought in multi-hop reasoning with knowledge graphs, 2024. URL: https://arxiv.org/abs/2402.
11199. arXiv:2402.11199.
[18] L. Luo, Y.-F. Li, G. Hafari, S. Pan, Reasoning on graphs: Faithful and interpretable large language
model reasoning, in: International Conference on Learning Representations, 2024.
[19] P. Giadikiaroglou, M. Lymperaiou, G. Filandrianos, G. Stamou, Puzzle solving using
reasoning of large language models: A survey, 2024. URL: https://arxiv.org/abs/2402.11291.
arXiv:2402.11291.
[20] S. Pan, L. Luo, Y. Wang, C. Chen, J. Wang, X. Wu, Unifying large language models and knowledge
graphs: A roadmap, IEEE Transactions on Knowledge and Data Engineering 36 (2024) 3580–3599.
doi:10.1109/TKDE.2024.3352100.
[21] H. Khorashadizadeh, F. Z. Amara, M. Ezzabady, F. Ieng, S. Tiwari, N. Mihindukulasooriya, J. Groppe,
S. Sahri, F. Benamara, S. Groppe, Research trends for the interplay between large language models
and knowledge graphs, 2024. URL: https://arxiv.org/abs/2406.08223. arXiv:2406.08223.
[22] E. Derner, S. S. de la Fuente, Y. Gutiérrez, P. Moreda, N. Oliver, Leveraging large language
models to measure gender bias in gendered languages, 2024. URL: https://arxiv.org/abs/2406.13677.
arXiv:2406.13677.
[23] M. O. Prates, P. H. Avelar, L. C. Lamb, Assessing gender bias in machine translation: a case study
with google translate, Neural Computing and Applications 32 (2020) 6363–6381.
[24] J. Zhao, T. Wang, M. Yatskar, V. Ordonez, K.-W. Chang, Men also like shopping: Reducing gender
bias amplification using corpus-level constraints, arXiv preprint arXiv:1707.09457 (2017).
[25] S. Ghosh, A. Caliskan, Chatgpt perpetuates gender bias in machine translation and ignores
non-gendered pronouns: Findings across bengali and five other low-resource languages, in:
Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, 2023, pp. 901–912.
[26] T. N. Fitria, Gender bias in translation using google translate: Problems and solution, Language</p>
      <p>Circle: Journal of Language and Literature 15 (2021).
[27] C. Ciora, N. Iren, M. Alikhani, Examining covert gender bias: A case study in turkish and english
machine translation models, arXiv preprint arXiv:2108.10379 (2021).
[28] G. Stanovsky, N. A. Smith, L. Zettlemoyer, Evaluating gender bias in machine translation, in:
A. Korhonen, D. Traum, L. Màrquez (Eds.), Proceedings of the 57th Annual Meeting of the
Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy,
2019, pp. 1679–1684. URL: https://aclanthology.org/P19-1164. doi:10.18653/v1/P19-1164.
[29] J. Pan, S. Razniewski, J.-C. Kalo, S. Singhania, J. Chen, S. Dietze, H. Jabeen, J. Omeliyanenko,
W. Zhang, M. Lissandrini, R. Biswas, G. de Melo, A. Bonifati, E. Vakaj, M. Dragoni, D. Graux, Large
Language Models and Knowledge Graphs: Opportunities and Challenges, Transactions on Graph
Data and Knowledge (2023). URL: https://hal.science/hal-04370111. doi:10.48550/arXiv.2308.
06374.
[30] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, A. Galstyan, A survey on bias and fairness in
machine learning, ACM computing surveys (CSUR) 54 (2021) 1–35.
[31] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P.
Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, M. Chen, G. Cucurull, D. Esiobu, J. Fernandes,
J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan,
M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M.-A. Lachaux, T. Lavril,
J. Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton,
J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan,
B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. Xu, Z. Yan, I. Zarov, Y. Zhang, A. Fan, M. Kambadur,
S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, T. Scialom, Llama 2: Open foundation and fine-tuned
chat models, 2023. URL: https://arxiv.org/abs/2307.09288. arXiv:2307.09288.
[32] A. Q. Jiang, A. Sablayrolles, A. Roux, A. Mensch, B. Savary, C. Bamford, D. S. Chaplot, D. de las Casas,
E. B. Hanna, F. Bressand, G. Lengyel, G. Bour, G. Lample, L. R. Lavaud, L. Saulnier, M.-A. Lachaux,
P. Stock, S. Subramanian, S. Yang, S. Antoniak, T. L. Scao, T. Gervet, T. Lavril, T. Wang, T. Lacroix,
W. E. Sayed, Mixtral of experts, 2024. URL: https://arxiv.org/abs/2401.04088. arXiv:2401.04088.
[33] D. M. Alves, J. Pombal, N. M. Guerreiro, P. H. Martins, J. Alves, A. Farajian, B. Peters, R. Rei,
P. Fernandes, S. Agrawal, P. Colombo, J. G. C. de Souza, A. F. T. Martins, Tower: An open
multilingual large language model for translation-related tasks, 2024. URL: https://arxiv.org/abs/
2402.17733. arXiv:2402.17733.
[34] L. Voukoutis, D. Roussis, G. Paraskevopoulos, S. Sofianopoulos, P. Prokopidis, V. Papavasileiou,
A. Katsamanis, S. Piperidis, V. Katsouros, Meltemi: The first open large language model for greek,
2024. URL: https://arxiv.org/abs/2407.20743. arXiv:2407.20743.
[35] X. Li, J. Li, Angle-optimized text embeddings, arXiv preprint arXiv:2309.12871 (2023).
[36] J. Dodge, M. Sap, A. Marasović, W. Agnew, G. Ilharco, D. Groeneveld, M. Mitchell, M. Gardner,
Documenting large webtext corpora: A case study on the colossal clean crawled corpus, 2021.
URL: https://arxiv.org/abs/2104.08758. arXiv:2104.08758.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>O. M.</given-names>
            <surname>Mastromichalakis</surname>
          </string-name>
          , G. Filandrianos,
          <string-name>
            <given-names>M.</given-names>
            <surname>Symeonaki</surname>
          </string-name>
          , G. Stamou,
          <article-title>Assumed identities: Quantifying gender bias in machine translation of gender-ambiguous occupational terms</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2503.04372. arXiv:
          <volume>2503</volume>
          .
          <fpage>04372</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Vanmassenhove</surname>
          </string-name>
          ,
          <article-title>9 gender bias in machine translation and the era of large language models, Gendered Technology in Translation and Interpreting: Centering Rights in the Development of Language Technology (</article-title>
          <year>2024</year>
          )
          <fpage>225</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Thakur</surname>
          </string-name>
          ,
          <article-title>Unveiling gender bias in terms of profession across llms: Analyzing and addressing sociological implications</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2307.09162. arXiv:
          <volume>2307</volume>
          .
          <fpage>09162</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Kirk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Volpin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Iqbal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Benussi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dreyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shtedritski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Asano</surname>
          </string-name>
          ,
          <article-title>Bias out-of-thebox: An empirical analysis of intersectional occupational biases in popular generative language models</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>34</volume>
          (
          <year>2021</year>
          )
          <fpage>2611</fpage>
          -
          <lpage>2624</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gorti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chadha</surname>
          </string-name>
          ,
          <article-title>Unboxing occupational bias: Grounded debiasing llms with us labor data</article-title>
          ,
          <source>arXiv preprint arXiv:2408.11247</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>O.</given-names>
            <surname>Menis-Mastromichalakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Filandrianos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Symeonaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Stamatopoulou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Parsanoglou</surname>
          </string-name>
          , G. Stamou,
          <article-title>Gender bias in machine learning: insights from oficial labour statistics and textual analysis</article-title>
          ,
          <source>Quality &amp; Quantity</source>
          (
          <year>2025</year>
          ). URL: https://link.springer.com/article/10.1007/s11135-025-02261-0. doi:
          <volume>10</volume>
          .1007/s11135-025-02261-0.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>O. Menis</given-names>
            <surname>Mastromichalakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liartis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Isaac</surname>
          </string-name>
          , G. Stamou,
          <article-title>Don't erase, inform! detecting and contextualizing harmful language in cultural heritage collections</article-title>
          , arXiv e-prints (
          <year>2025</year>
          ) arXiv-
          <fpage>2505</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Dervakos</surname>
          </string-name>
          , K. Thomas,
          <string-name>
            <given-names>G.</given-names>
            <surname>Filandrianos</surname>
          </string-name>
          , G. Stamou,
          <article-title>Choose your data wisely: A framework for semantic counterfactuals</article-title>
          , in: E. Elkind (Ed.),
          <source>Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, International Joint Conferences on Artificial Intelligence Organization</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>382</fpage>
          -
          <lpage>390</lpage>
          . URL: https://doi.org/10.24963/ijcai.
          <year>2023</year>
          /43. doi:
          <volume>10</volume>
          . 24963/ijcai.
          <year>2023</year>
          /43, main Track.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimitriou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lymperaiou</surname>
          </string-name>
          , G. Filandrianos, K. Thomas,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Stamou, Structure your data: Towards semantic graph counterfactuals</article-title>
          , in: R.
          <string-name>
            <surname>Salakhutdinov</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Kolter</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Heller</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Weller</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Oliver</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Scarlett</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Berkenkamp</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 41st International Conference on Machine Learning</source>
          , volume
          <volume>235</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>10897</fpage>
          -
          <lpage>10926</lpage>
          . URL: https://proceedings.mlr.press/v235/dimitriou24a.html.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liartis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Dervakos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Menis-Mastromichalakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chortaras</surname>
          </string-name>
          , G. Stamou,
          <article-title>Semantic queries explaining opaque machine learning classifiers</article-title>
          .,
          <source>in: DAO-XAI</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>