=Paper= {{Paper |id=Vol-3808/paper17 |storemode=property |title=AI-fairness and equality of opportunity: a case study on educational achievement |pdfUrl=https://ceur-ws.org/Vol-3808/paper17.pdf |volume=Vol-3808 |authors=Angel S. Marrero,Gustavo A. Marrero,Carlos Bethencourt,Liam James,Roberta Calegari |dblpUrl=https://dblp.org/rec/conf/aequitas/MarreroMBJC24 }} ==AI-fairness and equality of opportunity: a case study on educational achievement== https://ceur-ws.org/Vol-3808/paper17.pdf
                         AI-fairness and equality of opportunity: a case study on
                         educational achievement
                         Ángel S. Marrero1 , Gustavo A. Marrero1 , Carlos Bethencourt1 , Liam James2 and
                         Roberta Calegari2
                         1
                             Department of Economics and Research Center of Social Inequality and Governance, University of La Laguna, Spain
                         2
                             Department of Computer Science and Engineering Alma Mater Studiorum, Univerisità di Bologna, Italy


                                        Abstract
                                        This study focuses on predicting students’ academic performance, examining how AI predictive models often
                                        reflect socioeconomic inequalities influenced by factors such as parental socioeconomic status and home environ-
                                        ment, which affect the fairness of predictions. We compare three AI models aimed at performing an ablation study
                                        to understand how these sensitive features (referred to as circumstances) influence predictions. Our findings
                                        reveal biases in predictions that favor advantaged groups, depending on whether the goal is to identify excellence
                                        or underperformance. Additionally, a two-stage estimation procedure is proposed in the third model to mitigate
                                        the impact of sensitive features on predictions, thereby offering a model that can be considered fair with respect
                                        to inequality of opportunity.

                                        Keywords
                                        AI-fairness, socioeconomic equality of opportunity, AI-ethics




                         1. Introduction
                         Academic performance in primary school is a good predictor of an individual’s future income and
                         well-being [1]. Anticipating low academic performance levels is relevant to implementing corrective
                         policies at early ages, and anticipating high academic performance is also relevant to applying incentive
                         mechanisms to achieve excellence [2]. To achieve this, it is necessary to have good predictive models,
                         in terms of accuracy. However, it is equally important for these models to generate fair predictions [3].
                         This means they should provide consistent predictions across different groups, ensuring no disparity in
                         awarding excellence prizes to students regardless of their social background or their parents’ educational
                         levels. It is therefore essential to have models that ensure fairness in their predictions.
                            In this study, which aims to predict 6th grade educational performance using data relative to the 3rd
                         grade performances, we seek to link the outcomes of predictive models and their degree of bias (as
                         traditionally measured within the AI fairness community) to the socio-economic concept of inequality of
                         opportunity [4] reflected in the data and real life. Inequality of opportunity in educational achievement
                         is measured as the inequality that is explained by factors that are beyond the control of the individual,
                         such as the socioeconomic status of the parents, the cultural environment at home, the immigrant
                         status, the state of health at birth, the neighborhood of birth, etc. (in the terminology of the AI fairness
                         literature these are the sensitive variables). In particular, we analyze how the existence of certain
                         sensitive variables (referred to as circumstances in economics) that explain a relevant percentage of
                         the inequality in educational achievement are the cause of generating unfair predictions, as long as
                         they influence the predictive models directly or indirectly (i.e., through other predictors). Thus, for a
                         model to generate fair predictions, the predictors included must be free of the influence of variables
                         that generate inequality of opportunities in society.
                            To this end, in this paper, we provide a comparison of three AI models that allow us to conduct an

                          AEQUITAS 2024: Workshop on Fairness and Bias in AI | co-located with ECAI 2024, Santiago de Compostela, Spain
                          $ amarrerl@ull.edu.es (.́ S. Marrero); gmarrero@ull.edu.es (G. A. Marrero); cbethenc@ull.edu.es (C. Bethencourt);
                          liam.james2@unibo.it (L. James); roberta.calegari@unibo.it (R. Calegari)
                           0000-0003-0093-9571 (.́ S. Marrero); 0000-0003-4030-0078 (G. A. Marrero); 0000-0002-5605-9576 (C. Bethencourt);
                          0009-0001-7809-7514 (L. James); 0000-0003-3794-2942 (R. Calegari)
                                       © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
ablation study to understand how circumstances can influence predictions in predictive models and how
this type of analysis leads to strategies for detecting unfairness in predictions. A two stage estimation
procedure is proposed. In the first stage, the predictors are cleaned of the influence of circumstances.
In the second stage, the target variable is predicted (or adjusted, depending on the objective) using a
model that includes predictors not dependent on the sensitive variables. We also observe that a model
incorporating sensitive variables that explain the existing inequality of opportunities generates biases in
the predictions, favoring one class or another depending on the part of the distribution being predicted.
Thus, if the aim is to predict the upper tail of the distribution (to detect excellence), the worst predictions
occur for those classes most disadvantaged by the most relevant circumstances (i.e., students whose
parents have lower levels of education or who live in homes with a poorer cultural environment). The
opposite occurs when the objective is to predict the lower tail of the distribution (i.e., students with
worse educational performance). The opposite occurs when the objective is to predict the lower tail of
the distribution (i.e., students with worse educational performance).
   Accordingly, the paper is organized as follows. Section 2 describes the dataset. Section 3 reports the
results and discusses them. Finally, Section 4 concludes this preliminary work.


2. Data and Method
2.1. Database
In this paper, we use the data on primary and secondary education provided by the Canary Agency
for University Quality and Educational Evaluation (ACCUEE) from academic years 2015/16 to 2018/19.
ACCUEE is a public organization founded with the purpose of continuously improving education at
both the university level and other educational stages within the Canary Islands. Among its activities,
ACCUEE is responsible for the evaluation and accreditation of programs implemented in various
educational centers. It collects data on students’ academic performance and relevant variables that
determine their environment, considering census characteristics and demographic context, offering a
more reliable approach than other national or even international statistics.
   One of the main advantages of this database is that the surveys conducted by ACCUEE on the student
population are longitudinal in nature, allowing for the evaluation of the progress and development of
the individuals under study. This longitudinal data collection helps to control for factors intrinsic to
temporal circumstances while enhancing data quality and enabling estimates and comparisons of the
individual over time.
   The database provided by ACCUEE comprises 83,857 observations. Each row refers to a single student
at a given grade and academic year. Primary education data for the academic year 2015/16 and 2018/19
is gathered through a comprehensive census of the entire population. For other grades and academic
years, the data is collected through sampling. Longitudinal data is also included: students in 3rd grade
(primary school) during the 2015/2016 academic year are sampled again in their 6th grade, and this is
the information that we use in the application of this paper. The database contains information on 561
variables (columns) for each student, representing data collected from various contextual questionnaires
and performance tests across different subjects. Specifically, these columns are categorized into seven
thematic blocks:

    • Block 1: ID Variables. This block consists of variables that identify the individual through different
      approaches (organizational, educational center, academic year studied, survey ID etc. ).
    • Block 2: Informative Variables. These include codes that identify whether the surveyed individuals
      responded to the different blocks of questions.
    • Block 3: Grades Obtained. This block comprises the grades obtained by students in the reference
      subjects, using a continuous or categorical classification.
    • Block 4: Student Questionnaire. This block includes questions aimed at understanding the level
      of agreement (categorical) or the situation (coding/continuous) of the surveyed student.
    Table 1
    Set of circumstances (or sensitive features)
                   Circumstance                                                      Block
                   𝐶1 (gender)                                     Student Circumstances
                   𝐶2 (mother education)
                   𝐶3 (father education)
                   𝐶4 (mother occupation)
                   𝐶5 (father occupation)                            Family Circumstances
                   𝐶6 (immigrant status)
                   𝐶7 (start schooling age)
                   𝐶8 (number of books in household)
                   𝐶9 (adults reading books)
                   𝐶10 (public/private school)
                   𝐶11 (capital/non capital island)                  School Circumstances
                   𝐶12 (% teachers who change school each year)


    • Block 5: Principal Questionnaire. This block includes questions aimed at understanding the level
      of agreement (categorical) or the situation (coding/continuous) of the principal of the student’s
      school. Coherence is assumed for the same educational centers.
    • Block 6: Family Questionnaire. This block includes questions aimed at understanding the level
      of agreement (categorical) or the situation (coding/continuous) of the family of the surveyed
      student. Coherence is assumed for the same family units.
    • Block 7: Teacher Questionnaire. This block includes questions aimed at understanding the level of
      agreement (categorical) or the situation (coding/continuous) of the tutor of the surveyed student.
      Coherence is assumed for the same tutors.

  The dataset consists of 47,043,777 data points (resulting from the multiplication of rows and columns).
However, 21,225,226 of these are classified as missing values, constituting 45.12the database. The fact
that this database is purely Canary in nature reduces statistical issues such as sample representativeness
or selection bias. However, there are still potential biases in the database. The most relevant ones are:

    • Family Questionnaire Bias. In each grade and academic year, there is a high percentage of
      students (around 20 - 40%) without information on their family situation (Block 6). There may
      be unexamined correlations between not responding to the family questionnaire and the target
      variable (academic performance).
    • Missing Values Bias. As mentioned, there is a large amount of missing data, which could indicate
      correlations between the student’s situation (unobservable) and the target variable.
    • Sample Bias. Since data collection takes place in the Canary Islands, transferring the obtained
      results to other regions might not account for characteristics unique to this specific territory.

   We have a target variable 𝑌𝑡 , which is the performance of students in the 6th grade of primary
education. 𝑌𝑡 − 1, the achievement observed for students (in the same school) in previous years (in 3rd
grade in our application), is the most widely used predictor in the related literature. We have a set of
sensitive factors (or circumstances), denoted by 𝐶1 , . . . , 𝐶𝑛 (see Table 1). These factors are all beyond a
student’s control, hence any inequality resulting from these factors must be considered as unfair and
therefore corrected or compensated. Specifically, the considered circumstances are shown in Table 1.
   Our protected and unprotected groups are defined according to the circumstances considered. Each
circumstance is split into two categories, which form the groups. We consider the left category as the
unprotected group, while the right category forms the protected groups. These groups can be seen in
Table 2.
    Table 2
    Protected and unprotected groups
                       Circumstance                                     Group
           Gender                                                                   Male, Female
           Mother Education                         Tertiary education, Rest of educational levels
           Father education                         Tertiary education, Rest of educational levels
           Mother occupation                         High occupation, Rest of occupational levels
           Father occupation                         High occupation, Rest of occupational levels
           Immigrant status)                                                              No, Yes
           Start schooling age                       Less than 3 years old, More than 3 years old
           Number of books in household             More than 1000 books, Less than 1000 books
           Public/private school                                                   Public, Private
           Capital/non capital island                                            Yes (Capital), No
           See adults reading books                                               Daily, Not daily
           Teachers who change school each year                       Almost no change, Change
           Socioeconomic status                             Above the median, Below the median


2.2. Inequality of Opportunity
To estimate the inequality of opportunity in educational achievement (IOEA) in 3rd grade, we follow
the ex-ante approach proposed by [5] and recently used by [6, 7], and estimate the following reduced
form equation:
                                               𝐾
                                              ∑︁
                                𝑌𝑖,𝑡−1 = 𝛼 +     𝛽𝑘 𝐶𝑖,𝑘,𝑡−1 + 𝑣𝑖,𝑡−1                             (1)
                                                𝑘=1

From a practical point of view, we want to measure the IOEA at the time policy makers would take
policy actions; that is, at the end of the 3rd grade, in order to correct potential unfairness in the 6th
grade.
   We estimate equation 1 by ordinary least square (OLS), recover the fitted part, and our measure of
IOEA would be the ratio of the variance of 𝑌𝑡−1 explained by the set of circumstances (i.e., the variance
of the fitted part) with respect to the total variance. In a linear model, this ratio is exactly the 𝑅2 of
the estimates. [5] explains that, when using standardized achievement measures such as those used
in this paper, we must disregard using standard inequality indices such as the Gini or the MLD, and a
convenient one is using the variance.
   Next, we want to estimate the relative importance of each circumstance on educational achievement,
and we use a multivariate regression-based decomposition approach [8, 9], which adapts the decompo-
sition of [10] to determine the contribution of each factor (or group of factors) to explaining educational
achievement. The relative factor IOEA weight for any circumstance 𝐶𝑘 is given by:

                                       𝑐𝑜𝑣[𝛽ˆ𝑘 𝐶𝑘 , 𝑌ˆ ]  ˆ 𝜎 ˆ 𝐶𝑘
                               𝑆𝐶𝑘 =           2         =𝛽 𝑘      𝑐𝑜𝑟[𝑌, 𝑌ˆ ]                         (2)
                                            𝜎ˆ 𝑌^             𝜎
                                                              ˆ 𝑌^

  where 𝛽ˆ is the estimated OLS coefficient from 1 associated with circumstance 𝐶𝑘 , and 𝜎
                                                                                         ˆ 𝑌^ is the
           𝑘
variance of the achievement in the 3rd grade (i.e. the fitted target variable in 1).

2.3. Predictive Models
In this section, all the models used in the use case will be defined. The models employed will be based on
regression models. We want to start with a simple linear framework to better understand the relevance
of the different variables (features) included in the model. This analysis can be easily extended to more
sophisticated predictive models, such as conditional inference trees, random forests, or neural network
approaches. Our baseline (linear) predicted model is the following (Model 1):
                                          𝑌𝑖,𝑡 = 𝛼 + 𝛽1 𝑌𝑖,𝑡−1 + 𝜖𝑖,𝑡                                         (3)
   where 𝑌𝑖,𝑡−1 is the academic performance in mathematics at time 𝑡 (6th grade of primary education),
𝑌𝑖,𝑡−1 is the academic performance at time 𝑡 − 1 (3rd grade of primary education), 𝛽1 is a parameter to
be estimated and 𝜖𝑖,𝑡 is an error term.
   In order to understand the origin of potential AI unfairness, we also use the following variants. Model
2 extend model 1 including the set of circumstances (protected features) measured at 𝑡 − 1: in addition
to including the student’s academic performance at 𝑡 − 1, we also include a set 𝑘 of circumstances.
                                                          𝑘
                                                         ∑︁
                                𝑌𝑖,𝑡 = 𝛼 + 𝛽1 𝑌𝑖,𝑡−1 +         𝛽2𝑗 𝐶𝑖,𝑗,𝑡−1 + 𝜖𝑖,𝑡                            (4)
                                                         𝑗=1

   Notice that the interpretation of 𝛽1 in 4 is different than its interpretation in model 1, since 𝑌𝑖,𝑡−1
correlates with 𝐶𝑗,𝑡 . Now, the 𝛽1 is capturing the impact of the educational achievement in 3rd and in
6th, but taking into consideration the potential (unfair) differences generated by our set of circumstances
in 3rd grade and extended to 6th and probably to the future. Estimated 𝛽2𝑗 represent the impact of
circumstances in 3rd affecting the achievement in 6th not being channeled through its achievement in
3rd. If the entire impact of circumstances is channeled through its effect on the achievement in 3rd, the
estimated 𝛽2𝑗 coefficients should be close to zero.
   Model 3 decomposes the predictor 𝑌𝑖,𝑡−1 in the part explained by circumstances and the part non-
explained by circumstances (a residual term). In the IO literature, see [11] this residual term is associated
with the part of achievement not associated with (observed) circumstances and instead associated with
effort-related aspects and non-observed circumstances. They might correlate differently with the target
variable. For instance, [11] and [12], show that the circumstance part is negatively related to posterior
economic growth, while the effort component is positively correlated.
   To estimate Model 3, we start from estimates of model (1). Then, we decompose 𝑌𝑖,𝑡 into its fitted
part (the part explained by circumstances and associated with the IO estimated in (1), 𝑌ˆ 𝑖,𝑡−1 and the
residual part, 𝑣ˆ𝑖,𝑡−1 , which captures other factors not included in the model uncorrelated with the
considered circumstances. To simplify notation, we call this residual term “Effort”. We then estimate
Model 3 as follows:
                                  𝑌𝑖,𝑡 = 𝛼 + 𝛽4 𝑌ˆ 𝑖,𝑡−1 + 𝛽5 𝑣ˆ𝑖,𝑡−1 + 𝜖𝑖,𝑡                               (5)
   By doing that, we want to distinguish predictions of 𝑌𝑖,𝑡 due exclusively to circumstances (𝛼 +
𝛽4 𝑌ˆ 𝑖,𝑡−1 ) and due exclusively to effort (𝛼 + 𝛽4 𝑌ˆ 𝑖,𝑡−1 + 𝛽5 𝑣ˆ𝑖,𝑡−1 ), where 𝑌ˆ 𝑖,𝑡−1 represents the average
value of the predictions 𝑌ˆ 𝑖,𝑡−1 to have the same average levels in both predictions.

2.4. AI fairness metrics
We evaluate the models in terms of fairness using the equalized odds metric. This metric is satisfied
when the model’s predictions ensure that students from both protected and unprotected groups (e.g.,
females and males) have equal recall. Recall is defined as the ratio of True Positives (TP) to the sum of
True Positives (TP) and False Negatives (FN).
   For each model, we estimate the student’s academic performance prediction (𝑌ˆ𝑡 ). Since this prediction
is continuous, we discretize it into quartiles to create our predicted classes. Likewise, we categorize the
actual academic performance variable into quartiles to establish our True Class.
   Next, to evaluate the models’ predictive fairness for each group, we construct confusion matrices for
different quartiles of academic performance. For instance, the confusion matrix for the first quartile of
academic performance (below 25th percentile) is shown in Table 3:
   We specifically calculate the equalized odds metric for the low tail of the academic performance
distribution (Q1, below the 25th percentile), the center of the distribution (between 25th and 75th
percentiles), and the high tail of the distribution (above 75th percentile).
    Table 3
    Protected and unprotected groups
                                                          Circumstance                Group
                                                      Positive (Below 25)     Negative (Above 25)
         Predicted class     Positive (Below 25)      TP                      FP
                             Negative (Above 25)      FN                      TN


    Table 4
    Relative factor shares associated with each circumstance (sensitive factor) in IOEA
                                    Circumstance                  Relative factor share
                       Father education                                         27.53%
                       Books in household                                       27.40%
                       Mother education                                         16.05%
                       Socioeconomic status                                     13.19%
                       Start schooling age                                       6.57%
                       Teachers who change school every year                     2.71%
                       Public/private school                                    2.39 %
                       Father occupation                                         1.64%
                       See adults reading books                                  1.35%
                       Gender                                                    0.55%
                       Mother occupation                                         0.45%
                       Capital island                                            0.14%
                       Immigrant status                                          0.03%
                       Total                                                     100%


   For each sensitive feature, we calculate the equalized odds metric, whose values can be associated
with fair or unfair model predictions. Fair: Odds are close to 1, thus the model is predicting equally well
all groups within each circumstance. Unfair: Odds are far from 1; we might have Odds lower than 1,
which means that the predictions will benefit protected groups (lower categories); Odds above 1, which
means that they are benefiting unprotected groups (upper categories).
   We will show that the degree of AI unfairness depends on the model used (the type of variables
included in the model) and the part of the distributions we look at (upper, middle, or lower).


3. Results
3.1. Inequality of opportunity estimates
We estimate equation 1 through OLS. Table 4 shows the relative factor shares: father education, the
number of books at home (as proxy of cultural environment), followed by mother’s education, general
socioeconomic status of the household and the start schooling age, are the most relevant circumstances
explaining achievement variability in 3rd grade.
   In the AI terminology, a circumstance must always be considered as a sensitive feature, since, in
a fair society, they should not be correlated with its achievement. How does the relevance of each
circumstance correlate with AI-(un)fairness measure associated with each sensitive factor?

3.2. Discussion
We show below the equalized-odds measure for each circumstance (sensitive feature) obtained for each
of the three models used (and four predictions generated): model 1, which includes only achievement in
3rd grade; model 2, which assumes model 1 extended with circumstances; model 3, in which we show
the predictions of the part only using circumstances (which would be the inequality of opportunity
model) and the part using the effort component (which would be the effort model). We do this for each
circumstance and for the prediction of three parts of the distribution.

    • Looking at performance below the 25th percentile, it could be relevant if the policy is aimed
      at giving reinforcement in 3rd grade to the most disadvantaged students to reduce educational
      failure in 6th grade.
    • Looking at performance in the upper tail (with performance above the 75th percentile), which
      could be relevant if the objective is to reward excellence with a scholarship program, for example.
    • Looking at intermediate performance, to detect a group of students representative of a class or
      school to select case and control groups for an education policy experiment.

The results are shown in Figures 1-3 for the different predicted percentiles. The detailed results for each
sensitive variable (circumstance) are presented in Tables 5-7.
   All detailed results can be found in the appendix. We summarize the main results below, which are
also recapped in Table 5.
   The model using the predictions of our effort proxy is the only one that achieves AI fairness regardless
of which part of the distribution we want to predict. For all sensitive variables, the odd-ratio is very
close to one, indicating that the model (whether good or bad) predicts almost equally well one group as
another.
   On the other hand, when we consider models (the rest of the predictions) that include circumstances
in the model, the predictions come out differently depending on the group we consider and the part of
the performance distribution we are predicting.
   Thus, for example, when we seek to predict the upper part of the distribution (i.e., to predict whether
the student will be excellent in 6th grade), the baseline model and the rest of the models that include
information from the sensitive variables predict worse for the most disadvantaged category (i.e., lower
socioeconomic status, with lower educational level of the parents, with fewer books in the home, etc.)
than for the favored categories. Using these predictions for decision making (e.g., to award prizes for
excellence) would generate a clear injustice in favor of those who have a more advantageous starting
point, with the consequent increase (most likely) of unequal opportunities in the future. The model
generates more unfair predictions the greater the weight of circumstances in the model (in our case the
model that only uses circumstances as a predictor).
   Moreover, the greatest differences in the predictions are seen among the circumstances that turn out
to be most relevant in explaining the inequality of opportunities in the initial year. Thus, the model that
generates the greatest injustice is when we compare the predictions between students of high and low
educated parents, or when we compare students in homes with high and low cultural environments.
   On the contrary, if the aim is to predict the lower tail of the distribution (for example, to give classes to
reinforce learning), the prediction model that includes circumstances generates injustices (in the sense
that it predicts better for one class than for another) but, in this case, the favored group (that generates
better predictions) is the most disadvantaged. That is, it predicts school failure better for children
from lower classes or worse circumstances than for children from better circumstances. Applying the
predictions of this model would probably help to reduce inequality of opportunity in the future, but
applying a bad policy: it improves inequality of opportunity by improving the most disadvantaged, and
worsening the most advantaged.
   Ideally, across the entire distribution, predictive models should be fair in the sense that they do not
discriminate one group of different circumstances against another, and that this will eventually equalize
the outcome of individuals.


4. Conclusion
In this study, we have explored the complexities of predicting students’ academic performance using AI
models, with a particular focus on addressing socioeconomic inequalities that influence predictive out-
comes. Our analysis revealed significant biases in predictions favoring advantaged groups, particularly
    Table 5
    Summary of results
                                                                Between 25th and 75th
                                   Below 25th percentile             percentiles            Above 75th percentile
                                        Tail of the                 Center of the             High tail of the
                                       distribution                 distribution                distribution
                                       Unfairly favours                                        Unfairly favours
 Model 1 (baseline)                    protected groups                   ?                   unprotected groups
                                         More biased,                                            More biased,
 Model 2                           favours protected groups                ?              favours unprotected groups

 Model 3 - Prediction of              Much more biased,                                        Much more biased,
    circumstances                 favours the protected group              ?             favours the unprotected groups

                                               Fair,                       Fair,                        Fair,
                                     Odds ratio close to 1       Odds ratio close to 1        Odds ratio close to 1
 Model 3 - Prediction of effort      for all circumstances       for all circumstances        for all circumstances




Figure 1: Predictions below the 25th percentile. Odds of protected groups


when sensitive variables related to socioeconomic status and home environment are not appropriately
managed in predictive modeling.
   Through a comparative evaluation of three AI models and an ablation study, we demonstrated how
these sensitive features, also known as circumstances, can distort predictions, leading to unfair outcomes
in educational assessments. Importantly, we proposed a two-stage estimation procedure in our third
model to mitigate these biases.
   The findings underscore the critical importance of integrating fairness considerations into predictive
modeling practices, particularly in educational settings where equitable outcomes are essential. Future
research should further refine and validate these methodologies across diverse datasets and educational
contexts to foster more inclusive and equitable predictive models.


Acknowledgments
This paper was partially supported by the “AEQUITAS” project funded by the European Union’s Horizon
Europe research and innovation programme under grant number 101070363.
Figure 2: Predictions between the 25th and 75th percentiles. Odds of protected groups




Figure 3: Predictions above the 75th percentile. Odds of protected groups
References
 [1] E. Diener, M. L. Diener, C. Diener, Factors predicting the subjective well-being of nations., Journal
     of personality and social psychology 69 5 (1995) 851–64. URL: https://api.semanticscholar.org/
     CorpusID:20833520.
 [2] H. F. Ladd, Holding schools accountable: Performance-based reform in education., 1996. URL:
     https://api.semanticscholar.org/CorpusID:21305932.
 [3] R. Yu, Q. Li, C. Fischer, S. Doroudi, D. Xu, Towards accurate and fair prediction of college
     success: Evaluating different sources of student data, in: Educational Data Mining, 2020. URL:
     https://api.semanticscholar.org/CorpusID:220486717.
 [4] W. H. Sewell, Inequality of opportunity for higher education., American Sociological Review 36
     (1971) 793. URL: https://api.semanticscholar.org/CorpusID:34820586.
 [5] F. H. G. Ferreira, J. Gignoux, The measurement of educational inequality : Achievement and
     opportunity 1, 2014. URL: https://api.semanticscholar.org/CorpusID:260636749.
 [6] G. A. Marrero, J. G. Rodríguez, Unfair inequality and growth, The Scandinavian Journal of
     Economics (2023). URL: https://api.semanticscholar.org/CorpusID:258023458.
 [7] G. A. Marrero, J. C. Palomino, G. Sicilia, Inequality of opportunity in educational achievement in
     western europe: contributors and channels, The Journal of Economic Inequality 22 (2023) 383–410.
     URL: https://api.semanticscholar.org/CorpusID:265183210.
 [8] M. Brewer, L. Wren-Lewis, Accounting for changes in income inequality: Decomposition analyses
     for the uk, 1978–2008, ERN: Other Econometrics: Mathematical Methods & Programming (Topic)
     (2016). URL: https://api.semanticscholar.org/CorpusID:19612310.
 [9] G. S. Fields, Accounting for income inequality and its change: A new method, with application
     to the distribution of earnings in the united states, 2012. URL: https://api.semanticscholar.org/
     CorpusID:202257605.
[10] A. F. Shorrocks, Inequality decompositions by factor components, Econometrica 50 (1982) 193–211.
     URL: https://api.semanticscholar.org/CorpusID:7478703.
[11] G. A. Marrero, Inequality of opportunity and growth, 2013. URL: https://api.semanticscholar.org/
     CorpusID:67827959.
[12] G. A. Marrero, J. G. Rodríguez, Inequality of opportunity in europe, Microeconomics: Welfare
     Economics & Collective Decision-Making eJournal (2012). URL: https://api.semanticscholar.org/
     CorpusID:154345034.



A. Complete Results
In the tables shown in the appendix, numbers or letters are used in brackets to distinguish results
achieved by the different models presented in this work:

    • 1: refers to model 1 (Equation 3)
    • 2: refers to model 2 (Equation 4)
    • C: refers to the circumstance-based model
    • E: refers to the effort-based model (for this and the previous bullet point see the discussion
      regarding Model 3 in Section 2.3)
    Table 6
    Prediction below 25th percentile
                                                                                  Eq.                     Eq.                     Eq.                     Eq.
         Circumstance                          Category            Recall (1)   Odds (1)   Recall (2)   Odds (2)   Recall (C)   Odds (C)   Recall (E)   Odds (E)

                                     Males                               0.50       1.00         0.49       1.00         0.31       1.00         0.49       1.00
             Gender
                                     Females                             0.47       0.94         0.50       1.02         0.40       1.31         0.43       0.89
                                     Tertiary education                  0.40       1.00         0.32       1.00         0.03       1.00         0.51       1.00
        Mother education
                                     Rest of educational levels          0.53       1.33         0.59       1.84         0.53      18.44         0.43        0.8
                                     Tertiary education                  0.39       1.00         0.24       1.00         0.02       1.00         0.55       1.00
        Father education
                                     Rest of educational levels          0.51       1.32         0.57       2.40         0.46      19.63         0.43       0.78
                                     High occupation                     0.43       1.00         0.45       1.00         0.22       1.00         0.48       1.00
       Mother Occupation
                                     Rest of occupational levels         0.50       1.15         0.51       1.14         0.39       1.77         0.45       0.95
                                     High occupation                     0.44       1.00         0.43       1.00         0.25       1.00         0.46         1.0
        Father Occupation
                                     Rest of occupational levels         0.50       1.13         0.52       1.19         0.39       1.58         0.46       0.99
                                     More than 100                       0.38       1.00         0.29       1.00         0.02       1.00         0.54         1.0
       Books in household
                                     Less than 100                       0.51       1.32         0.53       1.83         0.42      17.60         0.44        0.8
                                     Almost no change                    0.43       1.00         0.43       1.00         0.16       1.00         0.50        1.0
Teachers changing school each year
                                     Change                              0.51       1.18         0.52       1.21         0.44       2.67         0.44        0.9
                                     No                                  0.48       1.00         0.50       1.00         0.35       1.00         0.46       1.09
       Immigration status
                                     Yes                                 0.52       1.09         0.46       0.93         0.38       1.08         0.50        1.0
                                     Private                             0.40       1.00         0.35       1.00         0.09       1.00         0.50       1.00
       Public/private school
                                     Public                              0.51       1.26         0.53       1.51         0.43       4.94         0.45        0.9
                                     Yes                                 0.47       1.00         0.49       1.00         0.32       1.00         0.45       1.00
          Capital island
                                     No                                  0.57       1.20         0.55       1.12         0.58       1.81         0.49       1.08
                                     Less than 3 years old               0.47       1.00         0.44       1.00         0.26       1.00         0.48       1.00
       Start schooling age
                                     More than 3 years old               0.50       1.06         0.57       1.29         0.51       2.00         0.43       0.88
                                     Daily                               0.48       1.00         0.47       1.00         0.29       1.00         0.48       1.00
     See adults reading books
                                     More rarely (not daily)             0.49       1.02         0.52       1.11         0.40       1.40         0.45       0.94
                                     Above the median                    0.41       1.00         0.34       1.00         0.07       1.00         0.51       1.00
      Socioeconomic status
                                     Below the median                    0.54       1.32         0.60       1.79         0.55       8.44         0.43       0.84
           Average                                                                  1.18                    1.41                    6.36                    0.92
       Weighted Average                                                             1.29                    1.89                   14.68                    0.83
    Table 7
    Predictions between the 25th and 75th percentiles
                                                                                  Eq.                     Eq.                     Eq.                     Eq.
         Circumstance                          Category            Recall (1)   Odds (1)   Recall (2)   Odds (2)   Recall (C)   Odds (C)   Recall (E)   Odds (E)

                                     Males                               0.56       1.00         0.58       1.00         0.51       1.00         0.55       1.00
             Gender
                                     Females                             0.54       0.96         0.55       0.95         0.51       1.00         0.52       0.95
                                     Tertiary education                  0.58       1.00         0.62       1.00         0.53       1.00         0.52       1.00
        Mother education
                                     Rest of educational levels          0.53       0.92         0.52       0.83         0.49       0.92         0.55       1.06
                                     Tertiary education                  0.56       1.00         0.61       1.00         0.41       1.00         0.50       1.00
        Father education
                                     Rest of educational levels          0.55       0.97         0.54       0.89         0.55       1.32         0.55       1.12
                                     High occupation                     0.56       1.00         0.56       1.00         0.42       1.00         0.53       1.00
       Mother Occupation
                                     Rest of occupational levels         0.55       0.99         0.56       1.01         0.53       1.27         0.54       1.02
                                     High occupation                     0.56       1.00         0.60       1.00         0.44       1.00         0.53       1.00
        Father Occupation
                                     Rest of occupational levels         0.55       0.99         0.55       0.92         0.53       1.21         0.54       1.02
                                     More than 100                       0.59       1.00         0.62       1.00         0.41       1.00         0.53       1.00
       Books in household
                                     Less than 100                       0.54       0.92         0.55       0.88         0.53       1.32         0.54       1.02
                                     Almost no change                    0.58       1.00         0.60       1.00         0.50       1.00         0.53       1.00
Teachers changing school each year
                                     Change                              0.54       0.93         0.54       0.91         0.51       1.03         0.54       1.03
                                     No                                  0.55       1.00         0.56       1.00         0.51       1.00         0.54       1.00
       Immigration status
                                     Yes                                 0.55       0.99         0.62       1.11         0.54       1.07         0.53       0.99
                                     Private                             0.59       1.00         0.62       1.00         0.46       1.00         0.55       1.00
       Public/private school
                                     Public                              0.54       0.91         0.55       0.89         0.52       1.12         0.53       0.97
                                     Yes                                 0.56       1.00         0.56       1.00         0.51       1.00         0.53       1.00
          Capital island
                                     No                                  0.52       0.94         0.58       1.04         0.48       0.93         0.55       1.04
                                     Less than 3 years old               0.56       1.00         0.59       1.00         0.52       1.00         0.52       1.00
       Start schooling age
                                     More than 3 years old               0.55       0.98         0.52       0.88         0.48       0.91         0.57       1.09
                                     Daily                               0.56       1.00         0.59       1.00         0.51       1.00         0.53       1.00
     See adults reading books
                                     More rarely (not daily)             0.55       0.98         0.54       0.91         0.50       0.98         0.54       1.03
                                     Above the median                    0.57       1.00         0.61       1.00         0.54       1.00         0.52       1.00
      Socioeconomic status
                                     Below the median                    0.54       0.94         0.52       0.85         0.48       0.90         0.55       1.07
           Average                                                                  0.96                    0.93                    1.08                    1.03
       Weighted Average                                                             0.94                    0.87                    1.15                    1.06
    Table 8
    Predictions above the 75th percentile
                                                                                  Eq.                     Eq.                     Eq.                     Eq.
         Circumstance                          Category            Recall (1)   Odds (1)   Recall (2)   Odds (2)   Recall (C)   Odds (C)   Recall (E)   Odds (E)

                                     Males                               0.53       1.00         0.53       1.00         0.42       1.00         0.47       1.00
             Gender
                                     Females                             0.49       0.93         0.48       0.91         0.31       0.75         0.48       1.03
                                     Tertiary education                  0.58       1.00         0.63       1.00         0.61       1.00         0.47       1.00
        Mother education
                                     Rest of educational levels          0.42       0.73         0.35       0.55         0.05       0.08         0.49       1.04
                                     Tertiary education                  0.58       1.00         0.68       1.00         0.67       1.00         0.46       1.00
        Father education
                                     Rest of educational levels          0.45       0.77         0.36       0.53         0.10       0.15         0.49       1.08
                                     High occupation                     0.58       1.00         0.62       1.00         0.64       1.00         0.49       1.00
       Mother Occupation
                                     Rest of occupational levels         0.48       0.83         0.46       0.74         0.25       0.39         0.47       0.97
                                     High occupation                     0.57       1.00         0.63       1.00         0.57       1.00         0.47       1.00
        Father Occupation
                                     Rest of occupational levels         0.48       0.84         0.44       0.70         0.26       0.45         0.48       1.04
                                     More than 100                       0.59       1.00         0.67       1.00         0.75       1.00         0.47       1.00
       Books in household
                                     Less than 100                       0.47       0.80         0.43       0.65         0.19       0.25         0.48       1.04
                                     Almost no change                    0.56       1.00         0.58       1.00         0.59       1.00         0.47       1.00
Teachers changing school each year
                                     Change                              0.48       0.85         0.46       0.80         0.23       0.38         0.48       1.02
                                     No                                  0.51       1.00         0.51       1.00         0.38       1.00         0.48       1.00
       Immigration status
                                     Yes                                 0.48       0.93         0.53       1.04         0.26       0.68         0.46       0.96
                                     Private                             0.57       1.00         0.61       1.00         0.64       1.00         0.45       1.00
       Public/private school
                                     Public                              0.49       0.85         0.47       0.77         0.25       0.39         0.49       1.08
                                     Yes                                 0.52       1.00         0.51       1.00         0.40       1.00         0.48       1.00
          Capital island
                                     No                                  0.45       0.86         0.48       0.94         0.16       0.40         0.48       1.01
                                     Less than 3 years old               0.53       1.00         0.55       1.00         0.43       1.00         0.47       1.00
       Start schooling age
                                     More than 3 years old               0.46       0.86         0.40       0.72         0.22       0.52         0.49       1.04
                                     Daily                               0.54       1.00         0.56       1.00         0.47       1.00         0.48       1.00
     See adults reading books
                                     More rarely (not daily)             0.49       0.91         0.47       0.84         0.28       0.60         0.48       1.00
                                     Above the median                    0.56       1.00         0.61       1.00         0.57       1.00         0.46       1.00
      Socioeconomic status
                                     Below the median                    0.44       0.79         0.35       0.58         0.07       0.12         0.50       1.10
           Average                                                                  0.84                    0.75                    0.40                    1.03
       Weighted Average                                                             0.79                    0.61                    0.22                    1.06