AI-fairness and equality of opportunity: a case study on educational achievement

AI-fairness and equality of opportunity: a case study on educational achievement ÁngelSMarrero amarrerl@ull.edu.es Department of Economics and Research Center of Social Inequality and Governance University of La Laguna

Spain

GustavoAMarrero gmarrero@ull.edu.es CarlosBethencourt Department of Economics and Research Center of Social Inequality and Governance University of La Laguna

Spain

LiamJames liam.james2@unibo.it Department of Economics and Research Center of Social Inequality and Governance University of La Laguna

Spain

Department of Computer Science and Engineering Alma Mater Studiorum Univerisità di Bologna

Italy

RobertaCalegari roberta.calegari@unibo.it Department of Computer Science and Engineering Alma Mater Studiorum Univerisità di Bologna

Italy

Santiago de Compostela Spain

AI-fairness and equality of opportunity: a case study on educational achievement 1613-0073 1A2F148792CC5456C9105AEA32756A0F GROBID - A machine learning software for extracting information from scholarly documents AI-fairness, socioeconomic equality of opportunity, AI-ethics Marrero) 0000-0003-4030-0078 (G. A. Marrero) 0000-0002-5605-9576 (C. Bethencourt) 0009-0001-7809-7514 (L. James) 0000-0003-3794-2942 (R. Calegari)

This study focuses on predicting students' academic performance, examining how AI predictive models often reflect socioeconomic inequalities influenced by factors such as parental socioeconomic status and home environment, which affect the fairness of predictions. We compare three AI models aimed at performing an ablation study to understand how these sensitive features (referred to as circumstances) influence predictions. Our findings reveal biases in predictions that favor advantaged groups, depending on whether the goal is to identify excellence or underperformance. Additionally, a two-stage estimation procedure is proposed in the third model to mitigate the impact of sensitive features on predictions, thereby offering a model that can be considered fair with respect to inequality of opportunity.

Introduction

Academic performance in primary school is a good predictor of an individual's future income and well-being [1]. Anticipating low academic performance levels is relevant to implementing corrective policies at early ages, and anticipating high academic performance is also relevant to applying incentive mechanisms to achieve excellence [2]. To achieve this, it is necessary to have good predictive models, in terms of accuracy. However, it is equally important for these models to generate fair predictions [3]. This means they should provide consistent predictions across different groups, ensuring no disparity in awarding excellence prizes to students regardless of their social background or their parents' educational levels. It is therefore essential to have models that ensure fairness in their predictions.

In this study, which aims to predict 6th grade educational performance using data relative to the 3rd grade performances, we seek to link the outcomes of predictive models and their degree of bias (as traditionally measured within the AI fairness community) to the socio-economic concept of inequality of opportunity [4] reflected in the data and real life. Inequality of opportunity in educational achievement is measured as the inequality that is explained by factors that are beyond the control of the individual, such as the socioeconomic status of the parents, the cultural environment at home, the immigrant status, the state of health at birth, the neighborhood of birth, etc. (in the terminology of the AI fairness literature these are the sensitive variables). In particular, we analyze how the existence of certain sensitive variables (referred to as circumstances in economics) that explain a relevant percentage of the inequality in educational achievement are the cause of generating unfair predictions, as long as they influence the predictive models directly or indirectly (i.e., through other predictors). Thus, for a model to generate fair predictions, the predictors included must be free of the influence of variables that generate inequality of opportunities in society.

To this end, in this paper, we provide a comparison of three AI models that allow us to conduct an ablation study to understand how circumstances can influence predictions in predictive models and how this type of analysis leads to strategies for detecting unfairness in predictions. A two stage estimation procedure is proposed. In the first stage, the predictors are cleaned of the influence of circumstances.

In the second stage, the target variable is predicted (or adjusted, depending on the objective) using a model that includes predictors not dependent on the sensitive variables. We also observe that a model incorporating sensitive variables that explain the existing inequality of opportunities generates biases in the predictions, favoring one class or another depending on the part of the distribution being predicted. Thus, if the aim is to predict the upper tail of the distribution (to detect excellence), the worst predictions occur for those classes most disadvantaged by the most relevant circumstances (i.e., students whose parents have lower levels of education or who live in homes with a poorer cultural environment). The opposite occurs when the objective is to predict the lower tail of the distribution (i.e., students with worse educational performance). The opposite occurs when the objective is to predict the lower tail of the distribution (i.e., students with worse educational performance). Accordingly, the paper is organized as follows. Section 2 describes the dataset. Section 3 reports the results and discusses them. Finally, Section 4 concludes this preliminary work.

Data and Method

Database

In this paper, we use the data on primary and secondary education provided by the Canary Agency for University Quality and Educational Evaluation (ACCUEE) from academic years 2015/16 to 2018/19. ACCUEE is a public organization founded with the purpose of continuously improving education at both the university level and other educational stages within the Canary Islands. Among its activities, ACCUEE is responsible for the evaluation and accreditation of programs implemented in various educational centers. It collects data on students' academic performance and relevant variables that determine their environment, considering census characteristics and demographic context, offering a more reliable approach than other national or even international statistics.

One of the main advantages of this database is that the surveys conducted by ACCUEE on the student population are longitudinal in nature, allowing for the evaluation of the progress and development of the individuals under study. This longitudinal data collection helps to control for factors intrinsic to temporal circumstances while enhancing data quality and enabling estimates and comparisons of the individual over time.

The database provided by ACCUEE comprises 83,857 observations. Each row refers to a single student at a given grade and academic year. Primary education data for the academic year 2015/16 and 2018/19 is gathered through a comprehensive census of the entire population. For other grades and academic years, the data is collected through sampling. Longitudinal data is also included: students in 3rd grade (primary school) during the 2015/2016 academic year are sampled again in their 6th grade, and this is the information that we use in the application of this paper. The database contains information on 561 variables (columns) for each student, representing data collected from various contextual questionnaires and performance tests across different subjects. Specifically, these columns are categorized into seven thematic blocks:

• Block 1: ID Variables. This block consists of variables that identify the individual through different approaches (organizational, educational center, academic year studied, survey ID etc. ). • Block 2: Informative Variables. These include codes that identify whether the surveyed individuals responded to the different blocks of questions. • Block 3: Grades Obtained. This block comprises the grades obtained by students in the reference subjects, using a continuous or categorical classification. • Block 4: Student Questionnaire. This block includes questions aimed at understanding the level of agreement (categorical) or the situation (coding/continuous) of the surveyed student. • Block 5: Principal Questionnaire. This block includes questions aimed at understanding the level of agreement (categorical) or the situation (coding/continuous) of the principal of the student's school. Coherence is assumed for the same educational centers. • Block 6: Family Questionnaire. This block includes questions aimed at understanding the level of agreement (categorical) or the situation (coding/continuous) of the family of the surveyed student. Coherence is assumed for the same family units. • Block 7: Teacher Questionnaire. This block includes questions aimed at understanding the level of agreement (categorical) or the situation (coding/continuous) of the tutor of the surveyed student.

Coherence is assumed for the same tutors.

The dataset consists of 47,043,777 data points (resulting from the multiplication of rows and columns). However, 21,225,226 of these are classified as missing values, constituting 45.12the database. The fact that this database is purely Canary in nature reduces statistical issues such as sample representativeness or selection bias. However, there are still potential biases in the database. The most relevant ones are:

• Family Questionnaire Bias. In each grade and academic year, there is a high percentage of students (around 20 -40%) without information on their family situation (Block 6). There may be unexamined correlations between not responding to the family questionnaire and the target variable (academic performance). • Missing Values Bias. As mentioned, there is a large amount of missing data, which could indicate correlations between the student's situation (unobservable) and the target variable. • Sample Bias. Since data collection takes place in the Canary Islands, transferring the obtained results to other regions might not account for characteristics unique to this specific territory.

We have a target variable 𝑌 𝑡 , which is the performance of students in the 6th grade of primary education. 𝑌 𝑡 − 1, the achievement observed for students (in the same school) in previous years (in 3rd grade in our application), is the most widely used predictor in the related literature. We have a set of sensitive factors (or circumstances), denoted by 𝐶 1 , . . . , 𝐶 𝑛 (see Table 1). These factors are all beyond a student's control, hence any inequality resulting from these factors must be considered as unfair and therefore corrected or compensated. Specifically, the considered circumstances are shown in Table 1.

Our protected and unprotected groups are defined according to the circumstances considered. Each circumstance is split into two categories, which form the groups. We consider the left category as the unprotected group, while the right category forms the protected groups. These groups can be seen in Table 2.

Inequality of Opportunity

To estimate the inequality of opportunity in educational achievement (IOEA) in 3rd grade, we follow the ex-ante approach proposed by [5] and recently used by [6,7], and estimate the following reduced form equation:

𝑌 𝑖,𝑡−1 = 𝛼 + 𝐾 ∑︁ 𝑘=1 𝛽 𝑘 𝐶 𝑖,𝑘,𝑡−1 + 𝑣 𝑖,𝑡−1(1)

From a practical point of view, we want to measure the IOEA at the time policy makers would take policy actions; that is, at the end of the 3rd grade, in order to correct potential unfairness in the 6th grade.

We estimate equation 1 by ordinary least square (OLS), recover the fitted part, and our measure of IOEA would be the ratio of the variance of 𝑌 𝑡−1 explained by the set of circumstances (i.e., the variance of the fitted part) with respect to the total variance. In a linear model, this ratio is exactly the 𝑅 2 of the estimates. [5] explains that, when using standardized achievement measures such as those used in this paper, we must disregard using standard inequality indices such as the Gini or the MLD, and a convenient one is using the variance.

Next, we want to estimate the relative importance of each circumstance on educational achievement, and we use a multivariate regression-based decomposition approach [8,9], which adapts the decomposition of [10] to determine the contribution of each factor (or group of factors) to explaining educational achievement. The relative factor IOEA weight for any circumstance 𝐶 𝑘 is given by:

𝑆 𝐶 𝑘 = 𝑐𝑜𝑣[𝛽 𝑘 ˆ𝐶𝑘 , 𝑌 ˆ] 𝜎 ˆ2 𝑌 ^= 𝛽 ˆ𝑘 𝜎 ˆ𝐶𝑘 𝜎 ˆ𝑌 ^𝑐𝑜𝑟[𝑌, 𝑌 ˆ](2)

where 𝛽 ˆ𝑘 is the estimated OLS coefficient from 1 associated with circumstance 𝐶 𝑘 , and 𝜎 ˆ𝑌 ^is the variance of the achievement in the 3rd grade (i.e. the fitted target variable in 1).

Predictive Models

In this section, all the models used in the use case will be defined. The models employed will be based on regression models. We want to start with a simple linear framework to better understand the relevance of the different variables (features) included in the model. This analysis can be easily extended to more sophisticated predictive models, such as conditional inference trees, random forests, or neural network approaches. Our baseline (linear) predicted model is the following (Model 1):

𝑌 𝑖,𝑡 = 𝛼 + 𝛽 1 𝑌 𝑖,𝑡−1 + 𝜖 𝑖,𝑡(3)

where 𝑌 𝑖,𝑡−1 is the academic performance in mathematics at time 𝑡 (6th grade of primary education), 𝑌 𝑖,𝑡−1 is the academic performance at time 𝑡 − 1 (3rd grade of primary education), 𝛽 1 is a parameter to be estimated and 𝜖 𝑖,𝑡 is an error term.

In order to understand the origin of potential AI unfairness, we also use the following variants. Model 2 extend model 1 including the set of circumstances (protected features) measured at 𝑡 − 1: in addition to including the student's academic performance at 𝑡 − 1, we also include a set 𝑘 of circumstances.

𝑌 𝑖,𝑡 = 𝛼 + 𝛽 1 𝑌 𝑖,𝑡−1 + 𝑘 ∑︁ 𝑗=1 𝛽 2𝑗 𝐶 𝑖,𝑗,𝑡−1 + 𝜖 𝑖,𝑡(4)

Notice that the interpretation of 𝛽 1 in 4 is different than its interpretation in model 1, since 𝑌 𝑖,𝑡−1 correlates with 𝐶 𝑗,𝑡 . Now, the 𝛽 1 is capturing the impact of the educational achievement in 3rd and in 6th, but taking into consideration the potential (unfair) differences generated by our set of circumstances in 3rd grade and extended to 6th and probably to the future. Estimated 𝛽 2𝑗 represent the impact of circumstances in 3rd affecting the achievement in 6th not being channeled through its achievement in 3rd. If the entire impact of circumstances is channeled through its effect on the achievement in 3rd, the estimated 𝛽 2𝑗 coefficients should be close to zero.

Model 3 decomposes the predictor 𝑌 𝑖,𝑡−1 in the part explained by circumstances and the part nonexplained by circumstances (a residual term). In the IO literature, see [11] this residual term is associated with the part of achievement not associated with (observed) circumstances and instead associated with effort-related aspects and non-observed circumstances. They might correlate differently with the target variable. For instance, [11] and [12], show that the circumstance part is negatively related to posterior economic growth, while the effort component is positively correlated.

To estimate Model 3, we start from estimates of model (1). Then, we decompose 𝑌 𝑖,𝑡 into its fitted part (the part explained by circumstances and associated with the IO estimated in (1), 𝑌 ˆ𝑖,𝑡−1 and the residual part, 𝑣 ˆ𝑖,𝑡−1 , which captures other factors not included in the model uncorrelated with the considered circumstances. To simplify notation, we call this residual term "Effort". We then estimate Model 3 as follows:

𝑌 𝑖,𝑡 = 𝛼 + 𝛽 4 𝑌 ˆ𝑖,𝑡−1 + 𝛽 5 𝑣 ˆ𝑖,𝑡−1 + 𝜖 𝑖,𝑡

By doing that, we want to distinguish predictions of 𝑌 𝑖,𝑡 due exclusively to circumstances (𝛼 + 𝛽 4 𝑌 ˆ𝑖,𝑡−1 ) and due exclusively to effort (𝛼 + 𝛽 4 𝑌 ˆ𝑖,𝑡−1 + 𝛽 5 𝑣 ˆ𝑖,𝑡−1 ), where 𝑌 ˆ𝑖,𝑡−1 represents the average value of the predictions 𝑌 ˆ𝑖,𝑡−1 to have the same average levels in both predictions.

AI fairness metrics

We evaluate the models in terms of fairness using the equalized odds metric. This metric is satisfied when the model's predictions ensure that students from both protected and unprotected groups (e.g., females and males) have equal recall. Recall is defined as the ratio of True Positives (TP) to the sum of True Positives (TP) and False Negatives (FN).

For each model, we estimate the student's academic performance prediction (𝑌 𝑡 ˆ). Since this prediction is continuous, we discretize it into quartiles to create our predicted classes. Likewise, we categorize the actual academic performance variable into quartiles to establish our True Class.

Next, to evaluate the models' predictive fairness for each group, we construct confusion matrices for different quartiles of academic performance. For instance, the confusion matrix for the first quartile of academic performance (below 25th percentile) is shown in Table 3:

We specifically calculate the equalized odds metric for the low tail of the academic performance distribution (Q1, below the 25th percentile), the center of the distribution (between 25th and 75th percentiles), and the high tail of the distribution (above 75th percentile). For each sensitive feature, we calculate the equalized odds metric, whose values can be associated with fair or unfair model predictions. Fair: Odds are close to 1, thus the model is predicting equally well all groups within each circumstance. Unfair: Odds are far from 1; we might have Odds lower than 1, which means that the predictions will benefit protected groups (lower categories); Odds above 1, which means that they are benefiting unprotected groups (upper categories).

We will show that the degree of AI unfairness depends on the model used (the type of variables included in the model) and the part of the distributions we look at (upper, middle, or lower).

Results

Inequality of opportunity estimates

We estimate equation 1 through OLS. Table 4 shows the relative factor shares: father education, the number of books at home (as proxy of cultural environment), followed by mother's education, general socioeconomic status of the household and the start schooling age, are the most relevant circumstances explaining achievement variability in 3rd grade.

In the AI terminology, a circumstance must always be considered as a sensitive feature, since, in a fair society, they should not be correlated with its achievement. How does the relevance of each circumstance correlate with AI-(un)fairness measure associated with each sensitive factor?

Discussion

We show below the equalized-odds measure for each circumstance (sensitive feature) obtained for each of the three models used (and four predictions generated): model 1, which includes only achievement in 3rd grade; model 2, which assumes model 1 extended with circumstances; model 3, in which we show the predictions of the part only using circumstances (which would be the inequality of opportunity model) and the part using the effort component (which would be the effort model). We do this for each circumstance and for the prediction of three parts of the distribution.

• Looking at performance below the 25th percentile, it could be relevant if the policy is aimed at giving reinforcement in 3rd grade to the most disadvantaged students to reduce educational failure in 6th grade. • Looking at performance in the upper tail (with performance above the 75th percentile), which could be relevant if the objective is to reward excellence with a scholarship program, for example. • Looking at intermediate performance, to detect a group of students representative of a class or school to select case and control groups for an education policy experiment.

The results are shown in Figures 1-3 for the different predicted percentiles. The detailed results for each sensitive variable (circumstance) are presented in Tables 5-7.

All detailed results can be found in the appendix. We summarize the main results below, which are also recapped in Table 5.

The model using the predictions of our effort proxy is the only one that achieves AI fairness regardless of which part of the distribution we want to predict. For all sensitive variables, the odd-ratio is very close to one, indicating that the model (whether good or bad) predicts almost equally well one group as another.

On the other hand, when we consider models (the rest of the predictions) that include circumstances in the model, the predictions come out differently depending on the group we consider and the part of the performance distribution we are predicting.

Thus, for example, when we seek to predict the upper part of the distribution (i.e., to predict whether the student will be excellent in 6th grade), the baseline model and the rest of the models that include information from the sensitive variables predict worse for the most disadvantaged category (i.e., lower socioeconomic status, with lower educational level of the parents, with fewer books in the home, etc.) than for the favored categories. Using these predictions for decision making (e.g., to award prizes for excellence) would generate a clear injustice in favor of those who have a more advantageous starting point, with the consequent increase (most likely) of unequal opportunities in the future. The model generates more unfair predictions the greater the weight of circumstances in the model (in our case the model that only uses circumstances as a predictor).

Moreover, the greatest differences in the predictions are seen among the circumstances that turn out to be most relevant in explaining the inequality of opportunities in the initial year. Thus, the model that generates the greatest injustice is when we compare the predictions between students of high and low educated parents, or when we compare students in homes with high and low cultural environments.

On the contrary, if the aim is to predict the lower tail of the distribution (for example, to give classes to reinforce learning), the prediction model that includes circumstances generates injustices (in the sense that it predicts better for one class than for another) but, in this case, the favored group (that generates better predictions) is the most disadvantaged. That is, it predicts school failure better for children from lower classes or worse circumstances than for children from better circumstances. Applying the predictions of this model would probably help to reduce inequality of opportunity in the future, but applying a bad policy: it improves inequality of opportunity by improving the most disadvantaged, and worsening the most advantaged.

Ideally, across the entire distribution, predictive models should be fair in the sense that they do not discriminate one group of different circumstances against another, and that this will eventually equalize the outcome of individuals.

Conclusion

In this study, we have explored the complexities of predicting students' academic performance using AI models, with a particular focus on addressing socioeconomic inequalities that influence predictive outcomes. Our analysis revealed significant biases in predictions favoring advantaged groups, particularly Through a comparative evaluation of three AI models and an ablation study, we demonstrated how these sensitive features, also known as circumstances, can distort predictions, leading to unfair outcomes in educational assessments. Importantly, we proposed a two-stage estimation procedure in our third model to mitigate these biases.

The findings underscore the critical importance of integrating fairness considerations into predictive modeling practices, particularly in educational settings where equitable outcomes are essential. Future research should further refine and validate these methodologies across diverse datasets and educational contexts to foster more inclusive and equitable predictive models.

A. Complete Results

In the tables shown in the appendix, numbers or letters are used in brackets to distinguish results achieved by the different models presented in this work:

Figure 1 :1Figure 1: Predictions below the 25th percentile. Odds of protected groups

Figure 2 :Figure 3 :23Figure 2: Predictions between the 25th and 75th percentiles. Odds of protected groups

• 1 :1refers to model 1 (Equation 3) • 2: refers to model 2 (Equation 4) • C: refers to the circumstance-based model • E: refers to the effort-based model (for this and the previous bullet point see the discussion regarding Model 3 in Section 2.3)

Table 11Set of circumstances (or sensitive features)Circumstance

Table 22Protected and unprotected groupsCircumstanceGroupGenderMale, FemaleMother EducationTertiary education, Rest of educational levelsFather educationTertiary education, Rest of educational levelsMother occupationHigh occupation, Rest of occupational levelsFather occupationHigh occupation, Rest of occupational levelsImmigrant status)No, YesStart schooling ageLess than 3 years old, More than 3 years oldNumber of books in householdMore than 1000 books, Less than 1000 booksPublic/private schoolPublic, PrivateCapital/non capital islandYes (Capital), NoSee adults reading booksDaily, Not dailyTeachers who change school each yearAlmost no change, ChangeSocioeconomic statusAbove the median, Below the median

Table 33Protected and unprotected groupsCircumstanceGroupPositive (Below 25) Negative (Above 25)

Table 55Summary of resultsBelow 25th percentileBetween 25th and 75th percentilesAbove 75th percentileTail of theCenter of theHigh tail of thedistributiondistributiondistributionUnfairly favoursUnfairly favoursModel 1 (baseline)protected groups?unprotected groups

Table 66Prediction below 25th percentileEq.Eq.Eq.Eq.CircumstanceCategoryRecall (1)Odds (1)Recall (2)Odds (2)Recall (C)Odds (C)Recall (E)Odds (E)Males0.501.000.491.000.311.000.491.00GenderFemales0.470.940.501.020.401.310.430.89Tertiary education0.401.000.321.000.031.000.511.00Mother educationRest of educational levels0.531.330.591.840.5318.440.430.8Tertiary education0.391.000.241.000.021.000.551.00Father educationRest of educational levels0.511.320.572.400.4619.630.430.78High occupation0.431.000.451.000.221.000.481.00Mother OccupationRest of occupational levels0.501.150.511.140.391.770.450.95High occupation0.441.000.431.000.251.000.461.0Father OccupationRest of occupational levels0.501.130.521.190.391.580.460.99More than 1000.381.000.291.000.021.000.541.0Books in householdLess than 1000.511.320.531.830.4217.600.440.8Almost no change0.431.000.431.000.161.000.501.0Teachers changing school each yearChange0.511.180.521.210.442.670.440.9No0.481.000.501.000.351.000.461.09Immigration statusYes0.521.090.460.930.381.080.501.0Private0.401.000.351.000.091.000.501.00Public/private schoolPublic0.511.260.531.510.434.940.450.9Yes0.471.000.491.000.321.000.451.00Capital islandNo0.571.200.551.120.581.810.491.08Less than 3 years old0.471.000.441.000.261.000.481.00Start schooling ageMore than 3 years old0.501.060.571.290.512.000.430.88Daily0.481.000.471.000.291.000.481.00See adults reading booksMore rarely (not daily)0.491.020.521.110.401.400.450.94Above the median0.411.000.341.000.071.000.511.00Socioeconomic statusBelow the median0.541.320.601.790.558.440.430.84Average1.181.416.360.92Weighted Average1.291.8914.680.83

Table 77Predictions between the 25th and 75th percentilesEq.Eq.Eq.Eq.CircumstanceCategoryRecall (1)Odds (1)Recall (2)Odds (2)Recall (C)Odds (C)Recall (E)Odds (E)Males0.561.000.581.000.511.000.551.00GenderFemales0.540.960.550.950.511.000.520.95Tertiary education0.581.000.621.000.531.000.521.00Mother educationRest of educational levels0.530.920.520.830.490.920.551.06Tertiary education0.561.000.611.000.411.000.501.00Father educationRest of educational levels0.550.970.540.890.551.320.551.12High occupation0.561.000.561.000.421.000.531.00Mother OccupationRest of occupational levels0.550.990.561.010.531.270.541.02High occupation0.561.000.601.000.441.000.531.00Father OccupationRest of occupational levels0.550.990.550.920.531.210.541.02More than 1000.591.000.621.000.411.000.531.00Books in householdLess than 1000.540.920.550.880.531.320.541.02Almost no change0.581.000.601.000.501.000.531.00Teachers changing school each yearChange0.540.930.540.910.511.030.541.03No0.551.000.561.000.511.000.541.00Immigration statusYes0.550.990.621.110.541.070.530.99Private0.591.000.621.000.461.000.551.00Public/private schoolPublic0.540.910.550.890.521.120.530.97Yes0.561.000.561.000.511.000.531.00Capital islandNo0.520.940.581.040.480.930.551.04Less than 3 years old0.561.000.591.000.521.000.521.00Start schooling ageMore than 3 years old0.550.980.520.880.480.910.571.09Daily0.561.000.591.000.511.000.531.00See adults reading booksMore rarely (not daily)0.550.980.540.910.500.980.541.03Above the median0.571.000.611.000.541.000.521.00Socioeconomic statusBelow the median0.540.940.520.850.480.900.551.07Average0.960.931.081.03Weighted Average0.940.871.151.06

Acknowledgments

This paper was partially supported by the "AEQUITAS" project funded by the European Union's Horizon Europe research and innovation programme under grant number 101070363.

Factors predicting the subjective well-being of nations EDiener MLDiener CDiener Journal of personality and social psychology 69 1995 HFLadd Holding schools accountable: Performance-based reform in education 1996 Towards accurate and fair prediction of college success: Evaluating different sources of student data RYu QLi CFischer SDoroudi DXu Educational Data Mining 2020 Inequality of opportunity for higher education WHSewell American Sociological Review 36 793 1971 FH GFerreira JGignoux The measurement of educational inequality : Achievement and opportunity 1 2014 Unfair inequality and growth GAMarrero JGRodríguez The Scandinavian Journal of Economics 2023 Inequality of opportunity in educational achievement in western europe: contributors and channels GAMarrero JCPalomino GSicilia The Journal of Economic Inequality 22 2023 MBrewer LWren-Lewis Accounting for changes in income inequality: Decomposition analyses for the uk 1978. 2016 ERN: Other Econometrics: Mathematical Methods & Programming (Topic) GSFields Accounting for income inequality and its change: A new method, with application to the distribution of earnings in the united states 2012 Inequality decompositions by factor components AFShorrocks Econometrica 50 1982 Inequality of opportunity and growth GAMarrero 2013 GAMarrero JGRodríguez Inequality of opportunity in europe, Microeconomics: Welfare Economics & Collective Decision-Making eJournal 2012