=Paper= {{Paper |id=Vol-3276/SSS-22_FinalPaper_83 |storemode=property |title=Advancing Fairness in Public Funding Using Domain Knowledge |pdfUrl=https://ceur-ws.org/Vol-3276/SSS-22_FinalPaper_83.pdf |volume=Vol-3276 |authors=Thomas Goolsby,Sheikh Rabiul Islam,Ingrid Russell }} ==Advancing Fairness in Public Funding Using Domain Knowledge== https://ceur-ws.org/Vol-3276/SSS-22_FinalPaper_83.pdf
           Advancing Fairness in Public Funding Using Domain Knowledge

                                  Thomas Goolsby, 1 Sheikh Rabiul Islam,2 Ingrid Russell3
                                                                University of Hartford1,2,3
                        goolsby@hartford.edu,1 shislam@hartford.edu,2 irussell@hartford.edu3


                                                                                                    In this work, we investigate the federal allocation of
                                    Abstract                                              funds for public transportation by keeping fairness issues in
Artificial Intelligence (AI) has become an integral part of                               mind. When we talk about fairness in this paper, we are
several modern-day solutions impacting many aspects of our                                speaking to the mitigation of hidden bias that can be
lives. Therefore, it is of paramount importance that AI-                                  introduced inadvertently during the machine learning
powered applications are fair and unbiased. In this work, we                              process. Ultimately, fairness in AI regarding this paper,
propose a domain knowledge infused AI-based system for                                    looks to employ known techniques to eliminate hidden bias.
public funding allocation in the transportation sector by
                                                                                          Furthermore, the FTA is supposed to distributes public funds
keeping potential fairness-related pitfalls in mind. In the
                                                                                          in an equitable fashion, as defined in Title VI of the Civil
transportation sector, in general, the funding allocation in a
particular geographic area corresponds to the population in                               Rights Act of 1964, thus it is our goal to replicate that equity
that area. However, we found that areas with high diversity                               through using a machine learning approach that mitigates
index have a higher public transit ridership, and this is a                               bias that may fabricate during the process. In the
crucial piece of information to consider for an equitable                                 transportation sector, in general, the funding allocation in a
distribution of funding. Therefore, in our proposed approach,                             particular geographic area corresponds to the population in
we use the above fact as domain knowledge to guide the                                    that area. However, we found that areas with high diversity
developed model to detect and mitigate the hidden bias in                                 index have a higher public transit ridership and a crucial
funding distribution. Our intervention has the potential to                               piece of information to consider for an equitable distribution
improve the declining rate of public transit ridership which
                                                                                          of funding. Therefore, in our proposed approach, we use the
has decreased by 3% in the last decade. An increase in public
                                                                                          above fact as domain knowledge to guide the developed
transit ridership has the potential to reduce the use of
personal vehicles as well as to reduce the carbon footprint.                              model to detect and mitigate the hidden bias in funding
                                                                                          distribution.
                                  Keywords                                                          Domain knowledge is a high-level, abstract concept
domain knowledge, artificial intelligence, machine learning,                              that encompasses the problem area. For example, in a car
federal funding, federal transit administration, public                                   classification problem from images, the domain knowledge
transportation, bias, fairness                                                            could be that a convertible has no roof, or a sedan has four
                                                                                          doors, etc. However, encoding this domain knowledge in a
                                Introduction                                              black-box model is challenging. Bias can occur during data
                                                                                          collection, data preprocessing, algorithm processing, or the
Available public data establishes a set of criteria based on                              act of making an algorithmic decision. Through the
census data to determine how funding is tabulated and                                     comparison of machine learning models with and without
granted to federal transit agencies in major Urbanized Areas                              domain knowledge, this work measures the effectiveness of
(UZAs) in the United States (Giorgis 2020). The current                                   domain knowledge integration. We use different machine
system takes into consideration a range of census-based                                   learning classifiers such as Random Forests (RF), Extra
criteria (Giorgis 2020) and supposed to take into                                         Trees (ET), and K-nearest neighbor, to name a few, for the
consideration of protected attributes defined in Title VI of                              experiments. We also use IBM AI Fairness 360 to detect and
the Civil Rights Act of 1964 (Title VI 1964) among other                                  mitigate bias and evaluate different standard fairness metrics
determinants. This raises the question as to how and if it is                             to further emphasize the effect of incorporating domain
possible to use AI-based systems to allocate federal funding                              knowledge into our proposed approach.
in an equitable fashion while abiding by Title VI guidelines.
___________________________________
In T. Kido, K. Takadama (Eds.), Proceedings of the AAAI 2022 Spring Symposium
“How Fair is Fair? Achieving Wellbeing AI”, Stanford University, Palo Alto, California,
USA, March 21–23, 2022. Copyright © 2022 for this paper by its authors. Use permitted
under Creative Commons License Attribution 4.0 International (CC BY 4.0).


                                                                                                                                                   66
 AAAI Spring Symposium ‘22, March 2022, Palo Alto, California                                                    Goolsby et al.

                       Background                                 non-white groups (Omi and Winant 2014). With observing
                                                                  domain knowledge in the allocation of federal funds, we
A good amount of work has been conducted on the domain
                                                                  must be extremely cautious of these implications. Link and
of bias and fairness in AI. Mehrabi et. al. developed a general
                                                                  Phelan provide a clear definition of what stigma is. They
survey exploring this topic. They emphasize the importance
                                                                  define stigma as “the co-occurrence of labeling,
of a continuous feedback loop between data, algorithms, and
                                                                  stereotyping, separation (segregation), status debasement,
users (Mehrabi et. al. 2021). This accentuates how
                                                                  and discrimination” (AI Fairness 360 2021). By
susceptible AI algorithms are to bias. This bias can be
                                                                  understanding the systemic instillment of stigma in racial
introduced when data is collected. It’s important to be aware
                                                                  categories, this work will look for ways to introduce fair
of the kinds of bias that can occur as well.
                                                                  domain knowledge without reifying those dangerous
          Seeing as unique the interaction between data and
                                                                  stigmas. This ultimately leads to some implications in the
users is, there are two biases in particular that apply to the
                                                                  development of a fair AI algorithm for allocating federal
data that we are working with. One of those biases is the
                                                                  funds for public transportation.
omitted variable bias, which occurs when one or more
                                                                           Public transit agencies are supposed to abide by the
important variables are left out of the model (Riegg 2008,
                                                                  Title VI of the Civil Rights Act of 1964. The Federal Transit
Mustard 2003, Clarke 2005). A simple example of this type
                                                                  Agency (FTA) follows closely with the rules written in Title
of bias in play could be with an algorithm that is trained to
                                                                  VI which protects people from discrimination based on race,
predict when users will unsubscribe from a company’s
                                                                  color, and national origin in programs and activities
service. A possible omitted variable here could be that a
                                                                  receiving federal financial assistance (Title VI 1964). Within
strong competitor enters the market that the algorithm was
                                                                  this work, we also abide by these laws to develop a legally
unaware of (Mehrabi et. al. 2021). The introduction of this
                                                                  applicable AI for allocating federal funds, and investigate
competitor would be the omitted variable, which would then
                                                                  the disparities. A fair and unbiased AI algorithm for
lead to bias being introduced in the algorithm when it tries
                                                                  allocating federal funds for public transportation could
to predict when a particular customer would unsubscribe.
                                                                  further help combat the national decline in public transit
The other important form of bias is aggregation bias.
                                                                  ridership. William J Mallet from the Congressional Research
Aggregation bias occurs when a one-size-fits-all model is
                                                                  Service emphasized that public transit ridership has declined
used for groups with different conditional distributions
                                                                  nationally by 7% over the last decade (Mallett et. al. 2018).
(Suresh and Guttag 2019). Both the omitted variable bias and
                                                                  Competing transportation options like personal vehicles,
aggregation bias are unique in machine learning applications
                                                                  ride-sourcing (e.g., Uber), and bike-sharing are partially at
since they are technical biases that can occur at any point in
                                                                  the forefront of the national decline. Some solutions
the machine learning process. This leads to them being
                                                                  proposed by this work are incentive funding, raising user
particularly difficult to counteract. The authors of this work
                                                                  fees on personal automobiles, and improving general
discussed how the introduction of discrimination in AI is
                                                                  funding for public transportation (Mallett et. al. 2018). That
unique since it is a direct interaction between data and users.
                                                                  is where this work comes in; to attempt and answer the
Again, domain knowledge is being used to attempt to
                                                                  question of if an AI algorithm embedded with fairness can
counteract specific instances of bias like this.
                                                                  contribute to a more equitable solution.
          Furthermore, it is important to understand the
problematic nature of introducing racial categories to
machine learning. Programmers face a unique dilemma in
                                                                                Experiments and Results
this problem domain since they can either be blind to racial      This project explores how domain knowledge can be
group disparities or be conscious of those racial categories      integrated to ensure fairness in AI. A publicly available
(Benthall et. al. 2019). However, regardless of which path        dataset on the allocation of federal funds to public
the programmer chooses to go down, both options ultimately        transportation agencies is being used (Giorgis 2020). This
reify the negative and inaccurate implications of race in         dataset is the basis on which this exploration and application
society. Moreover, this observes differences in races in the      of machine learning is being used. The dataset includes
United States, which is inherently problematic. Race              official data from 2014-2019 on 449 Federal Transit Agency
differences are created by ascribing race classifications onto    (FTA) defined public transportation agencies in the
individuals who were previously racially unspecified. This        continental United States, Alaska, Hawaii, and Puerto Rico.
ultimately leads to the newly racially classified individuals     The dataset is read into RStudio using R version 4.1.0 and
being linked to stereotyped and stigmatized beliefs about         Python version 3.8.0. The R programming language is being
                                                                  used in a simple R script while Python is being used in




                                                                                                                         67
isolated code chunks within an R markdown (Rmd) file. For         an effective incorporation of census-based domain
bias detection and mitigation, we use IBM AI 360 fairness         knowledge. To convert the diversity index by county into
open-source toolkit (AI Fairness 360 2021).                       class, the distribution of the values is being evaluated. As
                                                                  seen in Figure 1, there is a great number of observations
Data Preprocessing                                                (roughly 55%) that have a diversity index between 0.25 and
This dataset (Giorgis 2020) is being preprocessed into a          0.5.
summarized form which gives totals for individual transit
agencies per year. The data started off with 42 columns and
36,656 rows. Empty columns and rows are deleted which
then leads to the dataset containing 40 columns and 18,673
rows. The overall dataset is then split up into separate data
containers for individual years; thus, producing six separate
datasets for six individual years (2014-2019). Each of the six
datasets contains 13 columns and anywhere from 440-444
rows depending on the year. Finally, separate data containers
are combined back into a single data container which now
consists of summarized data for every given FTA UZA per
year. This summarized data container containing all data
from 2014-2019 has 13 columns and 2,615 rows.                     Figure 1: Histogram of diversity index by county
          Furthermore, the measure of operating expenses is
converted to classes in which supervised machine learning                   Therefore, since the distribution looked as such
can take place. Operating expense classes are determined by       with 4 bins, the diversity index by county was split into 4
examining the distribution of operating expenses across           classes. The first class is “Very Low”, which constitutes all
transit agencies. It was found that the distribution was          observations that have a diversity index greater than or equal
skewed towards the lower end (< $100,000,000). However,           to 0 and less than 0.25. The next class was “Low” which is
it is also found that the total amount of operating expenses      made up of observations that have a diversity index greater
for a specific transit agency has a high correlation, roughly     than or equal to 0.25 while also less than 0.5. Then the
95%, with the population of its service area. These are the       “Moderate” class includes all observations that have a
factors that lead to the current distribution of operating        diversity index greater than or equal to 0.5 and less than 0.75.
expense level classes. Data is currently being utilized from      Finally, the last class was “High” which included the
the 2020 national census, specifically diversity indices at the   remaining observations, or those that have a diversity index
state and county level. Data engineering techniques are           that is greater than or equal to 0.75 and less than 1 (since this
being used to incorporate both state and county-level             is the maximum value possible). These class bounds are also
diversity indices into the summarized public funds'               supported by the fact that the diversity index by county had
allocation dataset.                                               the largest correlation with the population of a particular
          To evaluate the fairness of the models with domain      UZA. It’s found that the correlation between these two
knowledge, diversity index by county had to be sorted into        values is 0.26, which again was the highest correlation that
classes. Diversity index by county is being used as the           diversity index by county had with any other variable in the
primary form of domain knowledge here since it provides a         data set (see Figure 2). Furthermore, one of the variables that
clearer vision of the diversity across populations. The           has the highest correlation with primary UZA population is
diversity index serves as a measure of how likely it is that      unlinked passenger trips (0.76). Total unlinked passenger
two individuals chosen at random from a population are            trips serves as an FTA defined measure of public
from different races and ethnic groups (Bureau et. al. 2021).     transportation ridership. Therefore, we can see the relation
The diversity index is bound between 0 and 1 where a 0-           here that urban areas with higher population tend to have
value indicates that everyone in the population has the same      higher public transit ridership as well as a higher diversity
racial and ethnic characteristics. While a value closer to 1      index by county.
indicates that everyone in the population has different racial
and ethnic characteristics (Bureau et. al. 2021). Therefore,
we observe diversity index by county for each of the 449
FTA-defined public transportation agencies, and found it as




                                                                                                                           68
 AAAI Spring Symposium ‘22, March 2022, Palo Alto, California                                                    Goolsby et al.

                                                                  Forest), while the class package is being used to develop the
                                                                  k-nearest neighbor algorithm in R. For the models without
                                                                  domain knowledge, 12 columns are being used. The 11
                                                                  predictors are all numeric values and some of the variables
                                                                  include Primary UZA Population, Total Unlinked Passenger
                                                                  Trips, and Total Passenger Miles Traveled to name a few.
                                                                  These 11 predictors are being used to predict Total
                                                                  Operating Expenses, which serves as a general measure of
                                                                  how much money a specific FTA transportation agency is
                                                                  receiving/spending. The models with domain knowledge
                                                                  have 12 predictors: the same 11 predictors as the models
                                                                  without domain knowledge, plus our variable representing
                                                                  Diversity Index by County employed as domain knowledge.
                                                                  The goal of measuring the accuracy, precision, recall, and
                                                                  ROC performance metrics was to take a trivial look as to if
Figure 2: Heatmap of all numeric variables in data set            incorporating domain knowledge into some simple
                                                                  classification models will drastically affect those values. As
         Furthermore, 7 out of the top 10 UZA is from the         seen in Table 1, the accuracy, precision, recall, and ROC
top 10 diverse states (Jensen et. al. 2021) – Hawaii,             metrics are calculated, each of which has a value of 0.99X.
California, Nevada, Maryland, District of Columbia, Texas,        The metrics with domain knowledge (i.e., after incorporating
New Jersey, New York, Georgia, and Florida (Jensen et. al.        diversity index as encoded domain knowledge) deviated
2021).                                                            only slightly from the metrics produced by the models
                                                                  without domain knowledge. The largest difference between
Table 1. Top 10 UZA with the highest ridership                    metrics of models with and without domain knowledge can
 New York-Newark, NY-NJ-CT                                        be seen in the K-Nearest Neighbor models. The average
 Los Angeles-Long Beach-Anaheim, CA                               difference between models without domain knowledge
 Chicago, IL-IN                                                   minus the models with domain knowledge is 0.00265. This
 Washington, DC-VA-MD                                             difference is negligible and expected considering the overall
 San Francisco-Oakland, CA                                        societal impact from it.
 Boston, MA-NH-RI
 Philadelphia, PA-NJ-DE-MD                                        Table 2. Accuracy precision, recall, and ROC metrics for
                                                                  Random Forest models w/ and w/o domain knowledge
 Seattle, WA
                                                                   Random           Without   Domain        With    Domain
 Miami, FL
                                                                   Forest           Knowledge               Knowledge
                                                                   Accuracy         0.99492                 0.99490
Although the diversity index of a county has the highest
                                                                   Precision        0.99488                 0.99501
correlation (.26) with the population of UZA, it has a
comparatively low correlation (.14) with total operating           Recall           0.99492                 0.99490
expenses in that area. This finding encourages us to develop       ROC              0.99998                 0.99998
an equitable distribution technique.
                                                                  Table 3. Accuracy precision, recall, and ROC metrics for Extra
Model Creation                                                    Trees models w/ and w/o domain knowledge
Both the R and Python programming languages are being              Extra Trees          Without Domain      With    Domain
used to create machine learning models on the dataset. R is                             Knowledge           Knowledge
primarily being used to preprocess the dataset while Python        Accuracy             0.99619             0.99618
is being used to develop classification models using a 70/30       Precision            0.99617             0.99622
training and test set split. Random forest, extra trees, and k-    Recall               0.99619             0.99618
nearest neighbor models without domain knowledge (i.e.,            ROC                  0.99998             0.99998
without considering diversity index) are being developed
and analyzed. The scikit-learn package is being used to
develop Python-based supervisor learning model (Random




                                                                                                                         69
                                                                   Figure 3: Density plot for Primary UZA Population
Table 4. Accuracy, precision, recall, and ROC metrics for K-
Nearest Neighbor models w/ and w/o domain knowledge
 K-Nearest            Without Domain        With    Domain
 Neighbor             Knowledge             Knowledge
 Accuracy             0.99111               0.98854
 Precision            0.99099               0.98849
 Recall               0.99111               0.98854
 ROC                  0.99952               0.99655

Fairness Evaluation Preprocessing
For evaluating fairness in the models with domain
knowledge, the IBM AI 360 tool is being used. We use the
R package of this tool for our experiment. To begin the            Figure 4: Density plot for Total Unlinked Passenger Trips
process of evaluating fairness, the data set needs to be
converted into a binary representation of itself. The most         A very similar idea is being used to split up total unlinked
important columns are being chosen to be present in the            passenger trips into classes. The National Transit Database
fairness evaluation. These variables were deemed the most          (NTD) and the FTA provided the explain that unlinked
important since they all presented the highest correlation         passenger trips are the number of boardings on public
with the variable being predicted: Total Operating Expenses.       transportation vehicles in a fiscal year for a specific
Furthermore, these variables are all numeric values which          transportation agency (Federal Transit Administration
are imperative to the development of classification models         2021). Transit agencies must count each passenger that
that can be evaluated using the IBM AI 360 (AI Fairness 360        boards their vehicles, regardless of how many vehicles the
2021). Considering that all these variables are numeric
values, it is much clearer where to set bounds when
converting variables to binary representations.
          This includes the population of the UZA in which
transit agencies exist in, total unlinked passenger trips, year,
and operating expense level. Since the operating expense
level is already broken into classes (Low, Medium, High), a
separate column is made for each. For example, there is one
column labeled “Operating Expense Level Low”, which has
a 1 in this column in the operating expenses are categorized
as "Low" and a 0 in every other row. A little more nuance is
being taken to convert the UZA population and total
unlinked passenger trips columns to binary representations.        passenger boards from origin to destination (Federal Transit
The density of both these variables shows a heavy                  Administration 2021). Similar to previous variables, 3
concentration of observations at the lower end (Figures 3 &        classes are being created with the following ranges for each:
4).
          Since both these variables have such many                   •    Low: total unlinked passenger trips [0, 5M)
observations near the lower end of the range, the ranges for          •    Medium: total unlinked passenger trips [5M, 100M)
the classes are being chosen to reflect this trend. For UZA           •    High: total unlinked passenger trips [100M, MAX)
population, three classes are being created to split this
column into a binary representation. The following is the          The Year variable is also being split up into a binary
range for each class for the UZA population:                       representation. The year in this data set ranges from 2014 to
                                                                   2019. Thus, a separate column for each year is being made
- Low: population [0, 250]                                         where a value of 1 means the specific observation is from
- Medium: population [250K, 1M)                                    that year. The last variable that is converted to a binary
- High: population [1M, MAX]                                       representation is, of course, the diversity index by county.
                                                                   Simply, for this column, a value of 1 is given if the diversity

                                                                                                                                 5



                                                                                                                           70
 AAAI Spring Symposium ‘22, March 2022, Palo Alto, California                                                          Goolsby et al.

index is categorized as “Moderate” or “High” and a value of            distribution of funds between FTA transportation agencies
0 is the diversity index is categorized as “Very Low” or               that are based in a county with a high diversity index (>=
“Low”. Figure 5 provides a snapshot of the data after all              0.75). By observing these specific fairness metrics, we can
variables are done being converted to binary representations.          see how favorable outcomes, or higher federal funding, may
The data set still has 2,615 rows, however, the binary data            be unequally distributed among privileged and unprivileged
set has 16 columns.                                                    groups.
                                                                                Furthermore, we chose to employ the IBM AI 360
  High      High    Medium      Low      High      Medium              toolkit as it provided a compact and efficient collection of
Diversity Operating Operating Operating Primary Primary                fairness evaluation libraries. The problem are of the project
Index by Expenses Expenses Expenses      UZA        UZA                is perfectly encapsulated in the recommended uses of the
 County                                Population Population           toolkit. The creators of the IBM AI 360 toolkit explain that
   1         0         0         1         0          0
                                                                       the toolkit should be used in very limited settings, one of
   0         1         0         0         0          1
                                                                       which is allocation assessment problems with well-defined
   1         1         0         0         1          0
                                                                       protected attributes (AI Fairness 360 2021). This project’s
   0         1         0         0         1          0
   1         0         1         0         0          0
                                                                       problem area deals with allocation of funds. Moreover, and
   0         1         0         0         0          1                more importantly, the dataset being used for the fairness
   1         1         0         0         0          1                evaluation has a well-defined protected attribute, which is
   0         1         0         0         1          0                diversity index by county, since as we have explained earlier
   1         1         0         0         1          0                in the paper, has unintentional bias defined by the FTA and
   0         1         0         0         1          0                protected by Title VI of the Civil Rights Act of 1964.
                                                                                The reweighing function is our tool of choice in the
Figure 5: Snapshot of the binary representation of data                IBM AI 360 toolkit as it assigns weights to training set tuples
                                                                       instead of changing class labels (Kamiran et. al. 2012). This
Fairness Metrics Calculation                                           is favorable since we want to analyze how diversity index by
We create a new R script to calculate the desired fairness             county plays a role in the mitigation of bias in this problem.
metrics. A simple definition of a fairness metric, as provided                  In the R environment, the “aif360” library is being
in the documentation of the IBM AI 360 tool, is a                      used, which includes all the metrics and capabilities
quantification of unwanted bias in training data or models             provided by the IBM AI 360 project. The project library is
(AI Fairness 360 2021). The fairness metrics that are being            loaded into the R environment and the binary data set from
evaluated in this project are statistical parity difference,           Figure 5 is also loaded in. To run any metric calculations
disparate impact, equal opportunity difference, and the Theil          with this library, any R data frames must be converted into
index. A brief definition for each observed fairness metric is         an aif data set, which asks for the protected attribute, the
as follows:                                                            privileged (i.e., reference group) and unprivileged value for
   •    Statistical parity difference: the difference in the rate of   the protected attribute, and the target variable. For our case,
        favorable outcomes received by the unprivileged group to       the target variable is the “Operating Expense Level High”
        the privileged group.                                          column. To reiterate, a value of 1 is given in this column if
   •    Disparate impact: the ratio of the rate of a favorable         the observation is considered to have “High” operating
        outcome for the unprivileged group to that of the              expenses, or operating expenses of more than
        privileged group.                                              $1,000,000,000. The protected attribute in this project is the
   •    Equal opportunity difference: the difference of true           diversity index by county column that was added as a piece
        positive rates between the unprivileged and the privileged     of domain knowledge. To capture the nature of the protected
        groups.
                                                                       attribute, the privileged group are observations that have a
   •    Theil index: measures the inequality in benefit allocation
                                                                       value of 0, or “Very Low” and “Low” diversity indices, and
        for individuals.
                                                                       the unprivileged group are observations that have a value of
        These four fairness metrics were chosen based on the           1, or “Moderate” and “High” diversity indices.
information provided on the IBM AI 360 tool. Furthermore,                        The IBM AI 360 library uses underlying
these four metrics specifically evaluate privileged versus             classification models to help develop and calculate fairness
unprivileged groups in terms of individual and group                   metrics. Since the IBM AI 360 library uses classification
fairness. Regarding this project, we are looking at the                models, we need two data sets to compare the true data with
                                                                       the predicted data. Thus, we have one aif data set that is the




                                                                                                                               71
raw binary data, and another that is nearly identical,
however, the “Total Operating Expenses High” variable was
predicted by a simple logistic regression model (this is called
the newly classified dataset). The reweighing technique
(Kamiran et. al. 2012, Aif360 2021), which modifies the
weights of different training examples, is being used to help
mitigate any bias that is present in this project. The IBM AI
360 tool includes a reweighing option that modifies the
weight of different training instances. The reweigh algorithm
is being applied to both the original binary data set as well
as the classified data set. Once both data sets are reweighed,
the fairness metrics can be calculated and compared to the
original data. Graphs are being produced to show the               Figure 7: Disparate impact of original vs mitigated data
difference and improvement after bias is mitigated through
reweighing. Figures 6, 7, 8, and 9 show the comparison of
fairness metrics between the original data and the reweighed
data that has bias mitigated.
          Calculating all four desired fairness metrics shows
that mitigating bias through reweighing leads to either
metrics being the same, or slightly improving the value. As
seen in all graphs, both the original data and the mitigated
data are within the fair range. Statistical parity difference
(i.e., discrimination) was reduced to .035 from .051 using
domain knowledge (see Figure 6). Statistical parity, also
called demographic parity, ensures each group has an equal
probability of being assigned to the positive predicted class.
          By mitigating bias, we can produce fairness metrics      Figure 8: Equal opportunity difference of original vs mitigated
that are closer to true fairness, which is a value of 0 for        data
statistical parity difference, equal opportunity difference,
and Theil index, and a value of 1 for disparate impact.
Currently, we are infusing the diversity index as domain
knowledge. However, in the future, we would also like to
investigate the infusible domain knowledge more by
examining other criteria such as native language spoken, and
family income.




                                                                   Figure 9: Theil index of original vs mitigated data




                                                                            Contributions and Future Works
                                                                   By investigating the implications of domain knowledge on
                                                                   creating fair decision-making, this work explores how true
Figure 6: Statistical parity difference of original vs mitigated
                                                                   fairness in AI can be achieved within the application of
data
                                                                   public funding allocation. This work investigates how


                                                                                                                                   7



                                                                                                                              72
 AAAI Spring Symposium ‘22, March 2022, Palo Alto, California                                                         Goolsby et al.

federal agencies like the FTA could apply AI in the process               0.4.0 documentation. (n.d.). Retrieved November 24,
of allocating funds. In general, the allocation of FTA funds              2021, from
                                                                          https://aif360.readthedocs.io/en/latest/modules/generated/
corresponds to the population in an area (i.e., UZA).                     aif360.algorithms.preprocessing.Reweighing.html.
However, it is found that areas with a higher diversity index        [10] Riegg, S. K. (2008). Causal inference and omitted
have higher public transport ridership. Our proposed domain               variable bias in financial aid research: Assessing
knowledge infused approach can reduce statistical parity                  solutions. The Review of Higher Education, 31(3), 329-
difference which helps to ensure each group has an equal                  354.
                                                                     [11] Mustard, D. B. (2003). Reexamining criminal behavior:
probability of being assigned to the positive predicted class.            the importance of omitted variable bias. Review of
Finding the right domain knowledge is very challenging.                   Economics and Statistics, 85(1), 205-211.
Going forward, we want to incorporate and investigate the            [12] Clarke, K. A. (2005). The phantom menace: Omitted
impact on other protected variables (e.g., native language                variable bias in econometric research. Conflict
                                                                          management and peace science, 22(4), 341-352.
spoken, family income), and find a way to enhance the                [13] Suresh, H., & Guttag, J. V. (2019). A framework for
infusible domain knowledge that reduces different                         understanding unintended consequences of machine
disparities. An increase in public transit ridership has the              learning. arXiv preprint arXiv:1901.10002, 2.
potential to reduce the use of personal vehicles as well as to       [14] Omi, M., & Winant, H. (2014). Racial formation in the
reduce the carbon footprint. A quantitative analysis of this              United States. Routledge.
                                                                     [15] AI Fairness 360. Retrieved November 25, 2021, from
possibility could be another direction of research.                       https://aif360.mybluemix.net/
                                                                     [16] Federal Transit Administration Office of Budget and
References                                                                Policy. (2021, December 13). National Transit Database
 [1] Giorgis, J. D. (2020). Federal Funding Allocation                    2021 policy manual. Retrieved January 24, 2022, from
     [Dataset]. United States Department of                               https://www.transit.dot.gov/sites/fta.dot.gov/files/2021-
     Transportation- FTA Federal Funding Allocation Since                 12/2021-NTD-Reduced-Reporting-Policy-Manual_1-
     2014, from https://catalog.data.gov/dataset/federal-                 1.pdf
     funding-allocation
 [2] Title VI of Civil Rights Act of 1964, Pub.L. 88-352, 78
     Stat. 241 (1964).
 [3] Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., &
     Galstyan, A. (2021). A survey on bias and fairness in
     machine learning. ACM Computing Surveys (CSUR),
     54(6), 1-35.
 [4] Benthall, S., & Haynes, B. D. (2019, January). Racial
     categories in machine learning. In Proceedings of the
     conference on fairness, accountability, and transparency
     (pp. 289-298).
 [5] Mallett, W. J. (2018, March). Trends in public
     transportation ridership: Implications for federal policy
     (No. R45144). Congressional Research Service.
 [6] Bureau, U. S. C. (2021, October 14). Racial and ethnic
     diversity in the United States: 2010 census and 2020
     census. Census.gov. Retrieved November 22, 2021, from
     https://census.gov/library/visualizations/interactive/racial-
     and-ethnic-diversity-in-the-united-states-2010-and-2020-
     census.html
 [7] Jensen, E., Jones, N., Rabe, M., Pratt, B., Medina, L.,
     Orozco, K., & Spell, L. (2021, August 12). The chance
     that two people chosen at random are of different race or
     ethnicity groups has increased since 2010. Census.gov.
     Retrieved November 24, 2021, from
     https://www.census.gov/library/stories/2021/08/2020-
     united-states-population-more-racially-ethnically-diverse-
     than-2010.html
 [8] Kamiran, F., & Calders, T. (2012). Data preprocessing
     techniques for classification without
     discrimination. Knowledge and Information
     Systems, 33(1), 1-33.
 [9] Aif360 Algorithms Preprocessing Reweighing.
     aif360.algorithms.preprocessing.Reweighing - aif360




                                                                                                                              73