=Paper= {{Paper |id=Vol-2812/RDAI-2021_paper_5 |storemode=property |title=The Limits of Global Inclusion in AI Development |pdfUrl=https://ceur-ws.org/Vol-2812/RDAI-2021_paper_5.pdf |volume=Vol-2812 |authors=Alan Chan,Chinasa T. Okolo,Zachary Terner,Angelina Wang }} ==The Limits of Global Inclusion in AI Development== https://ceur-ws.org/Vol-2812/RDAI-2021_paper_5.pdf
                              The Limits of Global Inclusion in AI Development

                   Alan Chan * 1 , Chinasa T. Okolo ∗ 2 , Zachary Terner ∗ 3 , Angelina Wang ∗ 4
                                              1
                                       Mila, Université de Montréal 2 Cornell University
                                      3
                                 National Institute of Statistical Sciences 4 Princeton University
                 alan.chan@mila.quebec, chinasa@cs.cornell.edu, zterner@niss.org, angelina.wang@princeton.edu



                               Abstract                                      countries in the Global South, given the exploitation and un-
                                                                             derdevelopment of these regions by European colonial pow-
   Those best-positioned to profit from the proliferation of arti-
                                                                             ers (Frank 1967; Rodney 1972; Jarosz 2003; Bruhn and Gal-
   ficial intelligence (AI) systems are those with the most eco-
   nomic power. Extant global inequality has motivated Western               lego 2012).
   institutions to involve more diverse groups in the develop-                  Current global inequality in AI development involves both
   ment and application of AI systems, including hiring foreign              a concentration of profits and a danger of ignoring the con-
   labour and establishing extra-national data centres and labo-             texts to which AI is applied. As AI systems become increas-
   ratories. However, given both the propensity of wealth to abet            ingly integrated into society, those responsible for develop-
   its own accumulation and the lack of contextual knowledge in              ing and implementing such systems stand to profit to a large
   top-down AI solutions, we argue that more focus should be                 extent. If these players are predominantly located outside of
   placed on the redistribution of power, rather than just on in-            the Global South, a disproportionate share of economic ben-
   cluding underrepresented groups. Unless more is done to en-               efit will fall also outside of this region, exacerbating extant
   sure that opportunities to lead AI development are distributed
                                                                             inequality. Furthermore, the ethical application of AI sys-
   justly, the future may hold only AI systems which are un-
   suited to their conditions of application, and exacerbate in-             tems requires knowledge of the contexts in which they are to
   equality.                                                                 be applied. As recent work (Grush 2015; De La Garza 2020;
                                                                             Coalition for Critical Technology 2020; Beede et al. 2020;
                                                                             Sambasivan et al. 2020) has highlighted, work that lacks this
                           Introduction                                      contextual knowledge can fail to help the targeted individu-
The arm of global inequality is long, rendering itself vis-                  als, and can even harm them (e.g., misdiagnoses in medical
ible especially in the development of artificial intelligence                applications).
(AI). In an analysis of publications at two major machine                       Whether explicitly in response to these problems or not,
learning conference venues, NeurIPS 2020 and ICML 2020,                      calls have been made for broader inclusion in the devel-
Chuvpilo (2020) found that of the top 10 countries in terms                  opment of AI (Asemota 2018; Lee et al. 2019). At the
of publication index, none were located in Latin Amer-                       same time, some have acknowledged the limitations of in-
ica, Africa, or Southeast Asia. Vietnam, the highest placing                 clusion. Sloane et al. (2020) describes and argues against
country of these groups, comes in 27th place. Of the top                     participation-washing, whereby the mere fact that somebody
10 institutions by publication index, eight out of 10 were                   has participated in a project lends it moral legitimacy. In this
based in the United States, including American tech giants                   work, we focus upon the implications of participation for
like Google, Microsoft, and Facebook. Indeed, the full lists                 global inequality, focusing particularly on the limitations in
of the top 100 universities and top 100 companies by pub-                    which inclusion in AI development is practised in the Global
lication index include no companies or universities based in                 South. We look specifically at how this plays out in the do-
Africa or Latin America. Although conference publications                    mains of datasets and research labs, and conclude with a
are just one metric, they remain the predominant medium in                   discussion of opportunities for ameliorating the power im-
which progress in AI is disseminated, and as such serve to                   balance in AI development.
be a signal of who is generating research.
   These statistics are unsurprising. The predominance of the                                          Datasets
United States in these rankings is consistent with its eco-                  Given the centrality of large amounts of data in today’s ma-
nomic and cultural dominance, just as the appearance of                      chine learning systems, there would appear to be substantial
China with the second highest index is a marker of its grow-                 opportunity for inclusion in data collection and labeling pro-
ing might. Also comprehensible is the relative absence of                    cesses. While there are benefits to more diverse participation
    * Author order is alphabetical by last name. All authors con-            in data-gathering pipelines (that is, processes involved in the
tributed equally to this work.                                               collection, labeling, and other processing of data for use in
Copyright© ©
Copyright     2021,
            2021      Association
                 for this           forauthors.
                          paper by its  the Advancement
                                                Use permittedofunder
                                                                Artificial   machine-learning systems), we will highlight how this ap-
Creative Commons
Intelligence       License Attribution
             (www.aaai.org).            4.0 reserved.
                                All rights  International (CC BY 4.0).       proach does not go far enough in addressing global inequal-
ity in AI development.                                             to client specifications (Gent 2019; Croce and Musa 2019).
   Data collection itself is a practice fraught with prob-         In the Global South recently, local companies have begun
lems of inclusion and representation. Two large, publicly          to proliferate, like Fastagger in Kenya, Sebenz.ai in South
available image datasets, ImageNet (Deng et al. 2009; Rus-         Africa, and Supahands in Malaysia. As AI development con-
sakovsky et al. 2015) and OpenImages (Krasin et al. 2017),         tinues to scale, the expansion of these companies opens the
are US- and Eurocentric (Shankar et al. 2017). Shankar et al.      door for low-skilled laborers to enter the workforce but also
(2017) further argues that models trained on these datasets        presents a chance for exploitation to continue to occur.
perform worse on images from the Global South. For ex-
                                                                   Barriers to Participation There are barriers that exist to
ample, images of grooms are classified with lower accuracy
                                                                   participating in data labeling. The most obvious is that a
when they come from Ethiopia and Pakistan, compared to
                                                                   computing device and stable internet access are required
images of grooms from the United States. Along this vein,
                                                                   for access to these data labeling platforms. These goods are
DeVries et al. (2019) shows that images of the same word,
                                                                   highly correlated with socioeconomic status and geographic
like “wedding” or “spices”, look very different when queried
                                                                   locations, thus serving as a barrier to participation for many
in different languages, as they are presented distinctly in dif-
                                                                   Harris, Straker, and Pollock (2017). A reliable internet con-
ferent cultures. Thus, publicly available object recognition
                                                                   nection is necessary for finding tasks to complete, complet-
systems fail to correctly classify many of these objects when
                                                                   ing those tasks, and accessing the remuneration for those
they come from the Global South. A representative dataset is
                                                                   tasks. Further, those in the Global South pay higher prices
crucial to allowing models to learn how certain objects and
                                                                   for Internet access compared to their counterparts in the
concepts are represented in different cultures.
                                                                   Global North (i.e. Western countries) (Nzekwe 2019). An-
   Since many deep learning techniques require large               other barrier is in the method of payment for data labeling
amounts of data to train their models, the importance of data      services on some of these platforms. For example, Amazon
labeling has grown. The data collection and labeling mar-          Mechanical Turk, a widely used platform for finding data
ket is expected to grow to $6.5 billion USD by 2027 (Grand         labelers, only allows payment to a U.S. Bank Account or in
View Research 2020), while Cognilytica (2019) estimates            the form of an Amazon.com gift card (Amazon 2020). These
that over 80% of the machine learning development process          methods of payment restrict may not be what is desired by
consists of data preparation tasks (collection, cleaning, and      a worker, and can serve as a deterrent to work for this plat-
labeling). Large tech companies such as Uber and Alpha-            form.
bet rely heavily these services, with some paying millions of
dollars monthly (Synced 2019).                                     Problems with Participation Although global inclusion
   At the same time, data labeling is a time-consuming,            in the data pipeline can be beneficial, it is no panacea for
repetitive process. Its importance in machine-learning re-         global inequality in AI development, and in fact, can even
search and development has led to the crowdsourcing of             be detrimental if not approached with care. The develop-
this work, whereby anonymous individuals are remunerated           ment of AI is highly concentrated in countries in the Global
for completing this work. A major venue for crowdsourcing          North for a variety of reasons, such as an abundance of cap-
work is Amazon Mechanical Turk; according to Difallah,             ital, well-funded research institutions, and technical infras-
Filatova, and Ipeirotis (2018), less than 2% of Mechanical         tructure. The existence of these advantageous conditions is
Turk workers come from the Global South (a vast majority           inextricable from the history of colonial exploitation of the
come from the USA and India). Other notable companies              Global South, whereby European states plundered labour
in this domain, Samasource, Scale AI, and Mighty AI also           and capital for the benefit of the metropoles, to the detriment
operate in the United States, but they crowdsource work-           of the colonized (Frank 1967; Rodney 1972). A key justifi-
ers from around the world, primarily relying on low-wage           cation for this exploitation was white supremacy: the colo-
workers from sub-Saharan Africa and Southeast Asia (Mur-           nized, as “uncivilized”, were most fit to perform physically
gia 2019). This leads to a significant disparity between the       excruciating labour, at wages lower than those paid to Eu-
millions in profits earned by data labeling companies and          ropeans. As such, colonized peoples were for the most part
worker earnings; for example, workers at Samasource earn           prevented from engaging in the more lucrative businesses
around $8 USD a day (Lee 2018) while the company made              of insurance, banking, industry, and trading (Rodney 1972).
$19 million in 2019 (sam 2021). While Lee (2018) notes             Although the labour and natural capital of colonized nations
that $8 USD may well be a living wage in certain areas,            were indispensable to European economic projects, Euro-
the massive profit disparity remains despite the importance        pean institutions and individuals captured the vast majority
of these workers to the core businesses of these companies.        this wealth.
Additionally, many of these workers are contributing to AI            It is instructive to view inclusion in the data pipeline as a
systems that are likely to be biased against underrepresented      continuation of this exploitative history. With respect to data
populations in the locales they are deployed in (Buolamwini        collection, current practices can neglect consent and poorly
and Gebru 2018; Obermeyer et al. 2019) and may not be              represent areas of the Global South. Image datasets are often
directly benefiting their local communities. While data la-        collected without consent from the people involved, even in
beling is not as physically intensive as traditional factory la-   pornographic contexts (Prabhu and Birhane 2020; Paullada
bor, workers report the pace and volume of their tasks as          et al. 2020), while others (e.g., companies, end-users) benefit
”mentally exhausting” and ”monotonous” due to the strict           from their use. Jo and Gebru (2020) suggests drawing from
requirements needed for labeling images, videos, and audio         the long tradition or archives when collecting data because
this is a discipline that has already been thinking about chal-   low-wage structure to the time spent on activities that were
lenges like consent and privacy. Indeed, beyond a possible        not compensated, such as finding tasks or working on tasks
honorarium for participation in the data collection process,      that are ultimately rejected. This leads into another major
no large-scale, successful schema currently exists for com-       problem of the power dynamics on a platform like Ama-
pensating users for the initial and continued use of their data   zon Mechanical Turk, where all of the power is given to
in machine-learning systems, although some efforts are cur-       the requester of the task. Requesters have the power to set
rently underway (Kelly 2020). However, the issue of com-          any price they want (as low as $.01), reject the completed
pensation elides the question of whether such large-scale         work of a worker, and misleadingly claim their task will take
data collection should occur in the first place. Indeed, the      a length of time much shorter than what it would actually
process of data collection can contribute to an “othering” of     take (Semuels 2018). In the US, workers in this business are
the subject and cement inaccurate or harmful beliefs. Even if     considered independent contractors rather than employees,
data come from somewhere in the Global South, they are of-        so protections guaranteed by the Fair Labor Standards Act
ten from the perspective of an outsider (Wang, Narayanan,         do not apply. A same lack of protections can be seen for
and Russakovsky 2020). That the outsider may not under-           data labelers in the Global South (Kaye 2019). This power
stand the context or may have an agenda counter to the in-        imbalance emphasizes the need for labor protection.
terest of the subject is reflected in the data captured, as has
been extensively studied in the case of photography (Ranger                             Research Labs
2001; Batziou 2011; Thompson 2016). Ignorance of context
can cause harm, as Sambasivan et al. (2020) discusses in          Establishing research labs has been essential for major tech
the case of fair ML in India, where distortions in the data       companies to advance the development of their respective
(e.g., a given sample corresponds to multiple individuals be-     technologies while providing valuable contributions to the
cause of shared device usage) distort the meaning of fair-        field of computer science (Nature 1915). In the United
ness definitions that were formulated in Western contexts.        States, General Electric (GE) Research Laboratory is widely
Furthermore, the history of phrenology reveals the role that      accepted as the first industrial research lab, providing early
the measurement and classification of colonial subjects had       technological achievements to GE and establishing them as
in justifying domination (Bank 1996; Poskett 2013). Denton        a leader in industrial innovation (Center 2011). As the as-
et al. (2020) points out the need to interrogate more deeply      cendance of artificial intelligence becomes more important
the norms and values behind the creation of datasets, as they     to the bottom lines of many large tech companies, indus-
are often extractive processes that benefit only the dataset      trial research labs have spun out that solely focus on artifi-
collector and users.                                              cial intelligence and its respective applications. Companies
                                                                  from Google to Amazon to Snapchat have doubled down
   As another significant part of the data collection pipeline,
                                                                  in this field and opened up labs leveraging artificial intel-
data labeling is an extremely low-paying job involving rote,
                                                                  ligence for web search, language processing, video recog-
repetitive tasks that offer no room for upward mobility. Indi-
                                                                  nition, voice applications, and much more. As AI becomes
viduals may not require many technical skills to label data,
                                                                  increasingly integrated into the livelihoods of consumers
but they do not develop any meaningful technical skills ei-
                                                                  around the world, tech companies have recognized the im-
ther. The anonymity of platforms like Amazon’s Mechan-
                                                                  portance of democratizing AI development and moving it
ical Turk inhibit the formation of social relationships be-
                                                                  outside the bounds of the Global North. Of five notable
tween the labeler and the client that could otherwise have
                                                                  tech companies developing AI solutions (Google, Microsoft,
led to further educational opportunities or better remunera-
                                                                  IBM, Facebook, and Amazon), Google, Microsoft, and IBM
tion. Although data is central to the AI systems of today, data
                                                                  have research labs in the Global South and all have either de-
labelers receive only a disproportionately tiny portion of the
                                                                  velopment centers, customer support centers, or data centers
profits of building these systems. In parallel with colonial
                                                                  within these regions. Despite their presence throughout the
projects of resource extraction, data labeling as extraction
                                                                  Global South, AI research centers tend to be concentrated
of meaning from data is no way out of a cycle of colonial
                                                                  in certain countries. Within Southeast Asia, the representa-
dependence.
                                                                  tion of lab locations is limited to India; in South America,
   The people doing the work of data labeling have been           representation is limited to Brazil. In sub-Saharan Africa we
termed ”ghost-workers” (Gray and Suri 2019). The labour           find a bit more spread in location with AI labs established
of these unseen workers generates massive profits that oth-       in Accra, Ghana; Nairobi, Kenya; and Johannesburg, South
ers capture. While our following discussion provides US           Africa.
statistics because those are the ones most readily available,
it is easy to imagine similar or worse labour situations in       Barriers to Participation For a company to choose to es-
the Global South. ImageNet (Deng et al. 2009; Russakovsky         tablish an AI research center, the company must believe this
et al. 2015)–a benchmark dataset essential to recent progress     initiative to be in its financial interest. Unfortunately, several
in computer vision–would have not been possible without           barriers exist. The necessity of generating reliable returns for
the work of data labelers (Gershgorn 2017). However, the          shareholders precludes ventures that appear too risky, espe-
workers themselves made only around a median of $2/hour           cially for smaller companies. The perception of risk can take
USD, with only 4% making more than the US federal min-            a variety of forms and possibly be influenced by stereotypes
imum wage of $7.25/hour (Hara et al. 2018), itself a far          to differing extents. Two such factors are political/economic
cry from a living wage. The study attributed much of this         instability or a relatively lower proportion of tertiary for-
mal education in the local population, which can be traced       sented regions, but hire employees and include voices from
to the history of colonial exploitation and underdevelop-        those regions in a proportionate manner.
ment (Rodney 1972; Jarosz 2003; Bruhn and Gallego 2012),            The CSET report also notes that AI labs form abroad
whereby European colonial powers extracted labour, natural       generally in one of three ways: through the acquisition of
resources, and economic surplus from colonies, while at the      startups; by establishing partnerships with local universi-
same time subordinating their economic development to that       ties or institutions; and by relocating internal staff or hiring
of the metropoles. It is hard to imagine the establishment of    new staff in these locations (Heston and Zwetsloot 2020).
a top-tier research university — with the attendant technical    The first two of these methods may favor locations with an
training afforded to the local populace — in regions repeat-     already-established technological or AI presence, as many
edly denuded of wealth.                                          AI startups are founded in locations where a financial and
                                                                 technological support system exists for them. Similarly, the
Problems with Participation While the opening of data            universities with whom tech companies choose to partner
centers and AI research labs in the Global South appears         are often already leaders in the space, as evidenced by Face-
beneficial for the local workforce, these positions may re-      book’s partnership with Carnegie Mellon professors and
quire technical expertise which the local population might       MIT’s partnerships with both IBM and Microsoft. The gen-
not have. This would instead introduce opportunities for dis-    eral strategy of partnering with existing institutions and of
placement by those from the Global North who have had            acquiring startups has the potential to reinforce existing in-
more access to specialized training needed to develop, main-     equities by investing in locations with already thriving tech
tain, and deploy AI systems. Given the unequal distribution      ecosystems. One notable exception to this is Google’s in-
of AI development globally, it is common for AI researchers      vestment into infrastructure, skills training, and startups in
and practitioners to work and study in places outside of         Ghana (Asemota 2018). Long-term investment and planning
their home countries (i.e., outside of the Global South). For    in the Global South can form the stepping stones for broad-
example, the current director of Google AI Accra, origi-         ening AI to include underrepresented and marginalized com-
nally from Senegal, was recruited to Google from Facebook        munities.
AI Research in Menlo Park, CA (Adekanmbi 2018; Ase-                 Even with long-term investment into regions in the Global
mota 2018). The director for Microsoft’s new lab in Nairobi,     South, the question remains of whether local residents are
Kenya was recruited from Microsoft Research India; before        provided opportunities to join management and contribute
that, she was a research scientist at Xerox in France (O’Neill   to important strategic decisions. Several organizations have
2020; Research 2020). While the directors of many research       emphasized the need for AI development within a country
labs established in the Global South have experience work-       to happen at the grassroots level, so that those implement-
ing in related contexts, we find that local representation is    ing AI as a solution understand the context of the problem
sorely lacking at both the leadership and general workforce      being solved (Mbayo 2020; Gul 2019). The necessity of in-
level. Grassroots AI education and training initiatives by       digenous decision-making is just as important in negotiat-
communities such as Deep Learning Indaba, Data Science           ing the values that AI technologies are to instantiate, such
Africa, and Khipu AI in Latin America aim to increase lo-        as through AI ethics declarations that are at the moment
cal AI talent, but since these initiatives are less than five    heavily Western-based (Jobin, Ienca, and Vayena 2019). Al-
years old, it is hard to measure their current impact on im-     though this is critical not only to the success of individual
proving the pipeline of AI researchers and machine learn-        AI solutions but also to equitable participation within the
ing engineers. However, with the progress made by these          field at large, more can and should be done. True inclusion
organizations publishing novel research at premier AI con-       necessitates that underrepresented voices can be found in all
ferences, hosting conferences of their own, and much more,       ranks of a company’s hierarchy, including in positions of up-
the path to inclusive representation in the global AI work-      per management. Tech companies which are establishing a
force is strengthening.                                          footprint in these regions are uniquely positioned to offer
   Although several tech companies have established re-          this opportunity to natives of the region. Taking advantage
search facilities across the world and in the Global South,      of this ability will be critical to ensuring that the benefits
these efforts remain insufficient at addressing long-term        of AI apply not only to technical problems that arise in the
problems in the AI ecosystem. A recent report from George-       Global South, but to socioeconomic inequalities which per-
town University’s Center for Security and Emerging Tech-         sist around the world.
nologies (CSET) describes the establishment of AI labs by
US companies, namely Facebook, Google, IBM, and Mi-                                    Opportunities
crosoft, abroad (Heston and Zwetsloot 2020). The report          In the face of global inequality in AI development, there are
notes that while 68% of the 62 AI labs are located outside       a few promising opportunities.
of the United States, 68% of the staff are located within
the United States. Therefore, the international offices re-      Affinity Groups While AI and technology in general has
main half as populated on average relative to the domestic       long excluded marginalized populations, the emergence of
locations. Additionally, none of these offices are located in    grassroots efforts by organizations to ensure that indige-
South America and only four are in Africa. To advance eq-        nous communities are actively involved as stakeholders of
uity within AI and improve inclusion efforts, it is imperative   AI has recently been strong. Black in AI, a nonprofit organi-
that companies not only establish locations in underrepre-       zation with worldwide membership, was founded to increase
the global representation of Black-identifying students, re-     agement and in the process of strategic decision-making.
searchers, and practitioners in the field of AI, and has made    The advancement of an equitable AI workforce and ecosys-
significant improvements in increasing the number of Black       tem requires that those in positions of data collection and
scholars attending and publishing in NeurIPS and other pre-      training be afforded opportunities to lead their organizations.
mier AI conferences (Earl 2020; Silva 2021). Inclusion in AI     Including these voices in positions of power has the added
is extremely sparse in higher education and recent efforts by    benefit of ensuring the future hiring and promotion of local
Black in AI have focused on instituting programming to sup-      community members.
port members in graduate programs and in their postgradu-
ate careers. Other efforts such as Khipu AI, based in Latin      AI as Development The massive inequalities in the devel-
America, have been established to provide a venue to train       opment of AI can appear daunting. Will it ever be possible to
aspiring AI researchers in advanced machine learning top-        close the gap? Similar concerns arise in the broader study of
ics, foster collaborations, and actively participate in how AI   economic development, from which one can draw lessons.
is being used to benefit Latin America. Other communities           Despite the large developmental gap between the Global
based on the African continent such as Data Science Africa       North and the Global South, the latter part of the 20th cen-
and Deep Learning Indaba have expanded their efforts, es-        tury saw some countries bridge it. For example, while the
tablishing conferences, workshops, and dissertation awards,      GDP per capita of South Korea was far lower than that of
and developing curricula for the broader African AI commu-       the USA in the 1960s, by 2000 the gap had considerably
nity. These communities are clear about their respective mis-    narrowed, especially in comparison to world GDP per capita
sions and the focus of collaboration. Notably, Masakhane, a      over the same time period. 1 Much work (Chang 2009; Lin
grassroots organization focusing on improving the represen-      2011; Aryeetey and Moyo 2012; Mendes, Bertella, and Teix-
tation of African languages in the field of natural language     eira 2014) has linked the relative economic success of South
processing shares the sentiment expressed in this paper on       Korea to the policy of import substitution industrialization
how AI research should be approached:                            (ISI), whereby a country attempts to replace foreign im-
                                                                 ports with domestic production in an attempt to build high-
   Masakhane are not just annotators or translators. We          productivity industries (e.g., electronics), rather than rely
   are researchers. We can likely connect you with anno-         on exports of low-productivity industries (e.g., agriculture).
   tators or translators but we do not support shallow en-       The idea is that once the so-called “infant industries” have
   gagement of Africans as only data generators or con-          developed enough, they will be able to compete in interna-
   sumers (Masakhane 2021).                                      tional markets without government support. The execution
   As these initiatives grow across the Global South, we         of ISI involves protectionist trade policies, subsidies for tar-
hope large organizations and technology companies partner        geted industries, and sufficient investment in education and
with and adopt the values of these respective initiatives to     infrastructure. While ISI can be incredibly successful, as in
ensure AI developments are truly representative of the global    the cases of Samsung and POSCO from South Korea (Chang
populace.                                                        2009), its execution relies on sufficient agricultural input and
                                                                 human capital, careful management of foreign reserves, and
Research Participation One key component of AI inclu-            state capacity for coordination with private partners (Ary-
sion efforts should be to elevate the involvement and par-       eetey and Moyo 2012; Mendes, Bertella, and Teixeira 2014).
ticipation of those historically excluded from technological     In the absence of these factors, ISI can fail and the country
development. Many startups and several governments across        can even go through de-industrialization.
the Global South are creating opportunities for local com-          We suggest viewing AI development as a path forward
munities to participate in the development and implemen-         for economic development, in light of the lessons learned
tation of AI programs (Mbayo 2020; Gul 2019; Galperin            from ISI policies. Rather than rely upon foreign construc-
and Alarcon 2018). In situations where the central involve-      tion of AI systems for domestic application, where any re-
ment has been data labeling, strides should be taken to add      turns from these systems are not reinvested domestically,
model development roles to the opportunity catalog there.        we encourage the formation of domestic AI development
Currently, data labelers are often wholly detached from the      activity. This development activity should not be focused
rest of the ML pipeline, with workers oftentimes not know-       on low-productivity activities, such as data-labeling, but in-
ing how their labor will be used nor for what purpose (Gra-      stead on high-productivity activities like model develop-
ham 2018). Little sense of fulfillment comes from menial         ment/deployment and research. An AI-focused ISI policy
tasks, and by exploiting these workers solely for their pro-     could include state-led investments into AI-related educa-
duced knowledge without bringing them into the fold of the       tion and infrastructure, funding for private bodies to engage
product that they are helping to create, a deep chasm ex-        in domestic AI development, and limitations on the extent to
ists between workers and the downstream product (Rogsta-         which foreign companies may be involved in or profit from
dius et al. 2011). Thus, in addition to policy that improves     domestic AI activities. While it remains essential, as it was
work conditions and wages for data labelers, workers should      in historical ISI policies, to work with and assimilate tech-
be provided with education opportunities that allow them to      nology and expertise from foreign companies, it is impera-
contribute to the models they are building in ways beyond la-
beling (Gray and Suri 2019). Similarly, where participation         1
                                                                     https://ourworldindata.org/grapher/average-real-gdp-per-
in the form of model development is the norm, employers          capita-across-countries-and-regions?time=1869..2016&country=
should seek to involve local residents in the ranks of man-      KOR∼USA∼OWID WRL
tive that domestic expertise be developed in tandem to shape       African Studies 22(3): 387–403. ISSN 0305-7070. URL
the future of AI development and reap its large profits.           http://www.jstor.org/stable/2637310. Publisher: [Taylor &
   This is by no means an easy task, and an AI-focused ISI         Francis, Ltd., Journal of Southern African Studies].
policy encounters many of the same difficulties as histori-        Batziou, A. 2011. Framing ‘otherness’ in press pho-
cal ISI policies, such as the necessity of bringing in exper-      tographs: The case of immigrants in Greece and Spain.
tise and technology, and in ensuring that sufficient education     Journal of Media Practice 12(1): 41–60. ISSN 1468-
and infrastructure (e.g., internet access) exist. It will likely   2753.      doi:10.1386/jmpr.12.1.41 1.  URL https://doi.
encounter many new difficulties that are unique to AI de-          org/10.1386/jmpr.12.1.41 1. Publisher: Routledge eprint:
velopment as well. Even in the absence of centralized state        https://doi.org/10.1386/jmpr.12.1.41 1.
coordination, however, recent initiatives like Deep Learning
Indaba and Khipu have promoted the importance of indige-           Beede, E.; Baylor, E.; Hersch, F.; Iurchenko, A.; Wilcox,
nous AI development and have advanced education in AI.             L.; Ruamviboonsuk, P.; and Vardoulakis, L. M. 2020. A
                                                                   Human-Centered Evaluation of a Deep Learning System
                        Conclusion                                 Deployed in Clinics for the Detection of Diabetic Retinopa-
As the development of artificial intelligence continues to         thy. In Proceedings of the 2020 CHI Conference on Human
progress across the world, the exclusion of those from com-        Factors in Computing Systems, CHI ’20, 1–12. New York,
munities most likely to bear the brunt of algorithmic inequity     NY, USA: Association for Computing Machinery. ISBN
only stands to worsen. We address this question by explor-         978-1-4503-6708-0. doi:10.1145/3313831.3376718. URL
ing the challenges and benefits of increasing broader inclu-       http://doi.org/10.1145/3313831.3376718.
sion in the field of AI. We examine the limits of current AI       Bruhn, M.; and Gallego, F. A. 2012. Good, Bad, and
inclusion methods, problems of participation regarding AI          Ugly Colonial Activities: Do They Matter for Economic
labs situated in the Global South from major tech compa-           Development? The Review of Economics and Statistics
nies, and discuss opportunities for AI to accelerate develop-      94(2): 433–461. URL https://ideas.repec.org/a/tpr/restat/
ment within disadvantaged regions.                                 v94y2012i2p433-461.html. Publisher: MIT Press.
   We hope the actions we propose can help to begin the            Buolamwini, J.; and Gebru, T. 2018. Gender shades: Inter-
movement of communities in the Global South from being             sectional accuracy disparities in commercial gender classifi-
just beneficiaries or subjects of AI systems to being active,      cation. In Conference on fairness, accountability and trans-
engaged participants. Having true agency over the AI sys-          parency, 77–91.
tems integrated into the livelihoods of communities in the
Global South will maximize the impact of these systems and         Center, E. T. 2011. General Electric Research Lab. URL
lead the way for global inclusion of AI.                           https://edisontechcenter.org/GEresearchLab.html.
   As a limitation of our work, it is important to acknowl-        Chang, H.-J. 2009. Bad Samaritans: The Myth of Free
edge we are currently all located at, and have been educated       Trade and the Secret History of Capitalism. New York, NY:
at, North American institutions. Our positions in these insti-     Bloomsbury Press. ISBN 978-1-59691-598-5.
tutions thus limit our perspective, and we respect the con-
siderations we may have missed and the voices we have not          Chuvpilo, G. 2020.         AI Research Rankings 2020:
heard in the course of writing this work.                          Can the United States Stay Ahead of China?          URL
                                                                   https://chuvpilo.medium.com/ai-research-rankings-2020-
                        References                                 can-the-united-states-stay-ahead-of-china-61cf14b1216.
2021.     Samasource.      URL https://www.causeiq.com/            Coalition for Critical Technology. 2020.        Abolish
organizations/samasource,262547062/.                               the #TechToPrisonPipeline.       URL https://medium.
Adekanmbi, B. 2018.            10 inspiring Facts about            com/@CoalitionForCriticalTechnology/abolish-the-
Moustapha Cisse, Google AI Ghana Pioneer Lead.                     techtoprisonpipeline-9b5b14366b16.
URL        https://www.datasciencenigeria.org/10-inspiring-        Cognilytica. 2019. Data Engineering, Preparation, and
facts-moustapha-cisse-google-ai-ghana-pioneer-lead/.               Labeling for AI. URL https://www.cognilytica.com/2019/
Amazon. 2020. FAQs. https://www.mturk.com/worker/help.             03/06/report-data-engineering-preparation-and-labeling-
                                                                   for-ai-2019/.
Aryeetey, E.; and Moyo, N. 2012. Industrialisation for
Structural Transformation in Africa: Appropriate Roles for         Croce, N.; and Musa, M. 2019. The new assembly lines:
the State. Journal of African Economies 21(suppl 2): ii85.         Why AI needs low-skilled workers too. URL https://www.
URL https://econpapers.repec.org/article/oupjafrec/v 3a21          weforum.org/agenda/2019/08/ai-low-skilled-workers/.
3ay 3a2012 3ai 3asuppl 5f2 3ap 3a-ii85.htm. Publisher:             De La Garza, A. 2020.      States’ Automated Systems
Centre for the Study of African Economies (CSAE).                  Are Trapping Citizens in Bureaucratic Nightmares With
Asemota, V. 2018. ’Ghana is the future of Africa’: Why             Their Lives on the Line. URL https://time.com/5840609/
Google built an AI lab in Accra. URL https://edition.cnn.          algorithm-unemployment/.
com/2018/07/14/africa/google-ghana-ai/.                            Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei-
Bank, A. 1996. Of ’Native Skulls’ and ’Noble Caucasians’:          Fei, L. 2009. ImageNet: A Large-Scale Hierarchical Image
Phrenology in Colonial South Africa. Journal of Southern           Database. In CVPR.
Denton, E.; Hanna, A.; Amironesei, R.; Smart, A.; Nicole,        Heston, R.; and Zwetsloot, R. 2020.        Mapping U.S.
H.; and Scheuerman, M. K. 2020. Bringing the Peo-                Multinationals’ Global AI Ramp;D Activity.         URL
ple Back In: Contesting Benchmark Machine Learning               https://cset.georgetown.edu/research/mapping-u-s-
Datasets. ICML Workshop on Participatory Approaches to           multinationals-global-ai-rd-activity/.
Machine Learning .                                               Jarosz, L. 2003. A Human Geographer’s Response to Guns,
DeVries, T.; Misra, I.; Wang, C.; and van der Maaten, L.         Germs, and Steel: The Case of Agrarian Development
2019. Does Object Recognition Work for Everyone? Com-            and Change in Madagascar. Antipode 35(4): 823–828.
puter Vision and Pattern Recognition Workshop (CVPRW)            ISSN 1467-8330.           doi:https://doi.org/10.1046/j.1467-
.                                                                8330.2003.00356.x. URL http://onlinelibrary.wiley.com/
                                                                 doi/abs/10.1046/j.1467-8330.2003.00356.x.              eprint:
Difallah, D.; Filatova, E.; and Ipeirotis, P. 2018. Demo-
                                                                 https://onlinelibrary.wiley.com/doi/pdf/10.1046/j.1467-
graphics and Dynamics of Mechanical Turk Workers. Pro-
                                                                 8330.2003.00356.x.
ceedings of WSDM: The Eleventh ACM International Con-
ference on Web Search and Data Mining .                          Jo, E. S.; and Gebru, T. 2020. Lessons from Archives: Strate-
                                                                 gies for Collecting Sociocultural Data in Machine Learning.
Earl, C. C. 2020. Notes from the Black In AI 2019 Work-          ACM Conference on Fairness, Accountability, Transparency
shop. URL https://charlesearl.blog/2020/01/08/notes-from-        (FAccT) .
the-black-in-ai-2019-workshop/.
                                                                 Jobin, A.; Ienca, M.; and Vayena, E. 2019. The global land-
Frank, A. G. 1967. Capitalism and underdevelopment in            scape of AI ethics guidelines. Nature Machine Intelligence
Latin America : historical studies of Chile and Brazil. New      1(9): 389–399. ISSN 2522-5839. doi:10.1038/s42256-019-
York: Monthly Review Press.                                      0088-2. URL http://www.nature.com/articles/s42256-019-
Galperin, H.; and Alarcon, A. 2018. The Future of Work in        0088-2.
the Global South.                                                Kaye, K. 2019. These companies claim to provide “fair-
Gent, E. 2019.     The ‘ghost work’ powering tech                trade” data work. Do they? Technology Review URL
magic.     URL https://www.causeiq.com/organizations/            https://www.technologyreview.com/2019/08/07/133845/
samasource,262547062/.                                           cloudfactory-ddd-samasource-imerit-impact-sourcing-
                                                                 companies-for-data-annotation/.
Gershgorn, D. 2017.          The data that transformed
AI research—and possibly the world.               Quartz         Kelly, M. 2020. Andrew Yang is pushing Big Tech to pay
https://qz.com/1034972/the-data-that-changed-the-                users for data. URL https://www.theverge.com/2020/6/22/
direction-of-ai-research-and-possibly-the-world/.                21298919/andrew-yang-big-tech-data-dividend-project-
                                                                 facebook-google-ubi.
Graham, M. 2018. The rise of the planetary labour market –
and what it means for the future of work. NS Tech .              Krasin, I.; Duerig, T.; Alldrin, N.; Ferrari, V.; Abu-El-Haija,
                                                                 S.; Kuznetsova, A.; Rom, H.; Uijlings, J.; Popov, S.; Veit,
Grand View Research. 2020. Data Collection Label-                A.; Belongie, S.; Gomes, V.; Gupta, A.; Sun, C.; Chechik,
ing Market Size Worth 6.5 Billion By 2027.          URL          G.; Cai, D.; Feng, Z.; Narayanan, D.; and Murphy, K. 2017.
https://www.grandviewresearch.com/press-release/global-          OpenImages: A public dataset for large-scale multi-label
data-collection-labeling-market.                                 and multi-class image classification. Dataset available from
Gray, M. L.; and Suri, S. 2019. Ghost Work: How to Stop          https://github.com/openimages .
Silicon Valley from Building a New Global Underclass .           Lee, D. 2018. Why Big Tech pays poor Kenyans to teach
Grush, L. 2015.        Google engineer apologizes af-            self-driving cars. BBC News .
ter Photos app tags two black people as gorillas.                Lee, M. K.; Kusbit, D.; Kahng, A.; Kim, J. T.; Yuan, X.;
URL https://www.theverge.com/2015/7/1/8880363/google-            Chan, A.; See, D.; Noothigattu, R.; Lee, S.; Psomas, A.; and
apologizes-photos-app-tags-two-black-people-gorillas.            Procaccia, A. D. 2019. WeBuildAI: Participatory Frame-
Gul, E. 2019.       Is Artificial Intelligence the frontier      work for Algorithmic Governance. Proceedings of the
solution to Global South’s wicked development chal-              ACM on Human-Computer Interaction 3(CSCW): 181:1–
lenges? URL https://towardsdatascience.com/is-artificial-        181:35. doi:10.1145/3359283. URL http://doi.org/10.1145/
intelligence-the-frontier-solution-to-global-souths-wicked-      3359283.
development-challenges-4206221a3c78.                             Lin, J. Y. 2011. From Flying Geese to Leading Dragons :
Hara, K.; Adams, A.; Milland, K.; Savage, S.; Callison-          New Opportunities and Strategies for Structural Transfor-
Burch, C.; and Bigham, J. 2018. A Data-Driven Analysis           mation in Developing Countries. Technical Report WPS
of Workers’ Earnings on Amazon Mechanical Turk. ACM              5702, World Bank.
Conference on Human Factors in Computing Systems (CHI)           Masakhane. 2021. Masakhane: A grassroots NLP commu-
.                                                                nity for Africa, by Africans. URL https://www.masakhane.
Harris, C.; Straker, L.; and Pollock, C. 2017. A socioeco-       io/.
nomic related’digital divide’exists in how, not if, young peo-   Mbayo, H. 2020.          Data and Power: AI and De-
ple use computers. PloS one 12(3): e0175011.                     velopment in the        Global South.    URL https:
//www.oxfordinsights.com/insights/2020/10/2/data-and-             Sambasivan, N.; Arnesen, E.; Hutchinson, B.; and Prab-
power-ai-and-development-in-the-global-south.                     hakaran, V. 2020. Non-portability of Algorithmic Fairness
Mendes, A. P. F.; Bertella, M. A.; and Teixeira, R. F.            in India. arXiv:2012.03659 [cs] URL http://arxiv.org/abs/
A. P. 2014.      Industrialization in Sub-Saharan Africa          2012.03659. ArXiv: 2012.03659.
and import substitution policy.        Revista de Econo-          Semuels, A. 2018.         The Internet Is Enabling a
mia Polı́tica 34(1): 120–138.           ISSN 0101-3157.           New Kind of Poorly Paid Hell.         The Atlantic https:
doi:10.1590/S0101-31572014000100008.           URL http:          //www.theatlantic.com/business/archive/2018/01/amazon-
//www.scielo.br/scielo.php?script=sci arttext&pid=S0101-          mechanical-turk/551192/.
31572014000100008&lng=en&tlng=en.                                 Shankar, S.; Halpern, Y.; Breck, E.; Atwood, J.; Wilson, J.;
Murgia, M. 2019. AI’s new workforce: the data-labelling           and Sculley, D. 2017. No Classification without Representa-
industry spreads globally. Financial Times .                      tion: Assessing Geodiversity Issues in Open DataSets for the
                                                                  Developing World. NeurIPS workshop: Machine Learning
Nature. 1915. Industrial Research Laboratories. URL https:        for the Developing World .
//doi.org/10.1038/096419a0.
                                                                  Silva, M. 2021. URL https://blackinai.github.io/#/about.
Nzekwe, H. 2019. Africans Are Paying More For In-
ternet Than Any Other Part Of The World – Here’s                  Sloane, M.; Moss, E.; Awomolo, O.; and Forlano, L.
Why. URL https://weetracker.com/2019/10/22/africans-              2020. Participation is not a Design Fix for Machine Learn-
pay-more-for-internet-than-other-regions/.                        ing. arXiv:2007.02423 [cs] URL http://arxiv.org/abs/2007.
                                                                  02423. ArXiv: 2007.02423.
Obermeyer, Z.; Powers, B.; Vogeli, C.; and Mullainathan, S.
2019. Dissecting racial bias in an algorithm used to manage       Synced. 2019.       Data Annotation: The Billion Dol-
the health of populations. Science 366(6464): 447–453.            lar Business Behind AI Breakthroughs.        URL https:
                                                                  //medium.com/syncedreview/data-annotation-the-billion-
O’Neill, J. 2020. Jacki O’Neill — LinkedIn. URL https:            dollar-business-behind-ai-breakthroughs-d929b0a50d23.
//www.linkedin.com/in/jacki-o-neill-5605534/.
                                                                  Thompson, A. 2016. Otherness and the Fetishization of
Paullada, A.; Raji, I. D.; Bender, E. M.; Denton, E.; and         Subject. URL https://petapixel.com/2016/11/16/otherness-
Hanna, A. 2020. Data and its (dis)contents: A survey of           fetishization-subject/.
dataset development and use in machine learning research.         Wang, A.; Narayanan, A.; and Russakovsky, O. 2020. RE-
arXiv:2012.05345 .                                                VISE: A Tool for Measuring and Mitigating Bias in Vi-
Poskett, J. 2013. Django Unchained and the racist science         sual Datasets. European Conference on Computer Vision
of phrenology | James Poskett. The Guardian ISSN 0261-            (ECCV) .
3077.     URL https://www.theguardian.com/science/blog/
2013/feb/05/django-unchained-racist-science-phrenology.
Prabhu, V. U.; and Birhane, A. 2020. Large image datasets:
A pyrrhic win for computer vision? arXiv:2006.16923 .
Ranger, T. 2001. Colonialism, Consciousness and the Cam-
era. Past & Present (171): 203–215. ISSN 0031-2746.
URL http://www.jstor.org/stable/3600818. Publisher: [Ox-
ford University Press, The Past and Present Society].
Research, M. 2020.   Jacki O’Neill at Microsoft Re-
search. URL https://www.microsoft.com/en-us/research/
people/jaoneil/.
Rodney, W. 1972. How Europe underdeveloped Africa.
London :: Bogle L’Ouverture Publications. ISBN 978-0-
9501546-4-0.
Rogstadius, J.; Kostakos, V.; Kittur, A.; Smus, B.; Laredo, J.;
and Vukovic, M. 2011. An Assessment of Intrinsic and Ex-
trinsic Motivation on Task Performance in Crowdsourcing
Markets. Proceedings of the Fifth International Conference
on Weblogs and Social Media .
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.;
Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.;
Berg, A. C.; and Fei-Fei, L. 2015. ImageNet Large Scale Vi-
sual Recognition Challenge. International Journal of Com-
puter Vision (IJCV) 115(3): 211–252. doi:10.1007/s11263-
015-0816-y.