The Limits of Global Inclusion in AI Development Alan Chan * 1 , Chinasa T. Okolo ∗ 2 , Zachary Terner ∗ 3 , Angelina Wang ∗ 4 1 Mila, Université de Montréal 2 Cornell University 3 National Institute of Statistical Sciences 4 Princeton University alan.chan@mila.quebec, chinasa@cs.cornell.edu, zterner@niss.org, angelina.wang@princeton.edu Abstract countries in the Global South, given the exploitation and un- derdevelopment of these regions by European colonial pow- Those best-positioned to profit from the proliferation of arti- ers (Frank 1967; Rodney 1972; Jarosz 2003; Bruhn and Gal- ficial intelligence (AI) systems are those with the most eco- nomic power. Extant global inequality has motivated Western lego 2012). institutions to involve more diverse groups in the develop- Current global inequality in AI development involves both ment and application of AI systems, including hiring foreign a concentration of profits and a danger of ignoring the con- labour and establishing extra-national data centres and labo- texts to which AI is applied. As AI systems become increas- ratories. However, given both the propensity of wealth to abet ingly integrated into society, those responsible for develop- its own accumulation and the lack of contextual knowledge in ing and implementing such systems stand to profit to a large top-down AI solutions, we argue that more focus should be extent. If these players are predominantly located outside of placed on the redistribution of power, rather than just on in- the Global South, a disproportionate share of economic ben- cluding underrepresented groups. Unless more is done to en- efit will fall also outside of this region, exacerbating extant sure that opportunities to lead AI development are distributed inequality. Furthermore, the ethical application of AI sys- justly, the future may hold only AI systems which are un- suited to their conditions of application, and exacerbate in- tems requires knowledge of the contexts in which they are to equality. be applied. As recent work (Grush 2015; De La Garza 2020; Coalition for Critical Technology 2020; Beede et al. 2020; Sambasivan et al. 2020) has highlighted, work that lacks this Introduction contextual knowledge can fail to help the targeted individu- The arm of global inequality is long, rendering itself vis- als, and can even harm them (e.g., misdiagnoses in medical ible especially in the development of artificial intelligence applications). (AI). In an analysis of publications at two major machine Whether explicitly in response to these problems or not, learning conference venues, NeurIPS 2020 and ICML 2020, calls have been made for broader inclusion in the devel- Chuvpilo (2020) found that of the top 10 countries in terms opment of AI (Asemota 2018; Lee et al. 2019). At the of publication index, none were located in Latin Amer- same time, some have acknowledged the limitations of in- ica, Africa, or Southeast Asia. Vietnam, the highest placing clusion. Sloane et al. (2020) describes and argues against country of these groups, comes in 27th place. Of the top participation-washing, whereby the mere fact that somebody 10 institutions by publication index, eight out of 10 were has participated in a project lends it moral legitimacy. In this based in the United States, including American tech giants work, we focus upon the implications of participation for like Google, Microsoft, and Facebook. Indeed, the full lists global inequality, focusing particularly on the limitations in of the top 100 universities and top 100 companies by pub- which inclusion in AI development is practised in the Global lication index include no companies or universities based in South. We look specifically at how this plays out in the do- Africa or Latin America. Although conference publications mains of datasets and research labs, and conclude with a are just one metric, they remain the predominant medium in discussion of opportunities for ameliorating the power im- which progress in AI is disseminated, and as such serve to balance in AI development. be a signal of who is generating research. These statistics are unsurprising. The predominance of the Datasets United States in these rankings is consistent with its eco- Given the centrality of large amounts of data in today’s ma- nomic and cultural dominance, just as the appearance of chine learning systems, there would appear to be substantial China with the second highest index is a marker of its grow- opportunity for inclusion in data collection and labeling pro- ing might. Also comprehensible is the relative absence of cesses. While there are benefits to more diverse participation * Author order is alphabetical by last name. All authors con- in data-gathering pipelines (that is, processes involved in the tributed equally to this work. collection, labeling, and other processing of data for use in Copyright© © Copyright 2021, 2021 Association for this forauthors. paper by its the Advancement Use permittedofunder Artificial machine-learning systems), we will highlight how this ap- Creative Commons Intelligence License Attribution (www.aaai.org). 4.0 reserved. All rights International (CC BY 4.0). proach does not go far enough in addressing global inequal- ity in AI development. to client specifications (Gent 2019; Croce and Musa 2019). Data collection itself is a practice fraught with prob- In the Global South recently, local companies have begun lems of inclusion and representation. Two large, publicly to proliferate, like Fastagger in Kenya, Sebenz.ai in South available image datasets, ImageNet (Deng et al. 2009; Rus- Africa, and Supahands in Malaysia. As AI development con- sakovsky et al. 2015) and OpenImages (Krasin et al. 2017), tinues to scale, the expansion of these companies opens the are US- and Eurocentric (Shankar et al. 2017). Shankar et al. door for low-skilled laborers to enter the workforce but also (2017) further argues that models trained on these datasets presents a chance for exploitation to continue to occur. perform worse on images from the Global South. For ex- Barriers to Participation There are barriers that exist to ample, images of grooms are classified with lower accuracy participating in data labeling. The most obvious is that a when they come from Ethiopia and Pakistan, compared to computing device and stable internet access are required images of grooms from the United States. Along this vein, for access to these data labeling platforms. These goods are DeVries et al. (2019) shows that images of the same word, highly correlated with socioeconomic status and geographic like “wedding” or “spices”, look very different when queried locations, thus serving as a barrier to participation for many in different languages, as they are presented distinctly in dif- Harris, Straker, and Pollock (2017). A reliable internet con- ferent cultures. Thus, publicly available object recognition nection is necessary for finding tasks to complete, complet- systems fail to correctly classify many of these objects when ing those tasks, and accessing the remuneration for those they come from the Global South. A representative dataset is tasks. Further, those in the Global South pay higher prices crucial to allowing models to learn how certain objects and for Internet access compared to their counterparts in the concepts are represented in different cultures. Global North (i.e. Western countries) (Nzekwe 2019). An- Since many deep learning techniques require large other barrier is in the method of payment for data labeling amounts of data to train their models, the importance of data services on some of these platforms. For example, Amazon labeling has grown. The data collection and labeling mar- Mechanical Turk, a widely used platform for finding data ket is expected to grow to $6.5 billion USD by 2027 (Grand labelers, only allows payment to a U.S. Bank Account or in View Research 2020), while Cognilytica (2019) estimates the form of an Amazon.com gift card (Amazon 2020). These that over 80% of the machine learning development process methods of payment restrict may not be what is desired by consists of data preparation tasks (collection, cleaning, and a worker, and can serve as a deterrent to work for this plat- labeling). Large tech companies such as Uber and Alpha- form. bet rely heavily these services, with some paying millions of dollars monthly (Synced 2019). Problems with Participation Although global inclusion At the same time, data labeling is a time-consuming, in the data pipeline can be beneficial, it is no panacea for repetitive process. Its importance in machine-learning re- global inequality in AI development, and in fact, can even search and development has led to the crowdsourcing of be detrimental if not approached with care. The develop- this work, whereby anonymous individuals are remunerated ment of AI is highly concentrated in countries in the Global for completing this work. A major venue for crowdsourcing North for a variety of reasons, such as an abundance of cap- work is Amazon Mechanical Turk; according to Difallah, ital, well-funded research institutions, and technical infras- Filatova, and Ipeirotis (2018), less than 2% of Mechanical tructure. The existence of these advantageous conditions is Turk workers come from the Global South (a vast majority inextricable from the history of colonial exploitation of the come from the USA and India). Other notable companies Global South, whereby European states plundered labour in this domain, Samasource, Scale AI, and Mighty AI also and capital for the benefit of the metropoles, to the detriment operate in the United States, but they crowdsource work- of the colonized (Frank 1967; Rodney 1972). A key justifi- ers from around the world, primarily relying on low-wage cation for this exploitation was white supremacy: the colo- workers from sub-Saharan Africa and Southeast Asia (Mur- nized, as “uncivilized”, were most fit to perform physically gia 2019). This leads to a significant disparity between the excruciating labour, at wages lower than those paid to Eu- millions in profits earned by data labeling companies and ropeans. As such, colonized peoples were for the most part worker earnings; for example, workers at Samasource earn prevented from engaging in the more lucrative businesses around $8 USD a day (Lee 2018) while the company made of insurance, banking, industry, and trading (Rodney 1972). $19 million in 2019 (sam 2021). While Lee (2018) notes Although the labour and natural capital of colonized nations that $8 USD may well be a living wage in certain areas, were indispensable to European economic projects, Euro- the massive profit disparity remains despite the importance pean institutions and individuals captured the vast majority of these workers to the core businesses of these companies. this wealth. Additionally, many of these workers are contributing to AI It is instructive to view inclusion in the data pipeline as a systems that are likely to be biased against underrepresented continuation of this exploitative history. With respect to data populations in the locales they are deployed in (Buolamwini collection, current practices can neglect consent and poorly and Gebru 2018; Obermeyer et al. 2019) and may not be represent areas of the Global South. Image datasets are often directly benefiting their local communities. While data la- collected without consent from the people involved, even in beling is not as physically intensive as traditional factory la- pornographic contexts (Prabhu and Birhane 2020; Paullada bor, workers report the pace and volume of their tasks as et al. 2020), while others (e.g., companies, end-users) benefit ”mentally exhausting” and ”monotonous” due to the strict from their use. Jo and Gebru (2020) suggests drawing from requirements needed for labeling images, videos, and audio the long tradition or archives when collecting data because this is a discipline that has already been thinking about chal- low-wage structure to the time spent on activities that were lenges like consent and privacy. Indeed, beyond a possible not compensated, such as finding tasks or working on tasks honorarium for participation in the data collection process, that are ultimately rejected. This leads into another major no large-scale, successful schema currently exists for com- problem of the power dynamics on a platform like Ama- pensating users for the initial and continued use of their data zon Mechanical Turk, where all of the power is given to in machine-learning systems, although some efforts are cur- the requester of the task. Requesters have the power to set rently underway (Kelly 2020). However, the issue of com- any price they want (as low as $.01), reject the completed pensation elides the question of whether such large-scale work of a worker, and misleadingly claim their task will take data collection should occur in the first place. Indeed, the a length of time much shorter than what it would actually process of data collection can contribute to an “othering” of take (Semuels 2018). In the US, workers in this business are the subject and cement inaccurate or harmful beliefs. Even if considered independent contractors rather than employees, data come from somewhere in the Global South, they are of- so protections guaranteed by the Fair Labor Standards Act ten from the perspective of an outsider (Wang, Narayanan, do not apply. A same lack of protections can be seen for and Russakovsky 2020). That the outsider may not under- data labelers in the Global South (Kaye 2019). This power stand the context or may have an agenda counter to the in- imbalance emphasizes the need for labor protection. terest of the subject is reflected in the data captured, as has been extensively studied in the case of photography (Ranger Research Labs 2001; Batziou 2011; Thompson 2016). Ignorance of context can cause harm, as Sambasivan et al. (2020) discusses in Establishing research labs has been essential for major tech the case of fair ML in India, where distortions in the data companies to advance the development of their respective (e.g., a given sample corresponds to multiple individuals be- technologies while providing valuable contributions to the cause of shared device usage) distort the meaning of fair- field of computer science (Nature 1915). In the United ness definitions that were formulated in Western contexts. States, General Electric (GE) Research Laboratory is widely Furthermore, the history of phrenology reveals the role that accepted as the first industrial research lab, providing early the measurement and classification of colonial subjects had technological achievements to GE and establishing them as in justifying domination (Bank 1996; Poskett 2013). Denton a leader in industrial innovation (Center 2011). As the as- et al. (2020) points out the need to interrogate more deeply cendance of artificial intelligence becomes more important the norms and values behind the creation of datasets, as they to the bottom lines of many large tech companies, indus- are often extractive processes that benefit only the dataset trial research labs have spun out that solely focus on artifi- collector and users. cial intelligence and its respective applications. Companies from Google to Amazon to Snapchat have doubled down As another significant part of the data collection pipeline, in this field and opened up labs leveraging artificial intel- data labeling is an extremely low-paying job involving rote, ligence for web search, language processing, video recog- repetitive tasks that offer no room for upward mobility. Indi- nition, voice applications, and much more. As AI becomes viduals may not require many technical skills to label data, increasingly integrated into the livelihoods of consumers but they do not develop any meaningful technical skills ei- around the world, tech companies have recognized the im- ther. The anonymity of platforms like Amazon’s Mechan- portance of democratizing AI development and moving it ical Turk inhibit the formation of social relationships be- outside the bounds of the Global North. Of five notable tween the labeler and the client that could otherwise have tech companies developing AI solutions (Google, Microsoft, led to further educational opportunities or better remunera- IBM, Facebook, and Amazon), Google, Microsoft, and IBM tion. Although data is central to the AI systems of today, data have research labs in the Global South and all have either de- labelers receive only a disproportionately tiny portion of the velopment centers, customer support centers, or data centers profits of building these systems. In parallel with colonial within these regions. Despite their presence throughout the projects of resource extraction, data labeling as extraction Global South, AI research centers tend to be concentrated of meaning from data is no way out of a cycle of colonial in certain countries. Within Southeast Asia, the representa- dependence. tion of lab locations is limited to India; in South America, The people doing the work of data labeling have been representation is limited to Brazil. In sub-Saharan Africa we termed ”ghost-workers” (Gray and Suri 2019). The labour find a bit more spread in location with AI labs established of these unseen workers generates massive profits that oth- in Accra, Ghana; Nairobi, Kenya; and Johannesburg, South ers capture. While our following discussion provides US Africa. statistics because those are the ones most readily available, it is easy to imagine similar or worse labour situations in Barriers to Participation For a company to choose to es- the Global South. ImageNet (Deng et al. 2009; Russakovsky tablish an AI research center, the company must believe this et al. 2015)–a benchmark dataset essential to recent progress initiative to be in its financial interest. Unfortunately, several in computer vision–would have not been possible without barriers exist. The necessity of generating reliable returns for the work of data labelers (Gershgorn 2017). However, the shareholders precludes ventures that appear too risky, espe- workers themselves made only around a median of $2/hour cially for smaller companies. The perception of risk can take USD, with only 4% making more than the US federal min- a variety of forms and possibly be influenced by stereotypes imum wage of $7.25/hour (Hara et al. 2018), itself a far to differing extents. Two such factors are political/economic cry from a living wage. The study attributed much of this instability or a relatively lower proportion of tertiary for- mal education in the local population, which can be traced sented regions, but hire employees and include voices from to the history of colonial exploitation and underdevelop- those regions in a proportionate manner. ment (Rodney 1972; Jarosz 2003; Bruhn and Gallego 2012), The CSET report also notes that AI labs form abroad whereby European colonial powers extracted labour, natural generally in one of three ways: through the acquisition of resources, and economic surplus from colonies, while at the startups; by establishing partnerships with local universi- same time subordinating their economic development to that ties or institutions; and by relocating internal staff or hiring of the metropoles. It is hard to imagine the establishment of new staff in these locations (Heston and Zwetsloot 2020). a top-tier research university — with the attendant technical The first two of these methods may favor locations with an training afforded to the local populace — in regions repeat- already-established technological or AI presence, as many edly denuded of wealth. AI startups are founded in locations where a financial and technological support system exists for them. Similarly, the Problems with Participation While the opening of data universities with whom tech companies choose to partner centers and AI research labs in the Global South appears are often already leaders in the space, as evidenced by Face- beneficial for the local workforce, these positions may re- book’s partnership with Carnegie Mellon professors and quire technical expertise which the local population might MIT’s partnerships with both IBM and Microsoft. The gen- not have. This would instead introduce opportunities for dis- eral strategy of partnering with existing institutions and of placement by those from the Global North who have had acquiring startups has the potential to reinforce existing in- more access to specialized training needed to develop, main- equities by investing in locations with already thriving tech tain, and deploy AI systems. Given the unequal distribution ecosystems. One notable exception to this is Google’s in- of AI development globally, it is common for AI researchers vestment into infrastructure, skills training, and startups in and practitioners to work and study in places outside of Ghana (Asemota 2018). Long-term investment and planning their home countries (i.e., outside of the Global South). For in the Global South can form the stepping stones for broad- example, the current director of Google AI Accra, origi- ening AI to include underrepresented and marginalized com- nally from Senegal, was recruited to Google from Facebook munities. AI Research in Menlo Park, CA (Adekanmbi 2018; Ase- Even with long-term investment into regions in the Global mota 2018). The director for Microsoft’s new lab in Nairobi, South, the question remains of whether local residents are Kenya was recruited from Microsoft Research India; before provided opportunities to join management and contribute that, she was a research scientist at Xerox in France (O’Neill to important strategic decisions. Several organizations have 2020; Research 2020). While the directors of many research emphasized the need for AI development within a country labs established in the Global South have experience work- to happen at the grassroots level, so that those implement- ing in related contexts, we find that local representation is ing AI as a solution understand the context of the problem sorely lacking at both the leadership and general workforce being solved (Mbayo 2020; Gul 2019). The necessity of in- level. Grassroots AI education and training initiatives by digenous decision-making is just as important in negotiat- communities such as Deep Learning Indaba, Data Science ing the values that AI technologies are to instantiate, such Africa, and Khipu AI in Latin America aim to increase lo- as through AI ethics declarations that are at the moment cal AI talent, but since these initiatives are less than five heavily Western-based (Jobin, Ienca, and Vayena 2019). Al- years old, it is hard to measure their current impact on im- though this is critical not only to the success of individual proving the pipeline of AI researchers and machine learn- AI solutions but also to equitable participation within the ing engineers. However, with the progress made by these field at large, more can and should be done. True inclusion organizations publishing novel research at premier AI con- necessitates that underrepresented voices can be found in all ferences, hosting conferences of their own, and much more, ranks of a company’s hierarchy, including in positions of up- the path to inclusive representation in the global AI work- per management. Tech companies which are establishing a force is strengthening. footprint in these regions are uniquely positioned to offer Although several tech companies have established re- this opportunity to natives of the region. Taking advantage search facilities across the world and in the Global South, of this ability will be critical to ensuring that the benefits these efforts remain insufficient at addressing long-term of AI apply not only to technical problems that arise in the problems in the AI ecosystem. A recent report from George- Global South, but to socioeconomic inequalities which per- town University’s Center for Security and Emerging Tech- sist around the world. nologies (CSET) describes the establishment of AI labs by US companies, namely Facebook, Google, IBM, and Mi- Opportunities crosoft, abroad (Heston and Zwetsloot 2020). The report In the face of global inequality in AI development, there are notes that while 68% of the 62 AI labs are located outside a few promising opportunities. of the United States, 68% of the staff are located within the United States. Therefore, the international offices re- Affinity Groups While AI and technology in general has main half as populated on average relative to the domestic long excluded marginalized populations, the emergence of locations. Additionally, none of these offices are located in grassroots efforts by organizations to ensure that indige- South America and only four are in Africa. To advance eq- nous communities are actively involved as stakeholders of uity within AI and improve inclusion efforts, it is imperative AI has recently been strong. Black in AI, a nonprofit organi- that companies not only establish locations in underrepre- zation with worldwide membership, was founded to increase the global representation of Black-identifying students, re- agement and in the process of strategic decision-making. searchers, and practitioners in the field of AI, and has made The advancement of an equitable AI workforce and ecosys- significant improvements in increasing the number of Black tem requires that those in positions of data collection and scholars attending and publishing in NeurIPS and other pre- training be afforded opportunities to lead their organizations. mier AI conferences (Earl 2020; Silva 2021). Inclusion in AI Including these voices in positions of power has the added is extremely sparse in higher education and recent efforts by benefit of ensuring the future hiring and promotion of local Black in AI have focused on instituting programming to sup- community members. port members in graduate programs and in their postgradu- ate careers. Other efforts such as Khipu AI, based in Latin AI as Development The massive inequalities in the devel- America, have been established to provide a venue to train opment of AI can appear daunting. Will it ever be possible to aspiring AI researchers in advanced machine learning top- close the gap? Similar concerns arise in the broader study of ics, foster collaborations, and actively participate in how AI economic development, from which one can draw lessons. is being used to benefit Latin America. Other communities Despite the large developmental gap between the Global based on the African continent such as Data Science Africa North and the Global South, the latter part of the 20th cen- and Deep Learning Indaba have expanded their efforts, es- tury saw some countries bridge it. For example, while the tablishing conferences, workshops, and dissertation awards, GDP per capita of South Korea was far lower than that of and developing curricula for the broader African AI commu- the USA in the 1960s, by 2000 the gap had considerably nity. These communities are clear about their respective mis- narrowed, especially in comparison to world GDP per capita sions and the focus of collaboration. Notably, Masakhane, a over the same time period. 1 Much work (Chang 2009; Lin grassroots organization focusing on improving the represen- 2011; Aryeetey and Moyo 2012; Mendes, Bertella, and Teix- tation of African languages in the field of natural language eira 2014) has linked the relative economic success of South processing shares the sentiment expressed in this paper on Korea to the policy of import substitution industrialization how AI research should be approached: (ISI), whereby a country attempts to replace foreign im- ports with domestic production in an attempt to build high- Masakhane are not just annotators or translators. We productivity industries (e.g., electronics), rather than rely are researchers. We can likely connect you with anno- on exports of low-productivity industries (e.g., agriculture). tators or translators but we do not support shallow en- The idea is that once the so-called “infant industries” have gagement of Africans as only data generators or con- developed enough, they will be able to compete in interna- sumers (Masakhane 2021). tional markets without government support. The execution As these initiatives grow across the Global South, we of ISI involves protectionist trade policies, subsidies for tar- hope large organizations and technology companies partner geted industries, and sufficient investment in education and with and adopt the values of these respective initiatives to infrastructure. While ISI can be incredibly successful, as in ensure AI developments are truly representative of the global the cases of Samsung and POSCO from South Korea (Chang populace. 2009), its execution relies on sufficient agricultural input and human capital, careful management of foreign reserves, and Research Participation One key component of AI inclu- state capacity for coordination with private partners (Ary- sion efforts should be to elevate the involvement and par- eetey and Moyo 2012; Mendes, Bertella, and Teixeira 2014). ticipation of those historically excluded from technological In the absence of these factors, ISI can fail and the country development. Many startups and several governments across can even go through de-industrialization. the Global South are creating opportunities for local com- We suggest viewing AI development as a path forward munities to participate in the development and implemen- for economic development, in light of the lessons learned tation of AI programs (Mbayo 2020; Gul 2019; Galperin from ISI policies. Rather than rely upon foreign construc- and Alarcon 2018). In situations where the central involve- tion of AI systems for domestic application, where any re- ment has been data labeling, strides should be taken to add turns from these systems are not reinvested domestically, model development roles to the opportunity catalog there. we encourage the formation of domestic AI development Currently, data labelers are often wholly detached from the activity. This development activity should not be focused rest of the ML pipeline, with workers oftentimes not know- on low-productivity activities, such as data-labeling, but in- ing how their labor will be used nor for what purpose (Gra- stead on high-productivity activities like model develop- ham 2018). Little sense of fulfillment comes from menial ment/deployment and research. An AI-focused ISI policy tasks, and by exploiting these workers solely for their pro- could include state-led investments into AI-related educa- duced knowledge without bringing them into the fold of the tion and infrastructure, funding for private bodies to engage product that they are helping to create, a deep chasm ex- in domestic AI development, and limitations on the extent to ists between workers and the downstream product (Rogsta- which foreign companies may be involved in or profit from dius et al. 2011). Thus, in addition to policy that improves domestic AI activities. While it remains essential, as it was work conditions and wages for data labelers, workers should in historical ISI policies, to work with and assimilate tech- be provided with education opportunities that allow them to nology and expertise from foreign companies, it is impera- contribute to the models they are building in ways beyond la- beling (Gray and Suri 2019). Similarly, where participation 1 https://ourworldindata.org/grapher/average-real-gdp-per- in the form of model development is the norm, employers capita-across-countries-and-regions?time=1869..2016&country= should seek to involve local residents in the ranks of man- KOR∼USA∼OWID WRL tive that domestic expertise be developed in tandem to shape African Studies 22(3): 387–403. ISSN 0305-7070. URL the future of AI development and reap its large profits. http://www.jstor.org/stable/2637310. Publisher: [Taylor & This is by no means an easy task, and an AI-focused ISI Francis, Ltd., Journal of Southern African Studies]. policy encounters many of the same difficulties as histori- Batziou, A. 2011. Framing ‘otherness’ in press pho- cal ISI policies, such as the necessity of bringing in exper- tographs: The case of immigrants in Greece and Spain. tise and technology, and in ensuring that sufficient education Journal of Media Practice 12(1): 41–60. ISSN 1468- and infrastructure (e.g., internet access) exist. It will likely 2753. doi:10.1386/jmpr.12.1.41 1. URL https://doi. encounter many new difficulties that are unique to AI de- org/10.1386/jmpr.12.1.41 1. Publisher: Routledge eprint: velopment as well. Even in the absence of centralized state https://doi.org/10.1386/jmpr.12.1.41 1. coordination, however, recent initiatives like Deep Learning Indaba and Khipu have promoted the importance of indige- Beede, E.; Baylor, E.; Hersch, F.; Iurchenko, A.; Wilcox, nous AI development and have advanced education in AI. L.; Ruamviboonsuk, P.; and Vardoulakis, L. M. 2020. A Human-Centered Evaluation of a Deep Learning System Conclusion Deployed in Clinics for the Detection of Diabetic Retinopa- As the development of artificial intelligence continues to thy. In Proceedings of the 2020 CHI Conference on Human progress across the world, the exclusion of those from com- Factors in Computing Systems, CHI ’20, 1–12. New York, munities most likely to bear the brunt of algorithmic inequity NY, USA: Association for Computing Machinery. ISBN only stands to worsen. We address this question by explor- 978-1-4503-6708-0. doi:10.1145/3313831.3376718. URL ing the challenges and benefits of increasing broader inclu- http://doi.org/10.1145/3313831.3376718. sion in the field of AI. We examine the limits of current AI Bruhn, M.; and Gallego, F. A. 2012. Good, Bad, and inclusion methods, problems of participation regarding AI Ugly Colonial Activities: Do They Matter for Economic labs situated in the Global South from major tech compa- Development? The Review of Economics and Statistics nies, and discuss opportunities for AI to accelerate develop- 94(2): 433–461. URL https://ideas.repec.org/a/tpr/restat/ ment within disadvantaged regions. v94y2012i2p433-461.html. Publisher: MIT Press. We hope the actions we propose can help to begin the Buolamwini, J.; and Gebru, T. 2018. Gender shades: Inter- movement of communities in the Global South from being sectional accuracy disparities in commercial gender classifi- just beneficiaries or subjects of AI systems to being active, cation. In Conference on fairness, accountability and trans- engaged participants. Having true agency over the AI sys- parency, 77–91. tems integrated into the livelihoods of communities in the Global South will maximize the impact of these systems and Center, E. T. 2011. General Electric Research Lab. URL lead the way for global inclusion of AI. https://edisontechcenter.org/GEresearchLab.html. As a limitation of our work, it is important to acknowl- Chang, H.-J. 2009. Bad Samaritans: The Myth of Free edge we are currently all located at, and have been educated Trade and the Secret History of Capitalism. New York, NY: at, North American institutions. Our positions in these insti- Bloomsbury Press. ISBN 978-1-59691-598-5. tutions thus limit our perspective, and we respect the con- siderations we may have missed and the voices we have not Chuvpilo, G. 2020. AI Research Rankings 2020: heard in the course of writing this work. Can the United States Stay Ahead of China? URL https://chuvpilo.medium.com/ai-research-rankings-2020- References can-the-united-states-stay-ahead-of-china-61cf14b1216. 2021. Samasource. URL https://www.causeiq.com/ Coalition for Critical Technology. 2020. Abolish organizations/samasource,262547062/. the #TechToPrisonPipeline. URL https://medium. Adekanmbi, B. 2018. 10 inspiring Facts about com/@CoalitionForCriticalTechnology/abolish-the- Moustapha Cisse, Google AI Ghana Pioneer Lead. techtoprisonpipeline-9b5b14366b16. URL https://www.datasciencenigeria.org/10-inspiring- Cognilytica. 2019. Data Engineering, Preparation, and facts-moustapha-cisse-google-ai-ghana-pioneer-lead/. Labeling for AI. URL https://www.cognilytica.com/2019/ Amazon. 2020. FAQs. https://www.mturk.com/worker/help. 03/06/report-data-engineering-preparation-and-labeling- for-ai-2019/. Aryeetey, E.; and Moyo, N. 2012. Industrialisation for Structural Transformation in Africa: Appropriate Roles for Croce, N.; and Musa, M. 2019. The new assembly lines: the State. Journal of African Economies 21(suppl 2): ii85. Why AI needs low-skilled workers too. URL https://www. URL https://econpapers.repec.org/article/oupjafrec/v 3a21 weforum.org/agenda/2019/08/ai-low-skilled-workers/. 3ay 3a2012 3ai 3asuppl 5f2 3ap 3a-ii85.htm. Publisher: De La Garza, A. 2020. States’ Automated Systems Centre for the Study of African Economies (CSAE). Are Trapping Citizens in Bureaucratic Nightmares With Asemota, V. 2018. ’Ghana is the future of Africa’: Why Their Lives on the Line. URL https://time.com/5840609/ Google built an AI lab in Accra. URL https://edition.cnn. algorithm-unemployment/. com/2018/07/14/africa/google-ghana-ai/. Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei- Bank, A. 1996. Of ’Native Skulls’ and ’Noble Caucasians’: Fei, L. 2009. ImageNet: A Large-Scale Hierarchical Image Phrenology in Colonial South Africa. Journal of Southern Database. In CVPR. Denton, E.; Hanna, A.; Amironesei, R.; Smart, A.; Nicole, Heston, R.; and Zwetsloot, R. 2020. Mapping U.S. H.; and Scheuerman, M. K. 2020. Bringing the Peo- Multinationals’ Global AI Ramp;D Activity. URL ple Back In: Contesting Benchmark Machine Learning https://cset.georgetown.edu/research/mapping-u-s- Datasets. ICML Workshop on Participatory Approaches to multinationals-global-ai-rd-activity/. Machine Learning . Jarosz, L. 2003. A Human Geographer’s Response to Guns, DeVries, T.; Misra, I.; Wang, C.; and van der Maaten, L. Germs, and Steel: The Case of Agrarian Development 2019. Does Object Recognition Work for Everyone? Com- and Change in Madagascar. Antipode 35(4): 823–828. puter Vision and Pattern Recognition Workshop (CVPRW) ISSN 1467-8330. doi:https://doi.org/10.1046/j.1467- . 8330.2003.00356.x. URL http://onlinelibrary.wiley.com/ doi/abs/10.1046/j.1467-8330.2003.00356.x. eprint: Difallah, D.; Filatova, E.; and Ipeirotis, P. 2018. Demo- https://onlinelibrary.wiley.com/doi/pdf/10.1046/j.1467- graphics and Dynamics of Mechanical Turk Workers. Pro- 8330.2003.00356.x. ceedings of WSDM: The Eleventh ACM International Con- ference on Web Search and Data Mining . Jo, E. S.; and Gebru, T. 2020. Lessons from Archives: Strate- gies for Collecting Sociocultural Data in Machine Learning. Earl, C. C. 2020. Notes from the Black In AI 2019 Work- ACM Conference on Fairness, Accountability, Transparency shop. URL https://charlesearl.blog/2020/01/08/notes-from- (FAccT) . the-black-in-ai-2019-workshop/. Jobin, A.; Ienca, M.; and Vayena, E. 2019. The global land- Frank, A. G. 1967. Capitalism and underdevelopment in scape of AI ethics guidelines. Nature Machine Intelligence Latin America : historical studies of Chile and Brazil. New 1(9): 389–399. ISSN 2522-5839. doi:10.1038/s42256-019- York: Monthly Review Press. 0088-2. URL http://www.nature.com/articles/s42256-019- Galperin, H.; and Alarcon, A. 2018. The Future of Work in 0088-2. the Global South. Kaye, K. 2019. These companies claim to provide “fair- Gent, E. 2019. The ‘ghost work’ powering tech trade” data work. Do they? Technology Review URL magic. URL https://www.causeiq.com/organizations/ https://www.technologyreview.com/2019/08/07/133845/ samasource,262547062/. cloudfactory-ddd-samasource-imerit-impact-sourcing- companies-for-data-annotation/. Gershgorn, D. 2017. The data that transformed AI research—and possibly the world. Quartz Kelly, M. 2020. Andrew Yang is pushing Big Tech to pay https://qz.com/1034972/the-data-that-changed-the- users for data. URL https://www.theverge.com/2020/6/22/ direction-of-ai-research-and-possibly-the-world/. 21298919/andrew-yang-big-tech-data-dividend-project- facebook-google-ubi. Graham, M. 2018. The rise of the planetary labour market – and what it means for the future of work. NS Tech . Krasin, I.; Duerig, T.; Alldrin, N.; Ferrari, V.; Abu-El-Haija, S.; Kuznetsova, A.; Rom, H.; Uijlings, J.; Popov, S.; Veit, Grand View Research. 2020. Data Collection Label- A.; Belongie, S.; Gomes, V.; Gupta, A.; Sun, C.; Chechik, ing Market Size Worth 6.5 Billion By 2027. URL G.; Cai, D.; Feng, Z.; Narayanan, D.; and Murphy, K. 2017. https://www.grandviewresearch.com/press-release/global- OpenImages: A public dataset for large-scale multi-label data-collection-labeling-market. and multi-class image classification. Dataset available from Gray, M. L.; and Suri, S. 2019. Ghost Work: How to Stop https://github.com/openimages . Silicon Valley from Building a New Global Underclass . Lee, D. 2018. Why Big Tech pays poor Kenyans to teach Grush, L. 2015. Google engineer apologizes af- self-driving cars. BBC News . ter Photos app tags two black people as gorillas. Lee, M. K.; Kusbit, D.; Kahng, A.; Kim, J. T.; Yuan, X.; URL https://www.theverge.com/2015/7/1/8880363/google- Chan, A.; See, D.; Noothigattu, R.; Lee, S.; Psomas, A.; and apologizes-photos-app-tags-two-black-people-gorillas. Procaccia, A. D. 2019. WeBuildAI: Participatory Frame- Gul, E. 2019. Is Artificial Intelligence the frontier work for Algorithmic Governance. Proceedings of the solution to Global South’s wicked development chal- ACM on Human-Computer Interaction 3(CSCW): 181:1– lenges? URL https://towardsdatascience.com/is-artificial- 181:35. doi:10.1145/3359283. URL http://doi.org/10.1145/ intelligence-the-frontier-solution-to-global-souths-wicked- 3359283. development-challenges-4206221a3c78. Lin, J. Y. 2011. From Flying Geese to Leading Dragons : Hara, K.; Adams, A.; Milland, K.; Savage, S.; Callison- New Opportunities and Strategies for Structural Transfor- Burch, C.; and Bigham, J. 2018. A Data-Driven Analysis mation in Developing Countries. Technical Report WPS of Workers’ Earnings on Amazon Mechanical Turk. ACM 5702, World Bank. Conference on Human Factors in Computing Systems (CHI) Masakhane. 2021. Masakhane: A grassroots NLP commu- . nity for Africa, by Africans. URL https://www.masakhane. Harris, C.; Straker, L.; and Pollock, C. 2017. A socioeco- io/. nomic related’digital divide’exists in how, not if, young peo- Mbayo, H. 2020. Data and Power: AI and De- ple use computers. PloS one 12(3): e0175011. velopment in the Global South. URL https: //www.oxfordinsights.com/insights/2020/10/2/data-and- Sambasivan, N.; Arnesen, E.; Hutchinson, B.; and Prab- power-ai-and-development-in-the-global-south. hakaran, V. 2020. Non-portability of Algorithmic Fairness Mendes, A. P. F.; Bertella, M. A.; and Teixeira, R. F. in India. arXiv:2012.03659 [cs] URL http://arxiv.org/abs/ A. P. 2014. Industrialization in Sub-Saharan Africa 2012.03659. ArXiv: 2012.03659. and import substitution policy. Revista de Econo- Semuels, A. 2018. The Internet Is Enabling a mia Polı́tica 34(1): 120–138. ISSN 0101-3157. New Kind of Poorly Paid Hell. The Atlantic https: doi:10.1590/S0101-31572014000100008. URL http: //www.theatlantic.com/business/archive/2018/01/amazon- //www.scielo.br/scielo.php?script=sci arttext&pid=S0101- mechanical-turk/551192/. 31572014000100008&lng=en&tlng=en. Shankar, S.; Halpern, Y.; Breck, E.; Atwood, J.; Wilson, J.; Murgia, M. 2019. AI’s new workforce: the data-labelling and Sculley, D. 2017. No Classification without Representa- industry spreads globally. Financial Times . tion: Assessing Geodiversity Issues in Open DataSets for the Developing World. NeurIPS workshop: Machine Learning Nature. 1915. Industrial Research Laboratories. URL https: for the Developing World . //doi.org/10.1038/096419a0. Silva, M. 2021. URL https://blackinai.github.io/#/about. Nzekwe, H. 2019. Africans Are Paying More For In- ternet Than Any Other Part Of The World – Here’s Sloane, M.; Moss, E.; Awomolo, O.; and Forlano, L. Why. URL https://weetracker.com/2019/10/22/africans- 2020. Participation is not a Design Fix for Machine Learn- pay-more-for-internet-than-other-regions/. ing. arXiv:2007.02423 [cs] URL http://arxiv.org/abs/2007. 02423. ArXiv: 2007.02423. Obermeyer, Z.; Powers, B.; Vogeli, C.; and Mullainathan, S. 2019. Dissecting racial bias in an algorithm used to manage Synced. 2019. Data Annotation: The Billion Dol- the health of populations. Science 366(6464): 447–453. lar Business Behind AI Breakthroughs. URL https: //medium.com/syncedreview/data-annotation-the-billion- O’Neill, J. 2020. Jacki O’Neill — LinkedIn. URL https: dollar-business-behind-ai-breakthroughs-d929b0a50d23. //www.linkedin.com/in/jacki-o-neill-5605534/. Thompson, A. 2016. Otherness and the Fetishization of Paullada, A.; Raji, I. D.; Bender, E. M.; Denton, E.; and Subject. URL https://petapixel.com/2016/11/16/otherness- Hanna, A. 2020. Data and its (dis)contents: A survey of fetishization-subject/. dataset development and use in machine learning research. Wang, A.; Narayanan, A.; and Russakovsky, O. 2020. RE- arXiv:2012.05345 . VISE: A Tool for Measuring and Mitigating Bias in Vi- Poskett, J. 2013. Django Unchained and the racist science sual Datasets. European Conference on Computer Vision of phrenology | James Poskett. The Guardian ISSN 0261- (ECCV) . 3077. URL https://www.theguardian.com/science/blog/ 2013/feb/05/django-unchained-racist-science-phrenology. Prabhu, V. U.; and Birhane, A. 2020. Large image datasets: A pyrrhic win for computer vision? arXiv:2006.16923 . Ranger, T. 2001. Colonialism, Consciousness and the Cam- era. Past & Present (171): 203–215. ISSN 0031-2746. URL http://www.jstor.org/stable/3600818. Publisher: [Ox- ford University Press, The Past and Present Society]. Research, M. 2020. Jacki O’Neill at Microsoft Re- search. URL https://www.microsoft.com/en-us/research/ people/jaoneil/. Rodney, W. 1972. How Europe underdeveloped Africa. London :: Bogle L’Ouverture Publications. ISBN 978-0- 9501546-4-0. Rogstadius, J.; Kostakos, V.; Kittur, A.; Smus, B.; Laredo, J.; and Vukovic, M. 2011. An Assessment of Intrinsic and Ex- trinsic Motivation on Task Performance in Crowdsourcing Markets. Proceedings of the Fifth International Conference on Weblogs and Social Media . Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, A. C.; and Fei-Fei, L. 2015. ImageNet Large Scale Vi- sual Recognition Challenge. International Journal of Com- puter Vision (IJCV) 115(3): 211–252. doi:10.1007/s11263- 015-0816-y.