=Paper=
{{Paper
|id=Vol-2884/paper_130
|storemode=property
|title=Clean Water: How the AI Community Can Contribute to Accessing Water Sources in
Developing Countries
|pdfUrl=https://ceur-ws.org/Vol-2884/paper_130.pdf
|volume=Vol-2884
|authors=Karthik Dusi,Thilanka Munasingha
}}
==Clean Water: How the AI Community Can Contribute to Accessing Water Sources in
Developing Countries==
Clean Water: How the AI Community can Contribute to Accessing Water Sources in Developing Countries Karthik Dusi1 , Thilanka Munasinghe2 1 Department of Industrial and Systems Engineering 2 Department of Information Technology and Web Science (ITWS) Rensselaer Polytechnic Institute Troy, NY Abstract cited as to why water should be a human right. One that ensuring access to clean water will significantly reduce the Access to water is one of the fundamental human rights. number of people affected by diseases (Editors et al. 2009). Clean water is an issue plaguing many countries world- Two, the privatization of water does not ensure everyone wide and is one of the world’s largest health concerns. The poor are those who suffer significantly from access has equal access (Editors et al. 2009). Three, the world’s to improved water sources and often contract other in- resources are being exploited to a point where our current fectious diseases from unsafe water. This paper exam- water supply quality is threatened and must be improved ines how the AI community can further research into upon (Editors et al. 2009). These reasons explain why water clean water data that is available and investigate the is a human right, but other factors affect the lack of access socio-economic factors that prevent some communities to clean water. from gaining access to safe water sources. Preliminary and Exploratory Data Analysis were done on the UN data to understand the patterns, relations, and trends be- tween related variables. Key correlations were inves- Literature Review tigated between different socioeconomic factors such Various socioeconomic factors may affect who has access as GDP, Corruption, and Infrastructure to understand to improved water sources, such as whether they live in an what has the greatest effect on access to improved wa- urban area, a rural area, a country’s GDP, infrastructure, ter sources. To do so, visualizations were built using corruption, and government effectiveness. Countries clas- Python and the Seaborn package, as well as using the sified as developing countries, like Afghanistan, Albania, Pandas package to curate the data. Iran, and India, are considered ’developing countries’ because of the rate at which its GDP per capita grows and Introduction the infrastructure it has to support necessary elements for human life clean water (investopedia 2019). Over 1.1 billion people in the world lack access to a general water source (World Wildlife Organization 2020). In rural areas, drinking contaminated water can lead to According to Worldwildlife.org, 2.7 billion people suffer diarrheal illnesses, enteropathy, and other serious diseases. from water scarcity at least one month a year, and 2.4 In a paper investigating water quality in a South African billion people are victims to clean water inadequately. rural community, at least a third of the population perceived These numbers have been on the rise and continue to be the water as unsafe and felt they could get sick from it as scientists predict that by 2025, over two-thirds of the (Edokpayi et al. 2018). The system used to supply water world’s population will face water shortage issues (World to the community did not test positive for containing Wildlife Organization 2020). contaminants, but the system does not reach all community residents and is subject to frequent shutdowns (Edokpayi Several scholars and politicians have called for clean et al. 2018). Additionally, due to increased amounts of water to be recognized as a human right. Germany and available water in the monsoon season, the research shows Spain put forward a resolution at the UN to recognize that there is more treated water in the region and more clean water as a fundamental human right. However, the people feel comfortable drinking the water in the monsoons US, Russia, and Canada rejected this resolution in favor of (Edokpayi et al. 2018). examining issues affecting access to safe drinking water and sanitation (Editors et al. 2009). Three main reasons are In an opinion raised by sustainability experts, they ex- AAAI Fall 2020 Symposium on AI for Social Good. press that current Sustainability development goals (SDG) Copyright c 2020 for this paper by its authors. Use permitted un- are based on the assumption that access to safe water sources der Creative Commons License Attribution 4.0 International (CC includes sources with good quality water. However, there BY 4.0). is an important distinction between safe water and qual- ity water. Over 1.8 billion people were exposed to water distribution and also to detect outliers. sources contaminated by fecal matter and were overlooked by the misguided SDG statistics reported in 2012. The ar- ticle written on ”Current opinion in environmental sustain- ability” suggests that the number of populations reported are in lack of access to safe drinking water was underestimated. (Tortajada and Biswas 2018) Introduction to Dataset Explored The UN provides datasets that they have collected as well as datasets from related organizations like the WHO at data.un.org. Other sites like ourworldindata.org also has rel- evant data towards understanding the problems behind lack of access to clean water. To understand basic correlations and present ideas for the reader, a dataset with informa- tion on populations using improved water sources was ex- plored(World Health Organization 2014). This dataset has a percentage of a population using improved water source Figure 2: Percent of Total Population with Access for 192 countries, and further divides the percentages into whether they live in rural or urban areas. The rural areas are defined as areas not part of major metropolitan areas, The boxplot in figure 2 shows the percentages of total which are defined by population density and distance from populations with access to improved water sources; there are the metropolitan city, and the rural data reflects data col- many outliers towards the lower-end of the plot. This means lected on those areas. Similarly, the urban data reflects data many outliers points are lower than the minimum, which was collected in areas part of major metropolitan areas. There is calculated to be 62.3%. The median of this data is 90.9% , also historical data ranging from 1990 up to 2012 for these and the IQR is 24%. This means that 50% of the percent- countries, giving ample data to explore and analyze. The fig- age values fall within 24% of the median. Instead of merely ure 1 shown below outline the general project work-flow. deleting the outliers here, since they are the countries with lower percentages of people having access to improved wa- ter sources, a new data frame could be made to contain the outliers data and then compare the boxplot for the ’outlier’ data frame to the original data frame. Similarly, the percentages of urban populations and ru- ral populations having access to improved water sources were explored using boxplots to see if there are any outliers. The majority of the rural populations fell within the Inter- Quartile Range (IQR), with a minimal number of outliers, but the urban populations had a large number of outliers. The boxplots with outliers are further reinforced by histograms of the same variables. Figure 1: Data Acquisition and Project Work Flow Exploratory Data Analysis (EDA) Using the Pandas (Pandas NumFOCUS 2020) and Seaborn (Michael Waskom 2020) packages in Python, EDA was done on the collected dataset to understand correlations between GDP per capita on percentage of total population’s access to improved water source, as well as understanding the correlations between GDP and percentage of urban populations and percentage of rural populations’ access to the improved water sources. First, rows with empty data were dropped to make sure that only rows with usable data Figure 3: Histogram of Percent of Total Population with Ac- were present. Next, boxplots were generated to see the cess we see that GDP per capita is worth analyzing further; other factors like infrastructure, corruption, and effectiveness can be included in the dataset to build a predictive models. As we see, this dataset can be built open further by including other socioeconomic factors and also utilizing the historical data that is provided from 1990 to 2012. More recent data is also provided by ourworldindata.org, which could be used to verify a predictive model if developed ( Ritchie, Max Roser 2019). According to a paper by economists (Gomez, Perdiguero, and Sanz 2019) investigating factors affecting water access in rural areas of developing countries, they cite gross na- tional income, female primary completion rate, agriculture, Figure 4: Histogram of Percent of Urban Population with growth of rural population, and governance indicators as the Access main socio-economic factors affecting access to improved water sources for rural populations. By governance indica- The same method mentioned above to deal with the total tors, they refer to political stability, control of corruption, population’s outliers could be used here to further explore and regulatory quality as examples. They also recognize the rural populations and in which countries exactly rural that the water source itself and income of the group are two populations are suffering more. things that should influence the selection of factors being By looking at heatmaps, we can understand the correla- looked at and include other indicators of ’good’ governance tions between each variable better. In this case, we want to such as infrastructure, taxation, etc. look at the relation between GDP per capita and the percent- Combining this initial dataset with other indicators ages of populations with improved water sources access. If provided by the World Bank (The World Bank 2018, 2020) we look at the total populations’ heatmap in Figure 5, we resulted in variables measuring Government Effectiveness, can see a 0.49 correlation. Overall Infrastructure, and the Corruption Perception Index. The initial dataset ranged from 1990 to 2012, but the World Bank dataset had data from 1995 to 2012. For preliminary purposes, the following analyses were done on data collected on the year 2012. Looking at only 124 countries in 2012, the following heatmap in Figure 6 to investigate correlations was generated. In Figure 6, we can see a strong blue color means a higher correlation between the two variables. We see a medium to a strong correlation between the corruption perception index (Corruption Perceptions Index 2020) and percent of the rural population with access. This can be perceived as certain ru- ral populations not having access to improved water sources because of a higher corruption perception index. We can also observe that there is a strong relationship be- tween government effectiveness and percentage value of ru- ral population that has access to improved water sources, which makes sense given that more effective governments are able to provide water sources to all parts of the country. There is a medium correlation between infrastructure rat- ing and percentage of the total population with access to im- proved water sources. This could be because this infrastruc- ture rating considers all infrastructure in the country, and it Figure 5: Correlation Matrix with GDP and Population may be more prudent just to observe water-related infras- tructure, like drainage basins, sewers, reservoirs, etc. This implies a moderate level correlation here that could be worked with further if the outliers are removed. Further Analysis and how AI Community can We also see that the country’s total population does not correlate with the percentage of the population with access help to improved water sources, with a correlation of 0.017. This The AI community can help leverage this data and turn it means that socioeconomic factors are worth looking at since into a usable tool for governments and relief organizations Figure 6: Correlation Matrix with Other Socioeconomic Factors by helping them predict where resources must be allocated References first to enhance access to improved water sources. Using it Ritchie, Max Roser. 2019. Clean Water - Our world in as a model to predict where clean water sources will de- data. Unsafe water is responsible for 1.2 million deaths each plete given trends in GDP, infrastructure, and other socioe- year, https://ourworldindata.org/water-access, Accessed on: conomic factors would be very useful as several scholars as- September 24, 2020. sert that by 2025, two-thirds of the world’s population will face water shortage. Corruption Perceptions Index. 2020. The corruption percep- (World Wildlife Organization 2020). tions index Ranks of countries. , https://www.transparency. org/en/cpi/2019/results#, Accessed on: September 24, 2020. Additionally, models can be used to investigate where wa- ter quality is low. With machines and water filters that con- Editors, P. M.; et al. 2009. Clean water should be recognized tinuously check whether the water is safe to drink or not, as a human right. PLoS Med 6(6): e1000102. a data collection feature could be added and could provide Edokpayi, J.; Rogawski, E.; Kahler, D.; Hill, C.; Reynolds, data for data scientists to use in narrowing down where the C.; Nyathi, E.; Smith, J.; Odiyo, J.; Samie, A.; Bessong, P.; water contamination is happening. Prototypes of devices that et al. 2018. Challenges to sustainable safe drinking water: can detect whether water quality is low and can report the A case study ofwater quality and use across seasons in ru- data to a database exist, and could be used for this applica- ral communities in Limpopo Province, South Africa, Water tion. (Switzerland), 2018, 10: 1–18. DOI 10: w10020159. By using machine learning techniques and neural net- Gomez, M.; Perdiguero, J.; and Sanz, A. 2019. Socioeco- works, this existing data coupled with other socio-economic nomic factors affecting water access in rural areas of low datasets can be used for the further analysis and develop and middle income countries. Water 11(2): 202. prediction models. Lack of clean water leads to many investopedia. 2019. Top 25 Developed and Develop- infectious diseases, such as deadly diarrheal diseases, ing Countries. , https://www.investopedia.com/updates/top- cholera, and typhoid, and by using a model to see where developing-countries/, Accessed on: September 24, 2020. there is no clean water available, medical professionals can help try to prevent the spread of infectious diseases in those Michael Waskom. 2020. seaborn: statistical data visualiza- areas utilizing those models. Stakeholders for this type tion. Seaborn is a Python data visualization library based on of application would be public policy experts, healthcare matplotlib. It provides a high-level interface for drawing at- professionals, and infrastructure professionals who could tractive and informative statistical graphics, https://seaborn. help provide data and insights regarding what sort of pydata.org/, Accessed on: September 24, 2020. socioeconomic factors are most prevalent in prohibiting Pandas NumFOCUS. 2020. Pandas Library. , https://pandas. access to clean water. pydata.org/, Accessed on: September 24, 2020. The World Bank. 2018. Government Effectiveness. Per- ceptions of the quality of public services, the quality of the civil service and the degree of its independence from polit- Conclusion ical pressures, the quality of policy formulation and imple- mentation, and the credibility of the government’s commit- This paper presents a preliminary understanding of what ment to such policies, https://bit.ly/30c2MrT, Accessed on: could be done to collect and explore the data to help solve September 24, 2020. access to improved water sources. Looking at correlations The World Bank. 2020. Quality of overall infrastructure. , between key indicators and populations with access to water https://bit.ly/331Wywr, Accessed on: September 24, 2020. sources provides a basic understanding of what features to use in future models. Additionally, looking at rural popula- Tortajada, C.; and Biswas, A. K. 2018. Achieving univer- tions over urban populations may be more productive since sal access to clean water and sanitation in an era of water urban populations tend to be well developed and have good scarcity: strengthening contributions from academia. Cur- water sources. We plan to use the insights gained from this rent opinion in environmental sustainability 34: 21–25. initial analysis to test out different hypotheses and research World Health Organization. 2014. Popula- questions in the future. Obstacles that must be overcome are tion using improved drinking-water sources . , the lack of data for specific countries and certain yearly peri- https://data.un.org/Data.aspx?q=water&d=WHO&f= ods. Most countries have recent data, but only some go back MEASURE CODE%3aWHS5 122, Accessed on: Septem- up to 1995 and beyond. More emphasis needed to be done ber 24, 2020. on adequate data collection. Organizations such as the World World Wildlife Organization. 2020. water-scarcity. Bank, the United Nations should emphasize the importance , https://www.worldwildlife.org/threats/water-scarcity, Ac- of regular and thorough data collection from their member cessed on: September 24, 2020. countries. As upstanding citizens of the world and with the new technologies available to us, the AI community must push themselves forward to develop and come up with tools that can be used in directing relief efforts in the right places where access to clean water is a problem.