=Paper= {{Paper |id=Vol-2884/paper_130 |storemode=property |title=Clean Water: How the AI Community Can Contribute to Accessing Water Sources in Developing Countries |pdfUrl=https://ceur-ws.org/Vol-2884/paper_130.pdf |volume=Vol-2884 |authors=Karthik Dusi,Thilanka Munasingha }} ==Clean Water: How the AI Community Can Contribute to Accessing Water Sources in Developing Countries== https://ceur-ws.org/Vol-2884/paper_130.pdf
                              Clean Water:
How the AI Community can Contribute to Accessing Water Sources in Developing
                                Countries
                                    Karthik Dusi1 , Thilanka Munasinghe2
                                  1
                               Department of Industrial and Systems Engineering
                       2
                         Department of Information Technology and Web Science (ITWS)
                                        Rensselaer Polytechnic Institute
                                                  Troy, NY
                            Abstract                                cited as to why water should be a human right. One that
                                                                    ensuring access to clean water will significantly reduce the
     Access to water is one of the fundamental human rights.        number of people affected by diseases (Editors et al. 2009).
     Clean water is an issue plaguing many countries world-
                                                                    Two, the privatization of water does not ensure everyone
     wide and is one of the world’s largest health concerns.
     The poor are those who suffer significantly from access        has equal access (Editors et al. 2009). Three, the world’s
     to improved water sources and often contract other in-         resources are being exploited to a point where our current
     fectious diseases from unsafe water. This paper exam-          water supply quality is threatened and must be improved
     ines how the AI community can further research into            upon (Editors et al. 2009). These reasons explain why water
     clean water data that is available and investigate the         is a human right, but other factors affect the lack of access
     socio-economic factors that prevent some communities           to clean water.
     from gaining access to safe water sources. Preliminary
     and Exploratory Data Analysis were done on the UN
     data to understand the patterns, relations, and trends be-
     tween related variables. Key correlations were inves-                            Literature Review
     tigated between different socioeconomic factors such           Various socioeconomic factors may affect who has access
     as GDP, Corruption, and Infrastructure to understand           to improved water sources, such as whether they live in an
     what has the greatest effect on access to improved wa-         urban area, a rural area, a country’s GDP, infrastructure,
     ter sources. To do so, visualizations were built using         corruption, and government effectiveness. Countries clas-
     Python and the Seaborn package, as well as using the
                                                                    sified as developing countries, like Afghanistan, Albania,
     Pandas package to curate the data.
                                                                    Iran, and India, are considered ’developing countries’
                                                                    because of the rate at which its GDP per capita grows and
                        Introduction                                the infrastructure it has to support necessary elements for
                                                                    human life clean water (investopedia 2019).
Over 1.1 billion people in the world lack access to a
general water source (World Wildlife Organization 2020).               In rural areas, drinking contaminated water can lead to
According to Worldwildlife.org, 2.7 billion people suffer           diarrheal illnesses, enteropathy, and other serious diseases.
from water scarcity at least one month a year, and 2.4              In a paper investigating water quality in a South African
billion people are victims to clean water inadequately.             rural community, at least a third of the population perceived
These numbers have been on the rise and continue to be              the water as unsafe and felt they could get sick from it
as scientists predict that by 2025, over two-thirds of the          (Edokpayi et al. 2018). The system used to supply water
world’s population will face water shortage issues (World           to the community did not test positive for containing
Wildlife Organization 2020).                                        contaminants, but the system does not reach all community
                                                                    residents and is subject to frequent shutdowns (Edokpayi
   Several scholars and politicians have called for clean           et al. 2018). Additionally, due to increased amounts of
water to be recognized as a human right. Germany and                available water in the monsoon season, the research shows
Spain put forward a resolution at the UN to recognize               that there is more treated water in the region and more
clean water as a fundamental human right. However, the              people feel comfortable drinking the water in the monsoons
US, Russia, and Canada rejected this resolution in favor of         (Edokpayi et al. 2018).
examining issues affecting access to safe drinking water
and sanitation (Editors et al. 2009). Three main reasons are           In an opinion raised by sustainability experts, they ex-
AAAI Fall 2020 Symposium on AI for Social Good.                     press that current Sustainability development goals (SDG)
Copyright c 2020 for this paper by its authors. Use permitted un-   are based on the assumption that access to safe water sources
der Creative Commons License Attribution 4.0 International (CC      includes sources with good quality water. However, there
BY 4.0).                                                            is an important distinction between safe water and qual-
ity water. Over 1.8 billion people were exposed to water         distribution and also to detect outliers.
sources contaminated by fecal matter and were overlooked
by the misguided SDG statistics reported in 2012. The ar-
ticle written on ”Current opinion in environmental sustain-
ability” suggests that the number of populations reported are
in lack of access to safe drinking water was underestimated.
(Tortajada and Biswas 2018)

        Introduction to Dataset Explored
The UN provides datasets that they have collected as well
as datasets from related organizations like the WHO at
data.un.org. Other sites like ourworldindata.org also has rel-
evant data towards understanding the problems behind lack
of access to clean water. To understand basic correlations
and present ideas for the reader, a dataset with informa-
tion on populations using improved water sources was ex-
plored(World Health Organization 2014). This dataset has
a percentage of a population using improved water source              Figure 2: Percent of Total Population with Access
for 192 countries, and further divides the percentages into
whether they live in rural or urban areas. The rural areas
are defined as areas not part of major metropolitan areas,          The boxplot in figure 2 shows the percentages of total
which are defined by population density and distance from        populations with access to improved water sources; there are
the metropolitan city, and the rural data reflects data col-     many outliers towards the lower-end of the plot. This means
lected on those areas. Similarly, the urban data reflects data   many outliers points are lower than the minimum, which was
collected in areas part of major metropolitan areas. There is    calculated to be 62.3%. The median of this data is 90.9% ,
also historical data ranging from 1990 up to 2012 for these      and the IQR is 24%. This means that 50% of the percent-
countries, giving ample data to explore and analyze. The fig-    age values fall within 24% of the median. Instead of merely
ure 1 shown below outline the general project work-flow.         deleting the outliers here, since they are the countries with
                                                                 lower percentages of people having access to improved wa-
                                                                 ter sources, a new data frame could be made to contain the
                                                                 outliers data and then compare the boxplot for the ’outlier’
                                                                 data frame to the original data frame.
                                                                    Similarly, the percentages of urban populations and ru-
                                                                 ral populations having access to improved water sources
                                                                 were explored using boxplots to see if there are any outliers.
                                                                 The majority of the rural populations fell within the Inter-
                                                                 Quartile Range (IQR), with a minimal number of outliers,
                                                                 but the urban populations had a large number of outliers. The
                                                                 boxplots with outliers are further reinforced by histograms
                                                                 of the same variables.




    Figure 1: Data Acquisition and Project Work Flow


        Exploratory Data Analysis (EDA)
Using the Pandas (Pandas NumFOCUS 2020) and Seaborn
(Michael Waskom 2020) packages in Python, EDA was
done on the collected dataset to understand correlations
between GDP per capita on percentage of total population’s
access to improved water source, as well as understanding
the correlations between GDP and percentage of urban
populations and percentage of rural populations’ access to
the improved water sources. First, rows with empty data
were dropped to make sure that only rows with usable data        Figure 3: Histogram of Percent of Total Population with Ac-
were present. Next, boxplots were generated to see the           cess
                                                               we see that GDP per capita is worth analyzing further; other
                                                               factors like infrastructure, corruption, and effectiveness can
                                                               be included in the dataset to build a predictive models.

                                                               As we see, this dataset can be built open further by
                                                               including other socioeconomic factors and also utilizing the
                                                               historical data that is provided from 1990 to 2012. More
                                                               recent data is also provided by ourworldindata.org, which
                                                               could be used to verify a predictive model if developed (
                                                               Ritchie, Max Roser 2019).

                                                               According to a paper by economists (Gomez, Perdiguero,
                                                               and Sanz 2019) investigating factors affecting water access
                                                               in rural areas of developing countries, they cite gross na-
                                                               tional income, female primary completion rate, agriculture,
Figure 4: Histogram of Percent of Urban Population with        growth of rural population, and governance indicators as the
Access                                                         main socio-economic factors affecting access to improved
                                                               water sources for rural populations. By governance indica-
   The same method mentioned above to deal with the total      tors, they refer to political stability, control of corruption,
population’s outliers could be used here to further explore    and regulatory quality as examples. They also recognize
the rural populations and in which countries exactly rural     that the water source itself and income of the group are two
populations are suffering more.                                things that should influence the selection of factors being
   By looking at heatmaps, we can understand the correla-      looked at and include other indicators of ’good’ governance
tions between each variable better. In this case, we want to   such as infrastructure, taxation, etc.
look at the relation between GDP per capita and the percent-      Combining this initial dataset with other indicators
ages of populations with improved water sources access. If     provided by the World Bank (The World Bank 2018, 2020)
we look at the total populations’ heatmap in Figure 5, we      resulted in variables measuring Government Effectiveness,
can see a 0.49 correlation.                                    Overall Infrastructure, and the Corruption Perception
                                                               Index. The initial dataset ranged from 1990 to 2012, but
                                                               the World Bank dataset had data from 1995 to 2012. For
                                                               preliminary purposes, the following analyses were done
                                                               on data collected on the year 2012. Looking at only 124
                                                               countries in 2012, the following heatmap in Figure 6 to
                                                               investigate correlations was generated.

                                                                  In Figure 6, we can see a strong blue color means a higher
                                                               correlation between the two variables. We see a medium to
                                                               a strong correlation between the corruption perception index
                                                               (Corruption Perceptions Index 2020) and percent of the rural
                                                               population with access. This can be perceived as certain ru-
                                                               ral populations not having access to improved water sources
                                                               because of a higher corruption perception index.
                                                                  We can also observe that there is a strong relationship be-
                                                               tween government effectiveness and percentage value of ru-
                                                               ral population that has access to improved water sources,
                                                               which makes sense given that more effective governments
                                                               are able to provide water sources to all parts of the country.
                                                                  There is a medium correlation between infrastructure rat-
                                                               ing and percentage of the total population with access to im-
                                                               proved water sources. This could be because this infrastruc-
                                                               ture rating considers all infrastructure in the country, and it
  Figure 5: Correlation Matrix with GDP and Population         may be more prudent just to observe water-related infras-
                                                               tructure, like drainage basins, sewers, reservoirs, etc.
   This implies a moderate level correlation here that
could be worked with further if the outliers are removed.      Further Analysis and how AI Community can
We also see that the country’s total population does not
correlate with the percentage of the population with access                        help
to improved water sources, with a correlation of 0.017. This   The AI community can help leverage this data and turn it
means that socioeconomic factors are worth looking at since    into a usable tool for governments and relief organizations
Figure 6: Correlation Matrix with Other Socioeconomic Factors
by helping them predict where resources must be allocated                                  References
first to enhance access to improved water sources. Using it        Ritchie, Max Roser. 2019. Clean Water - Our world in
as a model to predict where clean water sources will de-           data. Unsafe water is responsible for 1.2 million deaths each
plete given trends in GDP, infrastructure, and other socioe-       year, https://ourworldindata.org/water-access, Accessed on:
conomic factors would be very useful as several scholars as-       September 24, 2020.
sert that by 2025, two-thirds of the world’s population will
face water shortage.                                               Corruption Perceptions Index. 2020. The corruption percep-
(World Wildlife Organization 2020).                                tions index Ranks of countries. , https://www.transparency.
                                                                   org/en/cpi/2019/results#, Accessed on: September 24, 2020.
   Additionally, models can be used to investigate where wa-
ter quality is low. With machines and water filters that con-      Editors, P. M.; et al. 2009. Clean water should be recognized
tinuously check whether the water is safe to drink or not,         as a human right. PLoS Med 6(6): e1000102.
a data collection feature could be added and could provide         Edokpayi, J.; Rogawski, E.; Kahler, D.; Hill, C.; Reynolds,
data for data scientists to use in narrowing down where the        C.; Nyathi, E.; Smith, J.; Odiyo, J.; Samie, A.; Bessong, P.;
water contamination is happening. Prototypes of devices that       et al. 2018. Challenges to sustainable safe drinking water:
can detect whether water quality is low and can report the         A case study ofwater quality and use across seasons in ru-
data to a database exist, and could be used for this applica-      ral communities in Limpopo Province, South Africa, Water
tion.                                                              (Switzerland), 2018, 10: 1–18. DOI 10: w10020159.
   By using machine learning techniques and neural net-            Gomez, M.; Perdiguero, J.; and Sanz, A. 2019. Socioeco-
works, this existing data coupled with other socio-economic        nomic factors affecting water access in rural areas of low
datasets can be used for the further analysis and develop          and middle income countries. Water 11(2): 202.
prediction models. Lack of clean water leads to many               investopedia. 2019. Top 25 Developed and Develop-
infectious diseases, such as deadly diarrheal diseases,            ing Countries. , https://www.investopedia.com/updates/top-
cholera, and typhoid, and by using a model to see where            developing-countries/, Accessed on: September 24, 2020.
there is no clean water available, medical professionals can
help try to prevent the spread of infectious diseases in those     Michael Waskom. 2020. seaborn: statistical data visualiza-
areas utilizing those models. Stakeholders for this type           tion. Seaborn is a Python data visualization library based on
of application would be public policy experts, healthcare          matplotlib. It provides a high-level interface for drawing at-
professionals, and infrastructure professionals who could          tractive and informative statistical graphics, https://seaborn.
help provide data and insights regarding what sort of              pydata.org/, Accessed on: September 24, 2020.
socioeconomic factors are most prevalent in prohibiting            Pandas NumFOCUS. 2020. Pandas Library. , https://pandas.
access to clean water.                                             pydata.org/, Accessed on: September 24, 2020.
                                                                   The World Bank. 2018. Government Effectiveness. Per-
                                                                   ceptions of the quality of public services, the quality of the
                                                                   civil service and the degree of its independence from polit-
                        Conclusion                                 ical pressures, the quality of policy formulation and imple-
                                                                   mentation, and the credibility of the government’s commit-
This paper presents a preliminary understanding of what            ment to such policies, https://bit.ly/30c2MrT, Accessed on:
could be done to collect and explore the data to help solve        September 24, 2020.
access to improved water sources. Looking at correlations          The World Bank. 2020. Quality of overall infrastructure. ,
between key indicators and populations with access to water        https://bit.ly/331Wywr, Accessed on: September 24, 2020.
sources provides a basic understanding of what features to
use in future models. Additionally, looking at rural popula-       Tortajada, C.; and Biswas, A. K. 2018. Achieving univer-
tions over urban populations may be more productive since          sal access to clean water and sanitation in an era of water
urban populations tend to be well developed and have good          scarcity: strengthening contributions from academia. Cur-
water sources. We plan to use the insights gained from this        rent opinion in environmental sustainability 34: 21–25.
initial analysis to test out different hypotheses and research     World Health Organization. 2014.             Popula-
questions in the future. Obstacles that must be overcome are       tion using improved drinking-water sources .       ,
the lack of data for specific countries and certain yearly peri-   https://data.un.org/Data.aspx?q=water&d=WHO&f=
ods. Most countries have recent data, but only some go back        MEASURE CODE%3aWHS5 122, Accessed on: Septem-
up to 1995 and beyond. More emphasis needed to be done             ber 24, 2020.
on adequate data collection. Organizations such as the World       World Wildlife Organization. 2020.          water-scarcity.
Bank, the United Nations should emphasize the importance           , https://www.worldwildlife.org/threats/water-scarcity, Ac-
of regular and thorough data collection from their member          cessed on: September 24, 2020.
countries. As upstanding citizens of the world and with the
new technologies available to us, the AI community must
push themselves forward to develop and come up with tools
that can be used in directing relief efforts in the right places
where access to clean water is a problem.