=Paper=
{{Paper
|id=Vol-2884/paper_126
|storemode=property
|title=Using AI to Identify Optimal Drilling Locations for Sustainable Irrigation for
Subsistence Agriculture
|pdfUrl=https://ceur-ws.org/Vol-2884/paper_126.pdf
|volume=Vol-2884
|authors=Wanru Li,Kathryn B. Laskey,Mekuanent Muluneh,Rupert Douglas-Bate,Hemant Purohit,Paul Houser
|dblpUrl=https://dblp.org/rec/conf/aaaifs/LiLMDPH20
}}
==Using AI to Identify Optimal Drilling Locations for Sustainable Irrigation for
Subsistence Agriculture==
Using AI to Identify Optimal Drilling Locations for Sustainable Irriga-
tion for Subsistence Agriculture
Wanru Li, 1 Kathryn B. Laskey, 1 Mekuanent Muluneh, 2 Rupert Douglas-Bate, 3
Hemant Purohit, 1 Paul Houser 1
George Mason University, United States;1 Arba Minch University, Ethiopia;2 Global MapAid, United Kingdom;3
[wli15, klaskey, hpurohit, phouser] @gmu.edu,1 mulunehmekuanent@gmail.com,2 rupertdouglas@gmail.com,3
Abstract Understanding the factors that affect groundwater availa-
In East Africa, many drought events have occurred over the bility is important for estimating the probability of finding
past few decades. Droughts have resulted in severe food cri- water at a location. Previous research has shown that lithol-
ses, especially for countries relying heavily on agriculture. ogy, geological structures, drainage density, soils, lineament
From the perspective of sustainability, utilizing groundwater density, geomorphology, slope and land cover land use are
for crop irrigation could be an avenue toward resilience to the main factors that have an impact on the occurrence and
drought. In this study, we aim to use AI to identify optimal movement of groundwater in an area (Jaiswal et al. 2003,
drilling locations for sustainable irrigation for subsistence ag- Greenbaum 1985, Jha et al. 2010, Andualem and Demeke
riculture. Our initial focus is the Hare watershed in southern 2019). A recent groundwater drought study has shown that
Ethiopia. To identify suitable drilling locations, a hydrogeo- the evapotranspiration rate, precipitation, and soil moisture
logical model (TOPMODEL) for estimation of discharge and are significant factors affecting groundwater drought propa-
depth to water table will be implemented first; machine learn- gation (Han et al. 2019).
ing models will be constructed to estimate the probability of To estimate groundwater potential, we used TOPMODEL
finding groundwater at a particular location; and finally these (a TOPography based hydrological MODEL) proposed by
will be provided as inputs to an optimization model. Since Beven and Kirkby (1979). TOPMODEL simulates hydro-
this study is in progress, preliminary intermediate results are logical processes and has been used in a variety of applica-
presented in this paper. A topographic wetness index (TWI) tions. The topographic wetness index (TWI), one of the
map was developed. TWI captures topographic features re- TOPMODEL outputs, uses elevation data to estimate places
lated to groundwater potential and will be an important input where water tends to accumulate. Moreover, previous stud-
to our drilling location model. ies have shown that TOPMODEL has successfully predicted
streamflow (Ambroise et al. 1996, Ibbitt and Woods 2004,
Keywords Nourani et al. 2011, Andualem and Demeke 2019).
Optimization approaches have been widely applied in op-
AI; Groundwater potential; Topographic wetness index; timal well placement problems. Ma et al. (2018) developed
TOPMODEL; Machine learning; Optimal drilling locations a mixed-integer linear programming model to identify the
optimal layout of wells with minimizing the total irrigation
costs in an oasis area in Northwest China. A nonlinear pro-
gramming model has been constructed by Liu et al. (2019)
Introduction to find the optimal well layouts with minimized pumping
In Ethiopia, small farmers comprise 95% of all farmers, and costs in another oasis area in Northwest China. Yin et al.
about 80% of the population (Douglas-Bate et al. 2019. (2020) focused on developing a nonlinear multi-objective
Thus, the population is heavily reliant on agriculture. A de- model to explore optimal freshwater pumping strategies and
creasing of water supply has affected the yield of crops and optimal pumping locations. The multi-objective setup en-
increased vulnerability to hunger. In 2017, about 20.6% of sures groundwater sustainability. However, these well
Ethiopians suffer from hunger (Ethiopia Hunger Statistics, placement studies do not include uncertainty in the optimi-
n.d). At present in 2020, Ethiopia has faced a large outbreak zation models. Researchers in a previous study modeled the
of desert grasshoppers which results in loss of food and in- optimization problem using an infinite aquifer assumption,
come (ActionAid UK. 2017). To mitigate the impact of food that is, it is assumed there are no constraints on the amount
shortages, drilling wells for irrigation could support sustain- of water that can be pumped out from the wells (Ma et al.
ability for subsistence agriculture. 2018). In fact, this is a very strong assumption which may
AAAI Fall 2020 Symposium on AI for Social Good.
Copyright © 2020 for this paper by its authors. Use permitted under Crea-
tive Commons License Attribution 4.0 International (CC BY 4.0).
not be appropriate for areas that are facing severe water scar- Questions Related to Probability of Finding Shal-
city. The optimization model of this study will enable deci- low Groundwater
sion making under uncertainty, incorporate a sustainable ir-
rigation objective, and will relax the infinite aquifer assump- • What are the factors that affect shallow groundwater
tion. The overall objective of this study is to identify opti- availability?
mal drilling locations for sustainable irrigation. To achieve • What is the estimated probability of drilling out water at
this objective, we first model the hydrogeological processes a specific location?
by implementing TOPMODEL to estimate the discharge
and depth to water table. The outputs of TOPMODEL will Questions Related to Optimal Drilling Locations
be incorporated into machine learning algorithms to esti- • Without depleting the water table, how many wells can be
mate the probability of finding water at a particular location,
drilled to help satisfy the irrigation need?
which will then be used as input in an optimization model
for identifying optimal drilling locations. We will demon- • Where are optimal drilling locations that could yield wa-
strate our model with a prototype in the Hare watershed in ter with an acceptable distance to the crop fields?
Ethiopia. • What are the optimal distances between wells?
Applying machine learning requires data on wells. Data
availability will be a challenging problem. A recent study
has addressed challenges in collecting groundwater data Description of Study Area and Data
(Lall et al. 2020). The researchers compared the number of
well data points that were used in two studies. One study Study Area
examined global water table depths based on 1.4 million
The Hare region is located near the Abaya Lake in southern
well data points in North America and hundreds of wells in
Africa. The other study focused on groundwater age esti- Ethiopia with latitude 6˚1ʹ to 6˚17ʹ N and longitude 37˚27ʹ
to 37˚36ʹ E. The total area is 195.43 𝑘𝑚2 . Elevation of the
mates using 6455 wells around the globe. The comparison
region ranges from 1161 m to 3465 m.
highlights the extreme paucity of wells information in
global, especially in Africa. This finding is consistent with
the extreme limitations in available data on wells in the Data Collection and Preparation
study area in the present paper. In this study, observed daily discharge data for the period
The rest of the paper is organized as follows. Section 2 1987 to 2006 was collected from the Ministry of Water, Ir-
presents the research questions for this study; Section 3 de- rigation and Energy of Ethiopia. Units were converted from
scribes the study area and data; Section 4 discusses the cubic meters to cubic millimeters. To be consistent with the
methodologies used in this study; Section 5 shows prelimi- meteorological data, the plan was to collect data from 1987
nary results and discussion; Section 6 concludes the paper. to 2016. However, discharge data could be obtained only
from 1987 to 2006.
Digital Elevation Model (DEM) data were obtained from
Research Questions Alaska satellite services with a resolution of 25 m by 25 m.
For convenience of data analysis, an elevation matrix was
This study reports on research questions related to ground-
water recharge, groundwater potential, probability of find- created representing a digital elevation model with equally
sized pixels and equal NS and EW resolution.
ing shallow groundwater, and optimal drilling locations.
Meteorological data including precipitation and tempera-
Although this paper focuses on the Hare region, the ques-
tions can be generalized to other agricultural regions with ture was collected to estimate streamflow and groundwater
recharge. Daily precipitation data was retrieved from three
similar characteristics such as dry seasons and irrigation dif-
different meteorological stations including Arba Minch,
ficulties.
Chencha, and Dorze stations for the period 1987 to 2016.
Since precipitation data in 1987 is missing for the Dorze sta-
Questions Related to Groundwater Recharge tion and temperature data from 2006 to 2016 are missing in
• What is the groundwater recharge within the study area? both Dorze and Chencha stations, filling in the missing val-
ues is necessary. All the missing data was downloaded from
Questions Related to Groundwater Potential NASA Power Single Point Data Access
(power.larc.nasa.gov, n.d.) Since we have multiple precipi-
• What is the depth to the water table for different sites in tation and temperature measurements, measurements from
the study area? the three stations were integrated. The Thiessen Polygon ap-
• What factors are considered to generate a groundwater proach (Rhynsburger 1973) was used to determine the aver-
potential map? age precipitation and average temperature in Hare. The basic
concept of this approach is be summarized as follows. First,
we divide the watershed into three polygons (Figure 1)
namely Arba Minch (34.47 𝑘𝑚2 ), Chencha (64.79 𝑘𝑚2 )
and Dorze (96.17 𝑘𝑚2 ). Each contains a measurement point
(Figure 1). The coordinates for the measurement points are elevation data, a delay function derived by DEM and outlet
shown in Table 1. Second, we take a weighted average of data, a set of parameters that need to be calibrated (Table 2),
the measurements based on the size of each polygon. The and hydrometeorological and geological variables including
formula is: precipitation data, potential evapotranspiration, and ob-
∑𝑛𝑖 𝑃𝑖 𝐴𝑖 served discharge.
𝑃̅ =
∑𝑛𝑖 𝐴𝑖
where 𝑃̅ is the weighted average; 𝑃𝑖 is the measurement at
polygon 𝑖; 𝐴𝑖 is the area of polygon 𝑖; 𝑛 is the total number
of measurement points. After performing the above steps,
we have the finalized weighted average precipitation and
temperature data. To be consistent with the other data, the
period 1987 to 2006 was used.
Table 1: Coordinates for each meteorological station near
Hare
Longitude Latitude
Station Name X_UTM (m) Y_UTM (m)
(degree) (degree)
Arba Minch 339823.781 666130.500 37.553 6.025
Chencha 342243.250 691186.313 37.574 6.251
Dorze 341939.290 683857.503 37.571 6.185
Potential evapotranspiration (ETp) is an important input
for estimating groundwater recharge. Global ETp data was
downloaded from the NASA FLDAS site. The ETP data
units and range are the same as the observed discharge and
precipitation: millimeters per day from Jan 01, 1987 to Dec
31, 2006. Figure 1. The divided watershed by Thiessen Polygon ap-
Soil parameters including texture, moisture, porosity, and proach
hydraulic conductivity are related to the groundwater re-
charge potential. Previous soil data are not satisfactory, and
an intensive field campaign over 195 square kilometers will
be required to collect the soil parameters. This is a time-con- Table 2: Parameter set for TOPMODEL (Buytaert, 2011)
suming and costly project.
Parameter Description [Possible unit]
Existing local well information is urgently needed for this
Qs0 Initial subsurface flow per unit area [m]
study. Such information includes whether the well is work-
𝑇0 Transmissivity of the soil profile at full satura-
ing (dry or not), what type the well is (hand dug, borehole
tion [m2 /h]
or deep wells), how much water the well yields, and what is
the depth of the well (depth-to-water table). With support lnTe Log of the areal average of 𝑇0 [m2 /h]
from the Czech Geological Survey, we have obtained loca- m Model parameter controlling the rate of decline
tions of only four existing wells in the Hare watershed in of transmissivity in the soil profile
Ethiopia. Information other than the location these wells is Sr 0 Initial root zone storage deficit [m]
unknown. Since local well data are not available online or Sr max Maximum root zone storage deficit [m]
from any local organizations, field work is required to col- td Unsaturated zone time delay per unit storage
lect the data we need. This is another labor-intensive and deficit [h/m]
costly task. 𝑉𝑐ℎ Channel flow outside the catchment [m/h]
As the project progresses, other geology parameters, such 𝑉𝑟 Channel flow inside catchment [m/h]
as lithology, geological structures, drainage density, linea- 𝐾0 Surface hydraulic conductivity [m/h]
ment density, and land use land cover, will be collected and CD Capillary drive [m]
processed. dt The timestep [h]
Methodology The optimal parameters are chosen by matching as
closely as possible the simulated discharge from
Hydrogeology Model TOPMODEL to observed discharge in the training period.
To do this, input parameters including 𝑚, 𝑇0 , 𝑆𝑟 𝑚𝑎𝑥 are ad-
In this study, TOPMODEL was used to estimate discharge
justed to obtain the best match between model results and
and depth to water table. The inputs to TOPMODEL include
training data. After calibration, model validation is per-
the topographic wetness index computed from the digital
formed on the validation data set to evaluate the goodness
of the calibrated parameters. The calibration metric is the land use, soil texture, and percentage of topsoil moisture, the
Nash-Sutcliffe efficiency criterion. Values close to 1 indi- data will be collected along with the dependent variable by
cate a good fit; a value of 1 indicates a perfect match (Nash launching a field work.
and Sutcliffe 1970). The formula for Nash-Sutcliffe effi-
ciency is: Optimization Approaches
2 To find the optimal drilling locations, a two-stage stochastic
∑𝑁
𝑖=1(𝑄𝑜𝑏𝑠 − 𝑄𝑠𝑖𝑚 )
𝑅2 = 1 − mixed integer programming (SMIP) problem could be for-
∑𝑖=1(𝑄𝑜𝑏𝑠 − 𝑄̅𝑜𝑏𝑠 )
𝑁 2
mulated. The two-stage SMIP approach allows users to
make decisions under uncertainty with two decision varia-
where 𝑄𝑜𝑏𝑠 is the observed discharge; 𝑄𝑠𝑖𝑚 is the simulated bles, one in the first stage and the other in the second stage
discharge; 𝑄̅𝑜𝑏𝑠 is the mean of the observed discharge; and (Küçükyavuz 2017). In this study, we plan to formulate our
𝑁 is the total number of time steps. problem with binary first stage and continuous second stage
The depth to water table is simulated based on the satura- variables. The objective functions for the two stages should
tion deficit, which is simulated using TOPMODEL. To eval- be defined with respect to the two decision variables. Un-
uate the goodness of the simulation result of the depth to certainty only exists in the second stage.
water table, information on the depth of the existing wells The general idea of the two-stage SMIP optimization will
should be collected, which requires field work. be described from the initial formulation including the ob-
jective functions, decision variables for each stage, uncer-
Machine Learning Algorithms tainty and possible constraints, reformulation of the prob-
As mentioned in previous section, to find optimal drilling lem, and how to solve the problem.
locations, we need to make predictions on the probability of For the initial formulation, the first stage objective func-
drilling water out of a well in Hare region. This would be an tion could be minimizing the total construction cost with de-
input parameter for the optimization model. We divide the cision variable 𝑥𝑖 denoting whether there is a well (𝑥𝑖 =
Hare region into small pixels with equal area. A machine 0 𝑜𝑟 1) at location 𝑖. The second stage objective function
learning model, such as logistic regression, could be con- could be minimizing the pumping cost with decision varia-
structed to predict the probability of water availability for ble 𝑥𝑖 denoting the pumping hours. Uncertainty could be the
each pixel in Hare. The binary dependent variable is whether yield of water which has a distribution that should be deter-
the well at the location yield water. The independent varia- mined prior the optimization model. A set of constraints
bles may include precipitation, elevation, ETP, land cover (e.g. restriction on the pumping hours and total amount of
water withdrawn) will be added to fulfill the groundwater
sustainability considerations. Reformulation will be gener-
ated based on the initial formulation to make the problem
tractable. Gurobi, an optimization solver, will be used to
solve this optimization problem.
Preliminary Results and Discussion
Groundwater Potential
The topographic wetness index map of Hare Ethiopia de-
rived from digital elevation data shows the potential for
where water may tend to accumulate (Figure 2). Areas with
higher values of topographic index indicate large contrib-
uting areas and low slopes. Higher topographic indices
(darker green to purple) are mainly found in the southern
part of the watershed, and a little in the central and northern
parts. These regions have greater potential to become satu-
rated with rainfall. Higher TWI values are found in the areas
with surface water, such as streams and wetlands. Lower
TWI values indicate the area has small contributing areas
and high slope. In our study, lower TWI values (yellow) are
found in the central and northern parts of the watershed.
Since lower TWI indicates lower moisture storage in the
soils, there may be little accumulation in many parts of the
Hare watershed. As such, it could be challenging to find
Figure 2. Topographic Wetness Index Map for Hare, Ethiopia shallow good drilling locations for drawing groundwater.
Optimal Drilling Locations tana landscape, upper blue Nile Basin, Ethiopia. Journal of Hydrol-
ogy: Regional Studies, 24, p.100610.
Before finalizing the formulation of the optimization prob-
Beven, K.J. and Kirkby, M.J., 1979. A physically based, variable
lem, we need to estimate the parameters for an initial formu-
contributing area model of basin hydrology. Hydrolog-ical Sci-
lation from the collected data in our study area. As men- ences Journal, 24(1), pp.43-69.
tioned, data collection is the most challenging task in this
Buytaert, W., 2011. topmodel: Implementation of the Hy-drologi-
study. If the collection for some data items requires too
cal Model TOPMODEL in R. Global Change Biology, pp.679-706.
much effort and cost to be practical, the formulation would
be modified to adjust. The research questions related to op- Douglas-Bate, R., Pascual, P., Prakash, H., Kemal, A., and Mo-
hammed, K., 2019. AI scoping mission, Ethiopia, to enhance sus-
timal drilling locations would be answered after completing
tainable irrigation for food supply. Paper presented at Asso-ciation
the data collection, parameter estimation and optimization. for Advancement of Artificial Intelligence.
Ethiopia Hunger Statistics, n.d., Available at: https://www.macro-
trends.net/countries/ETH/ethiopia/hungerstatis-
Conclusion tics#:~:text=Ethiopia%20hunger%20statis-
This study focuses on using AI to identify optimal drilling tics%20for%202017,a%202.2%25%20decline%20from%202013
locations for sustainable irrigation for subsistence farmers Greenbaum, D., 1985. Review of remote sensing applications to
in Hare Ethiopia. We have found that collecting hydrogeo- groundwater exploration in basement and regolith.
logical data has become the main challenge to develop an Han, Z., Huang, S., Huang, Q., Leng, G., Wang, H., Bai, Q., Zhao,
AI model. After data items are collected, we will first con- J., Ma, L., Wang, L. and Du, M., 2019. Propagation dynamics from
struct the TOPMODEL to estimate discharge and depth to meteorological to groundwater drought and their possible influ-
water table, which will be used as inputs in machine learning ence factors. Journal of Hydrology, 578, p.124102.
models for an estimation of the probability of finding water Ibbitt, R. and Woods, R., 2004. Re-scaling the topographic index
at a particular location. With the probabilities as input, an to improve the representation of physical processes in catchment
optimization model for identifying optimal drilling locations models. Journal of Hydrology, 293(1-4), pp.205-218.
for sustainable irrigation for subsistence agriculture will be Jaiswal, R.K., Mukherjee, S., Krishnamurthy, J. and Saxena, R.,
constructed. Our preliminary intermediate result is the topo- 2003. Role of remote sensing and GIS techniques for genera-tion
graphic wetness index map of Hare Ethiopia. The TWI map of groundwater prospect zones towards rural development--an ap-
indicates that southern part of the watershed has greater po- proach. International Journal of Remote Sensing, 24(5), pp.993-
1008.
tential to accumulate water; central and northern parts of the
watershed show lower moisture storage in soils, which make Jha, M.K., Chowdary, V.M. and Chowdhury, A., 2010. Ground-
it challenging to identify shallow groundwater. As the study water assessment in Salboni Block, West Bengal (India) using re-
mote sensing, geographical information system and multi-criteria
moves forward, more results will be provided.
decision analysis techniques. Hydrogeology journal, 18(7),
pp.1713-1728.
Acknowledgements Küçükyavuz, S. and Sen, S., 2017. An introduction to two-stage
stochastic mixed-integer programming. In Leading Developments
This research is performed by the MODL (Modeling Opti- from INFORMS Communities (pp. 1-27). INFORMS.
mal Drilling Locations) team, which comprises researchers Lall, U., Josset, L. and Russo, T., 2020. A Snapshot of the World's
from the George Mason University Center for Resilient and Groundwater Challenges. Annual Review of Environ-ment and Re-
Sustainable Communities (C-RASC), the Arba Minch Uni- sources, 45.
versity Water Technology Institute, and Global MapAid, Ma, T., Wang, J., Liu, Y., Sun, H., Gui, D. and Xue, J., 2019. A
with support from Czech Geological Survey. Wanru Li Mixed Integer Linear Programming Method for Opti-mizing Lay-
gratefully acknowledges support from a C-RASC fellow- out of Irrigated Pumping Well in Oasis. Water, 11(6), p.1185.
ship for her efforts on this project. Nash, J.E. and Sutcliffe, J.V., 1970. River flow forecasting through
conceptual models part I—A discussion of principles. Journal of
hydrology, 10(3), pp.282-290.
Nourani, V., Roughani, A. and Gebremichael, M., 2011.
References TOPMODEL capability for rainfall-runoff modeling of the
Ammameh watershed at different time scales using different ter-
ActionAid UK. 2017. Food crisis in East Africa 2017-2019. Avail- rain algorithms. Journal of Urban and Environmental Engi-neer-
able at: https://www.actionaid.org.uk/about-us/what-we-do/emer- ing, 5(1), pp.1-14.
gencies-disasters-humanitarian-response/east-africa-crisis-facts-
and-figures. power.larc.nasa.gov, n.d. NASA POWER Data Access Viewer.
Available at: https://power.larc.nasa.gov/data-access-viewer/.
Ambroise, B., Beven, K. and Freer, J., 1996. Toward a gen-erali-
zation of the TOPMODEL concepts: Topographic indices of hy- Rhynsburger, D., 1973. Analytic delineation of Thiessen polygons.
drological similarity. Water Resources Research, 32(7), pp.2135- Geographical Analysis, 5(2), pp.133-144.
2145. Yin, J., Pham, H.V. and Tsai, F.T.C., 2020. Multiobjective
Andualem, T.G. and Demeke, G.G., 2019. Groundwater potential Spatial Pumping Optimization for Groundwater Manage-
assessment using GIS and remote sensing: A case study of Guna ment in a Multiaquifer System. Journal of Water Re-
sources Planning and Management, 146(4), p.04020013.