=Paper= {{Paper |id=Vol-2884/paper_126 |storemode=property |title=Using AI to Identify Optimal Drilling Locations for Sustainable Irrigation for Subsistence Agriculture |pdfUrl=https://ceur-ws.org/Vol-2884/paper_126.pdf |volume=Vol-2884 |authors=Wanru Li,Kathryn B. Laskey,Mekuanent Muluneh,Rupert Douglas-Bate,Hemant Purohit,Paul Houser |dblpUrl=https://dblp.org/rec/conf/aaaifs/LiLMDPH20 }} ==Using AI to Identify Optimal Drilling Locations for Sustainable Irrigation for Subsistence Agriculture== https://ceur-ws.org/Vol-2884/paper_126.pdf
   Using AI to Identify Optimal Drilling Locations for Sustainable Irriga-
                      tion for Subsistence Agriculture
             Wanru Li, 1 Kathryn B. Laskey, 1 Mekuanent Muluneh, 2 Rupert Douglas-Bate, 3
                                    Hemant Purohit, 1 Paul Houser 1
              George Mason University, United States;1 Arba Minch University, Ethiopia;2 Global MapAid, United Kingdom;3
              [wli15, klaskey, hpurohit, phouser] @gmu.edu,1 mulunehmekuanent@gmail.com,2 rupertdouglas@gmail.com,3




                              Abstract                                         Understanding the factors that affect groundwater availa-
  In East Africa, many drought events have occurred over the                bility is important for estimating the probability of finding
  past few decades. Droughts have resulted in severe food cri-              water at a location. Previous research has shown that lithol-
  ses, especially for countries relying heavily on agriculture.             ogy, geological structures, drainage density, soils, lineament
  From the perspective of sustainability, utilizing groundwater             density, geomorphology, slope and land cover land use are
  for crop irrigation could be an avenue toward resilience to               the main factors that have an impact on the occurrence and
  drought. In this study, we aim to use AI to identify optimal              movement of groundwater in an area (Jaiswal et al. 2003,
  drilling locations for sustainable irrigation for subsistence ag-         Greenbaum 1985, Jha et al. 2010, Andualem and Demeke
  riculture. Our initial focus is the Hare watershed in southern            2019). A recent groundwater drought study has shown that
  Ethiopia. To identify suitable drilling locations, a hydrogeo-            the evapotranspiration rate, precipitation, and soil moisture
  logical model (TOPMODEL) for estimation of discharge and                  are significant factors affecting groundwater drought propa-
  depth to water table will be implemented first; machine learn-            gation (Han et al. 2019).
  ing models will be constructed to estimate the probability of                To estimate groundwater potential, we used TOPMODEL
  finding groundwater at a particular location; and finally these           (a TOPography based hydrological MODEL) proposed by
  will be provided as inputs to an optimization model. Since                Beven and Kirkby (1979). TOPMODEL simulates hydro-
  this study is in progress, preliminary intermediate results are           logical processes and has been used in a variety of applica-
  presented in this paper. A topographic wetness index (TWI)                tions. The topographic wetness index (TWI), one of the
  map was developed. TWI captures topographic features re-                  TOPMODEL outputs, uses elevation data to estimate places
  lated to groundwater potential and will be an important input             where water tends to accumulate. Moreover, previous stud-
  to our drilling location model.                                           ies have shown that TOPMODEL has successfully predicted
                                                                            streamflow (Ambroise et al. 1996, Ibbitt and Woods 2004,
                     Keywords                                               Nourani et al. 2011, Andualem and Demeke 2019).
                                                                               Optimization approaches have been widely applied in op-
AI; Groundwater potential; Topographic wetness index;                       timal well placement problems. Ma et al. (2018) developed
TOPMODEL; Machine learning; Optimal drilling locations                      a mixed-integer linear programming model to identify the
                                                                            optimal layout of wells with minimizing the total irrigation
                                                                            costs in an oasis area in Northwest China. A nonlinear pro-
                                                                            gramming model has been constructed by Liu et al. (2019)
                          Introduction                                      to find the optimal well layouts with minimized pumping
In Ethiopia, small farmers comprise 95% of all farmers, and                 costs in another oasis area in Northwest China. Yin et al.
about 80% of the population (Douglas-Bate et al. 2019.                      (2020) focused on developing a nonlinear multi-objective
Thus, the population is heavily reliant on agriculture. A de-               model to explore optimal freshwater pumping strategies and
creasing of water supply has affected the yield of crops and                optimal pumping locations. The multi-objective setup en-
increased vulnerability to hunger. In 2017, about 20.6% of                  sures groundwater sustainability. However, these well
Ethiopians suffer from hunger (Ethiopia Hunger Statistics,                  placement studies do not include uncertainty in the optimi-
n.d). At present in 2020, Ethiopia has faced a large outbreak               zation models. Researchers in a previous study modeled the
of desert grasshoppers which results in loss of food and in-                optimization problem using an infinite aquifer assumption,
come (ActionAid UK. 2017). To mitigate the impact of food                   that is, it is assumed there are no constraints on the amount
shortages, drilling wells for irrigation could support sustain-             of water that can be pumped out from the wells (Ma et al.
ability for subsistence agriculture.                                        2018). In fact, this is a very strong assumption which may


AAAI Fall 2020 Symposium on AI for Social Good.
Copyright © 2020 for this paper by its authors. Use permitted under Crea-
tive Commons License Attribution 4.0 International (CC BY 4.0).
not be appropriate for areas that are facing severe water scar-   Questions Related to Probability of Finding Shal-
city. The optimization model of this study will enable deci-      low Groundwater
sion making under uncertainty, incorporate a sustainable ir-
rigation objective, and will relax the infinite aquifer assump-   • What are the factors that affect shallow groundwater
tion. The overall objective of this study is to identify opti-       availability?
mal drilling locations for sustainable irrigation. To achieve     • What is the estimated probability of drilling out water at
this objective, we first model the hydrogeological processes         a specific location?
by implementing TOPMODEL to estimate the discharge
and depth to water table. The outputs of TOPMODEL will            Questions Related to Optimal Drilling Locations
be incorporated into machine learning algorithms to esti-         • Without depleting the water table, how many wells can be
mate the probability of finding water at a particular location,
                                                                     drilled to help satisfy the irrigation need?
which will then be used as input in an optimization model
for identifying optimal drilling locations. We will demon-        • Where are optimal drilling locations that could yield wa-
strate our model with a prototype in the Hare watershed in           ter with an acceptable distance to the crop fields?
Ethiopia.                                                         • What are the optimal distances between wells?
   Applying machine learning requires data on wells. Data
availability will be a challenging problem. A recent study
has addressed challenges in collecting groundwater data                  Description of Study Area and Data
(Lall et al. 2020). The researchers compared the number of
well data points that were used in two studies. One study         Study Area
examined global water table depths based on 1.4 million
                                                                  The Hare region is located near the Abaya Lake in southern
well data points in North America and hundreds of wells in
Africa. The other study focused on groundwater age esti-          Ethiopia with latitude 6˚1ʹ to 6˚17ʹ N and longitude 37˚27ʹ
                                                                  to 37˚36ʹ E. The total area is 195.43 𝑘𝑚2 . Elevation of the
mates using 6455 wells around the globe. The comparison
                                                                  region ranges from 1161 m to 3465 m.
highlights the extreme paucity of wells information in
global, especially in Africa. This finding is consistent with
the extreme limitations in available data on wells in the         Data Collection and Preparation
study area in the present paper.                                  In this study, observed daily discharge data for the period
   The rest of the paper is organized as follows. Section 2       1987 to 2006 was collected from the Ministry of Water, Ir-
presents the research questions for this study; Section 3 de-     rigation and Energy of Ethiopia. Units were converted from
scribes the study area and data; Section 4 discusses the          cubic meters to cubic millimeters. To be consistent with the
methodologies used in this study; Section 5 shows prelimi-        meteorological data, the plan was to collect data from 1987
nary results and discussion; Section 6 concludes the paper.       to 2016. However, discharge data could be obtained only
                                                                  from 1987 to 2006.
                                                                     Digital Elevation Model (DEM) data were obtained from
                  Research Questions                              Alaska satellite services with a resolution of 25 m by 25 m.
                                                                  For convenience of data analysis, an elevation matrix was
This study reports on research questions related to ground-
water recharge, groundwater potential, probability of find-       created representing a digital elevation model with equally
                                                                  sized pixels and equal NS and EW resolution.
ing shallow groundwater, and optimal drilling locations.
                                                                     Meteorological data including precipitation and tempera-
Although this paper focuses on the Hare region, the ques-
tions can be generalized to other agricultural regions with       ture was collected to estimate streamflow and groundwater
                                                                  recharge. Daily precipitation data was retrieved from three
similar characteristics such as dry seasons and irrigation dif-
                                                                  different meteorological stations including Arba Minch,
ficulties.
                                                                  Chencha, and Dorze stations for the period 1987 to 2016.
                                                                  Since precipitation data in 1987 is missing for the Dorze sta-
Questions Related to Groundwater Recharge                         tion and temperature data from 2006 to 2016 are missing in
• What is the groundwater recharge within the study area?         both Dorze and Chencha stations, filling in the missing val-
                                                                  ues is necessary. All the missing data was downloaded from
Questions Related to Groundwater Potential                        NASA         Power      Single     Point     Data     Access
                                                                  (power.larc.nasa.gov, n.d.) Since we have multiple precipi-
• What is the depth to the water table for different sites in     tation and temperature measurements, measurements from
   the study area?                                                the three stations were integrated. The Thiessen Polygon ap-
• What factors are considered to generate a groundwater           proach (Rhynsburger 1973) was used to determine the aver-
   potential map?                                                 age precipitation and average temperature in Hare. The basic
                                                                  concept of this approach is be summarized as follows. First,
                                                                  we divide the watershed into three polygons (Figure 1)
                                                                  namely Arba Minch (34.47 𝑘𝑚2 ), Chencha (64.79 𝑘𝑚2 )
                                                                  and Dorze (96.17 𝑘𝑚2 ). Each contains a measurement point
(Figure 1). The coordinates for the measurement points are        elevation data, a delay function derived by DEM and outlet
shown in Table 1. Second, we take a weighted average of           data, a set of parameters that need to be calibrated (Table 2),
the measurements based on the size of each polygon. The           and hydrometeorological and geological variables including
formula is:                                                       precipitation data, potential evapotranspiration, and ob-
                                ∑𝑛𝑖 𝑃𝑖 𝐴𝑖                         served discharge.
                         𝑃̅ =
                                 ∑𝑛𝑖 𝐴𝑖
where 𝑃̅ is the weighted average; 𝑃𝑖 is the measurement at
polygon 𝑖; 𝐴𝑖 is the area of polygon 𝑖; 𝑛 is the total number
of measurement points. After performing the above steps,
we have the finalized weighted average precipitation and
temperature data. To be consistent with the other data, the
period 1987 to 2006 was used.

 Table 1: Coordinates for each meteorological station near
                           Hare
                                     Longitude Latitude
Station Name X_UTM (m) Y_UTM (m)
                                      (degree) (degree)
Arba Minch     339823.781 666130.500   37.553    6.025
Chencha        342243.250 691186.313   37.574    6.251
Dorze          341939.290 683857.503   37.571    6.185

   Potential evapotranspiration (ETp) is an important input
for estimating groundwater recharge. Global ETp data was
downloaded from the NASA FLDAS site. The ETP data
units and range are the same as the observed discharge and
precipitation: millimeters per day from Jan 01, 1987 to Dec
31, 2006.                                                            Figure 1. The divided watershed by Thiessen Polygon ap-
   Soil parameters including texture, moisture, porosity, and                                proach
hydraulic conductivity are related to the groundwater re-
charge potential. Previous soil data are not satisfactory, and
an intensive field campaign over 195 square kilometers will
be required to collect the soil parameters. This is a time-con-    Table 2: Parameter set for TOPMODEL (Buytaert, 2011)
suming and costly project.
                                                                   Parameter      Description [Possible unit]
   Existing local well information is urgently needed for this
                                                                   Qs0            Initial subsurface flow per unit area [m]
study. Such information includes whether the well is work-
                                                                   𝑇0             Transmissivity of the soil profile at full satura-
ing (dry or not), what type the well is (hand dug, borehole
                                                                                  tion [m2 /h]
or deep wells), how much water the well yields, and what is
the depth of the well (depth-to-water table). With support         lnTe           Log of the areal average of 𝑇0 [m2 /h]
from the Czech Geological Survey, we have obtained loca-           m              Model parameter controlling the rate of decline
tions of only four existing wells in the Hare watershed in                        of transmissivity in the soil profile
Ethiopia. Information other than the location these wells is       Sr 0           Initial root zone storage deficit [m]
unknown. Since local well data are not available online or         Sr max         Maximum root zone storage deficit [m]
from any local organizations, field work is required to col-       td             Unsaturated zone time delay per unit storage
lect the data we need. This is another labor-intensive and                        deficit [h/m]
costly task.                                                       𝑉𝑐ℎ            Channel flow outside the catchment [m/h]
   As the project progresses, other geology parameters, such       𝑉𝑟             Channel flow inside catchment [m/h]
as lithology, geological structures, drainage density, linea-      𝐾0             Surface hydraulic conductivity [m/h]
ment density, and land use land cover, will be collected and       CD             Capillary drive [m]
processed.                                                         dt             The timestep [h]


                      Methodology                                    The optimal parameters are chosen by matching as
                                                                  closely as possible the simulated discharge from
Hydrogeology Model                                                TOPMODEL to observed discharge in the training period.
                                                                  To do this, input parameters including 𝑚, 𝑇0 , 𝑆𝑟 𝑚𝑎𝑥 are ad-
In this study, TOPMODEL was used to estimate discharge
                                                                  justed to obtain the best match between model results and
and depth to water table. The inputs to TOPMODEL include
                                                                  training data. After calibration, model validation is per-
the topographic wetness index computed from the digital
                                                                  formed on the validation data set to evaluate the goodness
of the calibrated parameters. The calibration metric is the     land use, soil texture, and percentage of topsoil moisture, the
Nash-Sutcliffe efficiency criterion. Values close to 1 indi-    data will be collected along with the dependent variable by
cate a good fit; a value of 1 indicates a perfect match (Nash   launching a field work.
and Sutcliffe 1970). The formula for Nash-Sutcliffe effi-
ciency is:                                                      Optimization Approaches
                                               2                To find the optimal drilling locations, a two-stage stochastic
                           ∑𝑁
                            𝑖=1(𝑄𝑜𝑏𝑠 − 𝑄𝑠𝑖𝑚 )
                𝑅2 = 1 −                                        mixed integer programming (SMIP) problem could be for-
                           ∑𝑖=1(𝑄𝑜𝑏𝑠 − 𝑄̅𝑜𝑏𝑠 )
                            𝑁                 2
                                                                mulated. The two-stage SMIP approach allows users to
                                                                make decisions under uncertainty with two decision varia-
where 𝑄𝑜𝑏𝑠 is the observed discharge; 𝑄𝑠𝑖𝑚 is the simulated     bles, one in the first stage and the other in the second stage
discharge; 𝑄̅𝑜𝑏𝑠 is the mean of the observed discharge; and     (Küçükyavuz 2017). In this study, we plan to formulate our
𝑁 is the total number of time steps.                            problem with binary first stage and continuous second stage
   The depth to water table is simulated based on the satura-   variables. The objective functions for the two stages should
tion deficit, which is simulated using TOPMODEL. To eval-       be defined with respect to the two decision variables. Un-
uate the goodness of the simulation result of the depth to      certainty only exists in the second stage.
water table, information on the depth of the existing wells        The general idea of the two-stage SMIP optimization will
should be collected, which requires field work.                 be described from the initial formulation including the ob-
                                                                jective functions, decision variables for each stage, uncer-
Machine Learning Algorithms                                     tainty and possible constraints, reformulation of the prob-
As mentioned in previous section, to find optimal drilling      lem, and how to solve the problem.
locations, we need to make predictions on the probability of       For the initial formulation, the first stage objective func-
drilling water out of a well in Hare region. This would be an   tion could be minimizing the total construction cost with de-
input parameter for the optimization model. We divide the       cision variable 𝑥𝑖 denoting whether there is a well (𝑥𝑖 =
Hare region into small pixels with equal area. A machine        0 𝑜𝑟 1) at location 𝑖. The second stage objective function
learning model, such as logistic regression, could be con-      could be minimizing the pumping cost with decision varia-
structed to predict the probability of water availability for   ble 𝑥𝑖 denoting the pumping hours. Uncertainty could be the
each pixel in Hare. The binary dependent variable is whether    yield of water which has a distribution that should be deter-
the well at the location yield water. The independent varia-    mined prior the optimization model. A set of constraints
bles may include precipitation, elevation, ETP, land cover      (e.g. restriction on the pumping hours and total amount of
                                                                water withdrawn) will be added to fulfill the groundwater
                                                                sustainability considerations. Reformulation will be gener-
                                                                ated based on the initial formulation to make the problem
                                                                tractable. Gurobi, an optimization solver, will be used to
                                                                solve this optimization problem.


                                                                        Preliminary Results and Discussion

                                                                Groundwater Potential
                                                                The topographic wetness index map of Hare Ethiopia de-
                                                                rived from digital elevation data shows the potential for
                                                                where water may tend to accumulate (Figure 2). Areas with
                                                                higher values of topographic index indicate large contrib-
                                                                uting areas and low slopes. Higher topographic indices
                                                                (darker green to purple) are mainly found in the southern
                                                                part of the watershed, and a little in the central and northern
                                                                parts. These regions have greater potential to become satu-
                                                                rated with rainfall. Higher TWI values are found in the areas
                                                                with surface water, such as streams and wetlands. Lower
                                                                TWI values indicate the area has small contributing areas
                                                                and high slope. In our study, lower TWI values (yellow) are
                                                                found in the central and northern parts of the watershed.
                                                                Since lower TWI indicates lower moisture storage in the
                                                                soils, there may be little accumulation in many parts of the
                                                                Hare watershed. As such, it could be challenging to find
  Figure 2. Topographic Wetness Index Map for Hare, Ethiopia    shallow good drilling locations for drawing groundwater.
Optimal Drilling Locations                                          tana landscape, upper blue Nile Basin, Ethiopia. Journal of Hydrol-
                                                                    ogy: Regional Studies, 24, p.100610.
Before finalizing the formulation of the optimization prob-
                                                                    Beven, K.J. and Kirkby, M.J., 1979. A physically based, variable
lem, we need to estimate the parameters for an initial formu-
                                                                    contributing area model of basin hydrology. Hydrolog-ical Sci-
lation from the collected data in our study area. As men-           ences Journal, 24(1), pp.43-69.
tioned, data collection is the most challenging task in this
                                                                    Buytaert, W., 2011. topmodel: Implementation of the Hy-drologi-
study. If the collection for some data items requires too
                                                                    cal Model TOPMODEL in R. Global Change Biology, pp.679-706.
much effort and cost to be practical, the formulation would
be modified to adjust. The research questions related to op-        Douglas-Bate, R., Pascual, P., Prakash, H., Kemal, A., and Mo-
                                                                    hammed, K., 2019. AI scoping mission, Ethiopia, to enhance sus-
timal drilling locations would be answered after completing
                                                                    tainable irrigation for food supply. Paper presented at Asso-ciation
the data collection, parameter estimation and optimization.         for Advancement of Artificial Intelligence.
                                                                    Ethiopia Hunger Statistics, n.d., Available at: https://www.macro-
                                                                    trends.net/countries/ETH/ethiopia/hungerstatis-
                        Conclusion                                  tics#:~:text=Ethiopia%20hunger%20statis-
This study focuses on using AI to identify optimal drilling         tics%20for%202017,a%202.2%25%20decline%20from%202013
locations for sustainable irrigation for subsistence farmers        Greenbaum, D., 1985. Review of remote sensing applications to
in Hare Ethiopia. We have found that collecting hydrogeo-           groundwater exploration in basement and regolith.
logical data has become the main challenge to develop an            Han, Z., Huang, S., Huang, Q., Leng, G., Wang, H., Bai, Q., Zhao,
AI model. After data items are collected, we will first con-        J., Ma, L., Wang, L. and Du, M., 2019. Propagation dynamics from
struct the TOPMODEL to estimate discharge and depth to              meteorological to groundwater drought and their possible influ-
water table, which will be used as inputs in machine learning       ence factors. Journal of Hydrology, 578, p.124102.
models for an estimation of the probability of finding water        Ibbitt, R. and Woods, R., 2004. Re-scaling the topographic index
at a particular location. With the probabilities as input, an       to improve the representation of physical processes in catchment
optimization model for identifying optimal drilling locations       models. Journal of Hydrology, 293(1-4), pp.205-218.
for sustainable irrigation for subsistence agriculture will be      Jaiswal, R.K., Mukherjee, S., Krishnamurthy, J. and Saxena, R.,
constructed. Our preliminary intermediate result is the topo-       2003. Role of remote sensing and GIS techniques for genera-tion
graphic wetness index map of Hare Ethiopia. The TWI map             of groundwater prospect zones towards rural development--an ap-
indicates that southern part of the watershed has greater po-       proach. International Journal of Remote Sensing, 24(5), pp.993-
                                                                    1008.
tential to accumulate water; central and northern parts of the
watershed show lower moisture storage in soils, which make          Jha, M.K., Chowdary, V.M. and Chowdhury, A., 2010. Ground-
it challenging to identify shallow groundwater. As the study        water assessment in Salboni Block, West Bengal (India) using re-
                                                                    mote sensing, geographical information system and multi-criteria
moves forward, more results will be provided.
                                                                    decision analysis techniques. Hydrogeology journal, 18(7),
                                                                    pp.1713-1728.
                  Acknowledgements                                  Küçükyavuz, S. and Sen, S., 2017. An introduction to two-stage
                                                                    stochastic mixed-integer programming. In Leading Developments
This research is performed by the MODL (Modeling Opti-              from INFORMS Communities (pp. 1-27). INFORMS.
mal Drilling Locations) team, which comprises researchers           Lall, U., Josset, L. and Russo, T., 2020. A Snapshot of the World's
from the George Mason University Center for Resilient and           Groundwater Challenges. Annual Review of Environ-ment and Re-
Sustainable Communities (C-RASC), the Arba Minch Uni-               sources, 45.
versity Water Technology Institute, and Global MapAid,              Ma, T., Wang, J., Liu, Y., Sun, H., Gui, D. and Xue, J., 2019. A
with support from Czech Geological Survey. Wanru Li                 Mixed Integer Linear Programming Method for Opti-mizing Lay-
gratefully acknowledges support from a C-RASC fellow-               out of Irrigated Pumping Well in Oasis. Water, 11(6), p.1185.
ship for her efforts on this project.                               Nash, J.E. and Sutcliffe, J.V., 1970. River flow forecasting through
                                                                    conceptual models part I—A discussion of principles. Journal of
                                                                    hydrology, 10(3), pp.282-290.
                                                                    Nourani, V., Roughani, A. and Gebremichael, M., 2011.
                        References                                  TOPMODEL capability for rainfall-runoff modeling of the
                                                                    Ammameh watershed at different time scales using different ter-
ActionAid UK. 2017. Food crisis in East Africa 2017-2019. Avail-    rain algorithms. Journal of Urban and Environmental Engi-neer-
able at: https://www.actionaid.org.uk/about-us/what-we-do/emer-     ing, 5(1), pp.1-14.
gencies-disasters-humanitarian-response/east-africa-crisis-facts-
and-figures.                                                        power.larc.nasa.gov, n.d. NASA POWER Data Access Viewer.
                                                                    Available at: https://power.larc.nasa.gov/data-access-viewer/.
Ambroise, B., Beven, K. and Freer, J., 1996. Toward a gen-erali-
zation of the TOPMODEL concepts: Topographic indices of hy-         Rhynsburger, D., 1973. Analytic delineation of Thiessen polygons.
drological similarity. Water Resources Research, 32(7), pp.2135-    Geographical Analysis, 5(2), pp.133-144.
2145.                                                               Yin, J., Pham, H.V. and Tsai, F.T.C., 2020. Multiobjective
Andualem, T.G. and Demeke, G.G., 2019. Groundwater potential        Spatial Pumping Optimization for Groundwater Manage-
assessment using GIS and remote sensing: A case study of Guna       ment in a Multiaquifer System. Journal of Water Re-
                                                                    sources Planning and Management, 146(4), p.04020013.