Capturing Scientific Knowledge for Water Resources Sustainability in the Rio Grande Area Natalia Villanueva-Rosales Luis Garnica Chavira1 Smriti Rajkarnikar Tamrakar1 Cyber-ShARE Center of Excellence, Center for Environmental Resources & Cyber-ShARE Center of Excellence, Department of Computer Science Management Department of Computer Science University of Texas at El Paso, US University of Texas at El Paso, US University of Texas at El Paso, US nvillanuevarosales@utep.edu luis@gitgudconsulting.com smritirtamrakar@gmail.com Deana Pennington Raul Alejandro Vargas-Acosta Frank Ward Cyber-ShARE Center of Excellence, Cyber-ShARE Center of Excellence, Department of Agricultural Economics Geology Department Department of Computer Science and Agricultural Business University of Texas at El Paso, US University of Texas at El Paso, US New Mexico State University, US ddpennington@utep.edu ravargasaco@miners.utep.edu fward@nmsu.edu Alex S. Mayer Department of Civil and Environmental Engineering Michigan Technological University, US asmayer@mtu.edu ABSTRACT KEYWORDS This paper presents our experience in capturing scientific Knowledge representation, provenance, workflow visualization, knowledge for enabling the creation of user-defined modeling interdisciplinary research. scenarios that combine availability and use of water resources with potential climate in the middle Rio Grande region. The knowledge 1 INTRODUCTION representation models in this project were created and validated by The Middle Rio Grande watershed is comprised of parts of southern an international, interdisciplinary team of scientists and engineers. New Mexico and far west Texas in the U.S. and northern These models enable the automated generation of water Chihuahua in Mexico. Figure 1 contains a map illustrating the optimization models and visualization of output data and study area of this project modified from [24] using Google My provenance traces that support the reuse of scientific knowledge. Maps. Over the past 100 years, the Middle Rio Grande has been the Our efforts include an educational and outreach component to primary source of water in this desert region, providing water for enable students and a wide variety of stakeholders (e.g., farmers, substantive irrigated agriculture and to three municipalities with a city planners, and general public) to access and run water models. combined population of over 2 million people. The surface water Our approach, the Integrated Water Sustainability Modeling in the region is highly managed in accordance with national Framework, uses ontologies and light-weight standards such as treaties, state compacts, and water rights that date back well over a JSON-LD to enable the exchange of data across the different century [25]. However, due to recent periods of severe drought and components of the system and third-party tools, including modeling growing demand, the river alone no longer meets regional water and visualization tools. Future work includes the ability to needs, leading to increased groundwater use and dropping water automatically integrate further models (i.e., model integration). tables [22]. Sustainable water management in this region faces a CCS CONCEPTS number of drivers of change, including: 1) climate change that is impacting both water supply and demand [11]; 2) agricultural • Computer methodologies → Artificial intelligence → practices and trends, including high water demand crops and Knowledge representation and reasoning greater reliance on groundwater for irrigation [22]; 3) urban growth [16]; and 4) growing demand for environmental services such as riverside habitat and environmental flows [9]. A core question is K-CAP2017 Workshops and Tutorials Proceedings, how can water be managed so that the three competing sectors— © Copyright held by the owner/author(s) 1 Affiliated with the University of Texas at El Paso when producing this work. N Villanueva-Rosales et al. agricultural, urban, and environmental—can realize a project is further described in sections 2 and 3. One example of sustainable future in this challenged water system? reusing provenance trace is the visualization of provenance through Investigating potential ways to achieve long term water a third-party visualization suite with minimal effort. We envision sustainability requires the use of simulation models that integrate that other tools that can ingest data in standard Web-based the biophysical workings of the natural system with human choices languages such as JSON-LD [3] and the Web Ontology Language that impact the system. Such modeling approaches enable the - OWL [19] will further demonstrate the ability to share and reuse computational testing of alternative climate, population, and water scientific knowledge and resources using knowledge representation use scenarios that can improve understanding of the coupled languages. human-natural system and facilitate discussion among researchers, water managers, and other stakeholders [28]. A wide range of water models exist – typically focusing on one aspect of the system (e.g. groundwater, surface water, or water economics). Exploring potential solutions to water sustainability requires integration across these aspects, addressed by researchers from different disciplines using different modeling approaches [1]. Yet the resulting infrastructure must be lightweight, usable, and useful for people with a wide range of technical skills – including stakeholders who may have limited modeling and technical experience [13]. This paper discusses the efforts of a large, interdisciplinary group to create a water modeling framework to address this problem. Our solution, the Integrated Water Sustainability Modeling Framework or IWASM for short, combines hydrologic biophysical models [15] with an economic optimization model [26] into a “bucket model” implemented in the General Algebraic Modeling System (GAMS) [8]. Bucket model is a longstanding Figure 1: Map of the study area extending from Elephant Butte phrase used by hydrologic modelers for models that consider water Reservoir in Southern New Mexico through the El Paso/Ciudad Juarez region in Texas and Chihuahua, Mexico to the entrance storage as a set of buckets that have inflows (increasing storage) of the Rio Conchos from Mexico modified from [24]. and outflows (decreasing storage). The IWASM bucket model simulates major water sources, uses and losses and water supply constraints to improve our understanding of hydrology, agronomy, 2 IDENTIFIYING DATA AND KNOWLEDGE institutions, and economics that guide analysis of policy and FOR WATER SUSTAINABILITY management and answer questions important to stakeholders. A MODELING IN THE RIO GRANDE AREA key challenge in this collaborative project was developing a shared Due to the interdisciplinary nature of this project, the modeling understanding of team members’ expertise and how their research team was exposed in the early stages to artifacts such as concept could contribute to a more comprehensive whole. Integration of maps that allowed them to represent and negotiate the minimal deep knowledge has been identified as one of seven key challenges information needed to communicate with members from confronting interdisciplinary teams [4]. One approach to disciplines including Computer Science, Civil Engineering, overcoming this challenge is to facilitate structured team Hydrology and Agriculture. Concept maps, diagrams, and Excel interactions that expose team members to vocabulary, concepts, files were generated to create a shared understanding of the bucket and methods with which they may be unfamiliar [20]. The team model, its inputs, output, and parameters as well as the semantics must evolve their understanding of the problem from initially ill- of these data. Through several workshops and meetings, the structured, vague, and incomplete to well-structured, explicitly modeling and the development team identified the importance of represented, and integrated across disciplines. keeping track of data sources, user-defined parameters, and Our approach uses knowledge representation languages and workflow steps every time an instance of the model was generated. tools to automate the exchange of data between IWASM modules The need of tracking provenance information was also identified and third-party tools. IWASM Web-based interfaces support the by potential end-users of IWASM through a survey [21]. This use of the bucket model by stakeholders. A provenance trace survey was taken by 36 scientists and students working on water describes the people, institutions, entities, and activities involved in resources modeling in the El Paso – Juarez border area during the producing, influencing, or delivering some of data or thing [18]. Regional Water Symposium in January 2017 at the University of Capturing provenance for the execution model, including Texas at El Paso. Respondents came from a diverse pool of information about the model, input parameters, and output disciplines, including: Water Sustainability, Hydrology, Geology, variables aims to support the understanding and reusability of the Environmental Science, Economics, and Computer Science. After bucket model. The representation of data and provenance in this a short demo of IWASM, the respondents answered a list of 2 Capturing Scientific Knowledge for Water Resources Sustainability in the Rio Grande Area questions using a five-point from “strongly disagree” to “strongly Figure 2: Excerpt of IWAMS output composed by a variable, agree” and open-ended questions. Survey results showed that most corresponding value and annotations. of the respondents considered it important to know the source of the data (88% of respondents responded agree or strongly agree). Figure 2 provides an example of the output variable farm Moreover, 88% of the respondents indicated that knowing the income represented as an array of JSON objects. The object context source of the parameters used in the model would instill trust in the enables the semantic annotation of fields with linked-data model and 81% of respondents indicated that data and model vocabulary, e.g., the SIO Ontology [7]. provenance increased their trust to use or reproduce a water model generated from IWASM. Similarly, 88% of the respondents 4 AUTOMATING THE DATA INTEGRATION considered important to know how the data was manipulated to AND EXCHANGE OF DATA IN THE generate a water model. In addition, 85% of respondents considered WATER SUSTAINABILITY MODEL that it would be easier for them to replicate a water model if the Figure 3 shows an excerpt of a JSON-LD file containing the provenance of data and workflow is provided to them along with provenance trace of a sample user-scenario execution on IWASM. the model outputs. A slightly smaller percentage of respondents The terms used to annotate the JSON-LD are mapped to the PROV- (69%) indicated that they were willing to spend additional time O ontology [14] and schema.org vocabulary [10]. This figure annotating data sources and workflows so that other people could illustrates how the JSON-LD describes that the model-outputs were reuse them. In general, respondents indicated that a provenance generated by the previous task in the user-scenario execution called trace is important for them. This survey, along with input of the review-and-run and it was derived from a list of variables. Note research team influenced the design decisions for modelling that terms wasGeneratedBy and wasAttributedTo are mapped to metadata, including provenance, in IWASM. PROV-O by using the JSON-LD context containing the namespace prov, and terms hasName and hasURL from schema.org to extend 3 CAPTURING DATA AND KNOWLEDGE the description of the modeling agent. FOR WATER SUSTAINABILITY The bucket model requires a variety of data inputs that originate {"@id": "Step5: model-outputs", from multiple decoupled sources and heterogeneous formats, e.g., "@type": "prov:Entity", spreadsheets, database records or full text documents. To integrate "wasGeneratedBy": "review-and-run", these data and formats, JSON-LD was chosen due to its lightweight "wasAttributedTo": "Modeling Agent", characteristic of serializing Linked Data. Most of the data retrieved "wasDerivedFrom": "List of Variables", to execute the bucket model in IWASM is transformed semi- "Modeling Agent": [{ automatically by using third-party transformation, e.g., CSV-to- "@id": "prov:SoftwareAgent", JSON [5]. Data is manually curated and annotated with vocabulary "@type": "@id", describing modeling or provenance concepts e.g., agriculture, thus "hasName": "The General Algebraic Modeling System IWASM extends JSON-LD standards. (GAMS)", "hasURL": "https://www.gams.com/" }], { "modelOutputs" : [{ "@context": { "varLabel" : "Discounted Net Regional Farm Income", "prov" : "http://www.w3.org/ns/prov#", "varCategory" : "Summary", "sch" : "http://schema.org/", "varName" : "T_ag_ben_v", "wasGeneratedBy" : "prov:wasGeneratedBy", "varValue" : [{ "wasAttributedTo" : "prov:wasAttributedTo", "p" : "1-policy_hist", "wasDerivedFrom" : "prov:wasDerivedFrom", "w" : "1-w_supl_base", "hasName" : "sch:name", "value" : 1884324.28 }], "hasURL" : "sch:url" "varDescription" : "Discounted net present value of regional }} farm income", "varUnit" : "1000 USD" }], Figure 3: Excerpt of JSON-LD file containing provenance data "@context": { of a user-scenario execution in IWASM. "modelOutputs": "http://purl.org/wf4ever/wfdesc#Output", "rdfs" : "http://www.w3.org/2000/01/rdf-schema/", 5 CAPTURING PROVENANCE IN IWASM "sio" : "http://semanticscience.org/resource/" The bucket model requires a large number of data sources, fixed "varLabel": { "@id": "rdfs:label", "@type": "xsd:string"}, parameters, and customizable parameters. In this project, we used "varCategory": { "@id": "sio:SIO_000137", a design pattern for workflow execution described in the wprov "@type": "xsd:string" namespace which has also been used by the research team in the }}} context of biodiversity modeling [21]. A design pattern in the context of this project is a generic, yet customizable, solution that 3 N Villanueva-Rosales et al. Urban Water Use Farm Income water-model Water Stocks prov:hadMember prov:hadMember prov:used prov:hadMember prov:wasInformedBy wprov:user- workflow list-of-variables wprov:human- prov:wasInformedBy intervention prov:wasInformedBy wprov:hadNextStep prov:wasInformedBy prov:wasDerivedFrom wprov:climate- wprov:review-and- prov:wasGeneratedBy model-outputs selection run wprov:hadNextStep wprov:customize- parameters wprov:hadNextStep prov:wasAssociatedWith prov:wasAttributedTo wprov:hadParameter Water Price Elasticity prov:hadMember wprov:Modeling list-of-parameters Agent of Demand prov:hadMember Urban Average Cost Namespaces prov: https://www.w3.org/ns/prov-o wprov: http://ontology.cybershare.utep.edu/wprov Figure 4: Graphical representation of a user-scenario workflow execution provenance trace in the Integrated Water Modeling Platform. Provenance concepts and their relations are aligned to PROV-O concepts. provides a template to represent generic elements and their output variables and their values through the property relationships. The provenance captured in IWASM is mapped to prov:hadMember. PROV-O and other widely-used controlled vocabularies including The automated generation of provenance in IWASM uses the Workflow Description (wfdesc) [23] and Dublin Core metadata from the bucket model and the workflow provenance Metadata Initiative (dcterms) [27]. The provenance trace captured pattern currently stored in an instance of the MongoDB [17] in IWASM captures the main components of the user-scenario database. The wprov workflow provenance pattern, also execution including: workflow information, user-scenario represented in JSON, is used to automatically generate the execution steps, inputs, parameter collection, and output (variable) provenance trace of a user-scenario execution. The user-scenario results. execution provenance is merged with additional model metadata Figure 4 shows a graphical representation of a user-scenario into a single provenance JSON-LD file illustrated in Figure 5. The execution provenance trace in IWASM. The wprov:user-workflow integrated JSON-LD file can be directly downloaded or shared as a represents the overall user-scenario execution composed by a series link with other users and can be consumed by third-party tools such of steps and uses the water-model (bucket model), as a guideline to as the JSON visualization tool used in IWASM - described in the execute a series of steps. The PROV-O property following section. prov:wasInformedBy links the wprov:user-workflow with specific steps executed, e.g., wprov:human-intervention. Each workflow 6 VISUALIZING PROVENANCE TO INSTILL step is connected to the previous step by the wprov:hadNextStep TRUST AND PROMOTE REUSABILITY relation. The JSON-LD generated by IWASM can be reused by third-party The wprov:list-of-parameters, an extension of prov:Collection, applications due to the use of standard languages. A module to is linked to each parameter wprov:Parameter sent to the bucket visualize metadata and provenance trace of user-scenario execution model implementation in GAMS through the property is provided by IWASM using the third-party tool jsonld-vis [12] prov:hadMember. Steps in the user-scenario execution, e.g., (Figure 5). This open-source visualization tool constructs a wprov:review-and-run, are linked to the wprov:ModelingAgent that visualization graph of JSON-LD files. A few modifications to the is an extension of prov:Agent, using the property services provided by jsonld-vis were performed in order to generate prov:wasAssociatedWith relation. The outputs of the a workflow-like visualization. Figure 5 shows the provenance for wprov:review-and-run step are annotated as wprov:model-outputs the outputs of the model including the modeling agent. and linked to this step with the prov:wasGeneratedBy property. The wprov:model-outputs are linked to a wprov:list-of-variables, an extension of prov:Collection, through the property prov:wasDerivedFrom. The wprov:list-of-variables is linked to 4 Capturing Scientific Knowledge for Water Resources Sustainability in the Rio Grande Area Figure 5: Visualization of provenance trace generated for a user-scenario execution using the third-party tool jsonld-vis. scenarios include alternative climate, population, and water usage 7 PRELIMINARY EVALUATION that can improve understanding of the coupled human-natural system and facilitate discussions and policy making among a wide From the scientific perspective, a standard model evaluation range of stakeholders. This highly-interdisciplinary endeavor used approach was used to verify that the model works as intended and proven techniques for knowledge negotiation, including the produces believable results. This approach relies on selecting a time creation of concept models, and the development of common period to simulate for which observational data exists - in this case vocabularies through ontologies and knowledge representation reservoir capacity, streamflow at two gauges, and groundwater languages that enable the integration and exchange of data through depth in specific wells were used. The data are subdivided into two the Web. The requirements elicitation process as well as the parts [2]. The first part is used to calibrate the model (training development of IWASM was driven by the interdisciplinary dataset) and the second part is used to test how well results match research team of this project along with input from potential end- observations. A twenty-year period from 1994 to 2013 was used. users. As a result, IWASM provides a friendly interface that Simulated results for this time period were strongly correlated with enables user-scenario executions of the bucket model as well as observations, indicating the model has acceptable validity. outputs of the system with a provenance trace serialized as a JSON- To verify that the infrastructure created was generating the same LD file. The provenance visualization module illustrates the reuse results as if the modeling tool GAMS was executed directly we of JSON-LD files by third-party tools and fosters the understanding used a black box approach - a model with the same inputs was and reusability of models by end-users, including stakeholders that generated both using GAMS directly and using the Web interface. may not be familiar with modeling systems. The outputs of the two models were compared to make sure they were the same and thus verify that the Web-based graphical user interface, web service executions, and the infrastructure created 9 FUTURE WORK was generating the expected results. The bucket model is constantly evolving to support additional From the end-user perspective, we evaluated the usability of the features such as the dynamic generation of parameters. IWASM is graphical user interface in a number of ways. Initially we asked also being updated to support these changes. We are in the process team members and others affiliated with the project to step through of incorporating additional models of water including simulation a series of tasks and provide feedback through a survey as described models of water consumption using different modeling tools. Our in section 2. Then, we asked other participants in two workshops to ultimate goal is to enable users to ask English-like scientific step through the same tasks and provide feedback, both through a questions that will trigger the automatic selection and execution of survey and facilitated discussion. Lastly, we recruited five students a modeling algorithm exposed as a Semantic Web Service based on with agricultural backgrounds to test the interface, assuming they our previous work on workflow orchestration for biodiversity would more closely represent our agricultural stakeholders. sciences [6]. This new feature will also assist end-users in the We are in the process of incorporating suggestions from end- selection of parameters using context provided by ontologies. users into current versions of the bucket model and graphical user Additional data will be needed for new versions of the data model, interface. including data provided by members of the research team in Mexico. These data introduces the challenge of integrating data 8 CONCLUSIONS collected through different survey protocols, different unit scales (e.g., Metric instead of English) and languages (e.g., Spanish). We This paper reports in our efforts towards providing a Web-based will pursue the use of further ontologies and ontology mappings to platform – IWASM that enables the generation of user-scenario automate the integration of these data that ultimately represents executions of the bucket model that integrates biophysical different perspectives in studying water sustainability. workings of nature with human choices that impact IWASM. User 5 N Villanueva-Rosales et al. ACKNOWLEDGMENTS and Sciences. 6, 2 (Jun. 2016), 278–286. DOI:https://doi.org/10.1007/s13412- 015-0335-8. This material is based upon work that is supported by the National [21] Rajkarnikar Tamrakar, S. 2017. Describing Data and Workflow Provenance Institute of Food and Agriculture, U.S. Department of Agriculture, Using Design Patterns and Controlled Vocabularies. ETD Collection for University of Texas, El Paso. (Jan. 2017), 1–72. under award number 2015-68007-23130 “Sustainable water [22] Sheng, Z. 2013. Impacts of groundwater pumping and climate variability on resources for irrigated agriculture in a desert river basin facing groundwater availability in the Rio Grande Basin. Ecosphere. 4, 1 (Jan. 2013), 1–25. DOI:https://doi.org/10.1890/ES12-00270.1. climate change and competing demands: From characterization to [23] The Wfdesc ontology (wfdesc): 2015. solutions”. Authors would like to thank the valuable contributions http://lov.okfn.org/dataset/lov/vocabs/wfdesc. Accessed: 2017-10-26. of the research team (scientists and students) participating in this [24] USDA Project CAP Study Area: 2015. http://purl.org/iwasm/basemapmeta. Accessed: 2017-11-22. project and the GAMS developers. Special thanks to Bill Hargrove, [25] Walsh, C. 2013. Water infrastructures in the U.S./Mexico borderlands. Joe Heyman, Dave Gutzler, Alfredo Granados, Zhuping Sheng, Ecosphere. 4, 1 (Jan. 2013), 1–20. DOI:https://doi.org/10.1890/ES12-00268.1. [26] Ward, F.A. and Crawford, T.L. 2016. Economic performance of irrigation Jose Caballero, and Sarah Sayles for their contributions to this capacity development to adapt to climate in the American Southwest. Journal work, and Ismael Villanueva-Miranda for the generation of Figure of Hydrology. 540, (2016), 757–773. [27] Weibel, S. et al. 1998. Dublin core metadata for resource discovery. 1. This work used resources from Cyber-ShARE Center of [28] Zvoleff, A. and An, L. 2014. Analyzing Human–Landscape Interactions: Tools Excellence, which is supported by National Science Foundation That Integrate. Environmental Management. 53, 1 (Jan. 2014), 94–111. grant number HRD-0734825. DOI:https://doi.org/10.1007/s00267-012-0009-1. REFERENCES [1] Belete, G.F. et al. 2017. An overview of the model integration process: From pre-integration assessment to testing. Environmental Modelling & Software. 87, Supplement C (Jan. 2017), 49–63. DOI:https://doi.org/10.1016/j.envsoft.2016.10.013. [2] Bennett, N.D. et al. 2013. Characterising performance of environmental models. Environmental modelling & software. 40, (2013), 1–20. DOI:https://doi.org/10.1016/j.envsoft.2012.09.011. [3] Consortium, W.W.W. 2014. JSON-LD 1.0 : a JSON-based serialization for linked data. (Jan. 2014). [4] Cooke, N.J. 2015. Enhancing the Effectiveness of Team Science. The National Academies Press. [5] CSV to JSON - CSVJSON: 2014. http://www.csvjson.com/csv2json. Accessed: 2017-11-22. [6] Del Rio, N. et al. 2013. ELSEWeb meets SADI: Supporting Data-to-model Integration for Biodiversity Forecasting. Discovery Informatics Symposium (2013). [7] Dumontier, M. et al. 2014. The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery. J. Biomedical Semantics. 5, (2014), 14. [8] GAMS - Cutting Edge Modeling: 2017. https://www.gams.com/. Accessed: 2017-10-05. [9] Green, P. et al. 2015. Freshwater ecosystem services supporting humans: Pivoting from water crisis to water solutions. Global Environmental Change. 34, (Sep. 2015), 108–118. DOI:https://doi.org/10.1016/j.gloenvcha.2015.06.007. [10] Guha, R.V. et al. 2016. Schema.Org: Evolution of Structured Data on the Web. Commun. ACM. 59, 2 (Jan. 2016), 44–51. DOI:https://doi.org/10.1145/2844544. [11] Gutzler, D.S. 2013. Regional climatic considerations for borderlands sustainability. Ecosphere. 4, 1 (Jan. 2013), 1–12. DOI:https://doi.org/10.1890/ES12-00283.1. [12] jsonld-vis: Turn JSON-LD into pretty graphs: 2015. https://github.com/scienceai/jsonld-vis. Accessed: 2017-11-26. [13] Kelly (Letcher), R.A. et al. 2013. Selecting Among Five Common Modelling Approaches for Integrated Environmental Assessment and Management. Environ. Model. Softw. 47, C (Sep. 2013), 159–181. DOI:https://doi.org/10.1016/j.envsoft.2013.05.005. [14] Lebo, T. et al. 2013. Prov-o: The prov ontology. W3C Recommendation, 30th April. (2013). [15] Loucks, D.P. and van Beek, E. Water Resource Systems Planning and Management - An | Daniel P. Loucks | Springer. [16] McDonald, R.I. et al. 2014. Water on an urban planet: Urbanization and the reach of urban water infrastructure. Global Environmental Change. 27, (Jul. 2014), 96–105. DOI:https://doi.org/10.1016/j.gloenvcha.2014.04.022. [17] MongoDB: 2007. https://www.mongodb.com. Accessed: 2017-11-22. [18] Moreau, L. and Groth, P. 2013. Provenance: An Introduction to PROV. Synthesis Lectures on the Semantic Web: Theory and Technology. 3, 4 (Sep. 2013), 1–129. DOI:https://doi.org/10.2200/S00528ED1V01Y201308WBE007. [19] OWL 2 Web Ontology Language Document Overview (Second Edition): 2012. https://www.w3.org/TR/owl2-overview/. Accessed: 2017-07-10. [20] Pennington, D. et al. 2016. The EMBeRS project: employing model-based reasoning in socio-environmental synthesis. Journal of Environmental Studies 6