Exploring Climate Change and Its Impact on Agriculture Using Volunteered Geographic Information Hamed Mehdipoor Department of Geo-Information Processing, Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, The Netherlands h.mehdipoor@utwente.nl Abstract The PhD research exposed in this paper aims to develop workflows for fine scale study of climate change and its impact on agriculture using volunteered geographic information in phenology. First, a consistency checking workflow was developed to ensure the quality of volunteered observations. Next, by using novel predictors, spatio-temporal variation in plant phenology is modeled so that we can move from point-related to gridded phenological products. After that, long term gridded time series of phenological data relevant to agriculture is generated using the developed phenological models. Keywords: VGI, consistency checking, spatio-temporal modelling, machine learning, contextual geo-information. 1 Introduction Progress on information and communication technologies and on location‐ aware devices has radically eased the way in which “non‐experts” can pro- duce geo-information. Many “non‐experts” can now collect distribute and, even, analyze geo-information on a voluntary basis. This has resulted in a variety of new data, which fall into the realm of what has been called volunteered geographic in- formation or VGI (1). VGI-based projects monitoring the status of our planet at relatively fine spatial and temporal scales provide scientists with a novel source of geo-information. VGI consistency is, however, a major concern, especially when it is used in modelling activities (2, 3). This is because there is always a degree of spatial and temporal inconsistencies in the actual locations and time of the volunteered obser- vations. Volunteers do not often follow scientific principles of sampling design, and levels of expertise vary among them (4-6). Moreover, unlike traditional geo- Copyright (c) by the paper's authors. Copying permitted for private and aca- demic purposes. In: A. Comber, B. Bucher, S. Ivanovic (eds.): Proceedings of the 3rd AGILE Phd School, Champs sur Marne, France, 15-17-September-2015, pub- lished at http://ceur-ws.org graphic information, VGI typically lacks automated consistency checks as current approaches mostly rely on human interventions (3, 7). Human-based approaches are costly and time-consuming, and are impracticable in many situations such as monitoring of fast-changing phenomena. Another concern with use of the fine resolution VGI is finding a robust model- ling approach which accounts for potential spatial and temporal bias in volun- teered observations. For example, often, VGI is collected where is near to human residences or on weekends or public holidays (8, 9). Current spatio-temporal mod- elling approaches that solely rely on statistical algorithms cannot provide accurate predictions in the presence of such biases in the data. In addition, a variety of spa- tio-temporal contextual information is now available more than ever before, while, the modelling approaches are not efficient to apply such valuable, but high- dimensional, sources of input data. Yet, there is a lack of robust workflows that address the above mentioned con- cerns. This PhD research aims to design and to test workflows that facilitate the use of VGI in terms of consistency check and spatio-temporal modelling. These workflows use VGI, contextual geo-information and computational processing power to achieve the aim using VGI in phenology. 2 Volunteered phenological observations The world is experiencing climate change and this raises several pressing ques- tions. An important one is “How does climate change affect human abilities to se- cure agricultural products?”. Phenology, the science of the timing of seasonal plant and animal activities, provides relevant spatio-temporal information to an- swer this question. Phenological ground observations contain the location and time of species life cycle events (e.g. plant first flowering) and are often collected by volunteers, called volunteered phenological observations (VPOs) in this re- search. VPOs provide timely phenological data at almost no cost as well as extensive spatial and temporal coverage (10). However, there are differences in the collec- tion protocols and in the quality level of VPOs, which negatively affects the con- sistency and modelling of the phenological observations (11). Alternatively, phe- nological models are another way to obtain information about seasonal plant and animal activities. They predict the timing of events according to contextual envi- ronmental information such as climatic information, which typically are available at larger coverage and longer time than VPOs (12). In this way, the lack of com- plete period-of-observations on phenology could be compensated (13). The most of current phenological models have been calibrated using spatially and temporally biased and inconsistent VPOs as well as point-related climatic in- formation (14, 15). These affect the accuracy of model outputs at locations other than the location where climatic data and VPOs are available. In this PhD re- search, we develop workflows that facilitate the study of climate change and its impact on agriculture by 1) providing consistent VPOs about plants, 2) modelling spatio-temporal variation in plant phenology at fine spatial and temporal scales us- ing heterogeneous data sources and machine learning and 3) generating long term gridded time series of phenological data relevant to agriculture by making use of VPOs and correlated observations relevant to agriculture. This information can feed agricultural decision-makers and farmers to understand how to secure agri- cultural products from climate change. From a geoscience point of view, realizing the workflows introduces potentially novel computational approaches to analyze, model and mine VGI. 3 The workflows To date, the checking consistency workflow (16) was designed and tested on a dataset that contains the location, the year and the day of the year of the first flow- er of cloned lilac shrubs (17). The geographic extent of this dataset covers the con- tiguous United States and observations were available from 1980 to 2013. The most detailed set of climatic data for the US, namely the DAYMET database1 was used as contextual spatio-temporal geo-information. The proposed workflow requires three steps to identify inconsistent observa- tions (Fig 1). Clustering the observations based on the contextual condition in which they were collected provides considerable information about the variability that one should expect in the observations. When the contextual information is high-dimensional, mapping it to a low-dimensional space facilitates both the clus- tering and the subsequent visualization steps. Once the observations are assigned to clusters, inconsistency is identified by looking at the outliers present in each cluster. Fig 1. The main steps of the workflow for identifying inconsistencies in VPOs The second workflow (Fig 2) aims to create a novel plant phenology model. For this purpose, appropriate machine learning methods will be applied on gridded meteorological data, gridded digital elevation model data and available VPOs. In the third workflow (Fig 3) gridded time series of phenological data relevant to ag- riculture are generated. On one hand, ground-based observations relevant to agri- 1 http://daymet.ornl.gov/dataaccess.html 3 culture are sparse and thus less appropriate than gridded time series of phenologi- cal data to study trends and changes potentially attributable to climate change. On the other hand, the generation of gridded time series of phenological data relevant to agriculture faces lack of data at appropriate scales to link contextual environ- mental information to ground-based relevant to agriculture. Gridded elevation database Machine learning No: (selection of another algorithm) algorithms Gridded Gridded New Phenology Phenological data daily weather phenology Validation Is it valid? modelling about indicator database model plants Historical VGI-based VPOs VPOs Fig 2. The workflow for creating the spatio-temporal plant phenology model Finding another potential gridded data VPOs relevant to agriculture No Gridded time Gridded phenological series of Correlation Are they yes interpolation data about the target Validation Phenological checking correlated agricultural plant data relevant agricultural plant Gridded time series of Phenological data relevant to Indicator plant Gridded elevation database Fig 3. The workflow for Generation of long term gridded time series of pheno- logical data relevant to agriculture In summary, VGI-based initiatives can use the workflows in phenology but al- so in other environmental applications. The workflows are based on machine power which clearly makes quality checking and data modelling less time- consuming and more accurate respectively. However, the efficiency of the work- flows needs to be evaluated in other real-world case studies, which is considered as the perspective of this study. Acknowledgements I am grateful to AGILE for providing opportunities to develop this position paper through third AGILE PhD-School 2015. I would like to thank Prof. Alexis Comb- er for his valuable comments during the school. References 1. Goodchild MF. Citizens as sensors: the world of volunteered geography. Geojournal. 2007;69(4):211-21. 2. Flanagin AJ, Metzger MJ. The credibility of volunteered geographic information. Geojournal. 2008;72(3):137-48. 3. Goodchild MF, Li L. Assuring the quality of volunteered geographic information. Spat Stat. 2012;1:110-20. 4. Kelling S, Yu J, Gerbracht J, Wong WK, editors. Emergent Filters: Automated Data Verification in a Large-scale Citizen Science Project. eScienceW; 2011: IEEE. 5. See L, Comber A, Salk C, Fritz S, van der Velde M, Perger C, et al. Comparing the quality of crowdsourced data contributed by expert and non- experts. PLoS ONE. 2013;8(7):e69958. 6. Comber A, Brunsdon C, See L, Fritz S, McCallum I. Comparing Expert and Non-expert Conceptualisations of the Land: An Analysis of Crowdsourced Land Cover Data. Spatial Information Theory. Lecture Notes in Computer Science. 8116: Springer International Publishing; 2013. p. 243-60. 7. Elwood S, Goodchild M, Sui D. Prospects for VGI Research and the Emerging Fourth Paradigm. Crowdsourcing Geographic Knowledge: Springer Netherlands; 2013. p. 361-75. 8. Reddy S, Dávalos LM. Geographical sampling bias and its implications for conservation priorities in Africa. J Biogeogr. 2003;30(11):1719-27. 9. Sparks TH, Huber K, Tryjanowski P. Something for the weekend? Examining the bias in avian phenological recording. Int J Biometeorol. 2008;52(6):505-10. 10. Vliet AJH. Challenging times: towards an operational system for monitoring, modelling and forecasting of phenological changes and their socio- 5 economic impact, proceedings 31 March to 2 April, 2003 Wageningen, The Netherlands. 2004. 11. Yanenko O, Schlieder C. Enhancing the Quality of Volunteered Geographic Information: A Constraint-Based Approach. Bridging the Geographic Information Sciences. Lecture Notes in Geoinformation and Cartography: Springer Berlin Heidelberg; 2012. p. 429-46. 12. Chuine I, de Cortazar-Atauri IG, Kramer K, Hänninen H. Plant Development Models. Phenology: An Integrative Environmental Science: Springer; 2013. p. 275-93. 13. Schwartz MD. Monitoring global change with phenology: the case of the spring green wave. Int J Biometeorol. 1994;38(1):18-22. 14. Hamunyela E, Verbesselt J, Roerink G, Herold M. Trends in Spring Phenology of Western European Deciduous Forests. Remote Sensing. 2013;5(12):6159-79. 15. Chmielewski F-M. Phenology in Agriculture and Horticulture. Phenology: An Integrative Environmental Science: Springer; 2013. p. 539-61. 16. Mehdipoor H, Zurita-Milla R, Rosemartin AH, Gerst K, Weltzin JF. Developing a Workflow to Identify Inconsistencies in Volunteered Geographic Information: A Phenological Case Study. PLoS ONE. 2015. 17. Rosemartin AH, Denny EG, Weltzin JF, Lee Marsh R, Wilson BE, Mehdipoor H, et al. Lilac and honeysuckle phenology data 1956–2014. Sci Data. 2015;2:150038.