Application of Automated Means of Content Analysis of Data from Geoinformation Networks to Study the Accessibility of Landscaping Facilities Boris A. Nizomutdinov, Vladimir A. Kazak, Petr A. Begen ITMO University, Kronverksky Pr. 49, bldg. A, St. Petersburg, 197101, Russia Abstract The article presents a method developed by the authors to assess the accessibility of urban im- provement facilities for low-mobility groups of the population based on the analysis of text data from social networks and the socio-psychological well-being of city residents. The object of the study was the profiles of landscaping objects in Google Maps located in the Petrogradsky district of St. Petersburg, 25 urban landscaping objects (parks, gardens, squares) were selected. During the study, the total number of comments for each improvement object was determined, reviews were analyzed, reviews were identified that describe the impressions and experiences of elderly and low-mobility groups, and the tonality of these messages was evaluated. The analysis of the reviews showed that the comments contain information from low-mobility groups of the popula- tion describing the problems of improvement facilities, in particular accessibility. The conclusion is made about the great potential that can be extracted from the use of automated data collection from geoinformation social systems. Findings show that overlapping data from Google Maps en- riches the analysis that would previously have relied on a single source. Keywords 1 Geoinformation social networks, reviews, accessibility for people with limited mobility, parsing, data analysis 1. Introduction In most regions, various programs and methods have been developed to assess the accessibility of the urban environment. Their main goal is to ensure unhindered access to priority facilities and services for the disabled and other low-mobility groups of the population. However, the process of identifying prob- lem areas has a considerable number of nuances. Today, there are various ways to assess the accessibility of the environment, for example, surveys, conducting observations, studying project documentation, Internet surveys. But all these methods are la- bor-intensive, that is, there is a need to use significant human resources for a long time and, depending on the area under study, their volume may be different. Often, in resource-saving mode, these methods are not used by the city authorities. In this study, an original approach to assessing the accessibility of urban improvement facilities for low-mobility groups of the population based on data from their social networks is proposed. Improving the quality of life of the population as one of the main tasks of the socio-economic devel- opment of the city is a consequence of the successful interaction of social institutions and residents of the city in maintaining public relations to solve urgent problems of the city. For monitoring, comprehensive ratings have been developed to assess the quality of life of cities, which are aimed primarily at assessing the human potential of active working residents of the city and practically does not affect the interests of vulnerable groups of the population (elderly people, women, parents with young children, adolescents, youth, disabled people, etc.). In megacities, due to large flows of information, "communicative gaps" arise between residents and social institutions, which leads to an increase in social distance and a decrease SSI-2021: XXIII All-Russian Conference on Scientific Services & Internet, September 20–23, 2021, Moscow (on-line), Russia EMAIL: boris@itmo.ru (B.A. Nizomutdinov); kazakvauniversity@gmail.com (V.A. Kazak); petyabegen@mail.ru (P.N. Begen) ORCID: 0000-0002-4090-9564 (B.A. Nizomutdinov); 0000-0002-2158-5031 (V.A. Kazak); 0000-0002-0613-3133 (P.N. Begen) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) in understanding of aspects of urban existence relevant to these groups of citizens. Now the introduction of “hybrid management systems” is relevant, assuming a subject-subject system of relations between the city and its residents, based on an experimental study of the ideas of city residents about their well-being and the introduction of automated monitoring systems based on them. Among the studies in which vari- ous sources of information are presented, it is possible to distinguish between those whose data are ob- tained from “voluntarily provided” information and those obtained from “not voluntarily provided” in- formation. Voluntarily provided information is generated by users, for example, social media data in re- views and comments, while non-voluntarily provided information comes from sources that collect data on user activity, for example, data from mobile operators. The proposed method uses data that users generate independently, in the public domain. In their work [1], the authors eliminate the knowledge gap by proposing a method for determining ur- ban opportunities for urban regeneration, which includes pre-processing, analysis and interpretation of separate and overlapping LBSN data. A twofold point of view is accepted – based on people and on the spot. Data from four LBSNS – Foursquare, Twitter, Google Places and Airbnb – reflect a people-based approach as it provides insights into individual preferences, usage and activities. A group of scientists [2] has developed a method for predicting the urban area based on the geospatial activity of users in a social network. One of the most popular social networks Instagram was taken as a source of spatial data. Two large cities with different features of online activity were selected as target cities – New York, USA, and St. Petersburg, Russia, a convolutional neural network based on three- dimensional convolution layers is used for processing. The study [3] shows that low-mobility groups of the population actively participate in interaction in social processes using ICT, are included in online social discussions and are included in democratic pro- cesses in electronic forms. In this study, one of the most popular geoinformation social networks Google Maps was taken as a source of spatial data, and St. Petersburg was chosen as the target city. The introduction of active progress and the widespread dissemination of the SmartCity concept leads to the need to develop systems capable of accurately predicting the future state of the urban environment and landscaping facilities. Forecasting the state of an urban area requires the use of various data sources, new data sources are emerging in the new world, and social networks are one of such sources. Their social media data has become a valuable addition to the input data of a modern decision support system. Having data on the problems of accessi- bility of improvement facilities for low-mobility groups of the population in urban areas, researchers could extract information about the current situation and detect potential problems and develop recom- mendations for their elimination. This study is based on existing methodologies for analyzing and interpreting data from geoinformation social networks to identify potential accessibility issues. Reviews in Google Maps about landscaping ob- jects are considered as layers of information for analyzing the surroundings inside the city. This document is structured as follows. Firstly, the theoretical basis for this work is based on previous studies in which a parser was developed to collect information. Secondly, the sources and the general method of preparing data for analysis are described. Third, data analysis is described. Finally, the results are presented, followed by a discussion, the main conclusions of the study and a discussion of the limita- tions of the study. 2. Research methodology In this article, we focused on studying the accessibility of facilities for vulnerable social groups – dis- abled people, families with wheelchairs and pensioners. We want to find out if they write reviews about parks that have accessibility information. For the study, we chose the 1st district of St. Petersburg – Pet- rogradsky district. We have compiled a list of parks and squares for this area, found a card in Google Maps for each improvement object, excluded all objects that do not have text reviews. The final database contained 21 objects of urban improvement in the Petrogradsky district. The development of a new methodology will make it possible to improve public space and make it modern and accessible to low-mobility groups of the population. The proposed methodology implies:  determining the method of downloading data from Internet resources;  determining the audience of users of urban objects;  compilation of a dictionary that includes words that can characterize the accessibility of an ob- ject; 183  analysis and processing of the received data. The general scheme of the analysis is shown in Figure 1. Figure 1: Search for landscaping objects on the map At the step of collecting information, it was planned to use the Google API, however, during a detailed study and preparation of the parser, it turned out that the Google API has a limitation and is able to give only the last 5 reviews, which is not suitable for this study. To solve the problem of automated feedback collection, a ready-made Outscraper parser was used. With this parser, you can upload reviews by object ID to Google Maps, it does not have such critical limitations as the standard Google API. Figure 2: Search for landscaping objects on the map An important aspect that was taken into account when collecting information is personal data. From the point of view of legislation, projects providing for automated data collection from social networks affect both special legislation on personal data (in terms of the object of research) and related intellectual rights of the creators of the social network (in terms of the data source for research). The difficulty of in- terpreting legal relations from the point of view of the law arises from the fact that changes in legislation lag far behind the development of technology. 184 It follows from the legislation on personal data and information that the information posted by the us- ers themselves on social networks is publicly available from a legal point of view, but the courts, in some cases, come to a different conclusion. The best option for researchers is not to collect the user's full name, and to depersonalize all the col- lected data using and storing only statistical information. When the parser was running, data about the user's full name was not saved, each record was assigned an ID, which excluded the personalization of data, in this form the information can be considered impersonal. This approach avoids the collection, storage and processing of users' personal data from social networks. During the collection of information, 21 objects were processed, 4900 reviews were collected, all re- views were depersonalized and stored in a single database. 2.1. Search for thematic reviews about accessibility issues At the next stage, we compiled a dictionary that includes 60 words that can characterize the accessibil- ity of the object, for example, the words “ramp”, “barrier”, “wheelchair”, etc. were included in the dic- tionary. Then, using a script, we performed a search on the collected database using these words and vari- ous word forms. We have selected all the reviews that contain terms from our dictionary. During the search, 450 reviews describing accessibility were selected. This is very valuable information that can help improve these facilities. Additionally, for the development of this direction, we tested the use of machine learning methods to determine the subject of comments. To solve the problem of automatic determination of the subject of the review, an algorithm is being developed to solve the problem of text clustering. Clustering is the splitting of a set of similar documents into clusters – subsets, the parameters of which are unknown in advance. The number of clusters can be arbitrary or fixed (set by the user at the initial stage). The clustering task refers to the well-known approach of unsupervised learning, unsupervised learning (learning on data not marked up by experts). Using the implementation of machine learning methods in the algorithm, the result is achieved in the form of the formation of the nth number of groups (clusters) into which the source text array can poten- tially be divided. The resulting n-clusters should be further analyzed on the basis of the news corpuses that have fallen into them or by a list of keywords specific to each of the clusters. To implement the solu- tion of the clustering problem, the KMeans method (k-means method) was used. Figure 3: Partitioning into clusters The operation of the algorithm is to minimize the total quadratic deviation of cluster points from the centers of these clusters themselves. To implement the algorithm, the Kmeans class from the sklearn.cluster library was used. Further, in order to train the algorithm on the collected data, it is neces- 185 sary to pre-process them (remove punctuation marks, remove noise, etc.) and present them in vector (nu- meric) form. To do this, the basic methods and approaches for natural language processing (Natural Lan- guage Process) are used. The pandas library was used to extract the collected news array into the program. The “text” field from the original news array was selected as the training text data. The received data were pre-processed as follows: punctuation marks, invisible symbols were removed using regular expressions, Latin letters, sin- gle letters were removed, extra spaces were removed. With the help of the pymorphy2 library, all words were reduced to normal form (for example, the ad- jective word “electronic” is reduced to the form “electronic”). This allows you to reduce the dimension of the data array without losing significant features in the text. A list of stop words was also generated using the TfidfVectorizer class from the sklearn library to remove unnecessary noise in the source data (the words are presented in the file stopwords.txt). The conversion of text into a vector (numeric) form was also carried out using the TfidfVectorizer class. This class converts text into a vector form by compiling a matrix of weights for each word based on the tf-idf approach. Then the processed data was transferred to the KMeans class algorithm to solve the clustering problem. The trained finished model was saved to a file for further use in other tasks. During the analysis, we collected reviews which described the accessibility of facilities for people with reduced mobility, all of them were grouped by the main topics related to accessibility, for example: strollers, wheelchairs, disabled people, accessibility, family members and the elderly, restrictions. We had to combine the terms “wheelchair for children” and “wheelchair for the disabled”, since they have the same name in Russian. The “Restrictions” group includes general conditions related to access, functions and restrictions. Sep- arately, we can single out a group associated with reviews in which they wrote about the problems faced by older people, there were 37 such reviews. Figure 4: Partitioning into clusters Reviews from people with limited mobility contain both a description of the advantages and an indica- tion of the disadvantages of parks and squares in the area. Among the problems found, difficulties with moving along park paths, inconvenient entrances, uninformative information boards were mentioned, as well as mentions of uncomfortable benches for low-mobility groups of the population and other problems requiring attention. 186 Figure 5: Example of a review about a park 2.2. Definition of tonality It is difficult to solve the problem of determining the tone of a comment by clustering the text into groups, because the text contains many signs by which they can be classified, and the algorithm does not yet know which comment is “positive” and which is “negative”. To solve such a problem, the supervised learning approach is most often used (training on an array of data marked up by experts), which boils down to solving the problem of text classification (the distribu- tion of objects into previously known groups, categories). However, the data we have received does not have a preliminary markup for a “positive” or “negative” tone, so you should use another available and prepared array of texts. There are very few such corpora for the Russian language, because marking up large text bodies requires considerable time and human resources from researchers. As a Russian-language array with positive and negative texts marked up, we used an array collected by Yulia Rubtsova from the Twitter site, which contains user reviews and comments on a variety of top- ics (politics, economics, IT, sports, medicine, etc.). For training, a training corps consisting of 114,911 positive and 111,923 negative entries was used. To solve the problem of determining the tonality of text messages, a ready-made implementation from the company's researchers was used Mail.ru, freely availa- ble for research (https://github.com/sismetanin/sentiment-analysis-of-tweets-in-russian). In this algorithm, a convolutional neural network (CNN (convolutional neural network)) was implemented, which showed an average accuracy of 78.1% in determining the tonality of the text, which is good enough for solving such problems. In the received database, more than 35% of the reviews had a negative tone. It is planned to increase the accuracy of determining the tonality. 187 3. Conclusion In the course of the work done, a methodology for assessing the accessibility of urban improvement facilities was presented and described. As an illustrative example, the objects of the urban environment (parks, squares, gardens) located on the territory of the Petrogradsky district were taken. Based on the results of the work done, an analysis of the feedback received was carried out and based on them it was revealed that, in general, the state of urban improvement facilities is not in quite proper condition, be- cause 23% of negative reviews indicate that there are hard-to-reach territories for low-mobility groups in the area, to which the administration of this area should pay attention and promptly correct the situation. This study combined the methodological foundations of traditional socio-psychological research and the possibilities of modern information technologies to substantiate value-oriented management of urban infrastructure development. The selected source of information in Google Maps showed that users gener- ate a large number of reviews, and some of them about the problems of accessibility of urban facilities for low-mobility groups of the population. This project is a practical and promising solution in poorly formal- ized fields of knowledge. In addition, the use of such a solution at the national level can be an example of the introduction of digital technologies and platform solutions in the areas of public administration, busi- ness and society. During the processing of information, it was possible to identify informative reviews that describe the problems of accessibility of individual parks for low-mobility groups of the population. The proposed methods of data extraction and processing have shown good results, at the next stage it is planned to collect feedback on all districts of St. Petersburg, with the construction of a heat map. The study has a number of limitations that are planned to be worked on in the future, in particular, the study does not consider the reliability of reviews and so-called “fake reviews” that can be left by unscru- pulous citizens. However, we believe that when discussing urban improvement facilities, the proportion of fake reviews is lower than in the commercial sector. In the commercial sector, there is a whole industry for writing positive and negative reviews (SERM), but for city parks there is simply no need to write paid and fake reviews, therefore, they can be considered reliable. In addition, certain limitations of the study are associated with the unavailability of some information about the socio-demographic characteristics of users. This difficulty is directly related to the personal data processing policy, we have not collected or processed personal data. The results of the study can be used to develop recommendations for the management and develop- ment of the city's infrastructure. In this regard, the project has scientific, educational and educational val- ue and contributes to the implementation of one of the priority directions of the city development related to improving the quality of the urban environment and ensuring the effectiveness of management and de- velopment of the urban environment. Additionally, the data obtained can be used in Smart City projects for targeted operational monitoring of the actual needs of the population. 4. References [1] P. Martí, C. García-Mayor, L. Serrano-Estrada. Identifying opportunity places for urban regeneration through LBSNs. Cities 90 (2019) 191–206. https://doi.org/10.1016/j.cities.2019.02.001 [2] K. D. Mukhina, A. A. Visheratin, Gali-Ketema Mbogo, D. Nasonov. Forecasting of the Urban Area State Using Convolutional Neural Networks. 2018 23rd Conference of Open Innovations Associa- tion (FRUCT). https://doi.org/10.23919/FRUCT.2018.8588075 [3] I. Grigoryeva, L. Vidiasova, D. Zhukб Seniors' Inclusion into e-Governance: Social Media, e- Services, e-Petitions Usage. ICEGOV '15-16: Proceedings of the 9th International Conference on Theory and Practice of Electronic Governance, March 2016, pp. 173–176. https://doi.org/10.1145/2910019.2910022 [4] N. Hochman, L. Manovich, Zooming into an Instagram City: Reading the local through social media. First Monday 18 (7) (2013). https://doi.org/10.5210/fm.v18i7.4711. [5] D. Arribas-Bel, K. Kourtit, P. Nijkamp, J. Steenbruggen, Cyber Cities: Social Media as a Tool for Understanding Cities. Applied Spatial Analysis and Policy 8(3) (2015) 231–247. [6] L. Mitchell, The geography of happiness: Connecting twitter sentiment and expression, de- mographics, and objective characteristics of place. PloS one 5 (8) (2013) 64–71. [7] A. E. Nenko, A. M. Semenova, A. A. Galaktionova, Evaluation of the quality of public spaces ac- cording to reviews in Google Maps. Scientific service on the Internet: proceedings of the XXII All- 188 Russian Scientific Conference (September 21–25, 2020, online). Moscow: IPM named after M.V. Keldysh, 2020, pp. 473–485. [8] S. Van Canneyt, S. Schockaert, O. Van Laere, B. Dhoedt, Detecting places of interest using social media. Proceedings 2012 IEEE/WIC/ACM International Conference on Web Intelligence. 2012, pp. 447–451. [9] F. Mairesse, M. Walker, M. Mehl, R. Moore, Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of Artificial Intelligence Research 30 (2007) 457–500. https://doi.org/10.1613/jair.2349 [10] A. Korneeva, Yu. Zeremskaya, O. Loyko, Virtual space as a sphere of the personal identity’s for- mation. Journal of Economics and Social Sciences 8 (8) (2016) 31–35. [11] Y. Kim, J. H. Kim, Using computer vision techniques on Instagram to link users’ personalities and genders to the features of their photos: An exploratory study. Information Processing & Management 6 (54) (2018) 1101–1114. [12] A. Rogers, A. Romanov, A. Rumshisky, S. Volkova, M. Gronas, A. Gribov, RuSentiment: An En- riched Sentiment Analysis Dataset for Social Media in Russian. Proceedings of COLING 2018, pp. 755–763. 189