Intellectual Classifier Development of Citizens' messages on the “Our St. Petersburg” Portal: Experience in Using Machine Learning Methods Petr Begen[0000-0002-0613-3133] and Andrei Chugunov[0000-0001-5911-529X] 1 ITMO University, Kronverksky pr., 49, 197101, St. Petersburg, Russia peetabegen@yandex.ru, chugunov@itmo.ru Abstract. Functional features are investigated and shortcomings in the existing process of sending messages about city problems on the “Our St. Petersburg” portal are revealed. The approach to automatic classification development of citizens ' messages by existing on portal categories is described. Based on re- ports submitted by citizens in the amount of 1.5 million, training and test sam- ples were formed in the ratio of 80% and 20% of texts main volume, respective- ly. Based on training data sample and 194 categories, the algorithm of automat- ic classification was trained using such classical methods of machine learning as naive Bayes classifier, decision trees and artificial neural networks. Using the method of determining effectiveness of the classification and test sample, trained algorithm was tested and checked. The analysis revealed that algorithm based on the use of artificial neural networks shows the best result among the other methods used. The average classification accuracy of the algorithm was approximately 82%. The trained algorithm was used in the development of an intellectual classifier, which is a web application and implements API mecha- nisms for interaction with main modules of the portal information system. Keywords: Artificial Intelligence, Machine Learning, Artificial Neural Net- works, Classifier, e-participation. 1 Introduction Nowadays information technology usage for improving public administration has ceased to be perceived as a kind of innovation of technologies and systems of e- government have already entered everyday life of citizens and have become an inte- gral part of state machine. Research and development are carried out not in the field of translation traditional organizational processes into electronic form, but in the field of improving information systems’ efficiency. The research presented in this article refers to this type of development. Recently, more and more attempts are being made to formulate criteria for e- governance and e-participation effectiveness as a mechanism for feedback from gov- ernments to citizens. Improved information systems according to researchers is an important factor in the growth of institutional citizens’ trust to actions of the authori- Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 83 ties and opportunities to influence these actions [1], and government responsiveness for e-citizens – the basic criterion of e-participation efficiency [2]. In 2018 Russian Federation was developing approaches to the restructuring policy state in the field of innovative development and informatization. This was due to the designation of a new priority and the adoption of Russian Federation national pro- gram “Digital economy”. These processes stimulated a new round of interest in prob- lems of optimization and improvement of state information systems’ work, including problems of increasing their functioning efficiency. Currently, quite a lot of processes in e-governance systems require participation of government officials or subordinate organizations, and tasks’ implementation involving the automation of individual op- erations can significantly improve efficiency of state information systems. One of the approaches that began to be used in the development of state infor- mation systems is to use “Artificial intelligence” (AI), which is considered as one of the main trends in the modern information technologies (IT) development and is in- cluded in all lists of so-called “breakthrough technologies”. There are many publica- tions of an analytical and prognostic plan in which AI technologies play a key role at present stage of digital transformations [3], which is often referred to the “Industry 4.0” development. It should be noted that interest on the part of developed countries in forming fo- cused approach to AI development and ensuring introduction of these technologies and methods began in 2017–2018. At this time, countries such as Canada, China, Denmark, Finland, France, India, Italy, Japan, Singapore, South Korea, Sweden, Tai- wan, the UAE and the UK adopted strategic documents to promote the development and use of AI [4]. In these documents, such areas as research, development of educa- tion system, AI usage promotion in the public and private sectors, ethics and legal aspects of application, standards and data infrastructure protection of digital infor- mation are identified in different degrees of elaboration. Present work is a local research project under the direction oriented to the study of information systems’ functioning specifics supporting electronic interaction of citi- zens with authorities in a variety of contexts: from applied research to create models of e-governance institutional environment functioning. Within the framework of this research direction a series of projects is being im- plemented devoted to the empirical analysis of e-Participation practices, which is defined as “a set of methods and tools that ensure electronic interaction between citi- zens and authorities in order to take into account the views of citizens in state and municipal administration when making political and managerial decisions” [5, p. 60]. The pilot project, results of which are presented in this paper, is aimed at solving the problem of automating messages classification posted by citizens on the “Our St. Petersburg” portal. Using “Our St. Petersburg” portal residents of the city can send messages related to housing and communal services and city improvement, the state of sidewalks and roads, get background information on the object of interest of the city economy, etc. An important component of the portal is a system of organizational measures and rules of processing messages which involves many services and au- thorities of St. Petersburg [6]. 84 2 Functional Features of the “Our St. Petersburg” Portal 2.1 About the “Our St. Petersburg” portal The “Our St. Petersburg” portal was created on the initiative of the St. Petersburg Governor Poltavchenko G. S. in 2014 for the operational interaction between resi- dents with representatives of St. Petersburg. Using the portal user has following op- portunities [7]:  to send messages on problems connected with housing and communal services and improvement of the city, a condition of roads and sidewalks, illegal objects of con- struction and trade, violation of the land or migratory legislations;  to inform the city about the lack of background information on the Bulletin boards, also about the poor sanitary condition of the premises in the budgetary institutions of education, health, culture, social protection, employment;  to receive additional information concerning the address city programs and manag- ing organizations, also reference information on the object of municipal economy interest;  get acquainted with technical and economic passports of apartment buildings in St. Petersburg and get information about the organizations that provide their service. Messages sent through the “Our St. Petersburg” portal without fail are considered by city services in strictly established terms depending on chosen category according to the messages’ classifier. The portal user has the opportunity to receive information about the progress of consideration and processing of sent messages as well as to evaluate the response received. The first version of the “Our St. Petersburg” portal was opened in January 2014 and during the year was carried out a gradual modernization and development of regulations for processing citizens ' appeals. From the very beginning it was decided to develop this information resource based on the City monitoring center, which pro- cesses telephone calls of citizens (operational services of the city on various prob- lems). Up to date there is a rapid development of the portal: as of December 2019, citi- zens of St. Petersburg filed more than 2 million messages about urban problems and the same number has been resolved (more than 96% of the messages submitted), and the number of registered users is about 157 thousand people and still the current fig- ure continues to grow. As of December 2019, the scheme of functional relations of the “Our St. Peters- burg” portal, presented in [6], is as follows (Fig. 1). 85 GIS “Address IS and database of programs of St. e-government Petersburg” (registers, etc.) Unified identification Governor and authentication of St. Petersburg system (USIA) System of interdepartmental electronic interaction (IEIS) Regulations Citizens’ Rating of phone district Call-Center messages heads Coordinator's Executive authorities Account City Monitoring Quarterly Center Moderator's Account MODERATION Monthly Controller's Our St. Petersburg Account http://gorod.gov.spb.ru Public and Statistics Executive's Municipal Database Account Authorities Database of unapproved applications Applicant's Response Account Civil Interim Response Inter- oversight 1–7 days organizational Citizens' Commission Registration phone through USIA ACTION appeals CONFLICTS and Applicant DISPUTES (Citizen) Field visits, conflict resolution (if necessary) PROBLEM (site improvement, Field visits, recording the results of problems solving actions garbage… etc.) Fig. 1. Information and institutional architecture of the “Our St. Petersburg” portal (2019) For more than 5 years of portal existence (2014–2019) the following indicators were achieved:  the number of problem categories increased 3.5 times: from 56 to 194;  the number of authorities involved in the work increased 2.4 times: from 23 to 56;  the number of organizations/performers increased 10 times: from 54 to 540;  the number of the applicants’ personal accounts have increased 4.8 times: from 6,344 to 30,640;  the number of registered users on the portal has increased more than 9 times: from 17 thousand to 157 thousand;  the number of initial problems reports has increased by an average of more than 140 times. According to these statistics we can conclude that a sufficiently large and growing popularity of the portal among the city residents. Thus, the “Our St. Petersburg” por- tal becomes one of the most effective tools of e-participation implemented in St. Pe- tersburg, thanks to publicity and openness, many problems were quickly brought to the city administration and successfully solved in the shortest possible time. In this 86 regard, the load on the portal increases significantly: according to statistics [8], pre- sented by the Administration of St. Petersburg, the “Our St. Petersburg” portal re- ceives up to 2.5 thousand messages from citizens every day and about the same num- ber of responses from Executives, and such a high load on moderating services (which now has 22 moderators in team) leads to problems of effective activities of various service types and raises the question of optimizing the existing functionality of the portal for faster processing of incoming citizens’ messages and their further transfer to Executive authorities. In this paper the process of submitting a message to the portal and its classification in accordance with requirements will be considered in more detail and a solution for optimizing this process will be presented further. 2.2 The Current Process of Submitting Message to the Portal Existing process of messages’ submission by the user to the portal is arranged as fol- lows: 1. Firstly, you must log in and register on the portal. This is possible with the help of an account in social network “Vkontakte”, through the Unified identification and authentication system (USIA) or through the Unified single sign-in system. 2. To report a problem, you must select one of the 194 available problem categories using standard keyword search form, which will prompt several options for the de- sired category based on the query. 3. Specify the location of problem on the map by adding a point on the map or by en- tering street name and house number in the search bar. 4. Add photo of any supported format, confirming presence of the problem. 5. Briefly describe your problem in a special window (up to 1000 printed characters). When describing problem avoid abbreviations, obscene language, messages, re- quests, petitions of a personal nature. 6. If necessary, specify the name of the organization to which the user has already applied previously (* if necessary). 7. Confirm sending the generated message about the existing problem by pressing the “Send message” button. In the considered process of sending a message in the 2nd item of list found a signifi- cant drawback, which is quite difficult to determine the correct category of messages for user from a large amount of proposed number. According to statistics moderators reject 20–25% of incoming citizens’ messages due to the discrepancy of the message about problem of the one of available categories proposed in the classifier. Thus, the proposed solution to optimize this activity should simplify the process of submitting a message to the user, speed up the process of checking the message for compliance with the requirements for moderating services and reduce the percentage of messages rejection due to an incorrectly selected category of problem. 87 2.3 An Approach to Process Optimization of Classifying Messages when they are Submitted to the “Our St. Petersburg” Portal As been already mentioned earlier the moderation service, which now has 22 modera- tors, within one working day must work out each received message in accordance with the Order of work with messages. In days of peak loads the number of messages grows in one and a half or two times. Based on these statistics, we can calculate the approximate time to work off each incoming message to the portal. Under condition officially adopted 8-hour working days everyone the moderator needs fulfill almost 114 new messages from users, thus on effective practicing 1 messages the moderator can spend no more than 4 minutes working time. According to our calculations, it takes up to 1–1.5 minutes for the moderator to determine whether the category corresponds to the declared problem described in the text of the message. In day’s peak leverage, when number of incom- ing messages expands in 1.5–2 times, number of messages, which every moderator should fulfill in for working days grows to 170–228, and time on practicing every message is shrinking until 2–3 minutes. Therefore, the terms of moderation in such emergency situations have to be increased, which is promptly reported in the “news” section on the portal. As a result of implementation of the decision on process optimi- zation of message classification it is planned that the time allocated for check of text conformity of the message to the set category and requirements will be considerably reduced and will make no more than 30 sec further. Also, in statistics it is specified that 20–25% of all arriving messages from citizens are rejected by moderators because of incorrectly chosen category. Thus, in order to optimize this process, it is necessary to form certain criteria under which it will be possible to increase the efficiency of this algorithm and reduce the percentage of mes- sages rejection up to 15%. As one of the solutions, it is proposed to develop and implement automatic classi- fication of citizens ' messages. In order to minimize the risk of erroneous definition of the category by the user and to increase the efficiency of moderating service for pro- cessing incoming messages, the following approach to solving this problem is pro- posed:  to submit a problem report it is necessary to exclude the obligation for the user to choose a problem category from the Classifier on his own or enter keywords in the search form: for this purpose, the user needs just to describe the problem in the form of a message text. Follow the procedure when submitting, such as specifying the location of an existing problem on a map and uploading supporting photos, save.  for moderating service to develop the module of automatic text message classifica- tion which will present result of work in the form of the ranked list from three cer- tain categories with the corresponding percent of classification accuracy for the subsequent choice by the moderator. Methods for implementing this approach are described further. 88 3 Intellectual Message Classifier 3.1 Algorithm of messages' automatic classification AI technologies such as machine learning and Natural Language Processing tech- niques have been proposed to implement automatic message text classification. To achieve stated goal following tasks were set:  prepare data for training and testing classification algorithm;  apply to obtained data basic methods of Natural Language Processing;  build and train a classification model based on machine learning techniques;  test trained model on the basis of a test sample and get accuracy score for further analysis of result. Data of citizens' messages were obtained from portal database in amount of 1.5 mil- lion. When sending a message to the portal it already has a category that is defined by user himself, so messages checked and accepted by the moderating service were used as data. In accordance with common practice data were divided into training and test samples in a ratio of 80/20. Note that test sample does not participate in training of the model, which means that the model will “see” this data for the first-time during test- ing. This approach allows us to obtain objective estimates of trained model classifica- tion accuracy. For the model to be able to work with incoming data stream, it is necessary to pre- process and represent it in numerical form. At preparatory stage all obtained data is processed: punctuation marks, invisible symbols and numbers are removed, words are converted to lower case and initial form (for words with different prefixes, suffixes and endings) [9]. TF-IDF measure [10] is used to represent an array of data in the form of numeric vectors, which reflects importance of using each word from a certain set of words (number of words in the set determines the dimension of vector) in each body of text. Also, technique helps to exclude the most frequently encountered words (for example, prepositions and conjunctions) or Vice versa rarely encountered, because such words carry little useful information and only add information noise to unstructured text bodies. Another point of improving search for significant features in the text was formation of a stop-words list, which mainly includes names of streets or urban facilities, also do not have a significant impact on the definition of problem category. As another Natural Language Processing method, Word2Vec algorithms [11] were used to represent words in vector space. Algorithms use text context to form numeri- cal representations of words, so words used in the same context have similar vectors. This approach also provides an effective way to identify significant features in text to improve the final result of classification. To build a classification model based on analysis of works the following machine learning methods were chosen, showing good results when working with text infor- mation: naive Bayesian classifier [12], decision tree [13] and artificial neural net- works [14]. Three networks with different architectures have been proposed as neural 89 networks: feed-forward network (FFN), convolutional network (CNN) and recurrent network (RNN) with LSTM block. Each of methods organizing neural networks ar- chitecture has its advantages and disadvantages, but each has good results in classifi- cation problems, so it was decided to apply different methods and architectures and analyze the result within conditions of our problem. The model was developed with Python programming language. Keras framework (with an add-on over TensorFlow mechanisms) and scikit-learn library were used to implement machine learning methods and configure neural network architectures. After training model with different methods tests were conducted since a test sam- ple. To assess quality of trained model we used metric F-measure, which is the har- monic average between Precision and Recall of classification. The common formula of metric F-measure has the following form [15]: Precision × Recall 𝐹-measure = 2 × (1) Precision + Recall Precision is a proportion of true texts belonging to a given category relative to all texts that the model has assigned to that category, and Recall is a proportion of texts found by the model belonging to category relative to all texts of that category in the test sample. The results of trained models with different machine learning methods are present- ed in the Table 1 below: Table 1. F-measure and training time for different machine learning methods Machine learning method's name Naive Decision RNN with Bayesian FFN CNN tree LSTM classifier F-measure 0.659 0.703 0.7836 0.8199 0.8095 Model’s 20 min 5 h 14 min 1 h 45 min 2 h 5 min 2 h 35 min training time According to analysis of presented F-measure accuracies indicators the best and quite fast learning method machine learning, which was applicable in our classifica- tion tasks, was convolutional neural network (CNN), which showed almost 82% of accuracy in identifying category of problem based on the body of text message. The model with a recurrent neural network (RNN) with an LSTM block, which is tradi- tionally one of the best in text classification problems nowadays, performed slightly worse (i.e. a difference of 1%). Thus, an algorithm using a convolutional neural net- work as one of the best performed was proposed in further development of intellectual message classifier. 90 3.2 Criteria for the Success of Intellectual Message Classifier Operation In order to implement this business function in the existing functionality of the system and ensure the correctness of the automatic classification it is necessary to create a list of criteria. As a result of the algorithm analysis for working with messages and business func- tions a final list of criteria for the success of the tools for selecting the subject of citi- zens' messages and optimizing the process of submitting a message to the portal was compiled which looks as follows:  the subject of the message must correspond to the categories and time period for the occurrence or elimination of problems specified in the Classifier;  the message should contain a description of the problem in only one of the catego- ries;  the text of the message (if necessary) should contain the same coordinates of the problem location as the coordinates corresponding to the selected location on the map;  the message should not coincide with other message (on set of parameters: object, category, reason, address / coordinates of a problem) which is placed on the portal and is under consideration;  the message should not contain groundless, unproven charges against Executive bodies of the St. Petersburg state power and the state institutions (enterprises) sub- ordinated to them, Federal bodies of the state power, physical persons or legal enti- ties;  the message should not contain personal data of third parties distributed without their consent;  the message should not contain messages, requests, petitions of a personal nature related to the work of the portal;  the message should not contain information distributed for commercial purposes or for any other purposes other than the purposes of the Order (including spam, adver- tising in the message text, images, video files, links to third-party resources of the information and telecommunication network “Internet”);  the message must be a logically complete statement, not contain typos and (or) errors that prevent the understanding of the meaning of the appeal or allow for its ambiguous interpretation;  the message must contain a stylistically correct request, corresponding to the norms of business communication;  the message should be written in Cyrillic preferably in lowercase letters, not con- tain inappropriate abbreviations and obscene language;  the text of the message should not exceed the limit of 1000 characters;  in the Classifier it is necessary to exclude possibility of duplication of categories, texts of messages;  for each category there must be at least 30 examples of relevant message text to successfully train the classification model; 91  the percentage of accuracy in determining each category using machine learning methods should not be less than 80% and constantly improve. In compliance with the formed criteria for the success of the automatic classifica- tion, it is planned to significantly reduce the average time for working off 1 message from the moderator by at least 25% (from 4 minutes to 3 under normal load on the portal), reduce the percentage of messages rejection due to an incorrectly selected category (from 20–25% to 15–20%, i.e. by at least 5%), improve the usability of the portal and facilitate the process of submitting a message to the user. The intellectual message classifier is designed for moderating services in order to improve efficiency and convenience of working with citizens’ messages and is going to be a web application that implements API mechanisms for interaction with existing modules of information system of the “Our St. Petersburg” portal. The developing classifier will allow to automatically determine category of the us- er's message in asynchronous mode and present the result for moderating services in the form of a ranked list of three most possible categories with an indication of defini- tion accuracy percentage. If the definition percentage of any category is below 5%, then submitted message does not match any of the available categories, which so will also prompt the services to make a further decision. This approach will allow services to accurately verify correctness of problem category choice proposed by classifier as well as faster to consider text message at the time of detecting possible errors and passing it on to Executive authorities. 4 Conclusions As a result of this work the functional interaction of the “Our St. Petersburg” portal’s components was considered, the processes of submitting a message to the portal and its further development by various services were described. Based on the identified problems that negatively affect the effective operation of moderating and other services of the portal when working with citizens' messages an approach to the solution of the optimization process related to the messages’ classifi- cation was proposed. At this stage an algorithm was developed for automatic classification of citizens' messages into categories on the “Our St. Petersburg” portal based on machine learn- ing methods. The algorithm was trained on data previously divided into training and test samples in a ratio of 80/20, respectively, as well as analyzed and presented in vector space using Natural Language Processing methods. The best machine learning method used in automatic classification algorithm was the convolutional neural network (CNN) which showed an average category determi- nation accuracy (i.e. F-measure) of about 82%. The developed algorithm with this method was used in further development of an intellectual classifier for moderating services. As a further stages it is planned to explore the use of intellectual classifier in the framework of the tasks for the compliance of communications approved the rules 92 according to the Order of messages in automatic mode and analysis to identify the increase of services activity efficiency of the portal. This work was supported by the Russian Science Foundation, project No. 18-18- 00360 “E-participation as Politics and Public Policy Dynamic Factor”. References 1. Jansen, A.: The understanding of ICTs in public sector and its impact on governance. In: Electronic government: Proceedings of the 11th IFIP WG 8.5 international conference EGOV-2012, LNCS book series, vol. 7443, pp. 174–186 (2012). 2. Vidyasova, L.A., Misnikov, Y.G.: Kriterii ocenki social'noj effektivnosti portalov el- ektronnogo uchastiya v Rossii. Informacionnye resursy Rossii 5(159), 16–19 (2017). 3. Pandya, J.: The Geopolitics of Artificial Intelligence, https://www.forbes.com/sites/ cogni- tiveworld/2019/01/28/the-geopolitics-of-artificial-intelligence/#5a4b420979e1, last ac- cessed 2019/12/07. 4. Dutton, T.: An Overview of National AI Strategies, https://medium.com/politics-ai/an- overview-of-national-ai-strategies-2a70ec6edfd, last accessed 2019/12/07. 5. Chugunov, A.V.: Vzaimodejstvie grazhdan s vlast'yu kak kanal obratnoj svyazi v instituci- onal'noj srede elektronnogo uchastiya. Vlast' (10), 59–66 (2017). 6. Chugunov, A.V., Rybalchenko, P.A.: Razvitie sistemy elektronnogo vzaimodejstviya gra- zhdan s vlastyami v Sankt-Peterburge: opyt portala “Nash Peterburg”: 2014–2018 gg. In- formacionnye resursy Rossii (6), 27–34 (2018). 7. O portale, https://gorod.gov.spb.ru/about/, last accessed 2019/12/09. 8. Portalu “Nash Sankt-Peterburg” – pyat’!, https://www.gov.spb.ru/gov/otrasl/ c_information/news/159410/, last accessed 2019/12/08. 9. Zibert, A.O., Hrustalev, V.I.: Razrabotka sistemy opredeleniya nalichiya zaimstvovanij v rabotah studentov vysshih uchebnyh zavedenij. Metody predvaritel'noj obrabotki teksta. Universum: Tekhnicheskie nauki: elektron. nauchn. zhurn 4(5), (2014), http://7universum.com/ru/tech/archive/item/1258, last accessed 2019/12/07. 10. Ingersoll, G.S., Morton, T.S., Ferris, E.L.: Obrabotka nestrukturirovannyh tekstov. Poisk, organizaciya i manipulirovanie. Per. s angl. Slinkin, A.A. DMK Press, Moscow (2015). 11. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representa- tions in Vector Space. (2013), https://arxiv.org/abs/1301.3781, last accessed 2019/12/06. 12. Barsegyan, A.A., Kupriyanov, M.S., Holod I.I., Tess M.D., Еlizarov S.I.: Analiz dannyh i processov: ucheb. Posobie. 3d izd., pererab. i dop. BHV-Peterburg, St. Petersburg (2009). 13. Aggarwal, C.C.: Data Classification: Algorithms and Applications. 1st edn. Chapman & Hall/CRC (2014). 14. Prasanna P.L., Rao, D.R.: Text classification using artificial neural networks. International Journal of Engineering & Technology 7 (1.1), 603–606 (2018). 15. Sasaki, Y.: The truth of the F-measure. Teach Tutor Mater 1(5), 1–5 (2007).