=Paper=
{{Paper
|id=Vol-3909/Paper_20.pdf
|storemode=property
|title=Towards the Information Technology for Online Citizen Services Detection and Assessment on E-Government National Portals
|pdfUrl=https://ceur-ws.org/Vol-3909/Paper_20.pdf
|volume=Vol-3909
|authors=Andrii Kopp,Oleksandr Chornenkyi
|dblpUrl=https://dblp.org/rec/conf/iti2/KoppC24
}}
==Towards the Information Technology for Online Citizen Services Detection and Assessment on E-Government National Portals==
Towards the Information Technology for Online Citizen Services Detection and Assessment on E-Government National Portalsβ Andrii Kopp1, and Oleksandr Chornenkyi2,*, 1 2 V.N. Karazin Kharkiv National University, Svobody sq. 4, Kharkiv, 61022, Ukraine Abstract Nowadays, the problems of retrieving, processing, and analyzing information obtained from web sources to get valuable insights, processing large data volumes, and using different data analysis techniques are extremely relevant in cross-disciplinary research, including studies on the intersection of computer science and social sciences. Various social science fields, such as political science, may benefit from the application of web data extraction techniques, when the information should be scraped from websites. Therefore, this paper proposes an approach to the use of information technology for automatic data collection from e- government national portals, assessment of the online citizen services availability and variety, and further analysis by researchers in political science. The proposed information technology assumes detection of citizen services provided by national portals and their categorization according to the specified branches. The Python programming language is used to develop data processing components, while the Power BI analytical tool is used to visualized the obtained results on the dashboard. The performance of the proposed solution is verified, by processing national portals of several countries that have their homepages allow web scraping and offer English versions. Keywords E-Government Services, National Portal, Online Citizen Services, Information Technology, Web Scraping1 1. Introduction 1.1. Motivation Today, humanity lives in the period of the information age, the determinant of the evolution of which is the rapid development of new information and communication technologies that have been developing since the second half of the 20th century. Digital technology and the new opportunities that it brings were swiftly adopted by the world society, which in turn caused the beginning of the transformation of various aspects of human life, including politics. In the late 20th century and beginning of the 21st, new technologies significantly influenced the course of foreign and domestic policy of different countries. Information and communication technologies have begun to change the attitudes of states, governments, politicians, on the economy, on methods of governance, and most importantly, on improving the communicate ways with their citizens [1, 2]. Researchers of the second half of the 20th century, analyzing the possible ways of development of humanity, argued that the world is going through unavoidable changes, some developed countries are gradually moving from industrial to post-industrial society, and new technologies are one of the main drivers of this transition [3, 4]. There was an understanding that Information Technology and Implementation (IT&I-2024), November 20-21, 2024, Kyiv, Ukraine Corresponding author. These authors contributed equally. kopp93@gmail.com (A. Kopp); chornenkyi.o.o@gmail.com (O. Chornenkyi) 0000-0002-3189-5623 (A. Kopp); 0009-0001-9479-1776 (O. Chornenkyi) Β© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 252 CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings post-industrial society is a society that cannot do without the dominance of modern technologies, and the priority value is information [3-5]. In this way, it is possible to consider the concept of a post-industrial society to be equivalent to an information society. As the Internet began to spread widely around the world, quite a lot of enthusiasts interested in improving communication with government agencies began to use the then-new technology to create the first relevant web resources, which in turn could be supported by local governments [1]. With the following evolution of Internet-related technologies at the beginning of the 21st century emerged the understanding that as part of the development of an inclusive information society, it is necessary to reboot old methods of public administration and develop e-government models. In turn, the e-government model is open for communication with citizens and includes the development and support of specialized government web portals and online services through which citizens can receive information and services with the help of the Internet [6]. Today, the digitalization of government structures and creating an e-government model is a priority for many developed countries, and the E-Government Development Index (EGDI) shows that some states have achieved quite tangible success in achieving this aim [7]. In previous research [8], we shown in general the feasibility and prospects of using web-scrapping method and data analytics tools to analyze government national web portals. In the current study, we want to focus on using the proposed approach to comparatively analyze government web portals of n countries. The previous research [8] focused only on two countries, but the current study covers all countries from the EGDI ranking [7] whose portals had an English version and to which we had technical access. 1.2. Related Work Political science today has a rather large arsenal of research methods. However, like all other sciences, social sciences and political science in particular are in a state of constant movement, transformation and improvement of methodology. In terms of significant transformations, there has recently been a growing interest among humanities researchers in computational research methods using digital technology. In this case, it can be spotted the emergence of an interdisciplinary field, namely Computational Social Science (CSS). The growing interest of scholars in computational social science can also be justified by the rapid development of the internet and the growth of digital data worldwide [9]. It is also necessary to draw attention to the fact that in political science the use of methods related to the application of computer technology dates back to the second half of the 20th century [10, 11]. First of all, the use of computer technology in political science can be associated with simulation agent-based modeling of social and political processes. It is the use of this approach that makes it possible to investigate quite a large number of variants of events and find implicit causality relationships [11]. Methods related to automatic computerized text recognition have also gained some development. of text data in a fairly short period of time [10]. The Internet evolution has been one of the main reasons for the widespread digitalization of human life. Today, any user activity on the Internet can be recorded, and at the same time web pages, social networks, online media, blogs, file exchanges, etc. can be a source of valuable information for web pages can be used to analyze web resources. Web scraping can be understood as collecting information from a particular web page of interest to the researcher [13]. Nowadays, many politicians, parties, and government agencies use web pages to communicate with citizens. In general, the creation of dedicated web portals is part of the e-government model. The web scraping approach is valuable for political science because it allows us to examine how websites are used [14]. 253 1.3. Problem Statement Let us present the approach to automated collection and analysis of data from national e-government web portals of different countries. The main purpose of the study is to improve the assessment of the structure and content of these portals in terms of availability and variety of services provided to citizens. The analysis is aimed at identifying the key characteristics of the web portals, such as the number of available electronic services, their thematic distribution by e-government service catalog, as well as the level of richness of services in various citizen service branches. 2. Materials and Methods 2.1. National Web Portals Data Preparation The baseline of the data workflow is the uploading of a prepared data set containing information about countries, their national portals and relevant e-government indices (Fig. 1). Figure 1: General structure and heading of the prepared data set using EGDI national portals data. The process starts with the converting data from Excel-based spreadsheet format to JSON format [16] is used, which allowed to load data from an Excel file. The required dataset (Fig. 1) is contained Formally, a spreadsheet can be represented as the set of records: π·π₯ππ π₯ = {π1 , π2 , β¦ , ππ }. (1) Were: β’ π is the number of records in the data set π1 , π2 , β¦ , ππ ; β’ ππ is the vector of attribute values, π = 1, π. To load this data (1) into the programming environment, a table read operation is used, which ensures that the spreadsheet is transformed into a data frame marked as π·ππ . Then, the data frame π·ππ is transformed into the JavaScript Object Notation (JSON) format. Let us formally describe this process as the following operation: πππ ππ (π·ππ ) = π·ππ ππ . (2) 254 Were: β’ πππ ππ is the function used to transform the set of records π·ππ into the respective set of JSON- based objects π·ππ ππ = {πβ²1 , πβ²2 , β¦ , πβ²π }; β’ πβ²π is the JSON object that corresponds to a separate record in the data set, π = 1, π. The obtained JSON string π·ππ ππ (2) is then de-serialized into the internal Python object π·ππππ‘ , which could be denoted as the set of dictionaries: π·ππππ‘ = {πβ²β²1 , πβ²β²2 , β¦ , πβ²β²π }. (3) Were: β’ π is the number of records in the data set π1 , π2 , β¦ , ππ ; β’ πβ²β²π is the dictionary of attributes in the key-value format, corresponding to the record in the data set, π = 1, π. Finally, the process is finished with saving data (3) into the JSON format file. The writing operation could be formally described as following: πππ ππ (π·ππππ‘ , πΉππ ππ ). (4) Were: β’ πππ ππ is the function that stores the set of dictionaries π·ππππ‘ into a JSON file; β’ πΉππ ππ Therefore, the data preparation process could be formally described as the set of operations that ensure the transformation (4) of the Excel spreadsheet into the JSON format file: ππππ πππ ππ πππ πππππππ§π πππ ππ (5) π·π₯ππ π₯ β π·ππ β π·ππ ππ β π·ππππ‘ β πΉππ ππ . Then, the obtained JSON file (Fig. 2) will be used by another software component for national portals analysis using web scraping techniques. 2.2. National Web Portals Data Processing web portal is accessed by making Hypertext Transfer Protocol (HTTP) requests. All hyperlinks are extracted from the resulting (Hypertext Markup Language) HTML content, which is then analyzed for thematic keywords relevant to major government service areas and citizen services according to Integrated Architecture Framework for E-Government (IAFEG) [18]: β’ taxation; β’ education; β’ healthcare; β’ immigration; β’ employment. 255 Figure 2: Developed Python component for Excel to JSON transformation. Fig. 3 demonstrates the IAFEG structure and citizen services as part of the e-government system. Figure 3: Integrated Architecture Framework for E-Government [18]. Let us formally define the set of countries and corresponding web portals as following: πΆ = {π1 = (π1 , π’1 ), π2 = (π2 , π’2 ), β¦ , ππ = (ππ , π’π )}. (6) Were: β’ ππ is the country data record at the EDGI website, π = 1, π; β’ ππ is the national web portal page corresponding to the country data record, π = 1, π; β’ π’π is the Uniform Resource Locator (URL), i.e. the web address of the respective national web portal, π = 1, π. 256 The initial stage of the national web portals data (6) processing flow includes the data loading from the previously created JSON file described as following: ππππ (7) πΉππ ππ β π·ππππ‘ . Where: β’ πΉππ ππ is the JSON file with the prepared EGDI national web portals data; β’ π·ππππ‘ is the corresponding set of dictionaries, which contain information about countries and their web portals. For each web portal page ππ , π = 1, π that corresponds to the country ππ , π = 1, π (7), the HTTP Let us formally define the web page request as following: ππ = π πππ’ππ π‘(π’π ), π = 1, π. (8) Where: β’ ππ is the result of the request to web portal page ππ , π = 1, π; β’ π’π is the URL of the corresponding national web portal page ππ , π = 1, π. Each web portal page ππ , π = 1, π library [20], which extracts all the hyperlinks from the web portal page: πΏπ = {ππ1 , ππ2 , β¦ , πππ }, π = 1, π. (9) Where πππ is the hyperlink extracted from the web portal page, π = 1, π, π = 1, π. For each country web portal (9), a search is carried out by thematic categories of e-government citizen services defined by dictionaries of keywords. Let us formally describe such thematic categories using the following set: π = {π‘1 , π‘2 , β¦ , π‘π }. (10) Where π‘π is the thematic category of citizen services (e.g. taxation, education, healthcare, immigration, employment, etc.), each is associated with the set of keywords Ξ© = {ππ1 , ππ2 , β¦ , πππ }, π = 1, π. The IAFEG-based keywords for the e-government citizen services are outlined in Table 1. Table 1 Proposed thematic categories of citizen services and their keywords based on IAFEG [20] Services Keywords Taxation tax, finance, income, money, debt, credit Education education, school, study, child, training, student Health health, insurance, care, sick, medical, funeral Immigration immigration, citizen, travel, visa, residence, international Employment employment, work, job, business, license, certification For each hyperlink πππ , π = 1, π, π = 1, π (9) the presence of keywords (10) in the hyperlink text is checked. If the keyword from the thematic category π‘π , π = 1, π is found, the hyperlink is stored and considered as corresponding to the respective e-government citizen service. Therefore, the set of thematic categories and hyperlinks that, as we assume, provide the access to corresponding citizen services, is formulated for each country: 257 ππ = {(π‘π , πππ )}, π = 1, π. (11) Where: β’ π‘π is the thematic category of citizen services, π = 1, π; β’ πππ is the hyperlink extracted from the web portal page and considered as the access point to the corresponding thematic category (or its sub-category) of citizen services (e.g. taxation, education, healthcare, immigration, employment, etc.), π = 1, π, π = 1, π. The example of United Kingdom (UK) national portal [21] data scraping is demonstrated in Fig. 4. Figure 4: Example of UK national portal data scraping [21]. 2.3. National Web Portals Data Analysis As can be seen (Fig. 4), for any country, a thematic structure of citizen services is built based on the collected hyperlinks. This makes it possible to estimate the number of citizen services detected by using each thematic category (11), by introducing the following equation: ππ = π‘π . (12) β (π‘π ,πππ )βππ Moreover, it becomes possible to estimate the service richness [8] of the national portal with e- government services (12): 1 (13) ππ π = |π |. |π| π Where: 258 β’ ππ is the set of detected citizen services based on the introduced thematic categories (Fig. 3) and keywords (Table 1), π = 1, π; β’ π is the set of thematic categories characterizing citizen services. Finally, the general process of e-government national portals data processing could be formally represented as following: ππππ πππ ππ πππ πππππππ§π πππ ππ ππππ π·π₯ππ π₯ β π·ππ β π·ππ ππ β π·ππππ‘ β πΉππ ππ β ππππ ππππ’ππ π‘ πππ‘πβ πππππ’πππ‘π (14) β π·ππππ‘ β {πΏπ , π = 1, π} β {ππ , π = 1, π} β πππππ’πππ‘π β {(ππ , ππ π ), π = 1, π}. Further analysis of the data obtained using the introduced pipelines (14), requires powerful visual tools, freely accessible and easy to use for non-professionals in information technology, i.e. social or political scientists. Hence, we propose to use Microsoft Power BI for the further analysis of the obtained web scraping results. Power BI is a high-performance Business Intelligence (BI) tool for advanced data visualization and data-driven decision making [22]. and Power BI data analytics tool [22] for the national web portals scraping and evaluation. Figure 5: Developed Python component for national portal scraping and analysis. Hence, using the proposed Python-based data pipeline and the Power BI analytical tool, the proposed solution allows to automatically collect data from national portals of different countries mentioned in the EGDI index [7], as well as to evaluate their functionality by comparing toward the IAFEG structure of citizen services [18]. Using the proposed data analytics solution, social or political science scholars can easily compare and analyze the development of online services provided by national portals of different countries. Also, this toolkit can be used to identify the best practices in the field of online citizen services offering via e-government portals. 3. Results and Discussion The general EGDI-based dataset [7] includes 193 countries, each has the e-government evaluation and the URL of a national portal. Fig. 6 demonstrates the example of the countries list and a country information, including the national portal website URL, on the example of Estonia. 259 Figure 6: Example of the countries list and Estonia information with the national portal URL [7]. However, a lot of national portals mentioned on the EGDI website [7] are either not accessible or do not provide English version. Moreover, the proposed toolkit has failed to process several web portals among the remaining ones (i.e. accessible and with English versions). Table 2 demonstrates the stages of EGDI dataset [7] discovery, preliminary check (to manually remove countries with not accessible or non-English language interface), and processing using the proposed technology (14). Table 2 Processing results Stage Countries Remark Discovery 193 The initial EGDI [7] list consists of the 193 countries Preliminary 87 Removed 106 records describing countries, which national portals check are either not accessible or do not provide English versions Processing 69 Failed to process national web portals of 18 countries As can be seen from Table 2, almost 55% of country records were removed from the initial dataset because of the inaccessible national portals or absence of English versions. The remaining 87 records were processed using the proposed solution. However, only 79% of the available national portals were successfully scraped. There are such countries and corresponding national portals: β’ successfully processed: Mexico, Ukraine, Brazil, Maldives, Cambodia, Jamaica, Australia, Dominica, Liberia, Antigua and Barbuda, Monaco, China, Luxembourg, Micronesia (Federated States of), Republic of Moldova, Romania, Latvia, Uzbekistan, Nigeria, Somalia, Pakistan, Kazakhstan, Bahrain, Timor-Leste, United Arab Emirates, Viet Nam, Czech Republic, Croatia, Switzerland, Belgium, Austria, Germany, Canada, Cyprus, Kuwait, Japan, Fiji, Chile, Armenia, Bahamas, Barbados, Botswana, Grenada, Georgia (Country), Saint Kitts and Nevis, Montenegro, Rwanda, Singapore, Slovenia, United Kingdom of Great Britain and Northern Ireland, Vanuatu, South Africa, Sweden, Italy, Mauritius, Saint Lucia, Malawi, Netherlands, Solomon Islands, Portugal, Kiribati, Liechtenstein, Norway, Samoa, Cameroon, Finland, Trinidad and Tobago, United States of America, and Estonia; 260 β’ failed processing: Bulgaria, Spain, Iran (Islamic Republic of), Jordan, Lithuania, Eritrea, Ghana, Ireland, Israel, Kyrgyzstan, Malta, Namibia, Philippines, New Zealand, Morocco, Palau, Thailand, and Zimbabwe. Fig. 7 demonstrates the created Power BI dashboard that consolidates information about countries, Online Service Index (OSI) measures of these countries, as well as introduced measures: β’ number of detected citizen services (12); β’ richness of the citizen services (according to the IAFEG [18]), calculated as the relative number of detected services to the all thematic categories [8]. Figure 7: Power BI dashboard consolidating information about countries national portals. As can be seen from Fig. 7, country national portals are placed on the scatter chart: β’ X-axis of this scatter chart is OSI, which estimates the scope and quality of online services provided by a web portal according to the EGDI methodology [7]; β’ Y-axis is service richness (13), which estimates the correspondence of the national portal to the Integrated Architecture Framework for E-Government [18]. Sizes of each point reflect numbers of detected online citizen services (Fig. 7). In Fig. 7 we consider the first ten countries, ordered by the OSI values Estonia, Ukraine, Singapore, United Kingdom of Great Britain and Northern Ireland, Japan, Kazakhstan, China, Germany, Australia, and Netherlands. It is interesting, that processing of Ukraine, Australia, and Netherlands national portals has resulted into 0 citizen services detected and, therefore, 0.00 values for the service richness measures. However, according to the OSI measurement on the EGDI website [7], Ukraine has 0.99 score, while Australia and Netherlands have 0.92. 261 The analysis of Ukrainian indicators resulted into the fact, that EGDI rating contains the URL of the Cabinet of Ministers (CM) homepage [23] instead of the Diia portal [24]. Whereas, Diia portal provides the online citizen services, searched by the proposed technology according to the IAFEG [18]. As for the Australian and Dutch national portals, the reasons for undetectable online services are similar. 4. Conclusion and Future Work This paper proposed the information technology for online citizen services detection and assessment on e-government national portals. The main purpose of this study was to improve the assessment of the structure and content of national portals in terms of availability and variety of online services provided to citizens. Such a solution can be used by political scientists to perform experiments, find best practices of online citizen services provision, compare different national portals, and get valuable insights. The approach to data extraction from e-government national portals and further processing to assess the availability and variety of online citizen services is proposed, and the corresponding information technology is implemented using Python, third-party libraries, and Power BI. Obtained results have shown the difference between EGDI-based OSI measurements and the availability of detected citizen services. There was noticed, that some national portals provided by EGDI are not the same portals really providing online citizen services, e.g. the URL for Ukraine leads to the CM website, not Diia. In the future, the proposed approach will be improved to traverse all national portal pages. Declaration on Generative AI The authors have not employed any Generative AI tools. References [1] M. Castells, The Internet galaxy: Reflections on the Internet, business, and society, Oxford University Press, 2002. [2] M. Castells, Communication power, Oxford University Press, 2013. [3] A. Toffler, The Third Wave: The Classic Study of Tomorrow, New York: Bantam, 1980. [4] D. Bell, The coming of post-industrial society: A venture of social forecasting, Basic Books, 1973. [5] Y. Masuda, The information society as post-industrial society. World Future Society, 1981. [6] N. K. Hanna, Transforming government and building the information society: Challenges and opportunities for the developing world, New York: Springer, 2010. [7] E-Government Development Index (EGDI). URL: https://publicadministration.un.org/egovkb/en-us/About/Overview/-E-Government- Development-Index. [8] A. Kopp, O. Chornenkyi, Towards the Information Technology Usage for E-Government Portal Assessment based on Web Data Extraction Techniques, CEUR Workshop Proceedings 3646 (2023) 12 22. URL: https://ceur-ws.org/Vol-3646/Paper_2.pdf. [9] A. Edelmann, T. Wolff, D. Montagne, C. A. Bail, Computational social science and sociology. Annual review of sociology 46(1) (2020) 61 81. doi:10.1146/annurev-soc-121919-054621. [10] O. Chornenkyi, Use of information and communication technologies for political science research, The Journal of V.N. Karazin Kharkiv National University: Issues of Political Science 42 (2022) 38 44. doi:10.26565/2220-8089-2022-42-06. [11] C. F. Voinea, Political Attitudes: Computational and Simulation Modelling, John Wiley & Sons, 2016. doi:10.1002/9781118833162. 262 [12] J. N. Victor, A. H.Montgomery, M. Lubell, The Oxford handbook of political networks, Oxford University Press, 2017. [13] L. Curini, R. Franzese, The SAGE handbook of research methods in political science and international relations, Sage, 2020. [14] V. Lowndes, D. Marsh, G. Stoker, Theory and Methods in Political Science, 4th ed. Bloomsbury. Academic, 2017. [15] Python. URL: https://www.python.org/. [16] pandas Python Data Analysis Library. URL: https://pandas.pydata.org/. [17] json JSON encoder and decoder. URL: https://docs.python.org/3/library/json.html. [18] Computer and Communications 7(1) (2019) 1 9. doi:10.4236/jcc.2019.71001. [19] urllib URL handling modules. URL: https://docs.python.org/3/library/urllib.html. [20] BeautifulSoup4. URL: https://pypi.org/project/beautifulsoup4/. [21] GOV.UK. URL: https://gov.uk/. [22] Microsoft Power BI. URL: https://powerbi.microsoft.com/en-us/. [23] Cabinet of Ministers of Ukraine. URL: http://www.kmu.gov.ua/. [24] Unified State Web Portal of Electronic Services Diia. URL: https://diia.gov.ua/. 263