DIGILOG: Towards a Monitoring Platform for Digital
                                Transformation of European Communities
                                Jonathan Gerber1,∗,† , Jasmin S. Saxer1,† , Bruno B. Kreiner1,† and Andreas Weiler1,†
                                1
                                 Institute of Computer Science, Zurich University of Applied Sciences, Technikumstrasse 9, 8401 Winterthur, Switzerland
                                https:// www.zhaw.ch/ en/ engineering/ institutes-centres/ init/


                                            Abstract
                                            DIGILOG is an interdisciplinary research project between Computer and Political Science. The goal
                                            of the research project is to monitor and evaluate the digital transformation of the local governments
                                            of Europe. The project will generate coherent data for a systematic comparison using methodological
                                            triangulation, i.e., quantitative and qualitative methods. It will take the form of a regular and automated
                                            quantitative survey of all local authorities in 47 European countries (members of the Council of Europe),
                                            based on web crawling and machine learning techniques - this is a novel approach in the context of the
                                            social sciences - and qualitative research, namely case studies in selected European countries. Renowned
                                            scholars from the University of Potsdam, ZHAW, and the Vienna University of Economics and Business,
                                            with extensive experience in local government and comparative research, form the consortium of this
                                            project.
                                            Key project deliverables will be an openly accessible monitoring platform of digital transformation
                                            at the local tier of government, journal articles, an edited volume, and publications for practitioners.
                                            The real-time platform “Monitoring Digital Transformation in European Local Governments” will be
                                            accessible to researchers and practitioners worldwide and contribute to a better understanding of long-
                                            term developments. The duration of the project submitted to the SNSF/DFG is three years; however, by
                                            automating the process, the real-time platform will continue to exist and be updated regularly beyond
                                            this time frame. The research project will yield policy-relevant knowledge concerning local digitization
                                            measures from a European perspective, which can then be utilized to improve policymaking for future
                                            public sector modernization.

                                            Keywords
                                            digital transformation, content monitoring, data source evaluation, website embedding


                                1. Introduction
                                Digital transformation, a crucial innovation in local government, is anticipated to reshape
                                European public service delivery, administration structures, and overall governance. The recent
                                COVID-19 pandemic underscored the significance of well-prepared digital administration,
                                particularly at the local government level, which plays a pivotal role in digital transformation.
                                However, current comparative research on the digital transformation of state and administration

                                Joint Proceedings of RCIS 2024 Workshops and Research Projects Track, May 14-17, 2024, Guimarães, Portugal
                                ∗
                                    Corresponding author.
                                †
                                    These authors contributed equally.
                                Envelope-Open gerj@zhaw.ch (J. Gerber); saxr@zhaw.ch (J. S. Saxer); bapt@zhaw.ch (B. B. Kreiner); wele@zhaw.ch (A. Weiler)
                                GLOBE https://www.zhaw.ch/de/ueber-uns/person/gerj/ (J. Gerber); https://www.zhaw.ch/de/ueber-uns/person/saxr/
                                (J. S. Saxer); https://www.zhaw.ch/de/ueber-uns/person/bapt/ (B. B. Kreiner);
                                https://www.zhaw.ch/de/ueber-uns/person/wele/ (A. Weiler)
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
lacks sufficient investigation into local government levels, creating a knowledge gap on
implementation and effects across Europe.

DIGILOG1 is a research project determined to close this gap. It is an international
and interdisciplinary project that consists of political and computer scientists from the
University of Potsdam (DE), the Vienna University of Economics and Business (AU), and
the Zurich University of Applied Science ZHAW (CH). The Researchers of the project in
the field of Computer Science are the contributing authors of this paper. The project is
financed by the Swiss National Science Foundation (SNSF / Project Nr. 200839) and Deutsche
Forschungsgemeinschaft (DFG, German Research Foundation). The start of the project was in
spring 2022 and the end will be in summer 2025. The research project seeks to address this
above-mentioned gap by examining two key questions:
       • What are the dynamics, scale, and pace of digital transformation in European local
         governments? Is the change radical, revolutionary, incremental, or evolutionary, and are
         there identifiable regional differences?
       • What effects does digital transformation have on these organizations, specifically in terms
         of output (service delivery, organization, processes, and resources), outcomes (perfor-
         mance and accountability), and impact (citizen acceptance, governance, and emerging
         tensions)?
To address these questions comprehensively, data will be collected in different ways from all
municipalities in the 46 member states of the Council of Europe. As shown in Figure 1 we
collect data for the different communities in three ways.

1.1. Case Studies
In conjunction with the quantitative surveys, comparative case studies are conducted in selected
municipalities, which are also part of the extended survey sample. The case studies are car-
ried out in communities with different administrative cultures to capture the country-specific
variance of local administrative systems. The case study approach relies on field research
methods, semi-structured expert interviews, and focus groups conducted with local CEOs, Chief
Information Officers (CIOs), department heads, employee representatives, and staff. The aim is
to gain in-depth insights into the internal processes and actor constellations of the respective
digital transformation paths, building on the quantitative part’s interim results by capturing the
municipalities’ organizational realities.

1.2. Survey
In addition to the qualitative case studies, the DIGILOG project is based on two quantitative
forms of data collection: a web crawler for analyzing municipal websites and a survey among
the leaders of European municipal administrations.
   The survey has several objectives. The main goal is to collect information on the status of the
surveyed municipal administrations’ external and internal digital transformation, from which a
1
    https://www.digilog-project.org/
Figure 1: Three different ways of collecting data for the real-time monitoring platform for the digital
transformation of municipalities in the 46 member states of the Council of Europe.


Europe-wide index will be created. In the external domain, this primarily includes the digital
service offerings of the administrations, classified into various maturity levels according to an
established social science model. The categorization spans from basic information provision
to options for digital interaction with administrative personnel and completely digital and
seamless administrative process handling. The internal domain, on the other hand, covers
aspects such as the technical equipment of the administration, forms of internal communication,
data management, and the automation of processes and routine decisions.
   Furthermore, the survey collects data on various other variables related to digitization. These
include factors that can help explain the state of digital transformation in municipalities, such
as the size and organizational form of the municipality, as well as those that can reflect the
consequences of digitization, such as questions about the efficiency of administration or the
satisfaction of citizens with administrative work.

1.3. Web Crawler and Monitoring Platform
Web crawling is a central component of the DIGILOG project. In addition to surveys, automatic
crawling and analysis of municipal websites are part of the quantitative analysis. The results
of the data analysis described below will be displayed on a dashboard within a monitoring
platform. Additionally, the lists of website URLs and email addresses for surveys, if not already
provided, are completed through crawling public data sources.
   The monitoring platform is based on three main components that interact with each other:
web crawling, data storage, and subsequent data analysis. This platform ensures monitoring
of the political municipalities’ websites during the project duration. To manage the volume
of data, several methods enable targeted data collection with minimal information loss. One
project goal is to explore and implement the most efficient method for this task.
   Data storage is ensured with two different database systems, a relational and a document-
oriented system. The relational system stores database keys and normalized information.
Complementarily, the website documents and the analysis results are stored in a document-
oriented system. For analysis, clues (e.g., mention of selected services or keywords) indicating
digital transformation are extracted and evaluated. Various methods from Natural Language
Processing (NLP), a subfield of Machine Learning, are applied.
   The analysis, in turn, can provide effective feedback to the intelligent crawler, contributing
to its continuous improvement. The quality of the analysis is ensured by domain experts
who interpret and contextualize the results for management, political science, and public
administration.

1.4. Measurement of Digital Transformation
Several relevant indices on digital service provision exist, offering country rankings and poten-
tially serving as a valuable foundation for an index on local digital service provision within the
scope of this project. The European Commission publishes the Digital Economy and Society
Index Report on digital public services; however, it lacks specificity for the local government
tier [1]. The Digital Adoption Index by the World Bank, a composite index gauging the adoption
of digital technologies globally, focuses on the government sector, with sub-indices covering
core administrative systems, online public services, and digital identification. The United Na-
tions’ E-Government Development Index assesses the effectiveness of public service delivery,
identifying patterns in e-government development and regional challenges. Despite its Local
Online Service Index focusing on the local level, evaluating the scope and quality of online
services, telecommunication infrastructure development, and human capital, it only assesses
portals in a selection of 100 cities worldwide, overlooking smaller local governments [2]. The
E-Government Monitor, conducted through a representative survey of populations in Germany,
Austria, and Switzerland, explores the usage and satisfaction related to e-government services.
Results indicate a pronounced use of e-government services in Austria, followed by Switzerland
and Germany [3]. Nonetheless, once again, this index lacks specificity for the local government
tier. The German Index of Digitalization (Deutschland-Index Digitalisierung) scrutinizes digital
infrastructure, the use of digital services, the digital economy, and e-government in individual
German states but is confined to Germany [4].


2. Related Work
The project described is interdisciplinary. It intersects with the research area of political Science
and Information Retrieval in Computer Science. However, we only focus on the related work of
Information Retrieval related to this project.
   There is already work claiming to measure the level of digital transformation within local
governments. Garcia-Sanchez et al. [5] presents an analysis of the development of e-governments
of 102 Spanish municipalities where they select features from various papers and frameworks.
Pina et al. [6] conducted an empirical study about the effect of e-government on transparency,
openness, and hence accountability in 15 countries of the EU and a total of 318 government
websites. This task of assessing websites even finds its application in other domains such as
health [7].
   Since we focus on website content to measure digital transformation, we note the importance
of existing work on website processing, classification, and embedding, which is the encoding of
data into a lower-dimensional representation in such a way that preserves some relationship in
the data. We might focus on a website’s visual or textual aspects, or even both, and leverage
machine learning for our digitalization measurements. It’s not surprising that recent work
often uses Large Language Models (LLMs) and Convolutional Neural Networks (CNNs). Other
classical machine-learning approaches rely more on feature engineering. However, they do
not generalize as well as the state-of-the-art models due to their lack of flexibility regarding
structural changes of an HTML page. A large amount of related work exists in the field of
text-based embedding and classification of websites, which might help us categorize certain
website elements. Kowsari et al. [8] and Minaee et al. [9] provide reviews on past work on text
classification in general, while Hashemi [10] gives us a survey on web page classification. While
”classification” refers to categorizing websites, before making the final prediction, we need to
transform website data into a more manageable form which can involve creating embeddings
for the websites. These website embeddings can be compared based on numerical similarity
for various use cases. The classification models can be used to detect important digitalization
elements on the website while also giving us insight into how to process websites effectively.

2.1. Visual, text, and mixed Website Classification
Visual-only classifications are, in many cases, applied to the detection of harmful content such
as propaganda of terrorism [11], alcohol, adult content, weapons [12, 13] or just food, fashion or
landscapes [14]. These classes all have distinct visual features. However, in many cases, these
approaches can’t distinguish between visually similar pages (e.g., municipality homepage vs.
tourism page of the same municipality).
   In text-based website classification, some approaches rely on classical machine learning
[15, 16]. However, the majority are based on neural networks [17, 18, 19, 20] and the more
recent approaches are transformers architecture [21, 22, 23, 24]. Most notably, [23] proposes
MarkupLM for document understanding tasks based on the raw text and markup language,
which is also used to code websites.
   A mixed approach using both textual and visual features can be seen in [25] and [26]. The
ladder encodes multiple parts of a website, such as a screenshot and metadata, and combines
them to feed it into a neural network as input. The model is trained to categorize websites
into 14 different classes. While previous work gives insight into how websites are processed
and represented numerically, we must apply this knowledge to our specific data. How exactly
website data is handled is not a solved problem. Kiesel et al.[27], for example, compares
different web page segmentation algorithms. Dividing the page into individual segments might
provide more concentrated information sources for our future algorithms. Finally, recent AI
chatbots such as ChatGPT or open-source variants are capable of understanding a wide range of
instructions. Recent developments have made it possible for the models to even react to image
input while understanding user instruction, making them large multimodal models. They are
foundation models that can be used in a variety of ways, and they can understand website code
as well as screenshots. As development continues, it is becoming easier to use these models for
automatic extraction, summarization, analysis, and categorization of municipality websites. As
these models generate text, natural language analysis is essential.


3. Recent and Future Work
The field of our work in this project consists of two parts:

    • The URL gathering consists of the following questions: Has the municipality a website,
      and if so, what is the URL? Furthermore, the retrieved URLs must be distinguished from
      non-municipality URLs to eliminate false positives.
    • The website must be preprocessed (website segmentation, selection of relevant data, and
      removal of noisy data) and processed. The municipality website must be assessed based
      on the criteria defined by political scientists. A classifying model must be capable of
      detecting certain features if they exist on this website.

Assessing a website requires a semantic understanding of a website by the machine learning
model used to process the Websites. Whether it is URL classification (specifically discerning
municipality websites from others), topic modeling (classification of services), or e-service
detection on web pages, a robust foundation in embedding is essential. In our previous work,
we conducted not yet published experiments with general pre-trained webpage embedding
models and developed a basic embedding method to effectively differentiate municipality
websites from non-municipality ones. All methods yielded very good results, with the more
complex ones resulting in slightly better results. However, it’s crucial to acknowledge that basic
embeddings demonstrated a faster processing speed than more complex models, a significant
consideration given the vast number of websites in our study. We additionally evaluated
different data sources concerning their completeness of data. The categories evaluated were
search engines, encyclopedias, and blind requests with fabricated URLs based on certain patterns.
The retrieved URLs partially consisted of wrong URLs that did not belong to the local government
or municipality. Although the URL appeared to be correct in many of those cases, containing
the municipality name, the content was of another topic such as tourism, airports, other official
organizations in this municipality, or even completely unrelated content to the municipality.
Thus, an automated distinction and classification by analyzing the website’s content was
required.
   Furthermore, as mentioned in Section 1.4, there are many ways of measuring digitization. In
a conference paper, we defined three key aspects of our analysis, which consisted of different
indices. The categories are Service Maturity (measurement of provision of information, com-
munication possibility, and transactions), Usability (evaluation of accessibility and convenience
of use), and Technical Maturity (evaluation of security and privacy). This index was published
in a conference paper [28]. We tested the index on a sample of municipality websites and are
currently working on implementing and applying it to the whole data set. Looking ahead,
our plan encompasses the application of webpage embedding techniques for e-form detection,
including webpage segmentation and relevant information extraction. Further, we plan to
leverage large Language Models for topic modeling of webpages and webpage content. This
approach aims to further automate the process of monitoring the digital transformation of
European communities.


4. Acknowledgment
This work is supported by Grant No. GR 200839 of the Swiss National Science Foundation (SNF)
and German Research Foundation (DFG) for the research project “Digital Transformation at
the Local Tier of Government in Europe: Dynamics and Effects from a Cross-Countries and
Over-Time Comparative Perspective (DIGILOG)”.


References
 [1] European Commission, The Digital Economy and Society Index (DESI), 2023. URL: https:
     //digital-strategy.ec.europa.eu/en/policies/desi.
 [2] UN DESA, UN E-Government Survey 2022 - The Future of Digital Government, Technical
     Report, New York, 2022.
 [3] Initiative D21 and TUM, eGovernment Monitor 2023, Technical Report, 2023.
     URL: https://initiatived21.de/uploads/03_Studien-Publikationen/eGovernment-MONITOR/
     2023/egovernment_monitor_23.pdf.
 [4] Kompetenzzentrum Öffentliche IT, Deutschland-Index der Digitalisierung, 2023. URL:
     https://www.oeffentliche-it.de/deutschland-index.
 [5] I.-M. García-Sánchez, L. Rodríguez-Domínguez, J.-V. Frias-Aceituno, Evolutions in e-
     governance: evidence from Spanish local governments, Environmental Policy and Gover-
     nance 23 (2013) 323–340. Publisher: Wiley Online Library.
 [6] V. Pina, L. Torres, S. Royo, Are ICTs improving transparency and accountability in the
     EU regional and local governments? An empirical study, Public administration 85 (2007)
     449–472. Publisher: Wiley Online Library.
 [7] F. Monnet, L. Pivodic, C. Dupont, R.-M. Dröes, L. Van den Block, Information on advance
     care planning on websites of dementia associations in Europe: A content analysis, Aging
     & Mental Health 27 (2023) 1821–1831. Publisher: Taylor & Francis.
 [8] K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, D. Brown, Text
     classification algorithms: A survey, Information 10 (2019) 150. Publisher: Multidisciplinary
     Digital Publishing Institute.
 [9] S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, J. Gao, Deep learn-
     ing–based text classification: a comprehensive review, ACM computing surveys (CSUR)
     54 (2021) 1–40. Publisher: ACM New York, NY, USA.
[10] M. Hashemi, Web page classification: a survey of perspectives, gaps, and future directions,
     Multimedia Tools and Applications 79 (2020) 11921–11945. Publisher: Springer.
[11] M. Hashemi, M. Hall, Detecting and classifying online dark visual propaganda, Image and
     Vision Computing 89 (2019) 95–105. Publisher: Elsevier.
[12] A. Akusok, Y. Miche, J. Karhunen, K.-M. Bjork, R. Nian, A. Lendasse, Arbitrary cate-
     gory classification of websites based on image content, IEEE Computational Intelligence
     Magazine 10 (2015) 30–41. Publisher: IEEE.
[13] L. Espinosa-Leal, A. Akusok, A. Lendasse, K.-M. Björk, Website classification from webpage
     renders, in: Proceedings of ELM2019 9, Springer, 2021, pp. 41–50.
[14] D. López-Sánchez, J. M. Corchado, A. G. Arrieta, A CBR system for image-based webpage
     classification: case representation with convolutional neural networks, in: The Thirtieth
     International Flairs Conference, 2017.
[15] V. K. Bhalla, N. Kumar, An efficient scheme for automatic web pages categorization
     using the support vector machine, New Review of Hypermedia and Multimedia 22 (2016)
     223–242. Publisher: Taylor & Francis.
[16] G. Matošević, J. Dobša, D. Mladenić, Using machine learning for web page classification in
     search engine optimization, Future Internet 13 (2021) 9.
[17] E. Buber, B. Diri, Web page classification using RNN, Procedia Computer Science 154
     (2019) 62–72. Publisher: Elsevier.
[18] B. Y. Lin, Y. Sheng, N. Vo, S. Tata, Freedom: A transferable neural architecture for structured
     information extraction on web documents, in: Proceedings of the 26th ACM SIGKDD
     International Conference on Knowledge Discovery & Data Mining, 2020, pp. 1092–1102.
[19] A. K. Nandanwar, J. Choudhary, Semantic features with contextual knowledge-based web
     page categorization using the GloVe model and stacked BiLSTM, Symmetry 13 (2021) 1772.
[20] Y. Zhou, Y. Sheng, N. Vo, N. Edmonds, S. Tata, Simplified dom trees for transferable
     attribute extraction from the web, arXiv preprint arXiv:2101.02415 (2021).
[21] X. Chen, Z. Zhao, L. Chen, D. Zhang, J. Ji, A. Luo, Y. Xiong, K. Yu, WebSRC: a dataset for
     web-based structural reading comprehension, arXiv preprint arXiv:2101.09465 (2021).
[22] A. Gupta, R. Bhatia, Ensemble approach for web page classification, Multimedia Tools
     and Applications 80 (2021) 25219–25240. Publisher: Springer.
[23] J. Li, Y. Xu, L. Cui, F. Wei, MarkupLM: Pre-training of Text and Markup Language
     for Visually-rich Document Understanding, 2022. URL: http://arxiv.org/abs/2110.08518,
     arXiv:2110.08518 [cs].
[24] A. K. Nandanwar, J. Choudhary, Contextual Embeddings-Based Web Page Categorization
     Using the Fine-Tune BERT Model, Symmetry 15 (2023) 395. URL: https://www.mdpi.com/
     2073-8994/15/2/395. doi:10.3390/sym15020395 , number: 2 Publisher: Multidisciplinary
     Digital Publishing Institute.
[25] R. Bruni, G. Bianchi, Website categorization: A formal approach and robustness analysis
     in the case of e-commerce detection, Expert Systems with Applications 142 (2020) 113001.
     Publisher: Elsevier.
[26] S. Lugeon, T. Piccardi, R. West, Homepage2Vec: Language-Agnostic Website Embedding
     and Classification, Proceedings of the International AAAI Conference on Web and Social
     Media 16 (2022) 1285–1291. URL: https://ojs.aaai.org/index.php/ICWSM/article/view/19380.
     doi:10.1609/icwsm.v16i1.19380 .
[27] J. Kiesel, L. Meyer, F. Kneist, B. Stein, M. Potthast, An Empirical Comparison of Web Page
     Segmentation Algorithms, 2021, pp. 62–74. doi:10.1007/978- 3- 030- 72240- 1_5 .
[28] J. Marquardt, J. Gerber, J. Machljankin, C. Kaiser, & R. Steiner, Applying web crawling
     for data collection in the social sciences - Opportunities and limits using the example of
     digital transformation in European local governments, Zagreb, Croatia, 2023.