<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">DIGILOG: Towards a Monitoring Platform for Digital Transformation of European Communities</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Jonathan</forename><surname>Gerber</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Computer Science</orgName>
								<orgName type="institution">Zurich University of Applied Sciences</orgName>
								<address>
									<addrLine>Technikumstrasse 9</addrLine>
									<postCode>8401</postCode>
									<settlement>Winterthur</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Jasmin</forename><forename type="middle">S</forename><surname>Saxer</surname></persName>
							<email>saxr@zhaw.ch</email>
							<affiliation key="aff0">
								<orgName type="department">Institute of Computer Science</orgName>
								<orgName type="institution">Zurich University of Applied Sciences</orgName>
								<address>
									<addrLine>Technikumstrasse 9</addrLine>
									<postCode>8401</postCode>
									<settlement>Winterthur</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Bruno</forename><forename type="middle">B</forename><surname>Kreiner</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Computer Science</orgName>
								<orgName type="institution">Zurich University of Applied Sciences</orgName>
								<address>
									<addrLine>Technikumstrasse 9</addrLine>
									<postCode>8401</postCode>
									<settlement>Winterthur</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Andreas</forename><surname>Weiler</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Computer Science</orgName>
								<orgName type="institution">Zurich University of Applied Sciences</orgName>
								<address>
									<addrLine>Technikumstrasse 9</addrLine>
									<postCode>8401</postCode>
									<settlement>Winterthur</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">DIGILOG: Towards a Monitoring Platform for Digital Transformation of European Communities</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">1DD627B7904E38B7311F06D1B6A1ADE7</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:49+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>digital transformation</term>
					<term>content monitoring</term>
					<term>data source evaluation</term>
					<term>website embedding</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>DIGILOG is an interdisciplinary research project between Computer and Political Science. The goal of the research project is to monitor and evaluate the digital transformation of the local governments of Europe. The project will generate coherent data for a systematic comparison using methodological triangulation, i.e., quantitative and qualitative methods. It will take the form of a regular and automated quantitative survey of all local authorities in 47 European countries (members of the Council of Europe), based on web crawling and machine learning techniques -this is a novel approach in the context of the social sciences -and qualitative research, namely case studies in selected European countries. Renowned scholars from the University of Potsdam, ZHAW, and the Vienna University of Economics and Business, with extensive experience in local government and comparative research, form the consortium of this project. Key project deliverables will be an openly accessible monitoring platform of digital transformation at the local tier of government, journal articles, an edited volume, and publications for practitioners. The real-time platform "Monitoring Digital Transformation in European Local Governments" will be accessible to researchers and practitioners worldwide and contribute to a better understanding of longterm developments. The duration of the project submitted to the SNSF/DFG is three years; however, by automating the process, the real-time platform will continue to exist and be updated regularly beyond this time frame. The research project will yield policy-relevant knowledge concerning local digitization measures from a European perspective, which can then be utilized to improve policymaking for future public sector modernization.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Digital transformation, a crucial innovation in local government, is anticipated to reshape European public service delivery, administration structures, and overall governance. The recent COVID-19 pandemic underscored the significance of well-prepared digital administration, particularly at the local government level, which plays a pivotal role in digital transformation. However, current comparative research on the digital transformation of state and administration lacks sufficient investigation into local government levels, creating a knowledge gap on implementation and effects across Europe.</p><p>DIGILOG <ref type="foot" target="#foot_0">1</ref> is a research project determined to close this gap. It is an international and interdisciplinary project that consists of political and computer scientists from the University of Potsdam (DE), the Vienna University of Economics and Business (AU), and the Zurich University of Applied Science ZHAW (CH). The Researchers of the project in the field of Computer Science are the contributing authors of this paper. The project is financed by the Swiss National Science Foundation (SNSF / Project Nr. 200839) and Deutsche Forschungsgemeinschaft (DFG, German Research Foundation). The start of the project was in spring 2022 and the end will be in summer 2025. The research project seeks to address this above-mentioned gap by examining two key questions:</p><p>• What are the dynamics, scale, and pace of digital transformation in European local governments? Is the change radical, revolutionary, incremental, or evolutionary, and are there identifiable regional differences? • What effects does digital transformation have on these organizations, specifically in terms of output (service delivery, organization, processes, and resources), outcomes (performance and accountability), and impact (citizen acceptance, governance, and emerging tensions)?</p><p>To address these questions comprehensively, data will be collected in different ways from all municipalities in the 46 member states of the Council of Europe. As shown in Figure <ref type="figure" target="#fig_0">1</ref> we collect data for the different communities in three ways.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1.">Case Studies</head><p>In conjunction with the quantitative surveys, comparative case studies are conducted in selected municipalities, which are also part of the extended survey sample. The case studies are carried out in communities with different administrative cultures to capture the country-specific variance of local administrative systems. The case study approach relies on field research methods, semi-structured expert interviews, and focus groups conducted with local CEOs, Chief Information Officers (CIOs), department heads, employee representatives, and staff. The aim is to gain in-depth insights into the internal processes and actor constellations of the respective digital transformation paths, building on the quantitative part's interim results by capturing the municipalities' organizational realities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2.">Survey</head><p>In addition to the qualitative case studies, the DIGILOG project is based on two quantitative forms of data collection: a web crawler for analyzing municipal websites and a survey among the leaders of European municipal administrations. The survey has several objectives. The main goal is to collect information on the status of the surveyed municipal administrations' external and internal digital transformation, from which a Europe-wide index will be created. In the external domain, this primarily includes the digital service offerings of the administrations, classified into various maturity levels according to an established social science model. The categorization spans from basic information provision to options for digital interaction with administrative personnel and completely digital and seamless administrative process handling. The internal domain, on the other hand, covers aspects such as the technical equipment of the administration, forms of internal communication, data management, and the automation of processes and routine decisions.</p><p>Furthermore, the survey collects data on various other variables related to digitization. These include factors that can help explain the state of digital transformation in municipalities, such as the size and organizational form of the municipality, as well as those that can reflect the consequences of digitization, such as questions about the efficiency of administration or the satisfaction of citizens with administrative work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.3.">Web Crawler and Monitoring Platform</head><p>Web crawling is a central component of the DIGILOG project. In addition to surveys, automatic crawling and analysis of municipal websites are part of the quantitative analysis. The results of the data analysis described below will be displayed on a dashboard within a monitoring platform. Additionally, the lists of website URLs and email addresses for surveys, if not already provided, are completed through crawling public data sources.</p><p>The monitoring platform is based on three main components that interact with each other: web crawling, data storage, and subsequent data analysis. This platform ensures monitoring of the political municipalities' websites during the project duration. To manage the volume of data, several methods enable targeted data collection with minimal information loss. One project goal is to explore and implement the most efficient method for this task.</p><p>Data storage is ensured with two different database systems, a relational and a documentoriented system. The relational system stores database keys and normalized information. Complementarily, the website documents and the analysis results are stored in a documentoriented system. For analysis, clues (e.g., mention of selected services or keywords) indicating digital transformation are extracted and evaluated. Various methods from Natural Language Processing (NLP), a subfield of Machine Learning, are applied.</p><p>The analysis, in turn, can provide effective feedback to the intelligent crawler, contributing to its continuous improvement. The quality of the analysis is ensured by domain experts who interpret and contextualize the results for management, political science, and public administration.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.4.">Measurement of Digital Transformation</head><p>Several relevant indices on digital service provision exist, offering country rankings and potentially serving as a valuable foundation for an index on local digital service provision within the scope of this project. The European Commission publishes the Digital Economy and Society Index Report on digital public services; however, it lacks specificity for the local government tier <ref type="bibr" target="#b0">[1]</ref>. The Digital Adoption Index by the World Bank, a composite index gauging the adoption of digital technologies globally, focuses on the government sector, with sub-indices covering core administrative systems, online public services, and digital identification. The United Nations' E-Government Development Index assesses the effectiveness of public service delivery, identifying patterns in e-government development and regional challenges. Despite its Local Online Service Index focusing on the local level, evaluating the scope and quality of online services, telecommunication infrastructure development, and human capital, it only assesses portals in a selection of 100 cities worldwide, overlooking smaller local governments <ref type="bibr" target="#b1">[2]</ref>. The E-Government Monitor, conducted through a representative survey of populations in Germany, Austria, and Switzerland, explores the usage and satisfaction related to e-government services. Results indicate a pronounced use of e-government services in Austria, followed by Switzerland and Germany <ref type="bibr" target="#b2">[3]</ref>. Nonetheless, once again, this index lacks specificity for the local government tier. The German Index of Digitalization (Deutschland-Index Digitalisierung) scrutinizes digital infrastructure, the use of digital services, the digital economy, and e-government in individual German states but is confined to Germany <ref type="bibr" target="#b3">[4]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>The project described is interdisciplinary. It intersects with the research area of political Science and Information Retrieval in Computer Science. However, we only focus on the related work of Information Retrieval related to this project.</p><p>There is already work claiming to measure the level of digital transformation within local governments. Garcia-Sanchez et al. <ref type="bibr" target="#b4">[5]</ref> presents an analysis of the development of e-governments of 102 Spanish municipalities where they select features from various papers and frameworks. Pina et al. <ref type="bibr" target="#b5">[6]</ref> conducted an empirical study about the effect of e-government on transparency, openness, and hence accountability in 15 countries of the EU and a total of 318 government websites. This task of assessing websites even finds its application in other domains such as health <ref type="bibr" target="#b6">[7]</ref>.</p><p>Since we focus on website content to measure digital transformation, we note the importance of existing work on website processing, classification, and embedding, which is the encoding of data into a lower-dimensional representation in such a way that preserves some relationship in the data. We might focus on a website's visual or textual aspects, or even both, and leverage machine learning for our digitalization measurements. It's not surprising that recent work often uses Large Language Models (LLMs) and Convolutional Neural Networks (CNNs). Other classical machine-learning approaches rely more on feature engineering. However, they do not generalize as well as the state-of-the-art models due to their lack of flexibility regarding structural changes of an HTML page. A large amount of related work exists in the field of text-based embedding and classification of websites, which might help us categorize certain website elements. Kowsari et al. <ref type="bibr" target="#b7">[8]</ref> and Minaee et al. <ref type="bibr" target="#b8">[9]</ref> provide reviews on past work on text classification in general, while Hashemi <ref type="bibr" target="#b9">[10]</ref> gives us a survey on web page classification. While "classification" refers to categorizing websites, before making the final prediction, we need to transform website data into a more manageable form which can involve creating embeddings for the websites. These website embeddings can be compared based on numerical similarity for various use cases. The classification models can be used to detect important digitalization elements on the website while also giving us insight into how to process websites effectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Visual, text, and mixed Website Classification</head><p>Visual-only classifications are, in many cases, applied to the detection of harmful content such as propaganda of terrorism <ref type="bibr" target="#b10">[11]</ref>, alcohol, adult content, weapons <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b12">13]</ref> or just food, fashion or landscapes <ref type="bibr" target="#b13">[14]</ref>. These classes all have distinct visual features. However, in many cases, these approaches can't distinguish between visually similar pages (e.g., municipality homepage vs. tourism page of the same municipality).</p><p>In text-based website classification, some approaches rely on classical machine learning <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b15">16]</ref>. However, the majority are based on neural networks <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b17">18,</ref><ref type="bibr" target="#b18">19,</ref><ref type="bibr" target="#b19">20]</ref> and the more recent approaches are transformers architecture <ref type="bibr" target="#b20">[21,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b22">23,</ref><ref type="bibr" target="#b23">24]</ref>. Most notably, <ref type="bibr" target="#b22">[23]</ref> proposes MarkupLM for document understanding tasks based on the raw text and markup language, which is also used to code websites.</p><p>A mixed approach using both textual and visual features can be seen in <ref type="bibr" target="#b24">[25]</ref> and <ref type="bibr" target="#b25">[26]</ref>. The ladder encodes multiple parts of a website, such as a screenshot and metadata, and combines them to feed it into a neural network as input. The model is trained to categorize websites into 14 different classes. While previous work gives insight into how websites are processed and represented numerically, we must apply this knowledge to our specific data. How exactly website data is handled is not a solved problem. Kiesel et al. <ref type="bibr" target="#b26">[27]</ref>, for example, compares different web page segmentation algorithms. Dividing the page into individual segments might provide more concentrated information sources for our future algorithms. Finally, recent AI chatbots such as ChatGPT or open-source variants are capable of understanding a wide range of instructions. Recent developments have made it possible for the models to even react to image input while understanding user instruction, making them large multimodal models. They are foundation models that can be used in a variety of ways, and they can understand website code as well as screenshots. As development continues, it is becoming easier to use these models for automatic extraction, summarization, analysis, and categorization of municipality websites. As these models generate text, natural language analysis is essential.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Recent and Future Work</head><p>The field of our work in this project consists of two parts:</p><p>• The URL gathering consists of the following questions: Has the municipality a website, and if so, what is the URL? Furthermore, the retrieved URLs must be distinguished from non-municipality URLs to eliminate false positives. • The website must be preprocessed (website segmentation, selection of relevant data, and removal of noisy data) and processed. The municipality website must be assessed based on the criteria defined by political scientists. A classifying model must be capable of detecting certain features if they exist on this website.</p><p>Assessing a website requires a semantic understanding of a website by the machine learning model used to process the Websites. Whether it is URL classification (specifically discerning municipality websites from others), topic modeling (classification of services), or e-service detection on web pages, a robust foundation in embedding is essential. In our previous work, we conducted not yet published experiments with general pre-trained webpage embedding models and developed a basic embedding method to effectively differentiate municipality websites from non-municipality ones. All methods yielded very good results, with the more complex ones resulting in slightly better results. However, it's crucial to acknowledge that basic embeddings demonstrated a faster processing speed than more complex models, a significant consideration given the vast number of websites in our study. We additionally evaluated different data sources concerning their completeness of data. The categories evaluated were search engines, encyclopedias, and blind requests with fabricated URLs based on certain patterns. The retrieved URLs partially consisted of wrong URLs that did not belong to the local government or municipality. Although the URL appeared to be correct in many of those cases, containing the municipality name, the content was of another topic such as tourism, airports, other official organizations in this municipality, or even completely unrelated content to the municipality. Thus, an automated distinction and classification by analyzing the website's content was required. Furthermore, as mentioned in Section 1.4, there are many ways of measuring digitization. In a conference paper, we defined three key aspects of our analysis, which consisted of different indices. The categories are Service Maturity (measurement of provision of information, communication possibility, and transactions), Usability (evaluation of accessibility and convenience of use), and Technical Maturity (evaluation of security and privacy). This index was published in a conference paper <ref type="bibr" target="#b27">[28]</ref>. We tested the index on a sample of municipality websites and are currently working on implementing and applying it to the whole data set. Looking ahead, our plan encompasses the application of webpage embedding techniques for e-form detection, including webpage segmentation and relevant information extraction. Further, we plan to leverage large Language Models for topic modeling of webpages and webpage content. This approach aims to further automate the process of monitoring the digital transformation of European communities.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Three different ways of collecting data for the real-time monitoring platform for the digital transformation of municipalities in the 46 member states of the Council of Europe.</figDesc><graphic coords="3,162.21,84.19,270.85,231.56" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://www.digilog-project.org/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Acknowledgment</head><p>This work is supported by Grant No. GR 200839 of the Swiss National Science Foundation (SNF) and German Research Foundation (DFG) for the research project "Digital Transformation at the Local Tier of Government in Europe: Dynamics and Effects from a Cross-Countries and Over-Time Comparative Perspective (DIGILOG)".</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="https://digital-strategy.ec.europa.eu/en/policies/desi" />
		<title level="m">The Digital Economy and Society Index (DESI)</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note>European Commission</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m">UN E-Government Survey 2022 -The Future of Digital Government</title>
				<meeting><address><addrLine>New York</addrLine></address></meeting>
		<imprint>
			<publisher>UN DESA</publisher>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<ptr target="https://initiatived21.de/uploads/03_Studien-Publikationen/eGovernment-MONITOR/2023/egovernment_monitor_23.pdf" />
		<title level="m">Initiative D21 and TUM, eGovernment Monitor 2023</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<ptr target="https://www.oeffentliche-it.de/deutschland-index" />
		<title level="m">Deutschland-Index der Digitalisierung</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note>Kompetenzzentrum Öffentliche IT</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Evolutions in egovernance: evidence from Spanish local governments</title>
		<author>
			<persName><forename type="first">I.-M</forename><surname>García-Sánchez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rodríguez-Domínguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-V</forename><surname>Frias-Aceituno</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Environmental Policy and Governance</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page" from="323" to="340" />
			<date type="published" when="2013">2013</date>
			<publisher>Wiley Online Library</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Are ICTs improving transparency and accountability in the EU regional and local governments? An empirical study</title>
		<author>
			<persName><forename type="first">V</forename><surname>Pina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Torres</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Royo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Public administration</title>
		<imprint>
			<biblScope unit="volume">85</biblScope>
			<biblScope unit="page" from="449" to="472" />
			<date type="published" when="2007">2007</date>
			<publisher>Wiley Online Library</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Information on advance care planning on websites of dementia associations in Europe: A content analysis</title>
		<author>
			<persName><forename type="first">F</forename><surname>Monnet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Pivodic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Dupont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R.-M</forename><surname>Dröes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Van Den</surname></persName>
		</author>
		<author>
			<persName><surname>Block</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Aging &amp; Mental Health</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="page" from="1821" to="1831" />
			<date type="published" when="2023">2023</date>
			<publisher>Taylor &amp; Francis</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Text classification algorithms: A survey</title>
		<author>
			<persName><forename type="first">K</forename><surname>Kowsari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Jafari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Meimandi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Heidarysafa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Mendu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Barnes</surname></persName>
		</author>
		<author>
			<persName><surname>Brown</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">150</biblScope>
			<date type="published" when="2019">2019</date>
			<publisher>Multidisciplinary Digital Publishing Institute</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Deep learning-based text classification: a comprehensive review</title>
		<author>
			<persName><forename type="first">S</forename><surname>Minaee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kalchbrenner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Cambria</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Nikzad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chenaghlu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM computing surveys (CSUR)</title>
		<imprint>
			<biblScope unit="volume">54</biblScope>
			<biblScope unit="page" from="1" to="40" />
			<date type="published" when="2021">2021</date>
			<publisher>ACM</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Web page classification: a survey of perspectives, gaps, and future directions</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hashemi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Multimedia Tools and Applications</title>
		<imprint>
			<biblScope unit="volume">79</biblScope>
			<biblScope unit="page" from="11921" to="11945" />
			<date type="published" when="2020">2020</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Detecting and classifying online dark visual propaganda</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hashemi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hall</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Image and Vision Computing</title>
		<imprint>
			<biblScope unit="volume">89</biblScope>
			<biblScope unit="page" from="95" to="105" />
			<date type="published" when="2019">2019</date>
			<publisher>Elsevier</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Arbitrary category classification of websites based on image content</title>
		<author>
			<persName><forename type="first">A</forename><surname>Akusok</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Miche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Karhunen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-M</forename><surname>Bjork</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Nian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lendasse</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Computational Intelligence Magazine</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="30" to="41" />
			<date type="published" when="2015">2015</date>
			<publisher>IEEE</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Website classification from webpage renders</title>
		<author>
			<persName><forename type="first">L</forename><surname>Espinosa-Leal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Akusok</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lendasse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-M</forename><surname>Björk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ELM2019 9</title>
				<meeting>ELM2019 9</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="41" to="50" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">A CBR system for image-based webpage classification: case representation with convolutional neural networks</title>
		<author>
			<persName><forename type="first">D</forename><surname>López-Sánchez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Corchado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">G</forename><surname>Arrieta</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Thirtieth International Flairs Conference</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">An efficient scheme for automatic web pages categorization using the support vector machine</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">K</forename><surname>Bhalla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kumar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">New Review of Hypermedia and Multimedia</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="page" from="223" to="242" />
			<date type="published" when="2016">2016</date>
			<publisher>Taylor &amp; Francis</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Using machine learning for web page classification in search engine optimization</title>
		<author>
			<persName><forename type="first">G</forename><surname>Matošević</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dobša</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mladenić</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Future Internet</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page">9</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Web page classification using RNN</title>
		<author>
			<persName><forename type="first">E</forename><surname>Buber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Diri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procedia Computer Science</title>
		<imprint>
			<biblScope unit="volume">154</biblScope>
			<biblScope unit="page" from="62" to="72" />
			<date type="published" when="2019">2019</date>
			<publisher>Elsevier</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Freedom: A transferable neural architecture for structured information extraction on web documents</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">Y</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Vo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tata</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</title>
				<meeting>the 26th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1092" to="1102" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Semantic features with contextual knowledge-based web page categorization using the GloVe model and stacked BiLSTM</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Nandanwar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Choudhary</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Symmetry</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page">1772</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Vo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Edmonds</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tata</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2101.02415</idno>
		<title level="m">Simplified dom trees for transferable attribute extraction from the web</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Yu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2101.09465</idno>
		<title level="m">WebSRC: a dataset for web-based structural reading comprehension</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Ensemble approach for web page classification</title>
		<author>
			<persName><forename type="first">A</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bhatia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Multimedia Tools and Applications</title>
		<imprint>
			<biblScope unit="volume">80</biblScope>
			<biblScope unit="page" from="25219" to="25240" />
			<date type="published" when="2021">2021</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wei</surname></persName>
		</author>
		<idno>arXiv:</idno>
		<ptr target="2110.08518" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Contextual Embeddings-Based Web Page Categorization Using the Fine-Tune BERT Model</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Nandanwar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Choudhary</surname></persName>
		</author>
		<idno type="DOI">10.3390/sym15020395</idno>
		<ptr target="https://www.mdpi.com/2073-8994/15/2/395.doi:10.3390/sym15020395" />
	</analytic>
	<monogr>
		<title level="j">Symmetry</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page">395</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
		<respStmt>
			<orgName>Multidisciplinary Digital Publishing Institute</orgName>
		</respStmt>
	</monogr>
	<note>2 Publisher</note>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Website categorization: A formal approach and robustness analysis in the case of e-commerce detection</title>
		<author>
			<persName><forename type="first">R</forename><surname>Bruni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Bianchi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Expert Systems with Applications</title>
		<imprint>
			<biblScope unit="volume">142</biblScope>
			<biblScope unit="page">113001</biblScope>
			<date type="published" when="2020">2020</date>
			<publisher>Elsevier</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Homepage2Vec: Language-Agnostic Website Embedding and Classification</title>
		<author>
			<persName><forename type="first">S</forename><surname>Lugeon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Piccardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>West</surname></persName>
		</author>
		<idno type="DOI">10.1609/icwsm.v16i1.19380</idno>
		<ptr target="https://ojs.aaai.org/index.php/ICWSM/article/view/19380.doi:10.1609/icwsm.v16i1.19380" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International AAAI Conference on Web and Social Media</title>
				<meeting>the International AAAI Conference on Web and Social Media</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page" from="1285" to="1291" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<title level="m" type="main">An Empirical Comparison of Web Page Segmentation Algorithms</title>
		<author>
			<persName><forename type="first">J</forename><surname>Kiesel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Meyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Kneist</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-72240-1_5</idno>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="62" to="74" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Marquardt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gerber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Machljankin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Steiner</surname></persName>
		</author>
		<title level="m">Applying web crawling for data collection in the social sciences -Opportunities and limits using the example of digital transformation in European local governments</title>
				<meeting><address><addrLine>Zagreb, Croatia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
