<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>A Coruña, Spain.
$ montoyo@dlsi.ua.es (A. Montoyo); rafael@dlsi.ua.es (R. Muñoz);
ygutierrez@dlsi.ua.es (Y. Gutiérrez)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Geo.IA: Artificial Geo-Intelligence Platform to Solve Citizens Problems and Facilitate Strategic Decision Making in the Public Administration</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrés Montoyo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafael Muñoz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yoan Gutiérrez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Departament of Software and Computing Systems. University of Alicante</institution>
          ,
          <addr-line>Spain. Crta. San Vicentte del Raspeig s/n, Alicante</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The objective of Geo-IA is to research, design and implement a Geo-Smart Artificial Intelligence technology platform for public and private business organizations. The GeoIA project presents a geolocation platform that integrates technological innovation to support a strategy for the creation of a Smart Territories. To do this, Text Mining, Machine Learning (including deep learning) and Natural Language Processing technologies are deployed. The functionality of the geolocation platform is to analyze, integrate, share data, visualize and represent territorial indicators, with the aim of facilitating the monitoring and fulfillment of territorial strategies. In short, GeoIA promotes interoperability between public administration bodies and also provides citizens with mechanisms to access information of interest, where the magnitude of the integrated and interrelated data permits. GeoIA also provides digital knowledge (tools, linked information, semantics, virtual assistants) for use by public administrations to enhance their decision making through greater knowledge of the environment and to improve services to citizens.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ontologies</kwd>
        <kwd>semantics</kwd>
        <kwd>semantic document profile</kwd>
        <kwd>entity recognition</kwd>
        <kwd>knowledge discovery</kwd>
        <kwd>machine learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>knowledge of the environment</p>
      <p>With the above goal in mind, this project will apply a
A smart society uses advances in information and com- discipline based on continuous evolution that is central
munication technologies (ICT) to promote sustainable to Artificial Intelligence (AI), and grounded in three basic
improvements that better the lives of citizens. A "smart" pillars: (i) data and text mining that will integrate and
environment provides people with advanced solutions to extract information from diferent heterogeneous data
solve problems or apply innovative technologies to create sources to transform it into an understandable structure
eficient and adaptable services, connected cities and com- for further use, (ii) a prediction system, to make the best
munities, informed, participative and satisfied citizens decisions of future actions in public administration, as
and, above all, intelligent solutions for the provision of well as citizens’ behaviors for decision making based on
services. Thus, in order to achieve these objectives and re- machine learning (ML) and, (iii) a simplified text
generaspond to the needs of citizens, 21st-century public admin- tion system to prescribe citizens’ needs based on natural
istration must deepen the digitization and improvement language processing (NLP). The starting hypotheses to
of its services by promoting digital transformation as an address this project will be the following:
organizational system or ecosystem in which all
stakeholders (citizens, municipalities and companies) must
develop at the same speed. However, the public sector
adapts to change less rapidly than the private sector. It
is therefore a worthwhile endeavour to provide digital
knowledge (tools, linked information, semantics, virtual
assistants) to public administration bodies so that they
can improve services to citizens and thereby enhance
public administration decision making through greater
• (H1) Ontologies, the backbone technology that
supports Linked Data and the semantic web,
provide a means to overcome some of these
challenges and thus help to obtain accurate profiles to
produce satisfactory recommendations and
personalized services.
• (H2) An ontology-centric knowledge base
allows the integration of data from heterogeneous
sources and enables their analysis through
advanced reasoning and inference processes.
• (H3) Natural Language Processing, especially
focused on entity-oriented machine learning, is
essential to automate the construction and
population of these knowledge bases, as well as to
automatically generate simplified and synthesized
text.</p>
      <sec id="sec-1-1">
        <title>1.1. Project objectives</title>
        <p>
          The main objective of the project is the research, design
and implementation of a GeoIntelligence Artificial
Intelligence (GeoIA) technology platform for public and
private business organizations. Specifically, the platform
will carry out the following activities:
• (H4) Knowledge-based Artificial GeoIntelligence language have been modeled through their
identificaapplied to public sector sources such as OpenData, tion, characterization, representation and exploitation.
GIS, Meteorology and Social Media data will be The participating team has also worked in projects
reessential to generate predictive and prescriptive lated to knowledge generation and bias in language
models to improve decision making processes in modeling, both technologies of great relevance for the
Public Administration. present project proposal. Specifically, they are the
REDES1 project (TIN2015-65136-C2-1-R,
TIN2015-65136-C22-R) and LIVING-LANG2 (RTI2018-094653-B-C22). The
goal of REDES was to go a step beyond the
representation of Digital Entities, enhancing the development of
technologies to automatically discover Digital Entities
from diferent heterogeneous sources to populate
semantic structures and link them to shared data. This process
involved defining Digital Entities for diferent domains
and processing heterogeneous information from the web,
• The design and development of a flexible, scalable the social web and the web of data. The digital entities
and robust technological architecture for data were then semantically enriched and spatiotemporally
integrating from diferent sectors of the public controlled and, finally, the generated information was
inadministration, such as Geographic Information tegrated into the digital entity model. REDES project was
Systems (GIS), cadastral data, census data, infras- the starting point to generate the following publications
tructure data, consumption data, and endless sec- in indexed journals by the research team also involved
tor data. in GeoIA: [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], [3], [4] and [5].
• The application and optimization of machine One of the main contributions of LIVING-LANG,
learning algorithms to perform predictive and which is relevant to the present project, is the
characprescriptive analytics to predict events or make terization of digital entities through the use of human
sound strategic decisions. language in key dimensions for the social
contextualiza• The development of data visualization technolo- tion of these entities. These multidimensional features
gies to simplify communication with citizens by allow relating entities from diferent perspectives,
imofering information in a simplified and synthe- proving the understanding of the exchanged content and
sized form so that it can be easily understood. creating new knowledge in the analysis of these related
• The design and execution of a pilot test to validate structures. LIVING-LANG project generates these
publithe technologies developed in a key area such as cations indexed in journals of which the author is part
public administration. of the GeoIA project team: [6], [7], [8] and [9].
        </p>
        <p>Figure 1 provides a summary of how the above projects
have led to a number of Artificial Intelligence
technologies, in particular Language Technologies, which have
evolved incrementally.</p>
        <p>Thus, it is essential for public and private sector
organizations to develop a data and text integration model
capable of defining and creating semantic networks of
geolocalized digital entities, understanding digital entity as
any concept that has associated information that
characterizes it (a company, person, city, building, organization,
etc.). These tools will allow the detection and
extraction of semantic relationships between digital entities,
obtaining the information from diferent types of sources
(unstructured, structured and linked open data), as well
as determining the quality, consistency and veracity of
these relationships. In addition, from the previously
created knowledge bases, machine learning techniques and
algorithms will be adapted to work with spatio-temporal
data to define GeoIA knowledge-based models.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. State of the art</title>
      <sec id="sec-2-1">
        <title>The participating team behind this proposal has been involved in several national and international research projects, in which digital entities and their contextual</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed GeoIA System</title>
      <sec id="sec-3-1">
        <title>We propose the design and development of an Artificial</title>
        <p>GeoIntelligence system that focuses on the extraction of
knowledge from heterogeneous sources and is capable of
integrating diferent types of data, i.e. sectors of public
administration, such as Geographic Information Systems
(GIS), cadastral data, census data, infrastructure data,
consumption data, and a host of sector data. This type
of system has a direct impact on eco-digital transition
policies, in addition to being able to ofer a direct service
to the population through the use of virtual assistants,
e.g. chatbots.</p>
        <p>1https://gplsi.dlsi.ua.es/proyectos/redes/
2https://gplsi.dlsi.ua.es/proyectos/livinglang/</p>
        <sec id="sec-3-1-1">
          <title>3.1. Ingestion</title>
          <p>The data to be processed can be of diferent nature and
format. It is therefore necessary to implement data
cleansing, normalization, and transformation techniques in
order to be able to enter data into the platform in a
standardized way. Cleaning consists of correcting possible
errors in the data, sometimes manual or often caused by
trying diferent encoding conversions (i.e. UTF-8, EPSG 3,
etc.). In cases where the data is not recognizable because
it has undergone a significant alteration, it is eliminated.</p>
          <p>Standardization consists of unifying characteristics and
information with the same meaning, under uniform
criFigure 2: GeoIA solution in the context of digital transition. teria. For example: Alicante is equal to ALC.
Transformation consists of extracting the data from the source
document, recognizing its original format, and placing</p>
          <p>Figure 2 is a macro view of how GeoIA’s knowledge- it in a common format among all the data incorporated
based solutions are presented to support the digital tran- into the platform, for example, GeoPandas4. Examples of
sition towards optimizing operations in organizations by types of data sources that may be relevant for this project
incorporating digital technologies. are weather data API (e.g. OpenWeatherMap5, GIS (e.g.</p>
          <p>The GeoIA project, as an open and accessible data GeoNet6), census, city cadastre, air quality data (e.g.
Breestorage and processing center, will allow a controlled zometer7, social media user comments (e.g. Twitter 8 and
collaborative model between the diferent agents for the Google9, and other data sources such as the National
achievement of the project objectives, based on obtaining Institute of Statistics10 (INE).
knowledge through the integration of the data collected
and subsequently analyzed. Addressing each of the
diferent phases requires an expert consortium both in the
techniques involved and in the experience for its application
to real areas of society. For this reason, the consortium is
formed by: Instituto Tecnológico de Informática (ITI), the
multinational company GFT, the company 1MillionBot,
the company Gente Comunicación and the University of
Alicante. The phases into which the system is divided
are presented in Figure 3.
3https://epsg.io/
4https://geopandas.org
5https://openweathermap.org/
6https://www.geonet.es/
7https://www.breezometer.com/
8https://twitter.com/home
9https://www.google.com/
10https://www.ine.es/
3.2. Integration
11https://www.elastic.co/
12https://lucene.apache.org/
13https://leafletjs.com/
14http://python-visualization.github.io/folium/
15https://skos.um.es/TR/rdf-sparql-query/
16https://neo4j.com/developer/cypher/
institutions, topics, named entities, etc.) through the
semantic links that interconnect the network.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Impact GeoIA Project</title>
      <p>The data incorporated into the platform lacks
semantic information. Therefore, from a previously designed
model, which is able to capture the semantics of the
domain, these data are instantiated, as pieces of semantic
information, and linked together. The linking process
consists of identifying common characteristics for each
instance, and these characteristics are converted into
semantic relationships.</p>
      <p>This project is fully committed to the principle of eficient
and rational public management, balancing control with
productivity, by incorporating ICTs to improve
administrative processes through advanced digitization tools
and the provision of digital services to citizens and
busi3.3. Storage nesses. In this context, GeoIA impacts the following lines
of action that are considered central to making smart
The storage phase allows us to ensure that all data that cities a reality: digital public services (e-Government),
has been instantiated and linked can persist as long-term paperless administration, inter-administrative
cooperamemory, and also coexist with other data that has been tion and interoperability, rational management of ICT
previously incorporated. To guarantee this operation we resources and promotion of technological innovation in
opted for a non-SQL database, in which each instance be- public management. The specific actions included in the
comes a document and each relation an index. The tech- project are the promotion of the use and monitoring of
nology stack chosen for this platform was ElasticSearch11 the quality of the public administration’s electronic
serfor textual information. It is a Lucene12-based search vices, the promotion of the integration and exchange of
server that provides a full-text, distributed, multitenancy- data and documents between administrations, the
implecapable search engine with a RESTful web interface and mentation of an electronic administration platform, etc.
JSON documents. For integration queries GeoPandas, In this line, the implementation of Geo.IA (GeoArtificial
and Leaflet 13 y Folium14 for map creation and visualiza- Intelligence) will promote the generation of an ecosystem
tion. The entire stack is built on the Python programming that revolves around Data, facilitating the development,
language. prototyping and deployment of data analysis techniques
in any domain of the economy and society. This will
3.4. Visualization enable the incorporation of value from the exploitation
of the data available in the project by leveraging data
In this phase, a series of queries are defined based on to improve management and decision making. The
efthe user needs indicated in the user interface. From fective integration of innovation and research eforts
these queries, data and metadata are obtained to generate in the project through the formed consortium will be
visualizations that interweave data of diferent nature, boosted adding significant value. Geo.IA, serving as the
source and domain, incorporated in the platform. With cornerstone and vehicle, will act as the guiding thread for
this ability to interlink data, it is possible to carry out seamlessly incorporating these advancements into the
cross-cutting studies and recommendations to facilitate productive and social fabric of the Valencian Community.
decision-making and to propose new strategies, whether Geo.IA bases its value proposition on channeling the
political, economic, geographic, or other, and mixtures eforts of the Valencian Community in accelerating the
of these areas. digitization of our public sectors. The emphasis is on</p>
      <p>One of the most important aspects to take into account knowing the requirements of Valencian Community
citiin this task is semantic exploration and recommendation. zens in their interactions with public administration
bodGiven the existence of a semantic database, it is neces- ies through an AI platform to extract knowledge. This
sary to develop mechanisms for the exploration of the involves knowledge of citizens in line with their
geospasemantic network, supported by SPARQL15 or Cypher16 tial position and knowledge for the public administration
queries, of document profiles and other digital entities. to drive strategic decision making. Summarizing, this
These mechanisms will allow to retrieve not only docu- proposal aims to boost to the digitization and
technologments through metadata filters, but also to make aggre- ical advancement of citizens and the Valencian
Commugate queries to discover statistical trends, and to make nity’s public administration, enabling the analysis of the
recommendations of profiles (e.g., documents, authors, data generated in the GeoIA system to provide evidenced
based decision making.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>[3] Y. Gutiérrez, S. Vázquez, A. Montoyo, Spreading
semantic information by word sense disambiguation,
GeoIA will provide solutions to integrate data from Geo- Knowledge-Based Systems 132 (2017) 47–61.
graphic Information Systems (GIS), public sector organi- [4] Y. Gutiérrez, S. Vázquez, A. Montoyo, A
semanzations, census, land registry, cadastre, national statistics tic framework for textual data enrichment,
Exinstitute, weather forecast providers, social media, etc. pert Syst. Appl. 57 (2016) 248–269. URL: https://
These data will be integrated and transformed into Digi- doi.org/10.1016/j.eswa.2016.03.048. doi:10.1016/J.
tal Entities (DE) to compose a Semantic Lake of Digital ESWA.2016.03.048.</p>
      <p>Entities in which real entities coexist and interrelate with [5] B. Navarro-Colorado, E. Saquete, Cross-document
external information, DbPedia for example. This will con- event ordering through temporal, lexical and
distritribute to generate an ecosystem that can be exploited butional knowledge, Knowl. Based Syst. 110 (2016)
by public or private organizations and third party Apps 244–254. URL: https://doi.org/10.1016/j.knosys.2016.
oriented to citizen services. In addition, this allows anal- 07.032. doi:10.1016/J.KNOSYS.2016.07.032.
ysis, intelligent visualization of data for decision making [6] S. Estevez-Velarde, Y. Gutiérrez, Y. Almeida-Cruz,
at diferent scales. The models will improve traditional A. Montoyo, General-purpose hierarchical
optimisageographic analytics, going from knowing where and tion of machine learning pipelines with grammatical
when things happen to knowing why they happen in evolution, Inf. Sci. 543 (2021) 58–71. URL: https:
those places. If we add to this the information coming //doi.org/10.1016/j.ins.2020.07.035. doi:10.1016/J.
from public administration agencies (land registry, cen- INS.2020.07.035.
sus, environmental information, etc.), a whole ecosystem [7] A. Piad-Morfis, Y. Gutiérrez, Y. Almeida-Cruz,
is created, enriched by the use of knowledge that can be R. Muñoz, A computational ecosystem to
exploited by public administration agencies or private support ehealth knowledge discovery
technolocompanies and by third party Apps oriented to citizen gies in spanish, J. Biomed. Informatics 109
services. (2020) 103517. URL: https://doi.org/10.1016/j.jbi.2020.
103517. doi:10.1016/J.JBI.2020.103517.</p>
      <p>Acknowledgments [8] S. Estevez-Velarde, Y. Gutiérrez, A. Montoyo,
Y. Almeida-Cruz, Automl strategy based on
grammatical evolution: A case study about knowledge
discovery from text, in: A. Korhonen, D. R. Traum,
L. Màrquez (Eds.), Proceedings of the 57th
Conference of the Association for Computational
Linguistics, ACL 2019, Florence, Italy, July 28-
August 2, 2019, Volume 1: Long Papers, Association
for Computational Linguistics, 2019, pp. 4356–4365.</p>
      <p>URL: https://doi.org/10.18653/v1/p19-1428. doi:10.</p>
      <p>18653/V1/P19-1428.
[9] A. Piad-Morfis, Y. Gutiérrez, R. Muñoz, A corpus to
support ehealth knowledge discovery technologies,
J. Biomed. Informatics 94 (2019). URL: https://doi.
org/10.1016/j.jbi.2019.103172. doi:10.1016/J.JBI.
2019.103172.</p>
      <sec id="sec-5-1">
        <title>This project is funded by the Valencian Agency for</title>
        <p>Innovation(AVI) and the European Regional
Development Fund(ERDF) through the project "GeoIA:
Artificial GeoIntelligence platform to solve citizens problems
and facilitate strategic decision making in public
administrations" (INNEST/2023/11), partially funded by the
Generalitat Valenciana (Conselleria d’Educació,
Investigació, Cultura i Esport) through the project NL4DISMIS:
TLHs for an Equal and Accessible Inclusive Society
(CIPROM/2021/021). Moreover, it was backed by the
work of two COST Actions: CA19134 - “Distributed
Knowledge Graphs” and CA19142 - “Leading Platform
for European Citizens, Industries, Academia, and
Policymakers in Media Accessibility”.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tomas</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Moreno</surname>
          </string-name>
          ,
          <article-title>Developing an ontology schema for enriching and linking digital media assets</article-title>
          ,
          <source>Future Generation Computer Systems</source>
          <volume>101</volume>
          (
          <year>2019</year>
          )
          <fpage>381</fpage>
          -
          <lpage>397</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lloret-Climent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Montoyo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gutiérrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Guillena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Alonso-Stenberg</surname>
          </string-name>
          ,
          <article-title>A systemic and cybernetic perspective on causality, big data and social networks in tourism</article-title>
          ,
          <source>Kybernetes</source>
          <volume>48</volume>
          (
          <year>2019</year>
          )
          <fpage>287</fpage>
          -
          <lpage>297</lpage>
          . URL: https://doi.org/10.1108/K-02
          <string-name>
            <surname>-</surname>
          </string-name>
          2018-0084. doi:
          <volume>10</volume>
          .1108/K-02
          <string-name>
            <surname>-</surname>
          </string-name>
          2018-0084.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>