<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>FD. A Platform for Monitoring Financial and Economic Information towards Alternative Investment Funds</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>José Antonio García-Díaz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José Antonio Miñarro-Giménez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ángela Almela</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gema Alcaraz-Mármol</string-name>
          <email>Gema.Alcaraz@uclm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>María José Marín-Pérez</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francisco García-Sánchez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafael Valencia-García</string-name>
          <email>valencia@um.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Departamento de Filología Moderna, Universidad de Castilla La Mancha</institution>
          ,
          <addr-line>45071</addr-line>
          ,
          <country>España</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Facultad de Informática, Universidad de Murcia, Campus de Espinardo</institution>
          ,
          <addr-line>30100 Murcia</addr-line>
          ,
          <country>España</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Facultad de Letras, Universidad de Murcia</institution>
          ,
          <addr-line>Campus de la Merced, 30001, Murcia</addr-line>
          ,
          <country>España</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>For eficient financial asset management, it is necessary to select, process and analyze specific information on the Internet. However, the large volumes of information available and the fact that most of this data is stored in an unstructured way, hinder this task. In this demo, we present Financial Dashboard (FD), a global platform to monitor financial assets from the Internet using Natural Language Processing and Semantic Web technologies focused on the Spanish language. The objective is to allow users to monitor financial data from a set of keywords, accounts and digital newspapers. FD compiles data periodically and annotates semantic information such as financial entities or sentiments. All the information is made available to users from a web dashboard composed by configurable and independent KPIs and a REST API.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Alternative Investment Funds</kwd>
        <kwd>Sentiment Analysis</kwd>
        <kwd>Semantic Web</kwd>
        <kwd>Natural Language Processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>FD (Financial Dashboard) is a global platform that</title>
        <p>eases the monitoring of financial assets from the
InterTo boost financial management and to improve the efi- net using NLP tools and Semantic Web technologies. In
ciency in the use of public and private economy-related a nutshell, this tool allows managers of companies and
resources, it is necessary to monitor the Internet in search technicians of public organizations to establish a set of
of financial data. This process involves selecting, process- keywords, social networks accounts and digital
newsing and analyzing the global and local financial activity. A papers to monitor. The system compiles the data from
proper financial management helps companies and public those sources periodically and extract semantic
inforauthorities to identify risks and opportunities. However, mation including financial entities or sentiments. Also,
this task is not an easy one. First, there is a huge amount the system scores each piece of information in order to
of information on the Internet, which makes it challeng- determine their relationship with a set of objectives that
ing to handle, especially with real time requirements. can be previously defined by end-users. Finally, all
inSecond, most information can be found stored in an un- formation is made accessible through a web dashboard
structured or semi-structured way, so it is hard to take composed by configurable and independent Key
Perforadvantage of all such information. Third, state-of-the-art mance Indicators (KPIs) and deployed in the form of a
technologies for Natural Language Processing (NLP) are REST API.
mainly focused on the English language and have not At the technological level, this platform makes use of
been tested properly in Spanish texts. NLP techniques based on state-of-the-art Large Language
Models (LLMs) for extracting objective and subjective
information from textual sources and Semantic Web
technologies to map those concepts to a domain ontology.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Background information</title>
      <sec id="sec-2-1">
        <title>In this section we briefly analyze similar tools and ap</title>
        <p>proaches for monitoring financial data on the Internet.</p>
        <p>
          In [1] the authors describe a system procedure for
measuring explicit and implicit linkages between large U. S.
bank holding companies. In their methodology, the
authors propose the usage of mixed-frequency regression
techniques. This component provides bank supervisors
with knowledge about when new assets need to be mon- 3.1. Layer 1. Data acquisition module
itored. Besides, the authors demonstrate how variables
concerning outcome can be applied to measure the extent This project has two main data sources, namely, news
to which firms are interconnected. Another related study from digital newspapers and publications from social
meis [2], in which the authors introduce a framework for dia sites. On the one hand, the news are extracted using
quantitative investments and trading in financial markets. a custom web crawler. This crawler can filter news sites
This framework is subdivided into four components to based on two strategies. The first strategy is to filter by
monitor global variables, including (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) quantitative invest- URL using regular expressions. For example, it is
possiment trading, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) financial risk monitoring, (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) economic ble to restrict the system to consider only pages whose
situation, and (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) environmental risk monitoring. URL contains /economia/. The second strategy is to
        </p>
        <p>These previous studies do not take into account so- iflter using CSS filters, as some of the news sites include
cial data. However, in [3] the authors monitor fine-grain certain rules in the style to denote financial content. The
housing rental prices in order to bring insights for fair content is then stored in a markdown format, as we keep
housing policies. For this, they focus on housing rental some structural information of the news. On the other
websites for their studies in China. They consider fea- hand, the social media items are extracted from Twitter.
tures concerning the location, the neighborhood, the We use the Twitter API together with the
UMUCorpushome structure or accessibility, among other features. Classifier tool [ 4] to filter certain Twitter accounts of
They use time data between 2017 and 2018 and evaluate digital newspapers.
several classical machine-learning regression algorithms The stored data is also pre-processed in order to
resuch as random forest, gradient-boosting or support vec- move hyperlinks, mentions, and languages that are not
tor machines. Their results suggest that most of the gen- Spanish. Finally, every piece of information is
geoerated models have good performance and that the two located with a latitude, longitude and a radius. The radius
most influential features are related to job opportunity allows to set very specific items located to specific regions
and accessibility to health care services. or cities or to be more generic, spotting autonomous
com</p>
        <p>As far as our knowledge goes, there are no studies fo- munities or even countries. To calculate this position,
cused on monitoring financial data from social networks we use diferent heuristics such as looking for locations
such as Twitter considering texts written in Spanish. in the headline or the main text and then use a reverse
geolocation utility, to set the current position in the map.</p>
        <p>If no information is found, we search for meta-data and
author information.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. System architecture</title>
      <p>a new corpora of financial headlines and sites to conduct have been obtained, the relevance of each piece of
infora targeted sentiment analysis. The goal was to extract mation is ranked according to the users’ interests. This
the main economic target of the document and then the process allows to prioritize some items over others.
sentiment towards this target as well as the sentiments
towards other companies and society in general. For 3.3. Dashboard
this, we use two neural network models. The first one is
trained with a Named Entity Recognition (NER) task, and The last module of the FD platform is a dashboard. This
it is capable of identifying the target. The second neural dashboard is built using progressive web technologies. It
network is a multi-label document classification model enables users to create multiple projects and dashboards.
that it is able to capture the sentiments towards the three Each dashboard is composed of several configurable KPIs,
targets (main economic target, companies, and society) with each KPI associated to a group of customs filters.
at once. For the latest model, diferent LLMs have been These filters allow to set concepts from the ontology,
evaluated, including large and base models of BETO and time series, keywords, or the way in which data is to be
MarIA, and also lightweight models based on distillation visualized (e.g., word clouds, timelines, heatmaps, tables,
and multilingual LLMs. etc.). Besides, each KPI can be attached to specific filters</p>
      <p>The next module is focused on extracting entities and and facilitates the comparison of trends to final users. It
mapping them to a domain ontology. For this, we created is also worth mentioning that the data from each KPI is
a novel ontology that contains concepts related to dif- accessible using a REST API too, so that the platform can
ferent financial sectors including tourism, technologies, be interconnected to external systems and tools.
health, industry or energy, among others. Each concept Figure 2 presents a screenshot of the dashboard. As it
in the ontology allows to define a set of named entities can be seen, the dashboard contains a generic filter for
to identify relevant companies and related actors. Each all KPIs. In the capture, this filter is configured to show
compiled piece of information is mapped to the ontol- data from the last six months for four topics, including
ogy using semantic annotation based on an extended electricity, the European Central Bank, the rental price,
version of the Term-Frequency Inverse Document score and diesel oil. Below, the main KPIs are shown. Some of
(TF–IDF–e) [8]. This strategy is based on the TF–IDF them are configured to show the sentiments and targets
measure, that calculates the frequency of diferent terms for the selected topics. Another KPI panel contains the
(TF) and weights this information concerning how in- number of documents per topic, and there are also KPI
formative is the term in the rest of the documents (IDF). panels to show relevant documents, pie charts and a word
Once we have obtained the TF–IDF for each of the terms cloud.
of the ontology that appear explicitly in the texts, we It worth noting that the KPIs that are organized per
calculate the weight of the terms that appear implicitly. target are based on the dataset published in the shared
For this, the extended TF–IDF takes into account the dis- task FinancES 2023 [9], which consists in determining the
tance between each identified entity with the rest of the main entity that appears in economic headlines and the
concepts in the ontology. sentiments towards this target, other companies and
soci</p>
      <p>
        Once the sentiments and the semantic annotations ety in general. Targeted sentiment analysis can determine
the polarity of certain texts to diferent economical and
social groups. This strategy distinguishes among three
types of targets: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) the main economic entity (MET),
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) the rest of the companies, and (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) the society and
consumers. Besides, this approach can extract the main
entity using a NER system.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Further work</title>
      <p>In this work we have described the FD platform for
monitoring economic and financial data on the Internet. This
platform relies on Semantic Web technologies and NLP
techniques for extracting, annotating and classifying
financial data from several data sources, including web
sites and social networks. The data is presented to the
end-users in a web platform that allows them to
configure a personalized dashboard with a set of configurable
KPIs.</p>
      <p>We are currently on the last stages of the development
of the platform and we are preparing several case studies
for its validation. The further work is focused on
improving the explainability of the neural network models. In
particular, we plan to create a module based on linguistic
features [10] and define KPIs that highlight the relevant
parts of the text that contributed the most to the
predictions of the sentiments. Another idea for improving the
platform is to add more data filters.</p>
      <p>Currently, we are working on incorporating
information from video platforms such as YouTube. We will also
focus on the definition of KPIs that cluster results per
data-source [11] and improving the number of filters.
Finally, we will evaluate the feasibility of incorporating
KPIs for the detection of fake news.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Acknowledgments</title>
      <sec id="sec-5-1">
        <title>This work is part of the research projects AIInFunds (PDC2021-121112-I00) funded by MCIN/AEI/10.13039/501100011033 and by the European Union NextGenerationEU/PRTR.</title>
        <p>URL: http://journal.sepln.org/sepln/ojs/ojs/index.
php/pln/article/view/6292.
[5] A. Gutiérrez-Fandiño, J. Armengol-Estapé,
M. Pàmies, J. Llop-Palao, J. Silveira-Ocampo,
C. P. Carrino, C. Armentano-Oller, C.
RodriguezPenagos, A. Gonzalez-Agirre, M. Villegas, MarIA:
Spanish language models, Procesamiento
del Lenguaje Natural 68 (2022) 39–60. URL:
http://journal.sepln.org/sepln/ojs/ojs/index.php/
pln/article/view/6405.
[6] J. Cañete, G. Chaperon, R. Fuentes, J.-H. Ho,
H. Kang, J. Pérez, Spanish pre-trained bert model
and evaluation data, in: PML4DC at ICLR 2020,
2020, pp. 1–10.
[7] J. A. García-Díaz, F. García-Sánchez, R.
ValenciaGarcía, Smart analysis of economics sentiment
in spanish based on linguistic features and
transformers, IEEE Access 11 (2023) 14211–14224.
URL: https://doi.org/10.1109/ACCESS.2023.3244065.
doi:10.1109/ACCESS.2023.3244065.
[8] M. Á. Rodríguez-García, R. Valencia-Garcí,
F. García-Sánchez, J. J. Samper-Zapater, Creating a
semantically-enhanced cloud services environment
through ontology evolution, Future Generation
Computer Systems 32 (2014) 295–306. URL:
https://www.sciencedirect.com/science/article/pii/
S0167739X13001684. doi:https://doi.org/10.
1016/j.future.2013.08.003.
[9] J. A. García-Díaz, F. García-Sánchez, R.
Valencia García, Overview of FinancES 2023: Financial
targeted sentiment analysis in spanish (to appear),
Procesamiento del Lenguaje Natural (2023).
[10] J. A. García-Díaz, P. J. Vivancos-Vicente, A. Almela,
R. Valencia-García, UMUTextStats: A linguistic
feature extraction tool for spanish, in: Proceedings of
the Thirteenth Language Resources and Evaluation
Conference, 2022, pp. 6035–6044.
[11] J. A. García-Díaz, R. Colomo-Palacios, R.
ValenciaGarcía, Psychographic traits identification based
on political ideology: An author analysis study on
spanish politicians’ tweets posted in 2020, Future
Generation Computer Systems 130 (2022) 59–74.
doi:10.1016/j.future.2021.12.011.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Hale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Lopez</surname>
          </string-name>
          ,
          <article-title>Monitoring banking system connectedness with big data</article-title>
          ,
          <source>Journal of Econometrics</source>
          <volume>212</volume>
          (
          <year>2019</year>
          )
          <fpage>203</fpage>
          -
          <lpage>220</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/ S030440761930082X. doi:https://doi.org/10. 1016/j.jeconom.
          <year>2019</year>
          .
          <volume>04</volume>
          .027.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <article-title>Intelligent finance global monitoring and observatory : A new perspective for global macro beyond big data</article-title>
          ,
          <source>in: 2019 IEEE International Conference on Industrial Cyber Physical Systems (ICPS)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>623</fpage>
          -
          <lpage>628</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICPHYS.
          <year>2019</year>
          .
          <volume>8780156</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Weng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <article-title>Monitoring housing rental prices based on social media:an integrated approach of machine-learning algorithms and hedonic modeling to inform equitable housing policies</article-title>
          ,
          <source>Land Use Policy</source>
          <volume>82</volume>
          (
          <year>2019</year>
          )
          <fpage>657</fpage>
          -
          <lpage>673</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/ S0264837718316429. doi:https://doi.org/10. 1016/j.landusepol.
          <year>2018</year>
          .
          <volume>12</volume>
          .030.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          , Á. Almela,
          <string-name>
            <given-names>G.</given-names>
            <surname>Alcaraz-Mármol</surname>
          </string-name>
          , R. Valencia-García,
          <article-title>UMUCorpusClassifier: Compilation and evaluation of linguistic corpus for natural language processing tasks</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>65</volume>
          (
          <year>2020</year>
          )
          <fpage>139</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>