<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>V. Ramón-Ferrer);</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Contract Hub: Towards a More Accessible Public Procurement</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Virginia Ramón-Ferrer</string-name>
          <email>virginia.ramon@upm.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Álvaro Fontecha</string-name>
          <email>alvaro.fontecha@upm.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Badenes-Olmedo</string-name>
          <email>carlos.badenes@upm.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oscar Corcho</string-name>
          <email>oscar.corcho@upm.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>List of languages: German</institution>
          ,
          <addr-line>Dutch, French, Greek, Hungarian, Italian, Maltese, Portuguese, Romanian, Slovak, Spanish, Catalan, Galician, Basque and Swedish</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad Politécnica de Madrid, Campus de Montegancedo, Boadilla del Monte, Comunidad de Madrid</institution>
          ,
          <addr-line>28660</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>9</fpage>
      <lpage>0009</lpage>
      <abstract>
        <p>Public procurement is one of the pillars of the proper functioning of a country, given the direct impact it has on its economy. The European Union places great importance on the transparency and accessibility of data related to public procurement, giving public access to a significant part of public procurement contracts. The problem is that these data are often dificult for many users to consult and understand, in addition to being time consuming. In this paper, we present EU Contract Hub, a platform for eficient exploration and analysis of large amounts of data on public procurement in Europe. The tool incorporates an innovative ingestion pipeline that unifies and enriches the information of more than a million public contracts extracted from various formats into a single common structure. This unification of information enables the eficient analysis of large amounts of public procurement data, especially related to healthcare and its impact during the COVID-19 period, within the framework of the European project 'Procure: Public procurement assessment in the healthcare sector'. EU Contract Hub can be accessed through the platform page at https://procure.linkeddata.es/.</p>
      </abstract>
      <kwd-group>
        <kwd>Public procurement consultation</kwd>
        <kwd>Public contract management</kwd>
        <kwd>Digital statistical tool</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Every day, public institutions from all the European Union member states purchase goods and services
of various kinds. This process, known as public procurement, has to be highly rigorous and transparent,
given the impact that it has to the economical state of each country and the European Union as a
whole. Regarding transparency, the European Union has various initiatives to bring procurement
of the EU and its online version
Tenders Electronic Daily (TED) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], for example, is a platform where public procurement contracts with
a high economical value across the EU are published to ensure transparency and open competition.
      </p>
      <p>
        One clear example of the relevance of procurement is the task carried out during the COVID-19 crisis,
where the purchase of medical goods and services played a major role in combating the pandemic.
PROCURE [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is a EU funded project that aims to assess the impact the COVID-19 pandemic had on
health procurement organisations and practices from 13 participating EU countries. In the context of
this project, we present EU Contract Hub, an open platform that facilitates the exploration and analysis
of procurement data, making this information accessible to interested institutions and the general
public. Leveraging large amounts of heterogeneous data can be challenging, so we defined an ingestion
work that unifies contracts in diferent formats into a single common structure to facilitate the joint
consultation of these documents. With this, we aim to simplify the public procurement consultation
process, adapting public consultation data to a more user-friendly representation for non-experts and
using large language models to further enrich this information. We enable an easier cross-reference of
      </p>
      <p>CEUR</p>
      <p>ceur-ws.org
information along diferent data sources by homogenising both the data format and the language. This
platform can be openly accessed through its online page at https://procure.linkeddata.es/.</p>
    </sec>
    <sec id="sec-2">
      <title>2. System implementation</title>
      <p>
        EU Contract Hub1 is a document-oriented platform which allows us to create data visualisations
and integrate them into interactive dashboards. Currently2, this hub stores and enables the eficient
consultation of over 1.2 million public procurement contracts extracted from Tenders Electronic Daily
(TED) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], in addition to socioeconomic data about the consortium countries extracted from OECD [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
and EUROSTAT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. These sources were chosen taking into account a pre-defined series of questions
regarding the public procurement environment in the healthcare sector and key players, such as the
total amount of Healthcare Public contracts in value for each member country, and the organization of
the national health system, such as the overall country population, that the PROCURE [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] members
needed to be able to answer using our tool. As shown in figure 2, data is sourced and unified from
various standardised forms completed by contracting authorities, which results in a wide range of
quality and completeness levels across documents. Given that the same document can exist in diferent
versions with various formats, specifically eForms, XML and CSV, and each format may contain
information related to the contract that isn’t present in the others, we implemented an ingestion
framework that processes each formatted document and unifies the documents to gather the most
1Link to source code: https://github.com/procure-project/EU-Contract-Hub/
2This document was drafted on the 17th of September 2024, but the tool is continuously updated with new information and
visualisations, so there may be changes in the future.
information possible in relation to a specific contract in a single clean, processed structure. When
conflicts exist, meaning that the same attribute is available in various formats, we generally prioritise
data extracted from eForms and XML, except some specific attributes, such as the contract value,
where we prioritise CSV information. We also collaborated with procurement experts to identify
information that can be inferred through a set of rules from the contract data but was not explicitly
present, such as the legal mechanisms used in each contract. This inference allows us to provide more
extensive information of interest to both the consortium of the project and the general public of our
tool. Furthermore, though there are some search tools available at TED that enable the consultation of
the contracts individually, our aim was to also be able to identify overall trends, which the questions
defined in the matrix of the project required, and have the ability to create user-friendly visualizations
for non-experts. In EU Contract Hub we provide said functionalities to ensure this kind of analysis is
supported .
      </p>
      <p>
        The contracts stored have information related to the contract awards and tenders, including the
type and purpose of the contract, the value, and the winning bidders. Given the multilingualism
present in the EU, most contracts are in the native language of the contracting country. This, as a
user who wants to be able to consult contracts from all over the EU, makes it dificult to grasp a large
part of the relevant information represented in the contract notices. To tackle this problem, we have
machine-translated the non-English contract fields to English in order to widen the accessibility to the
information. Given that we could not supervise the translation of over a million contracts and needed a
model that could automatically translate at least 15 diferent languages 3 to English, we decided to use
the Deep-Translator library [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] with the Google Translate [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] model, which we previously evaluated
over two corpus extracted from OPUS [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the Opus-100 Corpus [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] with a sample size of 2000 and the
OPUS Europarl Corpus [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] with a sample size of 4000. This evaluation presented results of COMET
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] of 0.81 for the Opus-100 Corpus sample and 0.86 for the OPUS Europarl Corpus sample over the
16 non-English languages used in the 13 member states of the ProCure Project. This model was also
chosen since it could automatically detect the original language used and translate it to English,
saving the need to detect the language beforehand. Currently2, about 50% of the contracts in the
platform have translated attributes available, but the translation of the remaining contracts’ attributes is
under development, and we anticipate that the remaining translations will be available in the near future.
      </p>
      <p>This platform has been developed following the needs of the end users, especially the
consortium of the ProCure project. In addition, we are also constantly receiving feedback from
consortium members and applying the necessary changes to further improve the tool. The two
main requirements that we have based the development are that the contracts can be consulted
individually, by applying filters based on the existing attributes of each contract, and globally, by
creating data visualisations that enables the user to analyse the tendencies present in them. To
do this, the platform is divided into two main consulting interfaces, namely 1) Dashboard and 2) Discover :
1) Dashboard. This interface enables graphical visualisation of the data to extract statistics and
tendencies. As shown in figure 3a, we have created three main areas in the visualisation: The first
highlighted area is where the data filters are to be entered if needed. These filters can be entered
by hand, using Documentum Query Language (DQL), or using the ”Add Filter” option, where a filter
dialogue appears to enable the user to define the filter through selection of options. The second area
corresponds to the general information display, where the number of contracts taken into account for
the visualisations below is displayed together with a small window where we can filter the data directly
by Common Procurement Vocabulary (CPV) values and the list of questions defined by the consortium.
Finally, the third highlighted area displays the diferent graphics and tables needed to answer the
consortium questions according to the filters entered. The visualisations on the dashboard are currently</p>
      <p>(a) Dashboard
(b) Discover
designed to support natural language questions defined by the ProCure project consortium in the
context of healthcare procurement, but they can be modified and other visualisations can be created if
needed.</p>
      <p>2) Discover. This interface enables the consultation of individual contracts through diferent
attributes, such as the presence of the word ”mask” in one of the attributes or the existence of
a specific value of a specific attribute. This search platform enables us to consult over a million
contracts individually in a matter of seconds. As shown in figure 3b, there are three main areas in the
visualisation. First we have the filter panel, where we can enter the requirements of our search by
attributes, such as defining a specific country of contracts or a range of CPV values, for example. Same
as with the dashboard, filters can be entered both by hand, using Documentum Query Language (DQL),
or using the ”Add Filter” option. The second highlighted area shows the attribute selection, where the
election of displayed attributes of the contracts is defined according to the users’ necessities. The third
and last highlighted area displays the results of the search with the atributtes previously selected and
the filters, if defined, applied. The contracts displayed there can be then consulted individually by
clicking on the magnifying glass icon next to the desired contract, which will open a side window
where the document details will be fully shown.</p>
    </sec>
    <sec id="sec-3">
      <title>3. EU Contract Hub</title>
      <p>During the demo we are going to demonstrate two use cases: a) Extraction of statistics and visualisations
related to healthcare contracts from Spain and France with a value equal or higher than 100.000€ and b)
The search for information on a specific contracts, namely the procurement route of contracts made by
the Spanish institution ”Universidad Politécnica de Madrid” about protective gear.</p>
      <p>Case a) To do this search, we go to the ”Dashboard”4 interface of the platform (on the side menu of
the platform, OpenSearch dashboards →Dashboards). We stated that the requirements of our search
are contracts from Spain and France with a value equal or higher than 100.000€, so we define a DQL
query stating this filters: ”(Country: ES or Country: FR) and Value &gt;= 100000”. The graphics and tables
displayed in the dashboard will be adapted following the filters defined. For example, as we can see
in figure 4, we can see the proportion of contracts that are healthcare related against the ones that
are not for the chosen countries (figure 4a). We can also find the numerical value of the volume
of contracts that are healthcare related against the total number of contracts and the total
monetary value of these (figure 4b) or the distribution of the healthcare contracts by their CPV value (figure 4c).
(a) Healthcare Contracts by Country
(b) Healthcare Contracts Table
(c) Healthcare contracts by CPV</p>
      <p>Case b) To do this search, we are going back to the ”Discover” interface of the platform (on the side
menu of the platform, OpenSearch dashboards →Discover ). We stated that the requirements of our
search are that the Contracting Authority is the Universidad Politécnica de Madrid and that the product
asked for is protective gear, so we define a DQL query stating this filters: ”Contracting Authority Name:
”Universidad Politécnica de Madrid” and CPV Description: ”Protective gear””. Given that we wanted to get
the procurement route of these contracts, in the attribute selection area we select ”Procurement Route”,
in addition to ”_id”, ”Title” and ”CPV”, to have a little more context over the resulting contracts. As
shown in figure 5, we can see that this search gives us two resulting contracts, which both state that
they follow a ”Direct Procurement” route.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions and future work</title>
      <p>In this paper we presented EU Contract Hub, an open platform for the eficient exploration, analysis,
unification, and enrichment of over a million public procurement contracts. We also presented
the ingestion framework created to unify various contract formats into a single, unified structure
that enables the eficient consultation of both individual and global information regarding public
procurement.</p>
      <p>As future work, we will continue to study the feedback received by the users and implement the
changes needed to improve our tool, in addition to the finalisation of the contract translations. We are
also currently developing the partial and/or total verbalisation of contracts using , which will enable
us to give users a textual summary of the contract in addition to the table-formatted data available
currently. Our intention is to be able to create a Retrieval Augmented Generation system using all this
data where contract consultation using natural language is enabled.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work is supported by the European Union under the project ProCure (Grant Agreement number
101128437).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] Supplement to the oficial journal of the european union</article-title>
          , https://ted.europa.eu/TED/main/ HomePage.do,
          <year>2024</year>
          . Accessed:
          <fpage>2024</fpage>
          -09-16.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Procure</surname>
          </string-name>
          :
          <article-title>Public procurement assessment in the healthcare sector</article-title>
          , https://www.projectprocure.eu/,
          <year>2024</year>
          . Accessed:
          <fpage>2024</fpage>
          -09-16.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[3] Organisation for Economic Co-operation and</article-title>
          <string-name>
            <surname>Development</surname>
          </string-name>
          , OECD Statistics Database,
          <year>2024</year>
          . URL: https://stats.oecd.org, accessed:
          <fpage>2024</fpage>
          -09-13.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Eurostat</surname>
            ,
            <given-names>Eurostat</given-names>
          </string-name>
          <string-name>
            <surname>Database</surname>
          </string-name>
          ,
          <year>2024</year>
          . URL: https://ec.europa.eu/eurostat, accessed:
          <fpage>2024</fpage>
          -09-13.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Baccouri</surname>
          </string-name>
          , Deep-translator, https://github.com/nidhaloff/deep-translator,
          <year>2023</year>
          . Accessed:
          <fpage>2024</fpage>
          - 09-16.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Google</surname>
          </string-name>
          , Google translate, https://translate.google.com,
          <year>2024</year>
          . Accessed:
          <fpage>2024</fpage>
          -09-16.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tiedemann</surname>
          </string-name>
          ,
          <article-title>Parallel data, tools and interfaces in OPUS</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Choukri</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Declerck</surname>
            ,
            <given-names>M. U.</given-names>
          </string-name>
          <string-name>
            <surname>Doğan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Maegaard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mariani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Moreno</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Odijk</surname>
          </string-name>
          , S. Piperidis (Eds.),
          <source>Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)</source>
          ,
          <source>European Language Resources Association (ELRA)</source>
          , Istanbul, Turkey,
          <year>2012</year>
          , pp.
          <fpage>2214</fpage>
          -
          <lpage>2218</lpage>
          . URL: http://www. lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Titov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sennrich</surname>
          </string-name>
          ,
          <article-title>Improving massively multilingual neural machine translation and zero-shot translation</article-title>
          , in: D.
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Chai</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Schluter</surname>
          </string-name>
          , J. Tetreault (Eds.),
          <article-title>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>1628</fpage>
          -
          <lpage>1639</lpage>
          . URL: https: //aclanthology.org/
          <year>2020</year>
          .acl-main.
          <volume>148</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .acl- main.148.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Koehn</surname>
          </string-name>
          ,
          <article-title>Europarl: A parallel corpus for statistical machine translation</article-title>
          ,
          <source>in: Proceedings of Machine Translation Summit X: Papers</source>
          , Phuket, Thailand,
          <year>2005</year>
          , pp.
          <fpage>79</fpage>
          -
          <lpage>86</lpage>
          . URL: https://aclanthology.org/
          <year>2005</year>
          .mtsummit-papers.
          <volume>11</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Stewart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Farinha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lavie</surname>
          </string-name>
          ,
          <article-title>COMET: A neural framework for MT evaluation</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>2685</fpage>
          -
          <lpage>2702</lpage>
          . URL: https: //www.aclweb.org/anthology/2020.emnlp-main.
          <volume>213</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>