<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Integrating and Analysing Public Procurement Data through a Knowledge Graph: A Demonstration in a Nutshell</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ahmet Soylu</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oscar Corcho</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brian Elves ter</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Badenes-Olmedo</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francisco Yedro Mart nez</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matej Kovacic</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matej Posinkovic</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ian Makgill</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chris Taggart</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Simperl</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Till C. Lech</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dumitru Roman</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Jozef Stefan Institute</institution>
          ,
          <addr-line>Ljubljana</addr-line>
          ,
          <country country="SI">Slovenia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>King's College London</institution>
          ,
          <addr-line>London, the</addr-line>
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>OpenCorporates Ltd</institution>
          ,
          <addr-line>London, the</addr-line>
          <country country="UK">UK</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>OpenOpps Ltd</institution>
          ,
          <addr-line>London, the</addr-line>
          <country country="UK">UK</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>SINTEF AS</institution>
          ,
          <addr-line>Oslo</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Universidad Politecnica de Madrid</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents a demonstrator of a knowledge graph based approach for integrating and reconciling cross-border and crosslanguage procurement and company data from distributed data sources. The demonstrator also includes analysis of the resulting knowledge graph, exempli ed in anomaly detection and cross-lingual search.</p>
      </abstract>
      <kwd-group>
        <kwd>Public procurement Knowledge graph Linked data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The availability of high quality, open, and linked procurement data presents
an opportunity to enhance the public procurement processes. In this respect,
several directives were put forward by the European Commission (e.g., Directive
2003/98/EC and Directive 2014/24/EU8), which led to the emergence of national
and international public procurement portals. However, there is a lack of common
agreement across the European Union (EU) on the data formats for exposing
such data sources and on the data models for representing such data, leading to
a highly heterogeneous technical landscape.</p>
      <p>
        To this end, in order to deal with the technical heterogeneity and to connect
disparate data sources currently created and maintained in silos, we developed a
platform consisting of a set of modular REST APIs and ontologies, to publish,
curate, integrate, analyse, and visualise an EU-wide, cross-border, and cross-lingual
procurement knowledge graph (KG). This paper presents a demonstrator for the
knowledge graph based platform and end-user tools for integrating and
reconciling procurement and company data from distributed data sources, including
analytics tools exempli ed in anomaly detection and cross-lingual search [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Reconciliation</p>
      <p>API</p>
      <p>OC API</p>
      <p>OO API
owl:sameAs
Distributed
data sets</p>
      <p>JSON KG ingestion</p>
      <p>ETL</p>
      <p>RDF</p>
      <p>Triple
store
1.</p>
      <p>Download
procurement
data</p>
      <p>2.</p>
      <p>Reconcile
supplier
data</p>
      <p>OpenOpps (OO)
procurement database
OpenCorporates (OC)
company database
API Gateway</p>
      <p>JSON
SPARQL API</p>
      <p>Core API
rdf:seeAlso</p>
      <p>
        Cross-lingual
Search API
Document
store
Procurement and company data underlying the KG is provided by two main data
providers: OpenOpps1 for procurement data (e.g., tenders and contracts) and
OpenCorporates2 for supplier data (i.e., companies). OpenOpps has gathered
over three million tender documents from more than 685 publishers through
Web scraping and by using open APIs, while OpenCorporates currently has
140 million entities collected from national registers. We integrated the two
high-quality data sets according to an ontology network to form a knowledge
graph. The ontology network includes an ontology for representing procurement
data based on Open Contracting Data Standard (OCDS), namely the OCDS
ontology3 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and another ontology for representing company data, namely the
euBusinessGraph ontology4 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>The data ingestion process (see Fig. 1) comprises several steps using data
APIs of both providers, including data curation, matching suppliers appearing in
tender data against company data (i.e., reconciliation), and translating data sets
into the underlying graph data representation (i.e., RDF) with respect to the
ontology network and linked data principles. The platform (see Fig. 1) employs a
triple store for the generated RDF-based data, linked to original data sources
(using owl:sameAs), and a document store for the documents associated with</p>
      <sec id="sec-1-1">
        <title>1 https://openopps.com</title>
      </sec>
      <sec id="sec-1-2">
        <title>2 https://opencorporates.com</title>
      </sec>
      <sec id="sec-1-3">
        <title>3 https://github.com/TBFY/ocds-ontology/tree/master/model</title>
      </sec>
      <sec id="sec-1-4">
        <title>4 https://github.com/euBusinessGraph/eubg-data</title>
        <p>the public procurement data (using rdfs:seeAlso). The data is made available
through a SPARQL API, a core REST-based API (i.e., KG API), a cross-lingual
search API, and an API gateway providing a single entry point to the previously
mentioned APIs. The current release of the KG covers data from January 2019
onwards. New data is onboarded on a daily basis. As of August 2020, the KG
consists of more than 126 million triples and contains information about 1,31
million tenders, 1,54 million awards, and more than 99 thousand companies5.
The source data collected from the data providers in JSON and the KG data
in RDF are made openly available under the Open Database License (ODbl)6
on Zenodo7. An online catalogue is available8 providing access to data, schemas,
core APIs, tools, and added value services (see Fig. 2).
3</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Data Analysis</title>
      <p>We implemented a number of analysis techniques on the KG: anomaly detection
by using ML techniques for identifying patterns and anomalies, such as fraudulent
behaviour or monopolies in procurement processes and networks across data
sets produced independently; and, cross-lingual document search for nding
documents that are similar to a given one independently of its language.</p>
      <p>Public procurement is particularly susceptible to corruption, which can impede
economic development, create ine ciencies, and reduce competitiveness. At the
same time, manually analysing a large volume of procurement cases is not feasible.
Therefore, rstly, we applied several ML techniques, i.e., supervised, unsupervised,
and statistical, on top of the Slovenian public procurement data in the KG to</p>
      <sec id="sec-2-1">
        <title>5 http://data.tbfy.eu</title>
      </sec>
      <sec id="sec-2-2">
        <title>6 https://opendatacommons.org/licenses/odbl</title>
      </sec>
      <sec id="sec-2-3">
        <title>7 https://github.com/TBFY/data-sources</title>
      </sec>
      <sec id="sec-2-4">
        <title>8 https://tbfy.github.io/platform/</title>
        <p>identify patterns and anomalies. We implemented a system and made it available
online9. The system developed is capable of processing tens of millions of records
and allows detecting a large class of anomalies. Fig. 3 depicts the supervised
analysis approach implemented in our platform based on a decision tree. Users
select parameters by their own choice (e.g., buyer size and bidder municipality),
and explore various parameters contributing to the success of public tenders.</p>
        <p>
          Procurement processes are not only creating structured data, but also
constantly creating additional documents. These are commonly published in the
o cial language of the corresponding public administrations. Only some of these
are multilingual, but the documents in the local language are typically longer.
Therefore, secondly, we worked on an added-value service10 with the possibility of
nding documents that are similar to a given one independently of the language
in which it is made available [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. We also generated a Jupyter notebook with
some representative examples, so as to facilitate its use11. This service (see Fig. 4)
is based on the use of unsupervised probabilistic topic models, using cross-lingual
labels from sets of cognitive synonyms (synsets) to establish relations between
language-speci c topics.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>In this paper, we demonstrated the use of Semantic Web and Linked Data
technologies and principles to integrate open procurement and company data
sets, and advanced analytics and to unlock their value. The KG enabled easier</p>
      <sec id="sec-3-1">
        <title>9 http://tbfy.ijs.si 10 http://tbfy.librairy.linkeddata.es/search-api 11 http://bit.ly/tbfy-search-demo</title>
        <p>and advanced analytics, which was otherwise not possible. However, we also faced
a high number of data quality issues, such as missing, duplicate, and erroneous
data, even though there are mandates in place for buyers to provide correct data.
Acknowledgements. The work reported in this paper is partly funded by EC
H2020 TheyBuyForYou (780247) and euBusinessGraph (grant 732003) projects.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Badenes-Olmedo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , et al.:
          <article-title>Scalable Cross-lingual Similarity through languagespeci c Concept Hierarchies</article-title>
          .
          <source>In: Proc. of K-CAP</source>
          <year>2019</year>
          . pp.
          <volume>147</volume>
          {
          <issue>153</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Roman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , et al.:
          <article-title>The euBusinessGraph Ontology: a Lightweight Ontology for Harmonizing Basic Company Information</article-title>
          .
          <source>Semantic Web (under review)</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Soylu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , et al.:
          <article-title>Towards an Ontology for Public Procurement Based on the Open Contracting Data Standard</article-title>
          .
          <source>In: Proc. of I3E 2019</source>
          . pp.
          <volume>230</volume>
          {
          <issue>237</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Soylu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , et al.:
          <article-title>Enhancing Public Procurement in the European Union through Constructing and Exploiting an Integrated Knowledge Graph</article-title>
          .
          <source>In: Proc. of ISWC</source>
          <year>2020</year>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>