<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Overview of the TBFY Knowledge Graph for Public Procurement?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ahmet Soylu</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brian Elves ter</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philip Turk</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dumitru Roman</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oscar Corcho</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Simperl</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ian Makgill</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chris Taggart</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marko Grobelnik</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Till C. Lech</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Jozef Stefan Institute</institution>
          ,
          <addr-line>Ljubljana</addr-line>
          ,
          <country country="SI">Slovenia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>OpenCorporates Ltd</institution>
          ,
          <addr-line>London, the</addr-line>
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>OpenOpps Ltd</institution>
          ,
          <addr-line>London, the</addr-line>
          <country country="UK">UK</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>SINTEF Digital</institution>
          ,
          <addr-line>Oslo</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Universidad Politecnica de Madrid</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>University of Southampton</institution>
          ,
          <addr-line>Southampton, the</addr-line>
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A growing amount of public procurement data is being made available in the EU for the purpose of improving the e ectiveness, e ciency, transparency, and accountability of government spending. However, there is a large heterogeneity, due to the lack of common data formats and models. To this end, we developed an ontology network for representing and linking tender and company data and ingested relevant data from two prominent data providers into a knowledge graph, called TBFY. In this poster paper, we present an overview of our knowledge graph.</p>
      </abstract>
      <kwd-group>
        <kwd>Public procurement Knowledge graph Ontology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>In the EU, public authorities spend around 14% of GDP on the purchase of
services, works, and supplies every year7. Therefore, a growing amount of public
procurement data is being made available in the EU through public portals for the
purpose of improving the e ectiveness, e ciency, transparency, and accountability
of government spending. However, there is a large heterogeneity, due to the lack
of common data formats and models for exposing such data.</p>
      <p>
        There are various standardization initiatives for electronic procurement, such
as Open Contracting Data Standard (OCDS)8 and TED eSenders 9. However,
these are mostly oriented to achieve interoperability, document-oriented, and
provide no standardised practices to refer to third parties, companies participating
in the process, etc. This again generates a lot of heterogeneity. The Semantic Web
approach has been proposed as a response [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For example, several ontologies have
been developed, such as PPROC ontology [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for describing public processes and
contracts, LOTED2 ontology [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for public procurement notices, PCO ontology [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
for contracts in public domain, and MOLDEAS ontology [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for announcements
about public tenders. Each of these was developed with di erent concerns in
mind (legal, process-oriented, etc.) without signi cant adoption so far.
      </p>
      <p>To this end, we developed an ontology network for representing and linking
tender and company data and ingested relevant data from two prominent data
providers into a knowledge graph, called TBFY. In this poster paper, we present
an overview of our knowledge graph for public procurement.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Knowledge Graph</title>
      <p>We integrated two datasets according to an ontology network: tender data
provided by OpenOpps10 in the OCDS format and company data provided by
OpenCorporates11. OpenOpps has gathered over 2M tender documents from
more than 300 publishers through Web scrapping and by using open APIs, while
OpenCorporates currently has 140M entities collected from national registers.
2.1</p>
      <sec id="sec-2-1">
        <title>Ontology Network</title>
        <p>We are currently using two main ontologies. First, an ontology for tender data
(see Figure 1) that we developed using the OCDS' data model12.</p>
        <p>Second, we reused the euBG ontology for company data13. Both ontologies
reuse other ontologies and vocabularies (FOAF, Dublin Core, etc.).
10 https://openopps.com
11 https://opencorporates.com
12 https://github.com/TBFY/ocds-ontology
13 https://github.com/euBusinessGraph/eubg-data
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Data Ingestion</title>
        <p>The data ingestion process is composed of several steps using data APIs of both
providers (see Figure 2). Initially, company data is extracted from OpenOpps
for a given period of time and preprocessed primarily to handle null values.
Suppliers appearing in tender data are matched against company data provided
by OpenCorporates by using the reconciliation service of OpenCorporates. The
matched company data is extracted and then supplier data is annotated with
the corresponding company identi ers.</p>
        <p>OpenOpps</p>
        <p>API</p>
        <p>1.</p>
        <p>Fetch
tender data</p>
        <p>2.</p>
        <p>Preprocess</p>
        <p>Annotate
Reconciliation</p>
        <p>API
Reconcile
3.
4.</p>
        <p>OpenCorporates</p>
        <p>API</p>
        <p>5.</p>
        <p>Fetch
company
data
company data
tender data</p>
        <p>Ontologies</p>
        <p>6.</p>
        <p>RML
Mapping</p>
        <p>Knowledge</p>
        <p>Graph</p>
        <p>7.</p>
        <p>Upload to</p>
        <p>KG</p>
        <p>Finally, the extracted datasets are translated to RDF using the RDF Mapping
Language (RML)14 according to our ontology network. The supplier data and
company data is linked through the owl:sameAs property. Thereafter data is
uploaded to a graph database, namely GraphDB15.
The current release of the knowledge graph includes 23M triples originating from
tender data collected initially for the rst quarter of 2019. The knowledge graph
is available online16. An example query and its results are depicted in Figure 3.
The example query lists top ten companies in the Norwegian jurisdiction that
have the highest number of supplier role, where the jurisdiction data comes from
OpenCorporates and contract data comes from OpenOpps.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Open Issues</title>
      <p>Currently, one of the main issues concerns data quality including missing,
duplicate, poorly formed, and erroneous data. For example, the missing address
14 http://rml.io/
15 http://graphdb.ontotext.com/
16 http://data.tbfy.eu
information, variations on address format, etc. hinder the quality of reconciliation
process. Currently, we are working on various approaches to improve data quality
ranging from machine learning to crowd-sourcing.</p>
      <p>Acknowledgements. This work has been partly funded by the EU H2020
projects TheyBuyForYou (780247) and euBusinessGraph (732003).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Alvarez-Rodr guez</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          , et al.:
          <article-title>New trends on e-Procurement applying semantic technologies: Current status and future challenges</article-title>
          .
          <source>Computers in Industry</source>
          <volume>65</volume>
          (
          <issue>5</issue>
          ),
          <volume>800</volume>
          {
          <fpage>820</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Distinto</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , et al.:
          <article-title>LOTED2: An ontology of European public procurement notices</article-title>
          .
          <source>Semantic Web</source>
          <volume>7</volume>
          (
          <issue>3</issue>
          ),
          <volume>267</volume>
          {
          <fpage>293</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Mun</surname>
          </string-name>
          <article-title>~oz-</article-title>
          <string-name>
            <surname>Soro</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          , et al.:
          <article-title>PPROC, an ontology for transparency in public procurement</article-title>
          .
          <source>Semantic Web</source>
          <volume>7</volume>
          (
          <issue>3</issue>
          ),
          <volume>295</volume>
          {
          <fpage>309</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Necasky</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al.:
          <article-title>Linked data support for ling public contracts</article-title>
          .
          <source>Computers in Industry</source>
          <volume>65</volume>
          (
          <issue>5</issue>
          ),
          <volume>862</volume>
          {
          <fpage>877</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Rodr</surname>
            <given-names>guez</given-names>
          </string-name>
          ,
          <string-name>
            <surname>J.M.A.</surname>
          </string-name>
          , et al.:
          <article-title>Towards a Pan-European E-Procurement Platform to Aggregate, Publish and Search Public Procurement Notices Powered by Linked Open Data: the Moldeas Approach</article-title>
          .
          <source>International Journal of Software Engineering and Knowledge Engineering</source>
          <volume>22</volume>
          (
          <issue>3</issue>
          ),
          <volume>365</volume>
          {
          <fpage>384</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>