<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Connected Data to Empower a Financial Services Organization: Pro ject Helix at UBS</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gilad Geron</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tony Hammond</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ilya Venger</string-name>
          <email>ilya.vengerg@ubs.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>UBS Business Solutions AG</institution>
          ,
          <addr-line>5 Broadgate, London, EC2M 2QS</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>UBS Business Solutions AG</institution>
          ,
          <addr-line>Badenerstrasse 574, Zurich</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>UBS is the world's largest wealth manager with a diverse nancial portfolio. Its technology landscape supports core nancial activities as well as enterprise management and other enabling capabilities. UBS Group Technology recently introduced four 'big bets' { a series of initiatives aimed at accelerating developer productivity while streamlining and increasing controls. Big Bet #3 is aimed at making a step change in the understanding of our technology landscape. As is the case with many enterprises, data is often managed on an application level. A big picture, holistic view requires harmonization of data models, identi ers and adding semantics to connect diverse knowledge domains across the organization. Helix is a project under Big Bet #3 to build out an enterprise knowledge graph (EKG). The unique capability of an EKG is the ability to gain insights over disparate datasets. Strong management buy-in from the start helped establish RDF as a critical technology for achieving large-scale data integration. To deliver on the vision of an enterprise-wide knowledge graph we partnered with Cambridge Semantics and their product Anzo. We selected this tool for its end-to-end capability from data ingest to analytics and distribution. One of the unique features of Anzo is Anzograph { a scalable in-memory graph store providing for fast query lookups. Another powerful feature is its very strong support for executing chains of SPARQL queries which allows us to enrich our models incrementally. The ability to reason about technology and the representation of data (physical, logical, conceptual) is a necessary foundation to realizing a consistent and scalable knowledge graph. To achieve this we have started by building out a set of OWL-based domain ontologies describing technology and data assets. Focusing on relational databases rst (where most of our data resides), we wanted to leverage the existing metadata repositories, schemas and the interrelationships between data points to derive ontologies consistent with the current world models. We have developed a mapping model and a methodology based on RDF</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Motivation and current situation</title>
      <p>shapes to express relationships and link from physical columns to ontologies
capturing the expert domain knowledge.</p>
      <p>We present two case studies onboarding legacy models and connecting
instance data.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Exposing existing models</title>
      <p>Our main con guration management database (CMDB) holds extremely rich
information about technology assets and has been our primary source to start
building out a knowledge graph in the technology domain. As well as instance
data, the CMDB includes a semantic metadata repository which holds de nitions
for all classes of assets (component and system types) and a catalog of all the
parameters. However, relationships between objects are not explicitly stored and
the graph-like structure of the data is not explicitly used.</p>
      <p>To create an application-based ontology on this two hundred million triples
dataset, we implemented the following steps:
1. ingested and RDF'ized the contents of the database and the metadata
2. ran a set of SPARQL queries on top of the CMDB metamodel to create an
ontology with classes and properties
3. instantiated the contents of the database and assigned to classes
4. queried the instance data to augment the ontology with object/datatype
property distinctions and domain{range relationships
5. connected the resulting graph (expressed in our model) with other domain
ontologies by running queries on instance data across domains and manual
enrichment
3</p>
    </sec>
    <sec id="sec-3">
      <title>Mapping with shapes</title>
      <p>For rapid expansion of the graph to multiple domains we leveraged a central
data warehouse which had information from the core CMDB alongside multiple
other sources. It was simple to generate a number of CSVs `walking' across
the data landscape. The Anzo ingestion engine onboards tabular data into an
RDF dataset with an auto-generated model (based on source tables and column
names). However, to supplement the data with semantics we needed to `lift' this
RDF dataset into our own semantically-rich domain models.</p>
      <p>We created a set of RDF shapes based on our model for each of the ingest
classes overlaid with type and annotation property mappings. At runtime, a
single SPARQL query runs over the shapes, datasets and our own models
materializing triples into a named graph. The query also generated names for the new
data instances using typed namespaces from our ontologies. This has proven to
be highly e ective with a pilot small dataset size of tens of millions of triples
and a couple of dozen shapes.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>