=Paper= {{Paper |id=Vol-1660/competition-paper2 |storemode=property |title=General Legal Entity Identifier Ontology |pdfUrl=https://ceur-ws.org/Vol-1660/competition-paper2.pdf |volume=Vol-1660 |authors=Robert Trypuz,Dominik Kuziński,Mirek Sopek |dblpUrl=https://dblp.org/rec/conf/fois/TrypuzKS16 }} ==General Legal Entity Identifier Ontology== https://ceur-ws.org/Vol-1660/competition-paper2.pdf
 March 2016




 General Legal Entity Identifier Ontology
           Robert TRYPUZ a,b,1 , Dominik KUZIŃSKI a and Mirek SOPEK a
                                    a MakoLab S.A.
      b The Faculty of Philosophy, The John Paul II Catholic University of Lublin



              Abstract. This paper describes some advantages of RDF/OWL representation of
              Legal Entity Identifier (LEI) data in contrast to the current XML representation. We
              propose an OWL specification of LEI data named General Legal Entity Identifier
              Ontology (GLEIO) that is not only compliant with the current XML representation
              but also allows for representation of changes of LEI and related LEI reference data
              over time. The paper also presents a web application that uses GLEIO to serve
              information about LEI and legal entities.

              Keywords. Legal Entity Identifier, LEI, General Legal Entity Identifier Ontology




1. Introduction

The Legal Entity Identifier (LEI) is a code
based on the ISO 17442. It is assigned to a
legal entity to unambiguously and uniquely
identify it during financial transactions. The
Common Data File Format (henceforward:
CDF Format) is a standard by which an LEI
and a legal entity reference data are reported
by local operating units (LOUs). LOUs’ re-
ports are expressed in XML files defined by
XML schema compliant with the CDF For-
mat. The schema consists of LEI File Header
and LEI Data Record (see figure 1). This pa-
per describes some advantages of RDF/OWL
representation of global LEI data in contrast to
the current XML representation (section 2.1).
We propose an OWL specification of LEI data
named General Legal Entity Identifier Ontol-        Figure 1.: XML schema for the
ogy (GLEIO) that is not only compliant with         Common Data File Format
CDF Format but also allows for representation
of changes of LEI and related LEI reference data over time (section 2.2). We focus on
benefits that can be gained by using semantic representation. The paper also presents a
web portal that uses GLEIO to serve information about LEI and legal entities (section 3).
  1 Corresponding Author: Robert Trypuz, MakoLab, ul. Rzgowska 30, 93-172 Łódź, Poland; E-mail:

robert.trypuz@makolab.pl.
 March 2016


2. GLEIO – an RDF/OWL model of the CDF Format

2.1. Why use RDF/OWL to represent and share LEIs and related LEI reference data

In the White Paper2 presented to Global LEI Foundation (GLEIF)3 we pointed out draw-
backs and limitations of the current XML representation of LEI data. Among them are:
lack of precise semantics, difficult extensibility, lack of global persistent identifiers and
lack of inference. We also showed that OWL ontologies solve all those challenges.
     Ontologies allow to classify entities and have enough expressive power to introduce
constraints on the level of things. For example, we can formally express that “Each global
LEI identifies exactly one legal entity”. Semantic relations can help in linking collected
information. For instance, in the CDF Format we find references to Countries and Sub-
divisions. In a semantic representation they could be linked by parthood relation (e.g.
Nordrhein-Westfalen is part of Germany). As a result the data may be queried in a new
way. For instance, having expressed in a semantic LEI data representation of parthood
relations between Germany and its states, it is easy to issue a request in the SPARQL
query language: “Find a number of LEI codes for each German state.”
     Since ontologies link things (rather than describing the structure of documents), they
are flexible enough to allow for extensions. New data based on terminology from external
resources can be added to an RDF graph. It is possible because RDF/OWL ontologies
use URIs, which are universal global unique identifiers. By using URIs, browsers and
other applications can retrieve the data of the resource identified by that URI.
     Browsers render the retrieved representation so that it can be perceived by a user.
Other applications can use the content negotiation mechanism to access data in the in-
tended format. URIs of different RDF data can be easily linked. The large-scale, in-
fluential initiative known as “Linked Open Data” provides methodology of publishing
and interlinking data. By linking data we can reuse external resources which we need
for our representation (e.g. ontology of ISO 3166 codes for countries). But linked data
also allows for gaining new information and formulating interesting queries. For in-
stance, having semantic representation LEI data and connecting its class of countries with
http://www.geonames.org/countries/ automatically gives us information about
capital, area or population of a country. So for instance the following request can be
formed: “Find a number of legal entities that possess LEI codes for each Country. Order
answers by area of a country.”
     Ontologies also allow for inference. Reasoners for OWL ontologies can check con-
sistency of ontology, i.e. whether the set of ontological restrictions (axioms) has a model.
One can also verify whether the facts explicitly expressed in an OWL ontology do not
break the restrictions.

2.2. General Legal Entity Identifier Ontology

GLEIO4 is an OWL ontology compliant with the CDF Format. Its canonical URI is
http://lei.info/gleio/. Its process of creation involved careful translation of the
  2 https://makolab.com/en/innovation/gleio
  3 https://www.gleif.org
   4 GLEIO has SHIQ(D) expressivity, i.e., it allows complement of any concept, concept intersection, universal

restrictions, unlimited existential quantification, role (binary predicate) hierarchy, transitive and inverse rela-
tions, qualified cardinality restrictions and use of datatype properties, data values and data types. Hyperlinks to
 March 2016


format into OWL structure, i.e. introducing: classes of objects (e.g. Legal Entity, Busi-
ness Register); individuals (e.g. ISO 3166 codes for countries and subdivisions, particular
values of the LEI registration status such as “Annulled” or “Duplicate”); object proper-
ties that relate objects with objects (e.g. hasAssociatedEntity linking concrete legal enti-
ties with other legal entities); data properties relating objects with literals such as strings,
integers or dates (e.g. hasGLEICode linking general LEI registration with its code); con-
straints that specify the conceptualization expressed by CDF Format (e.g. that general
LEI registration has at most one LEI code and that each general LEI code identifies ex-
actly one legal entity). In the end each element from CDF Format found its counterpart
in GLEIO, so it can be concluded that GLEIO accurately represent its domain (fidelity).
To improve intelligibility, each GLEIO class, individual and property that corresponds to
an element in CDF Format was also described by annotations: rdfs:label, skos:definition,
rdfs:comment and rdfs:isDefinedBy. The last two annotations were used to point out an
element from CDF Format that corresponds to a given GLEIO resource. We have also
used skos:closeMatch and skos:narrowMatch to link classes and properties of GLEIO
with Financial Industry Business Ontology (FIBO) [Bennett, 2013].
Requirements for representing change Temporal aspect plays important role in the CDF
Format. A general LEI registration status can change in time. The same with data de-
scribing legal entities, e.g., the headquarters address of a legal entity can change. GLEIF
publishes daily an updated concatenated file which includes all LEIs issued globally and
related LEIs reference data. So everyday we receive a fresh, up-to-date picture but it is
unknown what have changed. In GLEIO we decided to allow for explicit representation
of change. Following [Fine, 1999,Moltmann, 2016] we divided all entities into5 : vari-
able entities, i.e. the entities that have different manifestations at different times, and
non-variable entities that do not have different manifestations as different objects at dif-
ferent times. Among non-variable entities are manifestations of variable entities. Each
such manifestation has its time stamp (being its beginning of existence). The following
assumptions introduced in [Moltmann, 2016] have been adopted in GLEIO:

A1 Each variable entity has its last (or current if the entity exists) manifestation.
A2 If a variable entity is manifested at time t by an entity manifestation m, then it exists
      at t, takes a geographical location of m and has all properties possessed by m in t.
A3 If all manifestations of a variable entity have some property, then the variable en-
      tity possesses the property necessarily (technically speaking the property can be
      directly attributed to the variable entity).

     A1 is an axiom of GLEIO. A2 and A3 are rules implemented on the level of
software and are used when our application (described in section 3 below) serves a
current data about a variable object (e.g. a legal entity or an LEI) – it takes prop-
erties of its current manifestation and the necessary properties of the variable object.

DL axioms of GLEIO, its the turtle serialization, OWL Doc representation and class taxonomy picture can be
found here: https://makolab.com/en/innovation/gleio.
   5 It is a topic for a separate paper to discuss why we have finally chosen this representation and not reification

or 4D approaches compared and discussed in the OWL context e.g. in [Welty and Fikes, 2006].
 March 2016


This representation
allows for change
tracking      across
time stamps. Every
day our LEI appli-
cation reads a com-
plete concatenated
XML file and cre-
ates new manifes-
tation of a legal en-
tity or global LEI,
if a change was
found (e.g., LEI
status was changed
or the headquarters
address of the legal
entity     changed).
Manifestations are
linked by tempo-
ral precedence re-
lation “hasPrede-
cessorEntityMani-
festation” (see fig-
ure 2). The changes               Figure 2.: Variable entity and its manifestations
in LEI registry is
also represented in
GLEIO by manifestations. For instance, the status of LEI registration can change from
“Issued” to “Lapsed” (here we still have to do with the same LEI). In that case, we create
a new manifestation of the LEI. In some other cases, for instance when the LEI status is
“Duplicate”, it will have its successor. In some sense it ceases to exist and stops being
updated. Its successor becomes the ‘issued’ LEI.
     We believe that Moltmann’s idea of representing changes adopted in our ontology
can also be successfully reused for other domains.


3. Web portal allowing for storing and displaying information about LEI in LOD
   standard (lei.info)

GLEIO is part of information system – a web portal – allowing for storing and displaying
information about LEI. The web portal is complaint with Linked Data principles6 .
     In figure 3 we present the components of the web portal. Input: GLEIF’s XML
concatenated files. A console application (Updater) runs on our server once a day. It
downloads a fresh XML data file and checks for newly updated records. For each
of them, the updater parses the data, transforms it to an RDF named graph and
saves it in a Triplestore (we use AllegroGraph installed on a single virtual machine).

  6 https://www.w3.org/DesignIssues/LinkedData.html
 March 2016


The updater is also responsible for tracking changes. GLEIO provides a schema for
the RDF graphs. Global LEI data stored in TripleStore can be queried by SPARQL
endpoint7 . The endpoint can be also used by external applications to access knowl-
edge about LEI data. Romantic Web8 is a MakoLab’s open source library for .NET
used as object-triple mapping tool. “lei.info” currently is a .NET application that al-
lows searching legal entities only by LEI or by name and serves information in mul-
tiple formats (HTML or RDF serializations) by content negotiation and RDF em-
bedded in HTML. In the future it will have a GUI allowing for tracking changes.
At the moment of writing the GLEIO
knowledge base contained over 420K
entity graphs, which gives over 24
million triples in total. There are
many various types of competency
questions that can be answered by
our representation. Let us mention
the following examples of questions:
Query returning Legal Entities from
France registered this year; Query for
entities which renewal date is today;
Query counting entities in LOUs;
Query returning entities with HQ in
VALENCIA, ES; Query for 5 newest
LOUs ordered by registration date.
SPARQL queries execute in times              Figure 3.: Variable entity and its manifesta-
of milliseconds (1 - 500 depending           tions
on query). Response time achieved
when requesting lei.info website for a single LEI record is around 250ms.


4. Conclusion

We have presented the main ontological choices behind Global LEI Ontology. The on-
tology allows for representation of changes of LEI and related LEI reference data over
time by introducing variable entities and their manifestations. We have also described
the components of a web application using GLEIO.


References

[Bennett, 2013] Bennett, M. (2013). FIBO: Best practice in big data. J Bank Regul, 14(3-4):255–268.
[Fine, 1999] Fine, K. (1999). Things and their parts. Midwest Studies in Philosophy, 23(1):61–74.
[Moltmann, 2016] Moltmann, F. (2016). Variable Objects and Truth-Making. In Dumitru, M., editor, Meta-
    physics, Meaning, and Modality. Themes from Kit Fine, volume (forthcoming). Oxford Univerity Press.
[Welty and Fikes, 2006] Welty, C. A. and Fikes, R. (2006). A Reusable Ontology for Fluents in OWL. In
    Bennett, B. and Fellbaum, C., editors, FOIS, volume 150 of Frontiers in Artificial Intelligence and Appli-
    cations, pages 226–236. IOS Press.

  7 http://lei.info/sparql
  8 http://romanticweb.net/