<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Evaluation of Knowledge Graph Embeddings for Autonomous Driving Data: Experience and Practice</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ruwan Wickramarachchi</string-name>
          <email>ruwan@email.sc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cory Henson</string-name>
          <email>cory.henson@us.bosch.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amit Sheth</string-name>
          <email>amit@sc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Artificial Intelligence Institute, University of South Carolina</institution>
          ,
          <addr-line>Columbia, SC</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Bosch Research and Technology Center</institution>
          ,
          <addr-line>Pittsburgh, PA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>23</fpage>
      <lpage>25</lpage>
      <abstract>
        <p>The autonomous driving (AD) industry is exploring the use of knowledge graphs (KGs) to manage the vast amount of heterogeneous data generated from vehicular sensors. The various types of equipped sensors include video, LIDAR and RADAR. Scene understanding is an important topic in AD which requires consideration of various aspects of a scene, such as detected objects, events, time and location. Recent work on knowledge graph embeddings (KGEs) - an approach that facilitates neuro-symbolic fusion - has shown to improve the predictive performance of machine learning models. With the expectation that neuro-symbolic fusion through KGEs will improve scene understanding, this research explores the generation and evaluation of KGEs for autonomous driving data. We also present an investigation of the relationship between the level of informational detail in a KG and the quality of its derivative embeddings. By systematically evaluating KGEs along four dimensions - i.e. quality metrics, KG informational detail, algorithms, and datasets - we show that (1) higher levels of informational detail in KGs lead to higher quality embeddings, (2) type and relation semantics are better captured by the semantic transitional distance-based TransE algorithm, and (3) some metrics, such as coherence measure, may not be suitable for intrinsically evaluating KGEs in this domain. Additionally, we also present an (early) investigation of the usefulness of KGEs for two use-cases in the AD domain.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Forecasters predict that fully autonomous vehicles could be
commercially available in the next few years; and within
the next few decades (i.e. 2040) half of all vehicles sold
and 40 percent of vehicle travel could be autonomous
        <xref ref-type="bibr" rid="ref11">(Litman 2019)</xref>
        . While racing to realize this vision, the
automotive industry is investing heavily into machine learning and
other relevant AI technologies. To meet the increasing data
demands of ML algorithms, fleets of vehicles are now
deployed in multiple cities around the world and collecting
massive amounts of data. These vehicles are equipped with
various types of heterogeneous sensors, including – but not
limited to – video, LIDAR, and RADAR.
      </p>
      <p>To manage these vast amounts of automotive sensor data,
companies are beginning to experiment with the use of KGs.
In other industries, and for many years, KGs have proven
invaluable for helping to manage data stored within enterprise
data lakes, which are growing rapidly in popularity. More
specifically, KGs help to enable the principles of FAIR data
– i.e. findability, accessibility, interoperability, and re-use –
across an enterprise.</p>
      <p>
        Current research into the topic of neuro-symbolic fusion1
        <xref ref-type="bibr" rid="ref19">(Nickel et al. 2015; Sheth et al. 2019)</xref>
        , however, is
beginning to point to new and exciting uses of KGs. Essentially,
KGs are a key source of high-quality domain knowledge
and KGE technology is now enabling ML algorithms to
more directly access this knowledge. Recent studies have
already shown that the use of KGEs leads to improved
performance and predictive capabilities (
        <xref ref-type="bibr" rid="ref5">Chen et al. 2017</xref>
        ;
Wang et al.
        <xref ref-type="bibr" rid="ref4">2019; Myklebust et al. 2019</xref>
        ). For this reason,
we believe that neuro-symbolic fusion through KGEs may
provide valuable knowledge needed to improve scene
understanding for autonomous driving.
      </p>
      <p>
        In this paper, we share our experience with generating
and evaluating KGEs for the AD domain. To the best of our
knowledge, this is the first attempt of its kind in this
domain. Our investigation begins with two popular benchmark
datasets from Aptiv and Lyft. From each of these datasets
we generate multiple KGs with varying degrees of
informational detail. The generated KGs focus on representing
the various scenes, or situations, that an autonomous
vehicle encounters on the road. The purpose of creating KGs
with varying degrees of detail is to enable an examination
of the relationship between KG detail and the quality of
derivative embeddings. Each KG is then translated into a
set of KGEs, each derived from one-of three popular
embedding algorithms; including TransE
        <xref ref-type="bibr" rid="ref3">(Bordes et al. 2013)</xref>
        ,
RESCAL
        <xref ref-type="bibr" rid="ref16">(Nickel, Tresp, and Kriegel 2011)</xref>
        , and HoLE
        <xref ref-type="bibr" rid="ref15 ref17">(Nickel, Rosasco, and Poggio 2016)</xref>
        . Finally, the quality of
each KGE is systematically evaluated based on the
framework proposed in
        <xref ref-type="bibr" rid="ref1">(Alshargi et al. 2019)</xref>
        .
      </p>
      <p>1https://www.digitaltrends.com/cool-tech/neuro-symbolic-aithe-future/</p>
      <p>Our analysis of evaluating the KGEs along four
dimensions – i.e. quality metrics, KG informational detail,
algorithms, and datasets – leads to some interesting findings.
First, we show that KGE quality significantly improves as
the informational detail of a KG increases. Second, focusing
on the evaluation measures, we report that some of the
metrics such as the coherence measure may not be suitable to
evaluate KGEs in this domain. When considering the
effectiveness of KGE algorithms, we identify that the semantic
transitional distance-based TransE algorithm captures type
and relational semantics better than algorithms from the
class of semantic matching-based models. It is interesting
to note that these findings are consistent across the
evaluations on two datasets. Finally, we report preliminary
observations on using KGEs for two use cases from the AD
domain. Specifically, we demonstrate how the scene/sub-scene
understanding was improved as KG informational detail was
increased, and how KGEs can be used to compute scene
similarity.</p>
      <p>The two primary contributions of this paper include: (1)
a demonstration of the process of creating and evaluating
KGEs for AD data, and (2) an (early) examination of the
relationship between KG detail and the quality of KGEs.
In Section 2, we discuss the construction of KGs from the
benchmark automotive driving datasets. The translation of
KGs to KGEs is explained in Section 3, while Section 4
focuses on their evaluation. Details of the technology used, as
well as related work, will be discussed in each individual
section. An investigation on the usefulness of semantics in
the AD domain is discussed in Section 5. Finally, in Section
6 we conclude with a summary of our overall results and
directions for future research.</p>
      <p>2</p>
    </sec>
    <sec id="sec-2">
      <title>Scene Knowledge Graphs</title>
      <p>
        To evaluate the KGEs for the AD domain, several KGs were
created based on two popular benchmark datasets; NuScenes
from Aptiv (Caesar et al. 2019) and Lyft-Level5 from Lyft
        <xref ref-type="bibr" rid="ref9">(Kesten et al. 2019)</xref>
        . To annotate the data from the datasets,
a scene ontology was used.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Composition of the Datasets</title>
        <p>Both the NuScenes and Lyft datasets follow a similar
structure. NuScenes, for example, is divided into a set of 20
second driving segments/scenes, with 40 samples/sub-scenes
per segment (i.e. one sample/sub-scene every 0.5 seconds).
Each 20 second segment is associated with a temporal
interval and spatial area, while each sample is associated with the
data collected at a specific temporal instant (i.e. timestamp)
and spatial coordinates.</p>
        <p>The NuScenes dataset contains 850 driving segments with
34,149 samples. Each object and event detected in a sample
is associated with one of 23 categories2.</p>
        <p>
          The Lyft dataset contains 180 driving segments with
22,680 samples. Lyft has only 9 object and event categories
that are used for annotating samples.
2.2
A scene is described as an observable volume of time and
space
          <xref ref-type="bibr" rid="ref7">(Henson et al. 2019)</xref>
          . In the AD domain, a scene
depicts a situation encountered by a vehicle. A few examples
may include a vehicle stopped at a traffic light, cruising on
the highway, or crashing into another vehicle. The concept
of scene acts as the polestar with which all information about
the vehicle, and its situation, are integrated. More
specifically, a scene may include information about time and
location, the occurring events, and the participating objects.
        </p>
        <p>A scene may also include sub-scenes. For example,
consider a vehicle driving for 20 seconds on a highway. This
drive may be represented as a single scene. However, during
this drive the vehicle may encounter several different
situations, each of which may also be represented as a scene.</p>
        <p>Formally, Figure 1 shows the properties associated with a
scene (depicted in Protege3).</p>
        <p>A subset of the events and features-of-interest (i.e.
objects) represented within the Scene Ontology is shown in
Figure 2. For this work, the Scene Ontology has been
extended to subsume all concepts found in both the NuScenes
and Lyft datasets.
For each dataset, three distinct KGs are generated with
differing levels of informational detail. It should be noted that
the level of informational detail for each KG refers to the
inclusion of additional information about scenes. The three
levels include: (1) a base KG, (2) a KG with inferred type
relations for objects/events, and (3) a KG with additional
includes relations between scenes and object/events. It should
be noted that this additional information does not correspond
to an increase in the logical expressivity of the KGs. It is also
worth mentioning that each KG includes the Scene Ontology
along with the facts derived from each dataset.</p>
        <p>Base KG Within the Base KG, each 20 second segment
is instantiated as a scene, and each sample is instantiated as
a sub-scene. The scene representing a 20 second segment
is associated with a temporal interval (of 20 seconds) and</p>
        <sec id="sec-2-1-1">
          <title>2https://www.nuscenes.org/data-annotation</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>3https://protege.stanford.edu/</title>
          <p>a spatial location (e.g. a city). This scene is also associated
with a set of sub-scenes, representing the samples. Each
subscene is associated with a temporal instant, spatial point, and
the objects and events that participate in the scene. See
Figure 3(a) for an example.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>KG with Inferred Types In the Base KG, objects</title>
        <p>and events are explicitly typed to the most specific class
possible. For example, an object instance representing a
car is typed to the Car class. Because the Car class is a
sub-class of Vehicle, then the instance is also a type of
Vehicle. However, this knowledge is only implicit in the
KG; implied by the semantics of the owl:subClassOf
relation. To make this knowledge explicit, a reasoner is used
to infer all implied types for each object and event instance.
See Figure 3(b) for an example.</p>
        <p>Example RDF (new triples proceeded by !)
:inst-scene rdf:type scene:Scene .
:inst-scene scene:hasPart :inst-sub-scene .
:inst-sub-scene includes :inst-car .
:inst-car rdf:type scene:Car .
! :inst-car rdf:type scene:Vehicle .
! :inst-car rdf:type scene:FeatureOfInterest .</p>
        <p>KG with Include Paths Within the Base KG, objects and
events are associated with sub-scenes derived from samples
of a 20 second drive. The scene representing the entire drive
is associated with these participating objects and events
through a two-hop path.</p>
        <sec id="sec-2-2-1">
          <title>Example RDF</title>
          <p>:inst-scene scene:hasPart :inst-sub-scene .
:inst-sub-scene scene:includes :inst-car .</p>
          <p>In order to make a more direct association between a
scene representing a drive and the detected objects and
events, new includes relations are added to the KG. See
Figure 3(c) for an example.</p>
          <p>Example RDF (new triples proceeded by !)
:inst-scene scene:hasPart :inst-sub-scene .
:inst-sub-scene scene:includes :inst-car .
! :inst-scene scene:includes :inst-car .
The goal of learning embeddings from a KG is to represent
the entities and relations in low-dimensional vector space
while also maintaining the semantics contained in the KG.
This transformation allows KGs to be more easily
manipulated and used for downstream learning tasks (e.g. link
preholds, fr is expected to be large.</p>
          <p>fr(h; t) =
jh + r
tj1=2
(1)</p>
          <p>TransE is one of the most efficient KGE algorithms
having O(nd + md) space complexity and O(ntd) time
complexity. Despite it’s benefits, TransE falls short in capturing
1-N, N-1 and N-N relations in KGs.</p>
          <p>RESCAL RESCAL belongs to the semantic
matching/multiplicative class of KGE algorithms. RESCAL is
an expressive model which takes into account the inherent
structure in multi-relational KGs and captures complex
patterns over multiple hops in the KG. It represents each
relation r as matrix Mr that captures all the interaction between
vectors (h; t) of the entities h and r. The scoring function
fr(h; t) is a bi-linear function which computes pairwise
interaction of entities with respect to each relation r.</p>
          <p>d 1 d 1
fr = hT Mrt = X X[Mr]ij :[h]i:[t]j</p>
          <p>The main limitation of using RESCAL with big KGs is
due to its high space and time complexity. RESCAL has
O(nd+md2) space complexity and O(ntd2) time
complexity.</p>
          <p>
            HolE HolE is an efficient successor of RESCAL and
addresses the high space complexity of RESCAL while
retaining the expressive power. HolE represents both entities and
relations as vectors in Rd. Given a triple (h; r; t), HolE first
creates a compositional vector using circular correlation
operation which aims at compressing the pairwise interaction;
diction
            <xref ref-type="bibr" rid="ref17 ref23">(Xiao, Huang, and Zhu 2016)</xref>
            and KG completion
            <xref ref-type="bibr" rid="ref10">(Lin et al. 2015)</xref>
            ). Vector representation of KGEs also
allows background knowledge contained in KGs to be easily
integrated with other input features of a machine learning
model.
          </p>
          <p>
            To select candidate algorithms for our experiments, we
referred to the classification of KGE algorithms established by
            <xref ref-type="bibr" rid="ref20">(Wang et al. 2017)</xref>
            and
            <xref ref-type="bibr" rid="ref18">(Sharma, Talukdar, and others 2018)</xref>
            .
KGE algorithms are categorized into two main classes:
(1) Transitional distance-based algorithms and (2)
Semantic Matching Models. For transitional distance-based
algorithms (i.e. additive methods) the scoring function is
composed of distance measures, and vector addition/subtraction
is used to capture the vector interaction. For Semantic
Matching Models (i.e. multiplicative methods) the
scoring function is based on a similarity measure, and
entityrelation-entity interaction is captured via a multiplicative
score function. We initially selected one algorithm from
each class. Specifically, we selected TransE from the former
category and RESCAL from the latter category. RESCAL,
however, has limitations in handling big KGs due to its
high space and time complexity. As a result, we also
included HolE into our experiments, which is a more space
and memory-efficient successor of RESCAL.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>3.1 Preliminaries</title>
        <p>Next, the symbols and notations used throughout the paper
are introduced, along with important details of the KGE
algorithms.</p>
        <p>Notation: Given a set of entities E and set of relations R,
we define KG to be a set of triples (h; r; t), T = E R
(E [L) where L is the set of literals. We consider E = C [N
where C is set of concepts from the ontology and N is set
of individuals. Lowercase bold characters represent vectors
of an entity or relation and uppercase bold characters
represent a set of vectors. For example, e 2 E is an embedding
vector of e 2 E . Most of the embedding algorithms –
including some used in our evaluation – generate vectors of
dimension d to represent entities e 2 Rd for e 2 E and
r = Rd to represent relations r 2 R. However, some
algorithms learn a projection matrix Mr 2 Rd d to represent
relations. The scoring function : E R E ! R used
in each algorithm is different. Transitional distance-based
models use distance measures whereas semantic matching
based methods use similarity measures. The learning of
embeddings involve optimizing parameter in the loss
function L(T ; T 0; ) where T is the set of positive triples and T 0
is the set of corrupted triples. When considering time and
space complexities of each algorithm, we consider n = jEj,
m = jRj and nt to be the number of training triples.</p>
        <p>The details of each algorithm used are briefly discussed
below;
TransE TransE is considered the most representative of
the translational distance-based class of algorithms. Given
a triple (h; r; t), TransE represents r as a translation vector
from h to t. Hence h + r t when the triple (h; r; t) holds
true. The scoring function of TransE fr is defined as the
negative distance between h + r and t, and when the (h; r; t)
(2)
(3)</p>
        <p>The total score for a given fact is then calculated by using
the function fr(h; t) that considers both the compositional
vector and the relation vector.
fr(h; t) = rT (h?t) =
d 1 d 1
X[r]i X[h]k:[t](k+i) mod d (4)
i=0 k=0</p>
        <p>Due to the use of circular correlation, HolE achieves
O(nd + md) space complexity and O(ntd log d) time
complexity.
3.2</p>
      </sec>
      <sec id="sec-2-4">
        <title>Visualizing KG Embeddings</title>
        <p>
          We selected 10 driving segments (including their samples)
from each dataset - NuScenes and Lyft - and created “mini”
KGs to visualize the embeddings in 2-dimensional (2D)
space. After experimenting with both PCA
          <xref ref-type="bibr" rid="ref22">(Wold, Esbensen,
and Geladi 1987)</xref>
          and t-Distributed Stochastic Neighbor
Embedding (t-SNE)
          <xref ref-type="bibr" rid="ref12">(Maaten and Hinton 2008)</xref>
          , t-SNE was
selected for dimensionality reduction as its 2D projections
yielded more meaningful clusters than PCA for the
generated embeddings. The embedding dimension d was set to
100 when generating embeddings for all our experiments.
        </p>
        <p>Our extended Scene Ontology identifies events and
features-of-interests (FoI) as top-level classes in the
ontology, and each instance of event or FoI are linked to Scenes
via the includes relation. FoIs are related to events through
the isParticipantOf relation. Therefore, we first look at how
FoIs and events are manifested in the embedding space for
each dataset.</p>
        <p>NuScenes KG Embeddings Figure 4 shows how the
events and FoIs form clusters in the embedding space based
on their type. For the sake of brevity, only cars and the
events in which they participate are highlighted. From this
visualization, you can see that instances of events such as
stopped car, moving car and parked car are clustered around
the instances of car. The embeddings represented in this
figure are generated from TransE on the “Base KG”.
Lyft KG Embeddings Figure 5 shows a similar
visualization of the embeddings from the Lyft “Base KG”. This
image shows how the events in Lyft are clustered together
with FoIs. It may be noticed that Lyft contains fewer clusters
than NuScenes. This is the case since the Lyft dataset only
contains annotations for a few FoIs. Similar to what we’ve
seen with NuScenes, Lyft embeddings also show how the
instances of events such as stopped car, parked car and
driving straightforward are clustered around instances of car.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation</title>
      <p>
        The primary objective of our evaluation is to determine how
well the salient features and rich semantics of KGs, such
as type and relation semantics, transfer to learned
embeddings. Our evaluation deviates from most prior work
evaluating KGE algorithms, which focus on evaluating the
performance of some extrinsic downstream task (such as entity
classification). Here, we focus more on an intrinsic
evaluation of embeddings.
There exists a large body of literature that evaluates the
effect of using KGEs on a downstream task. To our
knowledge, however, there’s only one recent work which
introduces metrics to evaluate and quantify the intrinsic quality
of embeddings. Of the metrics introduced in
        <xref ref-type="bibr" rid="ref1">(Alshargi et al.
2019)</xref>
        , we adapt three metrics for our evaluation:
categorization measure, coherence measure, and semantic transition
distance. Figure 6 depicts the four dimensions involved in
our evaluation – i.e. quality metrics, datasets, KGE
algorithms and KGs with varying degrees of informational
detail.
Next we provide a brief overview of each of the evaluation
metrics;
      </p>
      <p>The categorization measure captures how well the
entities that are “typed” by the same background concept cluster
together. For example, all the entities that are typed by the
concept Car ( e</p>
      <p>8 i 2 ck=Car) share common characteristics
and should be clustered together. Hence this metric is
computed by taking the cosine similarity s(V1; V2) = V1:V2 of
jV1jjV2j
the averaged embedding vector (equation 5) of all such
entities and the embedding vector of the background concept k,
ck.
(5)
(6)
(7)</p>
      <p>The coherence measure captures whether the adjacent
entities in the embedding space share a common background
concept. In introducing this measure, authors hypothesise
that in the ideal case, all the entities that are typed by the
same background concept should form a cluster and the
background concept should be the centroid of this cluster.
This is quantified (equation 7) by taking n closest entities
of the background concept ci and taking the proportion of
which that are actually typed by ci. For our experiments, we
choose n to be 1000.</p>
      <p>#eijei 2 ci
Coherence(ci) =
n</p>
      <p>The semantic transition distance is a widely used
metric in word embedding literature and re-introduced to KGEs
to capture the relational semantics of KGs. For example,
assume hi is asserted as the domain of the property ri and ti
is asserted as its range. Then, if (hi; ri; ti) is correctly
represented in the embedding space, the transition distance
between hi + ri should be close to ti. This is formally
represented in equation 8 where Tr is semantic transition distance
of relation r and s denotes cosine similarity.</p>
      <p>Tr(hi + ri; ti) = s(hi + ri; ti)
(8)</p>
      <p>Next we report our evaluation results of these three
metrics with respect to each dataset/algorithm.</p>
      <p>Evaluation on the Lyft Dataset Figure 7 summarizes the
results of the categorization measure computed on the
embeddings generated from three algorithms on three KG
versions. It is clear from the figures that TransE performs
better compared to RESCAL and HolE and the categorization
quality is mostly better in the KG with more informational
detail (i.e. KG w/ include paths) compared to other two less
expressive variants. In our experimental setting, this
measure is computed considering the top level concepts (FoIs
and events) in the Scene Ontology.</p>
      <p>The results of the coherence measure, as depicted in
figure 8, shows a similar trend as the categorization measure;
TransE is performing better and both RESCAL and HolE
fail to generate entity clusters with high purity and closer
to the background concept of those entities. It is interesting
to note that, the KG with highest informational detail shows
significant improvement in coherence measure on the
embeddings generated from TransE.</p>
      <p>Next we look at how the relational semantics in KGs
are transferred to KGEs by computing semantic transition
distance for 11 relations defined in the Scene ontology.
As per figure 9, KGs with include paths are able to
capture relational semantics better than the other two variants
across all three algorithms. An interesting observation to
note here is that the isPartOf relation performs significantly
better in KGs with include paths across all three algorithms
even though we have only added implicit include relations.
A possible explanation could be that the implicit include
paths make the relationship between scenes and sub-scenes
stronger in KGs with the highest informational detail.</p>
      <sec id="sec-3-1">
        <title>Evaluation on the NuScenes Dataset The evaluation pro</title>
        <p>cess on the NuScenes dataset is similar to Lyft. However,
we report that RESCAL was not scalable to the NuScenes
KGs (having 10.8+ million triples and 2.1+ million entities).
Therefore, we evaluate NuScenes KGEs only on TransE and
HolE.</p>
        <p>The results of the categorization measure for the
NuScenes dataset follows a similar trend as Lyft (see
Figure 10). TransE embeddings on KG w/ include paths yields
the best categorization performance, with the exception of a
few concepts where HolE outperforms on the base KG.
Except for these few outliers, the results show that higher level
of informational detail in KG achieves better categorization
irrespective of the KGE algorithms used for training.</p>
        <p>The benefits of information detail in KG are portrayed
well in the results of coherence measure (see Figure 11).
Even though the coherence measure, for many concepts, are
either non-existent or closer to zero, the KG w/ include paths
significantly outperform the other two KG variants for those
values that exist.</p>
        <p>The results of the semantic transition distance for TransE
show consistent patterns similar to Lyft (see Figure 12).
HolE, however, shows that base KGEs perform on par with
embeddings trained on the KG w/ include paths.
The evaluation of KGEs for AD domain lead to some
interesting observations. We discuss our observations in three
perspectives: (1) KGE algorithmic perspective, (2)
evaluation measures, and (3) various levels of KG informational
detail.</p>
        <p>
          First, looking at the overall performance of KGE
algorithms, TransE performs better than RESCAL and HolE in
capturing both type and relational semantics. TransE is also
scalable to large KGs and shows consistent performance
across datasets. In addition to RESCAL’s space and time
complexity, it’s performance on all three metrics is worse
than TransE and HolE. Even though HolE’s performance is
sub-optimal compared to TransE, it was consistent across
the two datasets and the derived KGs. We hypothesize that
the better performance of TransE on all three metrics is due
to the way it is learning embeddings; i.e. using the
translational distance based scoring function inspired by word
embedding algorithms. The KGE quality metrics introduced by
          <xref ref-type="bibr" rid="ref1">(Alshargi et al. 2019)</xref>
          are inspired by the word embedding
literature. They evaluated these metrics on KGEs generated
from word embedding based RDF2Vec
          <xref ref-type="bibr" rid="ref17">(Ristoski and
Paulheim 2016)</xref>
          algorithm. Hence, it may be worth examining
whether these metrics are suitable only for evaluating
embeddings generated from translational distance / word
embedding inspired KGE algorithms.
        </p>
        <p>Second, when considering the evaluation measures used,
we observe that the coherence measure is not very
meaningful in this domain to evaluate the quality of KGEs. The
entities are mostly clustered based on either scenes/sub-scenes
or FoIs/events. Hence, the n most similar entities to a class
(e.g. Human) are mostly not homogeneous, resulting in zero
or close to zero coherence value.</p>
        <p>Third, it has been consistently shown across multiple
datasets and algorithms that KGs with the highest levels of
informational detail are able to capture both type and
relational semantics better than the other two less expressive
variants. This discovery leads to an interesting future
direction for research. To the best of our knowledge, all existing
KGE algorithms in the literature are evaluated on base KGs
(i.e. KGs without any inference). Therefore, it stands to
reason that a KG embedding derived from a KG with more
informational detail should capture more salient features and
rich semantics of the KG. Such informational details can be
captured either through pre-processing or by automatically
extracted by the KGE algorithm.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5 Investigating Semantics of AD Domain</title>
      <p>In the previous section we looked at different quality aspects
of embeddings which are common to KGs in any domain.
Here, we report additional experiments which look at certain
aspects specific to AD domain and scene understanding.
5.1</p>
      <sec id="sec-4-1">
        <title>Scene/sub-scene Relationship</title>
        <p>An understanding of complex AD scenes is an important
task in AD domain. The ability to distinguish one
complex scene from another requires looking at (1) how the
scene/sub-scene relationship (formally defined by isPartOf
relation) is captured by the embeddings in different KG
versions, and (2) how well the FoIs and events cluster based on
the scenes/subscenes they belong to.</p>
        <p>Figure 13 shows the manifestation of the scene-subscene
relationship that is moving from a “no-relation” (figure
13(a)) in the base KG to a more “meaningful” one (figure
13(c)) in the KG with the highest informational detail. Then
we look at how the various levels of informational detail in
KGs affect the clustering of FoIs and events of a scene based
on scene/sub-scene relationship.</p>
        <p>Figure 14 shows how the FoI/event dominant clusters in
the base KG transfer to a clustering based on 10 scenes in
the KG w/ include paths. Interestingly, we can still see small
clusters formed based on FoIs/events inside the larger scene
clusters. This suggests that the KGs with access to more
informational detail are able to distinguish a scene by both
participating FoIs/events as well as scene/sub-scene
relationships.
We report preliminary results of using KGEs for computing
scene similarity. Our objective here is to determine whether
two scenes are similar by considering only KGEs. Given
a set of scene pairs, we calculate the cosine similarity
between the KGE vectors of two scenes in a pair and then
select pairs with the highest cosine similarity. Figure 15(a)
shows the two most similar sub-scenes when pairs include
sub-scenes from the same parent scene. With further
investigation, we found that these two sub-scenes are in fact
subsequent frames (or samples) from the same 20 second
driving segment. Figure 15(b) shows the two most similar
subscenes when pairs contain only sub-scenes from different
scenes. It is interesting to note that the KGE based similarity
computation was able to identify two sub-scenes which are
not visually similar, but share common characteristics. For
example, the black string of objects in Figure 15(b) (Left)
are barriers (a Static Object) and the orange string of objects
in Figure 15(b) (Right) are set of stopped cars.
In this paper, we present an evaluation of KGEs for the
autonomous driving domain that considers multiple datasets,
metrics, algorithms and levels of informational detail. This
evaluation supports the hypothesis that a KG with more
detailed information yields higher quality KG embeddings
with respect to both type and relational semantics.
Furthermore, this evaluation highlights an important question about
the suitability of metrics used in the existing literature, to
evaluate a wide range of KGE algorithms. Finally, opening
rich areas for future research, we present an early
investigation into the use of KGEs for two important use-cases from
the AD domain: scene/sub-scene understanding and
computing scene similarity.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Appendices</title>
      <p>We include detailed evaluation results with respect to each
dataset, evaluation metric, KG and algorithm in appendices.
Tables (2, 3, 4) in appendix A contains results of the Lyft
dataset whereas appendix B contains tables (5, 6, 7)
summarizing the results of the NuScenes dataset.</p>
    </sec>
    <sec id="sec-6">
      <title>Appendix A: Evaluation Results of the Lyft dataset</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Alshargi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Shekarpoour</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Soru,
          <string-name>
            <given-names>T.</given-names>
            ; and
            <surname>Sheth</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>Metrics for evaluating quality of embeddings for ontological concepts</article-title>
          .
          <source>In AAAI 2019 Spring Symposium on Combining Machine Learning with Knowledge Engineering (AAAIMAKE</source>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Usunier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Garcia-Duran</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Weston</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Yakhnenko</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Translating embeddings for modeling multi-relational data</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          ,
          <volume>2787</volume>
          -
          <fpage>2795</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          2019.
          <article-title>nuscenes: A multimodal dataset for autonomous driving</article-title>
          . ArXiv abs/
          <year>1903</year>
          .11027.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Tian</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Zaniolo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>Multilingual knowledge graph embeddings for cross-lingual knowledge alignment</article-title>
          .
          <source>In Proceedings of the 26th International Joint Conference on Artificial Intelligence</source>
          ,
          <fpage>1511</fpage>
          -
          <lpage>1517</lpage>
          . AAAI Press.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Henson</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schmid</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Tran,
          <string-name>
            <given-names>T.</given-names>
            ; and
            <surname>Karatzoglou</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>Using a knowledge graph of scenes to enable search of autonomous driving data</article-title>
          .
          <source>In Proceedings of the 2019 International Semantic Web Conference (ISWC</source>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Kesten</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Usman,
          <string-name>
            <surname>M.</surname>
          </string-name>
          ; Houston, J.; Pandya,
          <string-name>
            <given-names>T.</given-names>
            ;
            <surname>Nadhamuni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ;
            <surname>Ferreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Low</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ;
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Ondruska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ;
            <surname>Omari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ;
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ;
            <surname>Kulkarni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Kazakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ;
            <surname>Platinsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ;
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            ; and
            <surname>Shet</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Lyft level 5 av dataset 2019</article-title>
          . https://level5.lyft.com/dataset/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; Liu,
          <string-name>
            <given-names>Y.</given-names>
            ; and
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          <year>2015</year>
          .
          <article-title>Learning entity and relation embeddings for knowledge graph completion</article-title>
          .
          <source>In Twenty-ninth AAAI conference on Artificial Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Litman</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Autonomous vehicle implementation predictions: Implications for transport planning</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Maaten</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          v. d., and
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>Visualizing data using t-sne</article-title>
          .
          <source>Journal of machine learning research 9</source>
          (Nov):
          <fpage>2579</fpage>
          -
          <lpage>2605</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Myklebust</surname>
            ,
            <given-names>E. B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Jimenez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Wolf, R.; and
          <string-name>
            <surname>Tollefsen</surname>
            ,
            <given-names>K. E.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Knowledge graph embedding for ecotoxicological effect prediction</article-title>
          .
          <source>In International Semantic Web Conference</source>
          ,
          <volume>490</volume>
          -
          <fpage>506</fpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          2015.
          <article-title>A review of relational machine learning for knowledge graphs</article-title>
          .
          <source>Proceedings of the IEEE</source>
          <volume>104</volume>
          (
          <issue>1</issue>
          ):
          <fpage>11</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Nickel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Rosasco</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ; and Poggio,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>Holographic embeddings of knowledge graphs</article-title>
          .
          <source>In Thirtieth AAAI Conference on Artificial Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Nickel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Tresp</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ; and Kriegel, H.-P.
          <year>2011</year>
          .
          <article-title>A three-way model for collective learning on multi-relational data</article-title>
          .
          <source>In Proceedings of the 28th International Conference on International Conference on Machine Learning</source>
          ,
          <fpage>809</fpage>
          -
          <lpage>816</lpage>
          . Omnipress.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Ristoski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Paulheim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Rdf2vec: Rdf graph embeddings for data mining</article-title>
          .
          <source>In International Semantic Web Conference</source>
          ,
          <volume>498</volume>
          -
          <fpage>514</fpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Sharma</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Talukdar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; et al.
          <year>2018</year>
          .
          <article-title>Towards understanding the geometry of knowledge graph embeddings</article-title>
          .
          <source>In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          ,
          <fpage>122</fpage>
          -
          <lpage>131</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gaur</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kursuncu</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ; and Wickramarachchi,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Shades of knowledge-infused learning for enhancing deep learning</article-title>
          .
          <source>IEEE Internet Computing</source>
          <volume>23</volume>
          (
          <issue>6</issue>
          ):
          <fpage>54</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Knowledge graph embedding: A survey of approaches and applications</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>29</volume>
          (
          <issue>12</issue>
          ):
          <fpage>2724</fpage>
          -
          <lpage>2743</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ; Zhang,
          <string-name>
            <given-names>F.</given-names>
            ;
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            ;
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          ; and Guo,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Multi-task feature learning for knowledge graph enhanced recommendation</article-title>
          .
          <source>In The World Wide Web Conference</source>
          ,
          <year>2000</year>
          -
          <fpage>2010</fpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Wold</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Esbensen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ; and Geladi,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <year>1987</year>
          .
          <article-title>Principal component analysis</article-title>
          .
          <source>Chemometrics and intelligent laboratory systems 2</source>
          (
          <issue>1</issue>
          -3):
          <fpage>37</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Xiao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ; Huang,
          <string-name>
            <given-names>M.</given-names>
            ; and
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>From one point to a manifold: knowledge graph embedding for precise link prediction</article-title>
          .
          <source>In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence</source>
          ,
          <fpage>1315</fpage>
          -
          <lpage>1321</lpage>
          . AAAI Press.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>